Sample records for physical interaction datasets

  1. Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets

    PubMed Central

    2011-01-01

    Background M. tuberculosis is a formidable bacterial pathogen. There is thus an increasing demand on understanding the function and relationship of proteins in various strains of M. tuberculosis. Protein-protein interactions (PPIs) data are crucial for this kind of knowledge. However, the quality of the main available M. tuberculosis PPI datasets is unclear. This hampers the effectiveness of research works that rely on these PPI datasets. Here, we analyze the two main available M. tuberculosis H37Rv PPI datasets. The first dataset is the high-throughput B2H PPI dataset from Wang et al’s recent paper in Journal of Proteome Research. The second dataset is from STRING database, version 8.3, comprising entirely of H37Rv PPIs predicted using various methods. We find that these two datasets have a surprisingly low level of agreement. We postulate the following causes for this low level of agreement: (i) the H37Rv B2H PPI dataset is of low quality; (ii) the H37Rv STRING PPI dataset is of low quality; and/or (iii) the H37Rv STRING PPIs are predictions of other forms of functional associations rather than direct physical interactions. Results To test the quality of these two datasets, we evaluate them based on correlated gene expression profiles, coherent informative GO term annotations, and conservation in other organisms. We observe a significantly greater portion of PPIs in the H37Rv STRING PPI dataset (with score ≥ 770) having correlated gene expression profiles and coherent informative GO term annotations in both interaction partners than that in the H37Rv B2H PPI dataset. Predicted H37Rv interologs derived from non-M. tuberculosis experimental PPIs are much more similar to the H37Rv STRING functional associations dataset (with score ≥ 770) than the H37Rv B2H PPI dataset. H37Rv predicted physical interologs from IntAct also show extremely low similarity with the H37Rv B2H PPI dataset; and this similarity level is much lower than that between the S. aureus MRSA252 predicted physical interologs from IntAct and S. aureus MRSA252 pull-down PPIs. Comparative analysis with several representative two-hybrid PPI datasets in other species further confirms that the H37Rv B2H PPI dataset is of low quality. Next, to test the possibility that the H37Rv STRING PPIs are not purely direct physical interactions, we compare M. tuberculosis H37Rv protein pairs that catalyze adjacent steps in enzymatic reactions to B2H PPIs and predicted PPIs in STRING, which shows it has much lower similarities with the B2H PPIs than with STRING PPIs. This result strongly suggests that the H37Rv STRING PPIs more likely correspond to indirect relationships between protein pairs than to B2H PPIs. For more precise support, we turn to S. cerevisiae for its comprehensively studied interactome. We compare S. cerevisiae predicted PPIs in STRING to three independent protein relationship datasets which respectively comprise PPIs reported in Y2H assays, protein pairs reported to be in the same protein complexes, and protein pairs that catalyze successive reaction steps in enzymatic reactions. Our analysis reveals that S. cerevisiae predicted STRING PPIs have much higher similarity to the latter two types of protein pairs than to two-hybrid PPIs. As H37Rv STRING PPIs are predicted using similar methods as S. cerevisiae predicted STRING PPIs, this suggests that these H37Rv STRING PPIs are more likely to correspond to the latter two types of protein pairs rather than to two-hybrid PPIs as well. Conclusions The H37Rv B2H PPI dataset has low quality. It should not be used as the gold standard to assess the quality of other (possibly predicted) H37Rv PPI datasets. The H37Rv STRING PPI dataset also has low quality; nevertheless, a subset consisting of STRING PPIs with score ≥770 has satisfactory quality. However, these STRING “PPIs” should be interpreted as functional associations, which include a substantial portion of indirect protein interactions, rather than direct physical interactions. These two factors cause the strikingly low similarity between these two main H37Rv PPI datasets. The results and conclusions from this comparative analysis provide valuable guidance in using these M. tuberculosis H37Rv PPI datasets in subsequent studies for a wide range of purposes. PMID:22369691

  2. GODIVA2: interactive visualization of environmental data on the Web.

    PubMed

    Blower, J D; Haines, K; Santokhee, A; Liu, C L

    2009-03-13

    GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.

  3. The New LASP Interactive Solar IRradiance Datacenter (LISIRD)

    NASA Astrophysics Data System (ADS)

    Baltzer, T.; Wilson, A.; Lindholm, D. M.; Snow, M. A.; Woodraska, D.; Pankratz, C. K.

    2017-12-01

    The New LASP Interactive Solar IRradiance Datacenter (LISIRD) The University of Colorado at Boulder's Laboratory for Atmospheric and Space Physics (LASP) has a long history of providing state of the art Solar instrumentation and datasets to the community. In 2005, LASP created a web interface called LISIRD which provided plotting of and access to a number of Solar Irradiance measured and modeled datasets, and it has been used extensively by members of the community both within and outside of LASP. In August of 2017, LASP is set to release a new version of LISIRD for use by anyone interested in viewing and downloading the datasets it serves. This talk will describe the new LISIRD with emphasis on features enabled by it to include: New and more functional plotting interfaces Better dataset browse and search capabilities More datasets Easier to add datasets from a wider array of resources Cleaner interface with better use of screen real estate Much easier to update metadata describing each dataset Much of this capability is leveraged off new infrastructure that will also be touched upon.

  4. HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

    PubMed

    López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2015-01-01

    HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.

  5. Inferring Mechanisms of Compensation from E-MAP and SGA Data Using Local Search Algorithms for Max Cut

    NASA Astrophysics Data System (ADS)

    Leiserson, Mark D. M.; Tatar, Diana; Cowen, Lenore J.; Hescott, Benjamin J.

    A new method based on a mathematically natural local search framework for max cut is developed to uncover functionally coherent module and BPM motifs in high-throughput genetic interaction data. Unlike previous methods which also consider physical protein-protein interaction data, our method utilizes genetic interaction data only; this becomes increasingly important as high-throughput genetic interaction data is becoming available in settings where less is known about physical interaction data. We compare modules and BPMs obtained to previous methods and across different datasets. Despite needing no physical interaction information, the BPMs produced by our method are competitive with previous methods. Biological findings include a suggested global role for the prefoldin complex and a SWR subcomplex in pathway buffering in the budding yeast interactome.

  6. Inferring mechanisms of compensation from E-MAP and SGA data using local search algorithms for max cut.

    PubMed

    Leiserson, Mark D M; Tatar, Diana; Cowen, Lenore J; Hescott, Benjamin J

    2011-11-01

    A new method based on a mathematically natural local search framework for max cut is developed to uncover functionally coherent module and BPM motifs in high-throughput genetic interaction data. Unlike previous methods, which also consider physical protein-protein interaction data, our method utilizes genetic interaction data only; this becomes increasingly important as high-throughput genetic interaction data is becoming available in settings where less is known about physical interaction data. We compare modules and BPMs obtained to previous methods and across different datasets. Despite needing no physical interaction information, the BPMs produced by our method are competitive with previous methods. Biological findings include a suggested global role for the prefoldin complex and a SWR subcomplex in pathway buffering in the budding yeast interactome.

  7. Inferring Mechanisms of Compensation from E-MAP and SGA Data Using Local Search Algorithms for Max Cut

    PubMed Central

    Leiserson, Mark D.M.; Tatar, Diana; Cowen, Lenore J.

    2011-01-01

    Abstract A new method based on a mathematically natural local search framework for max cut is developed to uncover functionally coherent module and BPM motifs in high-throughput genetic interaction data. Unlike previous methods, which also consider physical protein-protein interaction data, our method utilizes genetic interaction data only; this becomes increasingly important as high-throughput genetic interaction data is becoming available in settings where less is known about physical interaction data. We compare modules and BPMs obtained to previous methods and across different datasets. Despite needing no physical interaction information, the BPMs produced by our method are competitive with previous methods. Biological findings include a suggested global role for the prefoldin complex and a SWR subcomplex in pathway buffering in the budding yeast interactome. PMID:21882903

  8. CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets.

    PubMed

    Schofield, E C; Carver, T; Achuthan, P; Freire-Pritchett, P; Spivakov, M; Todd, J A; Burren, O S

    2016-08-15

    Promoter capture Hi-C (PCHi-C) allows the genome-wide interrogation of physical interactions between distal DNA regulatory elements and gene promoters in multiple tissue contexts. Visual integration of the resultant chromosome interaction maps with other sources of genomic annotations can provide insight into underlying regulatory mechanisms. We have developed Capture HiC Plotter (CHiCP), a web-based tool that allows interactive exploration of PCHi-C interaction maps and integration with both public and user-defined genomic datasets. CHiCP is freely accessible from www.chicp.org and supports most major HTML5 compliant web browsers. Full source code and installation instructions are available from http://github.com/D-I-L/django-chicp ob219@cam.ac.uk. © The Author 2016. Published by Oxford University Press. All rights reserved.

  9. Application of Seasonal CRM Integrations to Develop Statistics and Improved GCM Parameterization of Subgrid Cloud-Radiation Interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xiaoqing Wu; Xin-Zhong Liang; Sunwook Park

    2007-01-23

    The works supported by this ARM project lay the solid foundation for improving the parameterization of subgrid cloud-radiation interactions in the NCAR CCSM and the climate simulations. We have made a significant use of CRM simulations and concurrent ARM observations to produce long-term, consistent cloud and radiative property datasets at the cloud scale (Wu et al. 2006, 2007). With these datasets, we have investigated the mesoscale enhancement of cloud systems on surface heat fluxes (Wu and Guimond 2006), quantified the effects of cloud horizontal inhomogeneity and vertical overlap on the domain-averaged radiative fluxes (Wu and Liang 2005), and subsequently validatedmore » and improved the physically-based mosaic treatment of subgrid cloud-radiation interactions (Liang and Wu 2005). We have implemented the mosaic treatment into the CCM3. The 5-year (1979-1983) AMIP-type simulation showed significant impacts of subgrid cloud-radiation interaction on the climate simulations (Wu and Liang 2005). We have actively participated in CRM intercomparisons that foster the identification and physical understanding of common errors in cloud-scale modeling (Xie et al. 2005; Xu et al. 2005, Grabowski et al. 2005).« less

  10. Big Data in HEP: A comprehensive use case study

    NASA Astrophysics Data System (ADS)

    Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter; Jayatilaka, Bo; Kowalkowski, Jim; Pivarski, Jim; Sehrish, Saba; Mantilla Surez, Cristina; Svyatkovskiy, Alexey; Tran, Nhan

    2017-10-01

    Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.

  11. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

    PubMed

    Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

    2015-07-02

    A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

  12. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare

    PubMed Central

    Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

    2015-01-01

    A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a “data modeler” tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets. PMID:26147731

  13. Spark and HPC for High Energy Physics Data Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sehrish, Saba; Kowalkowski, Jim; Paterno, Marc

    A full High Energy Physics (HEP) data analysis is divided into multiple data reduction phases. Processing within these phases is extremely time consuming, therefore intermediate results are stored in files held in mass storage systems and referenced as part of large datasets. This processing model limits what can be done with interactive data analytics. Growth in size and complexity of experimental datasets, along with emerging big data tools are beginning to cause changes to the traditional ways of doing data analyses. Use of big data tools for HEP analysis looks promising, mainly because extremely large HEP datasets can be representedmore » and held in memory across a system, and accessed interactively by encoding an analysis using highlevel programming abstractions. The mainstream tools, however, are not designed for scientific computing or for exploiting the available HPC platform features. We use an example from the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) in Geneva, Switzerland. The LHC is the highest energy particle collider in the world. Our use case focuses on searching for new types of elementary particles explaining Dark Matter in the universe. We use HDF5 as our input data format, and Spark to implement the use case. We show the benefits and limitations of using Spark with HDF5 on Edison at NERSC.« less

  14. Enhancer Sharing Promotes Neighborhoods of Transcriptional Regulation Across Eukaryotes

    PubMed Central

    Quintero-Cadena, Porfirio; Sternberg, Paul W.

    2016-01-01

    Enhancers physically interact with transcriptional promoters, looping over distances that can span multiple regulatory elements. Given that enhancer–promoter (EP) interactions generally occur via common protein complexes, it is unclear whether EP pairing is predominantly deterministic or proximity guided. Here, we present cross-organismic evidence suggesting that most EP pairs are compatible, largely determined by physical proximity rather than specific interactions. By reanalyzing transcriptome datasets, we find that the transcription of gene neighbors is correlated over distances that scale with genome size. We experimentally show that nonspecific EP interactions can explain such correlation, and that EP distance acts as a scaling factor for the transcriptional influence of an enhancer. We propose that enhancer sharing is commonplace among eukaryotes, and that EP distance is an important layer of information in gene regulation. PMID:27799341

  15. Big Data in HEP: A comprehensive use case study

    DOE PAGES

    Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter; ...

    2017-11-23

    Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.more » In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. Lastly, we will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.« less

  16. Big Data in HEP: A comprehensive use case study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter

    Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.more » In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. Lastly, we will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.« less

  17. Long-term dataset on aquatic responses to concurrent climate change and recovery from acidification

    NASA Astrophysics Data System (ADS)

    Leach, Taylor H.; Winslow, Luke A.; Acker, Frank W.; Bloomfield, Jay A.; Boylen, Charles W.; Bukaveckas, Paul A.; Charles, Donald F.; Daniels, Robert A.; Driscoll, Charles T.; Eichler, Lawrence W.; Farrell, Jeremy L.; Funk, Clara S.; Goodrich, Christine A.; Michelena, Toby M.; Nierzwicki-Bauer, Sandra A.; Roy, Karen M.; Shaw, William H.; Sutherland, James W.; Swinton, Mark W.; Winkler, David A.; Rose, Kevin C.

    2018-04-01

    Concurrent regional and global environmental changes are affecting freshwater ecosystems. Decadal-scale data on lake ecosystems that can describe processes affected by these changes are important as multiple stressors often interact to alter the trajectory of key ecological phenomena in complex ways. Due to the practical challenges associated with long-term data collections, the majority of existing long-term data sets focus on only a small number of lakes or few response variables. Here we present physical, chemical, and biological data from 28 lakes in the Adirondack Mountains of northern New York State. These data span the period from 1994-2012 and harmonize multiple open and as-yet unpublished data sources. The dataset creation is reproducible and transparent; R code and all original files used to create the dataset are provided in an appendix. This dataset will be useful for examining ecological change in lakes undergoing multiple stressors.

  18. Comparative visualization of genetic and physical maps with Strudel.

    PubMed

    Bayer, Micha; Milne, Iain; Stephen, Gordon; Shaw, Paul; Cardle, Linda; Wright, Frank; Marshall, David

    2011-05-01

    Data visualization can play a key role in comparative genomics, for example, underpinning the investigation of conserved synteny patterns. Strudel is a desktop application that allows users to easily compare both genetic and physical maps interactively and efficiently. It can handle large datasets from several genomes simultaneously, and allows all-by-all comparisons between these. Installers for Strudel are available for Windows, Linux, Solaris and Mac OS X at http://bioinf.scri.ac.uk/strudel/.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Song

    CFD (Computational Fluid Dynamics) is a widely used technique in engineering design field. It uses mathematical methods to simulate and predict flow characteristics in a certain physical space. Since the numerical result of CFD computation is very hard to understand, VR (virtual reality) and data visualization techniques are introduced into CFD post-processing to improve the understandability and functionality of CFD computation. In many cases CFD datasets are very large (multi-gigabytes), and more and more interactions between user and the datasets are required. For the traditional VR application, the limitation of computing power is a major factor to prevent visualizing largemore » dataset effectively. This thesis presents a new system designing to speed up the traditional VR application by using parallel computing and distributed computing, and the idea of using hand held device to enhance the interaction between a user and VR CFD application as well. Techniques in different research areas including scientific visualization, parallel computing, distributed computing and graphical user interface designing are used in the development of the final system. As the result, the new system can flexibly be built on heterogeneous computing environment, dramatically shorten the computation time.« less

  20. Solar Irradiance Data Products at the LASP Interactive Solar IRradiance Datacenter (LISIRD)

    NASA Astrophysics Data System (ADS)

    Lindholm, D. M.; Ware DeWolfe, A.; Wilson, A.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

    2011-12-01

    The Laboratory for Atmospheric and Space Physics (LASP) has developed the LASP Interactive Solar IRradiance Datacenter (LISIRD, http://lasp.colorado.edu/lisird/) web site to provide access to a comprehensive set of solar irradiance measurements and related datasets. Current data holdings include products from NASA missions SORCE, UARS, SME, and TIMED-SEE. The data provided covers a wavelength range from soft X-ray (XUV) at 0.1 nm up to the near infrared (NIR) at 2400 nm, as well as Total Solar Irradiance (TSI). Other datasets include solar indices, spectral and flare models, solar images, and more. The LISIRD web site features updated plotting, browsing, and download capabilities enabled by dygraphs, JavaScript, and Ajax calls to the LASP Time Series Server (LaTiS). In addition to the web browser interface, most of the LISIRD datasets can be accessed via the LaTiS web service interface that supports the OPeNDAP standard. OPeNDAP clients and other programming APIs are available for making requests that subset, aggregate, or filter data on the server before it is transported to the user. This poster provides an overview of the LISIRD system, summarizes the datasets currently available, and provides details on how to access solar irradiance data products through LISIRD's interfaces.

  1. Comparative visualization of genetic and physical maps with Strudel

    PubMed Central

    Bayer, Micha; Milne, Iain; Stephen, Gordon; Shaw, Paul; Cardle, Linda; Wright, Frank; Marshall, David

    2011-01-01

    Summary: Data visualization can play a key role in comparative genomics, for example, underpinning the investigation of conserved synteny patterns. Strudel is a desktop application that allows users to easily compare both genetic and physical maps interactively and efficiently. It can handle large datasets from several genomes simultaneously, and allows all-by-all comparisons between these. Availability and implementation: Installers for Strudel are available for Windows, Linux, Solaris and Mac OS X at http://bioinf.scri.ac.uk/strudel/. Contact: strudel@scri.ac.uk; micha.bayer@scri.ac.uk PMID:21372085

  2. Optimizing tertiary storage organization and access for spatio-temporal datasets

    NASA Technical Reports Server (NTRS)

    Chen, Ling Tony; Rotem, Doron; Shoshani, Arie; Drach, Bob; Louis, Steve; Keating, Meridith

    1994-01-01

    We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and interactive visualization. We have developed data partitioning techniques that partition datasets into 'clusters' based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.

  3. MCL-CAw: a refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure

    PubMed Central

    2010-01-01

    Background The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. Results Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. Conclusions We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw. PMID:20939868

  4. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae

    PubMed Central

    Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike

    2006-01-01

    Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047

  5. LEAP: biomarker inference through learning and evaluating association patterns.

    PubMed

    Jiang, Xia; Neapolitan, Richard E

    2015-03-01

    Single nucleotide polymorphism (SNP) high-dimensional datasets are available from Genome Wide Association Studies (GWAS). Such data provide researchers opportunities to investigate the complex genetic basis of diseases. Much of genetic risk might be due to undiscovered epistatic interactions, which are interactions in which combination of several genes affect disease. Research aimed at discovering interacting SNPs from GWAS datasets proceeded in two directions. First, tools were developed to evaluate candidate interactions. Second, algorithms were developed to search over the space of candidate interactions. Another problem when learning interacting SNPs, which has not received much attention, is evaluating how likely it is that the learned SNPs are associated with the disease. A complete system should provide this information as well. We develop such a system. Our system, called LEAP, includes a new heuristic search algorithm for learning interacting SNPs, and a Bayesian network based algorithm for computing the probability of their association. We evaluated the performance of LEAP using 100 1,000-SNP simulated datasets, each of which contains 15 SNPs involved in interactions. When learning interacting SNPs from these datasets, LEAP outperformed seven others methods. Furthermore, only SNPs involved in interactions were found to be probable. We also used LEAP to analyze real Alzheimer's disease and breast cancer GWAS datasets. We obtained interesting and new results from the Alzheimer's dataset, but limited results from the breast cancer dataset. We conclude that our results support that LEAP is a useful tool for extracting candidate interacting SNPs from high-dimensional datasets and determining their probability. © 2015 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.

  6. HYDRA Hyperspectral Data Research Application Tom Rink and Tom Whittaker

    NASA Astrophysics Data System (ADS)

    Rink, T.; Whittaker, T.

    2005-12-01

    HYDRA is a freely available, easy to install tool for visualization and analysis of large local or remote hyper/multi-spectral datasets. HYDRA is implemented on top of the open source VisAD Java library via Jython - the Java implementation of the user friendly Python programming language. VisAD provides data integration, through its generalized data model, user-display interaction and display rendering. Jython has an easy to read, concise, scripting-like, syntax which eases software development. HYDRA allows data sharing of large datasets through its support of the OpenDAP and OpenADDE server-client protocols. The users can explore and interrogate data, and subset in physical and/or spectral space to isolate key areas of interest for further analysis without having to download an entire dataset. It also has an extensible data input architecture to recognize new instruments and understand different local file formats, currently NetCDF and HDF4 are supported.

  7. DIVE: A Graph-based Visual Analytics Framework for Big Data

    PubMed Central

    Rysavy, Steven J.; Bromley, Dennis; Daggett, Valerie

    2014-01-01

    The need for data-centric scientific tools is growing; domains like biology, chemistry, and physics are increasingly adopting computational approaches. As a result, scientists must now deal with the challenges of big data. To address these challenges, we built a visual analytics platform named DIVE: Data Intensive Visualization Engine. DIVE is a data-agnostic, ontologically-expressive software framework capable of streaming large datasets at interactive speeds. Here we present the technical details of the DIVE platform, multiple usage examples, and a case study from the Dynameomics molecular dynamics project. We specifically highlight our novel contributions to structured data model manipulation and high-throughput streaming of large, structured datasets. PMID:24808197

  8. Experimental Database with Baseline CFD Solutions: 2-D and Axisymmetric Hypersonic Shock-Wave/Turbulent-Boundary-Layer Interactions

    NASA Technical Reports Server (NTRS)

    Marvin, Joseph G.; Brown, James L.; Gnoffo, Peter A.

    2013-01-01

    A database compilation of hypersonic shock-wave/turbulent boundary layer experiments is provided. The experiments selected for the database are either 2D or axisymmetric, and include both compression corner and impinging type SWTBL interactions. The strength of the interactions range from attached to incipient separation to fully separated flows. The experiments were chosen based on criterion to ensure quality of the datasets, to be relevant to NASA's missions and to be useful for validation and uncertainty assessment of CFD Navier-Stokes predictive methods, both now and in the future. An emphasis on datasets selected was on surface pressures and surface heating throughout the interaction, but include some wall shear stress distributions and flowfield profiles. Included, for selected cases, are example CFD grids and setup information, along with surface pressure and wall heating results from simulations using current NASA real-gas Navier-Stokes codes by which future CFD investigators can compare and evaluate physics modeling improvements and validation and uncertainty assessments of future CFD code developments. The experimental database is presented tabulated in the Appendices describing each experiment. The database is also provided in computer-readable ASCII files located on a companion DVD.

  9. Forward and small-x QCD physics results from CMS experiment at LHC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cerci, Deniz Sunar, E-mail: deniz.sunar.cerci@cern.ch

    2016-03-25

    The Compact Muon Solenoid (CMS) is one of the two large, multi-purpose experiments at the Large Hadron Collider (LHC) at CERN. During the Run I Phase a large pp collision dataset has been collected and the CMS collaboration has explored measurements that shed light on a new era. Forward and small-x quantum chromodynamics (QCD) physics measurements with CMS experiment covers a wide range of physics subjects. Some of highlights in terms of testing the very low-x QCD, underlying event and multiple interaction characteristics, photon-mediated processes, jets with large rapidity separation at high pseudo-rapidities and the inelastic proton-proton cross section dominatedmore » by diffractive interactions are presented. Results are compared to Monte Carlo (MC) models with different parameter tunes for the description of the underlying event and to perturbative QCD calculations. The prominent role of multi-parton interactions has been confirmed in the semihard sector but no clear deviation from the standard Dglap parton evolution due to Bfkl has been observed. An outlook to the prospects at 13 TeV is given.« less

  10. Dataset definition for CMS operations and physics analyses

    NASA Astrophysics Data System (ADS)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  11. A Virtual Reality Visualization Tool for Neuron Tracing

    PubMed Central

    Usher, Will; Klacansky, Pavol; Federer, Frederick; Bremer, Peer-Timo; Knoll, Aaron; Angelucci, Alessandra; Pascucci, Valerio

    2017-01-01

    Tracing neurons in large-scale microscopy data is crucial to establishing a wiring diagram of the brain, which is needed to understand how neural circuits in the brain process information and generate behavior. Automatic techniques often fail for large and complex datasets, and connectomics researchers may spend weeks or months manually tracing neurons using 2D image stacks. We present a design study of a new virtual reality (VR) system, developed in collaboration with trained neuroanatomists, to trace neurons in microscope scans of the visual cortex of primates. We hypothesize that using consumer-grade VR technology to interact with neurons directly in 3D will help neuroscientists better resolve complex cases and enable them to trace neurons faster and with less physical and mental strain. We discuss both the design process and technical challenges in developing an interactive system to navigate and manipulate terabyte-sized image volumes in VR. Using a number of different datasets, we demonstrate that, compared to widely used commercial software, consumer-grade VR presents a promising alternative for scientists. PMID:28866520

  12. A Virtual Reality Visualization Tool for Neuron Tracing.

    PubMed

    Usher, Will; Klacansky, Pavol; Federer, Frederick; Bremer, Peer-Timo; Knoll, Aaron; Yarch, Jeff; Angelucci, Alessandra; Pascucci, Valerio

    2018-01-01

    Tracing neurons in large-scale microscopy data is crucial to establishing a wiring diagram of the brain, which is needed to understand how neural circuits in the brain process information and generate behavior. Automatic techniques often fail for large and complex datasets, and connectomics researchers may spend weeks or months manually tracing neurons using 2D image stacks. We present a design study of a new virtual reality (VR) system, developed in collaboration with trained neuroanatomists, to trace neurons in microscope scans of the visual cortex of primates. We hypothesize that using consumer-grade VR technology to interact with neurons directly in 3D will help neuroscientists better resolve complex cases and enable them to trace neurons faster and with less physical and mental strain. We discuss both the design process and technical challenges in developing an interactive system to navigate and manipulate terabyte-sized image volumes in VR. Using a number of different datasets, we demonstrate that, compared to widely used commercial software, consumer-grade VR presents a promising alternative for scientists.

  13. Machine learning action parameters in lattice quantum chromodynamics

    NASA Astrophysics Data System (ADS)

    Shanahan, Phiala E.; Trewartha, Daniel; Detmold, William

    2018-05-01

    Numerical lattice quantum chromodynamics studies of the strong interaction are important in many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. The high information content and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.

  14. Trace Gas/Aerosol Interactions and GMI Modeling Support

    NASA Technical Reports Server (NTRS)

    Penner, Joyce E.; Liu, Xiaohong; Das, Bigyani; Bergmann, Dan; Rodriquez, Jose M.; Strahan, Susan; Wang, Minghuai; Feng, Yan

    2005-01-01

    Current global aerosol models use different physical and chemical schemes and parameters, different meteorological fields, and often different emission sources. Since the physical and chemical parameterization schemes are often tuned to obtain results that are consistent with observations, it is difficult to assess the true uncertainty due to meteorology alone. Under the framework of the NASA global modeling initiative (GMI), the differences and uncertainties in aerosol simulations (for sulfate, organic carbon, black carbon, dust and sea salt) solely due to different meteorological fields are analyzed and quantified. Three meteorological datasets available from the NASA DAO GCM, the GISS-II' GCM, and the NASA finite volume GCM (FVGCM) are used to drive the same aerosol model. The global sulfate and mineral dust burdens with FVGCM fields are 40% and 20% less than those with DAO and GISS fields, respectively due to its heavier rainfall. Meanwhile, the sea salt burden predicted with FVGCM fields is 56% and 43% higher than those with DAO and GISS, respectively, due to its stronger convection especially over the Southern Hemispheric Ocean. Sulfate concentrations at the surface in the Northern Hemisphere extratropics and in the middle to upper troposphere differ by more than a factor of 3 between the three meteorological datasets. The agreement between model calculated and observed aerosol concentrations in the industrial regions (e.g., North America and Europe) is quite similar for all three meteorological datasets. Away from the source regions, however, the comparisons with observations differ greatly for DAO, FVGCM and GISS, and the performance of the model using different datasets varies largely depending on sites and species. Global annual average aerosol optical depth at 550 nm is 0.120-0.131 for the three meteorological datasets.

  15. Three-dimensional global MHD modeling of a coronal mass ejection interacting with the solar wind

    NASA Astrophysics Data System (ADS)

    An, J.; Inoue, S.; Magara, T.; Lee, H.; Kang, J.; Hayashi, K.; Tanaka, T.; Den, M.

    2013-12-01

    We developed a three-dimensional (3D) magnetohydrodynamic (MHD) code to reproduce the structure of the solar wind, the propagation of a coronal mass ejection (CME), and the interaction between them. This MHD code is based on the finite volume method and total diminishing (TVD) scheme with an unstructured grid system. In particular, this grid system can avoid the singularity at the north and south poles and relax tight CFL conditions around the poles, both of which would arise in the spherical coordinate system (Tanaka 1995). In this study, we constructed a model of the solar wind driven by the physical values at 50 solar radii obtained from the MHD tomographic method (Hayashi et al. 2003) where an interplanetary scintillation (IPS) observational data is used. By comparing the result to the observational data obtained from the near-Earth OMNI dataset, we confirmed that our simulation reproduces the velocity, temperature and density profiles obtained from the near-Earth OMNI dataset. We then insert a spheromak-type CME (Kataoka et al. 2009) into our solar-wind model and investigate the propagation process of the CME interacting with the solar wind. In particular, we discuss how the magnetic twist accumulated in a CME affects the CME-solar wind interaction.

  16. Search strategy using LHC pileup interactions as a zero bias sample

    NASA Astrophysics Data System (ADS)

    Nachman, Benjamin; Rubbo, Francesco

    2018-05-01

    Due to a limited bandwidth and a large proton-proton interaction cross section relative to the rate of interesting physics processes, most events produced at the Large Hadron Collider (LHC) are discarded in real time. A sophisticated trigger system must quickly decide which events should be kept and is very efficient for a broad range of processes. However, there are many processes that cannot be accommodated by this trigger system. Furthermore, there may be models of physics beyond the standard model (BSM) constructed after data taking that could have been triggered, but no trigger was implemented at run time. Both of these cases can be covered by exploiting pileup interactions as an effective zero bias sample. At the end of high-luminosity LHC operations, this zero bias dataset will have accumulated about 1 fb-1 of data from which a bottom line cross section limit of O (1 ) fb can be set for BSM models already in the literature and those yet to come.

  17. Machine learning action parameters in lattice quantum chromodynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shanahan, Phiala; Trewartha, Daneil; Detmold, William

    Numerical lattice quantum chromodynamics studies of the strong interaction underpin theoretical understanding of many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. Finally, the high information contentmore » and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.« less

  18. Machine learning action parameters in lattice quantum chromodynamics

    DOE PAGES

    Shanahan, Phiala; Trewartha, Daneil; Detmold, William

    2018-05-16

    Numerical lattice quantum chromodynamics studies of the strong interaction underpin theoretical understanding of many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. Finally, the high information contentmore » and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.« less

  19. HyRA: A Hybrid Recommendation Algorithm Focused on Smart POI. Ceutí as a Study Scenario.

    PubMed

    Alvarado-Uribe, Joanna; Gómez-Oliva, Andrea; Barrera-Animas, Ari Yair; Molina, Germán; Gonzalez-Mendoza, Miguel; Parra-Meroño, María Concepción; Jara, Antonio J

    2018-03-17

    Nowadays, Physical Web together with the increase in the use of mobile devices, Global Positioning System (GPS), and Social Networking Sites (SNS) have caused users to share enriched information on the Web such as their tourist experiences. Therefore, an area that has been significantly improved by using the contextual information provided by these technologies is tourism. In this way, the main goals of this work are to propose and develop an algorithm focused on the recommendation of Smart Point of Interaction (Smart POI) for a specific user according to his/her preferences and the Smart POIs' context. Hence, a novel Hybrid Recommendation Algorithm (HyRA) is presented by incorporating an aggregation operator into the user-based Collaborative Filtering (CF) algorithm as well as including the Smart POIs' categories and geographical information. For the experimental phase, two real-world datasets have been collected and preprocessed. In addition, one Smart POIs' categories dataset was built. As a result, a dataset composed of 16 Smart POIs, another constituted by the explicit preferences of 200 respondents, and the last dataset integrated by 13 Smart POIs' categories are provided. The experimental results show that the recommendations suggested by HyRA are promising.

  20. HyRA: A Hybrid Recommendation Algorithm Focused on Smart POI. Ceutí as a Study Scenario

    PubMed Central

    Gómez-Oliva, Andrea; Molina, Germán

    2018-01-01

    Nowadays, Physical Web together with the increase in the use of mobile devices, Global Positioning System (GPS), and Social Networking Sites (SNS) have caused users to share enriched information on the Web such as their tourist experiences. Therefore, an area that has been significantly improved by using the contextual information provided by these technologies is tourism. In this way, the main goals of this work are to propose and develop an algorithm focused on the recommendation of Smart Point of Interaction (Smart POI) for a specific user according to his/her preferences and the Smart POIs’ context. Hence, a novel Hybrid Recommendation Algorithm (HyRA) is presented by incorporating an aggregation operator into the user-based Collaborative Filtering (CF) algorithm as well as including the Smart POIs’ categories and geographical information. For the experimental phase, two real-world datasets have been collected and preprocessed. In addition, one Smart POIs’ categories dataset was built. As a result, a dataset composed of 16 Smart POIs, another constituted by the explicit preferences of 200 respondents, and the last dataset integrated by 13 Smart POIs’ categories are provided. The experimental results show that the recommendations suggested by HyRA are promising. PMID:29562590

  1. Learning to recognize rat social behavior: Novel dataset and cross-dataset application.

    PubMed

    Lorbach, Malte; Kyriakou, Elisavet I; Poppe, Ronald; van Dam, Elsbeth A; Noldus, Lucas P J J; Veltkamp, Remco C

    2018-04-15

    Social behavior is an important aspect of rodent models. Automated measuring tools that make use of video analysis and machine learning are an increasingly attractive alternative to manual annotation. Because machine learning-based methods need to be trained, it is important that they are validated using data from different experiment settings. To develop and validate automated measuring tools, there is a need for annotated rodent interaction datasets. Currently, the availability of such datasets is limited to two mouse datasets. We introduce the first, publicly available rat social interaction dataset, RatSI. We demonstrate the practical value of the novel dataset by using it as the training set for a rat interaction recognition method. We show that behavior variations induced by the experiment setting can lead to reduced performance, which illustrates the importance of cross-dataset validation. Consequently, we add a simple adaptation step to our method and improve the recognition performance. Most existing methods are trained and evaluated in one experimental setting, which limits the predictive power of the evaluation to that particular setting. We demonstrate that cross-dataset experiments provide more insight in the performance of classifiers. With our novel, public dataset we encourage the development and validation of automated recognition methods. We are convinced that cross-dataset validation enhances our understanding of rodent interactions and facilitates the development of more sophisticated recognition methods. Combining them with adaptation techniques may enable us to apply automated recognition methods to a variety of animals and experiment settings. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

    PubMed

    Özgür, Arzucan; Hur, Junguk; He, Yongqun

    2016-01-01

    The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset. By modeling and representing multiple textual keywords for interaction types, the extended INO enabled the identification of complex biological gene-gene interactions represented with multiple keywords.

  3. Dataset of anomalies and malicious acts in a cyber-physical subsystem.

    PubMed

    Laso, Pedro Merino; Brosset, David; Puentes, John

    2017-10-01

    This article presents a dataset produced to investigate how data and information quality estimations enable to detect aNomalies and malicious acts in cyber-physical systems. Data were acquired making use of a cyber-physical subsystem consisting of liquid containers for fuel or water, along with its automated control and data acquisition infrastructure. Described data consist of temporal series representing five operational scenarios - Normal, aNomalies, breakdown, sabotages, and cyber-attacks - corresponding to 15 different real situations. The dataset is publicly available in the .zip file published with the article, to investigate and compare faulty operation detection and characterization methods for cyber-physical systems.

  4. CMS Analysis and Data Reduction with Apache Spark

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gutsche, Oliver; Canali, Luca; Cremer, Illia

    Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and tools, promising a fresh look at analysis ofmore » very large datasets that could potentially reduce the time-to-physics with increased interactivity. Moreover these new tools are typically actively developed by large communities, often profiting of industry resources, and under open source licensing. These factors result in a boost for adoption and maturity of the tools and for the communities supporting them, at the same time helping in reducing the cost of ownership for the end-users. In this talk, we are presenting studies of using Apache Spark for end user data analysis. We are studying the HEP analysis workflow separated into two thrusts: the reduction of centrally produced experiment datasets and the end-analysis up to the publication plot. Studying the first thrust, CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. Studying the second thrust, we are presenting studies on using Apache Spark for a CMS Dark Matter physics search, comparing Spark's feasibility, usability and performance to the ROOT-based analysis.« less

  5. The interplay between human population dynamics and flooding in Bangladesh: a spatial analysis

    NASA Astrophysics Data System (ADS)

    di Baldassarre, G.; Yan, K.; Ferdous, MD. R.; Brandimarte, L.

    2014-09-01

    In Bangladesh, socio-economic and hydrological processes are both extremely dynamic and inter-related. Human population patterns are often explained as a response, or adaptation strategy, to physical events, e.g. flooding, salt-water intrusion, and erosion. Meanwhile, these physical processes are exacerbated, or mitigated, by diverse human interventions, e.g. river diversion, levees and polders. In this context, this paper describes an attempt to explore the complex interplay between floods and societies in Bangladeshi floodplains. In particular, we performed a spatially-distributed analysis of the interactions between the dynamics of human settlements and flood inundation patterns. To this end, we used flooding simulation results from inundation modelling, LISFLOOD-FP, as well as global datasets of population distribution data, such as the Gridded Population of the World (20 years, from 1990 to 2010) and HYDE datasets (310 years, from 1700 to 2010). The outcomes of this work highlight the behaviour of Bangladeshi floodplains as complex human-water systems and indicate the need to go beyond the traditional narratives based on one-way cause-effects, e.g. climate change leading to migrations.

  6. Systematic chemical-genetic and chemical-chemical interaction datasets for prediction of compound synergism

    PubMed Central

    Wildenhain, Jan; Spitzer, Michaela; Dolma, Sonam; Jarvik, Nick; White, Rachel; Roy, Marcia; Griffiths, Emma; Bellows, David S.; Wright, Gerard D.; Tyers, Mike

    2016-01-01

    The network structure of biological systems suggests that effective therapeutic intervention may require combinations of agents that act synergistically. However, a dearth of systematic chemical combination datasets have limited the development of predictive algorithms for chemical synergism. Here, we report two large datasets of linked chemical-genetic and chemical-chemical interactions in the budding yeast Saccharomyces cerevisiae. We screened 5,518 unique compounds against 242 diverse yeast gene deletion strains to generate an extended chemical-genetic matrix (CGM) of 492,126 chemical-gene interaction measurements. This CGM dataset contained 1,434 genotype-specific inhibitors, termed cryptagens. We selected 128 structurally diverse cryptagens and tested all pairwise combinations to generate a benchmark dataset of 8,128 pairwise chemical-chemical interaction tests for synergy prediction, termed the cryptagen matrix (CM). An accompanying database resource called ChemGRID was developed to enable analysis, visualisation and downloads of all data. The CGM and CM datasets will facilitate the benchmarking of computational approaches for synergy prediction, as well as chemical structure-activity relationship models for anti-fungal drug discovery. PMID:27874849

  7. Finding undetected protein associations in cell signaling by belief propagation.

    PubMed

    Bailly-Bechet, M; Borgs, C; Braunstein, A; Chayes, J; Dagkessamanskaia, A; François, J-M; Zecchina, R

    2011-01-11

    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.

  8. The Virtual Environment for Rapid Prototyping of the Intelligent Environment

    PubMed Central

    Bouzouane, Abdenour; Gaboury, Sébastien

    2017-01-01

    Advances in domains such as sensor networks and electronic and ambient intelligence have allowed us to create intelligent environments (IEs). However, research in IE is being held back by the fact that researchers face major difficulties, such as a lack of resources for their experiments. Indeed, they cannot easily build IEs to evaluate their approaches. This is mainly because of economic and logistical issues. In this paper, we propose a simulator to build virtual IEs. Simulators are a good alternative to physical IEs because they are inexpensive, and experiments can be conducted easily. Our simulator is open source and it provides users with a set of virtual sensors that simulates the behavior of real sensors. This simulator gives the user the capacity to build their own environment, providing a model to edit inhabitants’ behavior and an interactive mode. In this mode, the user can directly act upon IE objects. This simulator gathers data generated by the interactions in order to produce datasets. These datasets can be used by scientists to evaluate several approaches in IEs. PMID:29112175

  9. The radiation-belt electron phase-space-density response to stream-interaction regions: A study combining multi-point observations, data-assimilation, and physics-based modeling

    NASA Astrophysics Data System (ADS)

    Kellerman, A. C.; Shprits, Y.; McPherron, R. L.; Kondrashov, D. A.; Weygand, J. M.; Zhu, H.; Drozdov, A.

    2017-12-01

    Presented is an analysis of the phase-space density (PSD) response to the stream-interaction region (SIR), which utilizes a reanalysis dataset principally comprised of the data-assimilative Versatile Electron Radiation Belt (VERB) code, Van Allen Probe and GOES observations. The dataset spans the period 2012-2017, and includes several SIR (and CIR) storms. The PSD is examined for evidence of injections, transport, acceleration, and loss by considering the instantaneous and time-averaged change at adiabatic invariant values that correspond to ring-current, relativistic, and ultra-relativistic energies. In the solar wind, the following variables in the slow and fast wind on either side of the stream interface (SI) are considered in each case: the coronal hole polarity, IMF, solar wind speed, density, pressure, and SI tilt angle. In the magnetosphere, the Dst, AE, and past PSD state are considered. Presented is an analysis of the dominant mechanisms, both external and internal to the magnetosphere, that cause radiation-belt electron non-adiabatic changes during the passage of these fascinating solar wind structures.

  10. The Virtual Environment for Rapid Prototyping of the Intelligent Environment.

    PubMed

    Francillette, Yannick; Boucher, Eric; Bouzouane, Abdenour; Gaboury, Sébastien

    2017-11-07

    Advances in domains such as sensor networks and electronic and ambient intelligence have allowed us to create intelligent environments (IEs). However, research in IE is being held back by the fact that researchers face major difficulties, such as a lack of resources for their experiments. Indeed, they cannot easily build IEs to evaluate their approaches. This is mainly because of economic and logistical issues. In this paper, we propose a simulator to build virtual IEs. Simulators are a good alternative to physical IEs because they are inexpensive, and experiments can be conducted easily. Our simulator is open source and it provides users with a set of virtual sensors that simulates the behavior of real sensors. This simulator gives the user the capacity to build their own environment, providing a model to edit inhabitants' behavior and an interactive mode. In this mode, the user can directly act upon IE objects. This simulator gathers data generated by the interactions in order to produce datasets. These datasets can be used by scientists to evaluate several approaches in IEs.

  11. Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets

    NASA Astrophysics Data System (ADS)

    Tonkova, V.; Paulus, D.; Neeb, H.

    2013-02-01

    We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.

  12. A multimodal dataset for authoring and editing multimedia content: The MAMEM project.

    PubMed

    Nikolopoulos, Spiros; Petrantonakis, Panagiotis C; Georgiadis, Kostas; Kalaganis, Fotis; Liaros, Georgios; Lazarou, Ioulietta; Adam, Katerina; Papazoglou-Chalikias, Anastasios; Chatzilari, Elisavet; Oikonomou, Vangelis P; Kumar, Chandan; Menges, Raphael; Staab, Steffen; Müller, Daniel; Sengupta, Korok; Bostantjopoulou, Sevasti; Katsarou, Zoe; Zeilig, Gabi; Plotnik, Meir; Gotlieb, Amihai; Kizoni, Racheli; Fountoukidou, Sofia; Ham, Jaap; Athanasiou, Dimitrios; Mariakaki, Agnes; Comanducci, Dario; Sabatini, Edoardo; Nistico, Walter; Plank, Markus; Kompatsiaris, Ioannis

    2017-12-01

    We present a dataset that combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate) signals collected from 34 individuals (18 able-bodied and 16 motor-impaired). Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. The presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.

  13. 'Tagger' - a Mac OS X Interactive Graphical Application for Data Inference and Analysis of N-Dimensional Datasets in the Natural Physical Sciences.

    NASA Astrophysics Data System (ADS)

    Morse, P. E.; Reading, A. M.; Lueg, C.

    2014-12-01

    Pattern-recognition in scientific data is not only a computational problem but a human-observer problem as well. Human observation of - and interaction with - data visualization software can augment, select, interrupt and modify computational routines and facilitate processes of pattern and significant feature recognition for subsequent human analysis, machine learning, expert and artificial intelligence systems.'Tagger' is a Mac OS X interactive data visualisation tool that facilitates Human-Computer interaction for the recognition of patterns and significant structures. It is a graphical application developed using the Quartz Composer framework. 'Tagger' follows a Model-View-Controller (MVC) software architecture: the application problem domain (the model) is to facilitate novel ways of abstractly representing data to a human interlocutor, presenting these via different viewer modalities (e.g. chart representations, particle systems, parametric geometry) to the user (View) and enabling interaction with the data (Controller) via a variety of Human Interface Devices (HID). The software enables the user to create an arbitrary array of tags that may be appended to the visualised data, which are then saved into output files as forms of semantic metadata. Three fundamental problems that are not strongly supported by conventional scientific visualisation software are addressed:1] How to visually animate data over time, 2] How to rapidly deploy unconventional parametrically driven data visualisations, 3] How to construct and explore novel interaction models that capture the activity of the end-user as semantic metadata that can be used to computationally enhance subsequent interrogation. Saved tagged data files may be loaded into Tagger, so that tags may be tagged, if desired. Recursion opens up the possibility of refining or overlapping different types of tags, tagging a variety of different POIs or types of events, and of capturing different types of specialist observations of important or noticeable events. Other visualisations and modes of interaction will also be demonstrated, with the aim of discovering knowledge in large datasets in the natural, physical sciences. Fig.1 Wave height data from an oceanographic Wave Rider Buoy. Colors/radii are driven by wave height data.

  14. Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps

    PubMed Central

    Huang, Hailiang; Jedynak, Bruno M; Bader, Joel S

    2007-01-01

    Yeast two-hybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high false-discovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic false-positive interactions) or a high false-negative rate (interaction sets have low overlap because each misses many true interactions). We extend capture–recapture theory to provide the first unified model for false-positive and false-negative rates for two-hybrid screens. Analysis of yeast, worm, and fly data indicates that 25% to 45% of the reported interactions are likely false positives. Membrane proteins have higher false-discovery rates on average, and signal transduction proteins have lower rates. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical undersampling and a 55% to 85% false-negative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the Erdös-Rényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All software and datasets are available in Datasets S1 and S2, Figures S1–S5, and Tables S1−S6, and are also available from our Web site, http://www.baderzone.org. PMID:18039026

  15. New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size.

    PubMed

    Sambourg, Laure; Thierry-Mieg, Nicolas

    2010-12-21

    As protein interactions mediate most cellular mechanisms, protein-protein interaction networks are essential in the study of cellular processes. Consequently, several large-scale interactome mapping projects have been undertaken, and protein-protein interactions are being distilled into databases through literature curation; yet protein-protein interaction data are still far from comprehensive, even in the model organism Saccharomyces cerevisiae. Estimating the interactome size is important for evaluating the completeness of current datasets, in order to measure the remaining efforts that are required. We examined the yeast interactome from a new perspective, by taking into account how thoroughly proteins have been studied. We discovered that the set of literature-curated protein-protein interactions is qualitatively different when restricted to proteins that have received extensive attention from the scientific community. In particular, these interactions are less often supported by yeast two-hybrid, and more often by more complex experiments such as biochemical activity assays. Our analysis showed that high-throughput and literature-curated interactome datasets are more correlated than commonly assumed, but that this bias can be corrected for by focusing on well-studied proteins. We thus propose a simple and reliable method to estimate the size of an interactome, combining literature-curated data involving well-studied proteins with high-throughput data. It yields an estimate of at least 37, 600 direct physical protein-protein interactions in S. cerevisiae. Our method leads to higher and more accurate estimates of the interactome size, as it accounts for interactions that are genuine yet difficult to detect with commonly-used experimental assays. This shows that we are even further from completing the yeast interactome map than previously expected.

  16. A Sample Data Publication: Interactive Access, Analysis and Display of Remotely Stored Datasets From Hurricane Charley

    NASA Astrophysics Data System (ADS)

    Weber, J.; Domenico, B.

    2004-12-01

    This paper is an example of what we call data interactive publications. With a properly configured workstation, the readers can click on "hotspots" in the document that launches an interactive analysis tool called the Unidata Integrated Data Viewer (IDV). The IDV will enable the readers to access, analyze and display datasets on remote servers as well as documents describing them. Beyond the parameters and datasets initially configured into the paper, the analysis tool will have access to all the other dataset parameters as well as to a host of other datasets on remote servers. These data interactive publications are built on top of several data delivery, access, discovery, and visualization tools developed by Unidata and its partner organizations. For purposes of illustrating this integrative technology, we will use data from the event of Hurricane Charley over Florida from August 13-15, 2004. This event illustrates how components of this process fit together. The Local Data Manager (LDM), Open-source Project for a Network Data Access Protocol (OPeNDAP) and Abstract Data Distribution Environment (ADDE) services, Thematic Realtime Environmental Distributed Data Service (THREDDS) cataloging services, and the IDV are highlighted in this example of a publication with embedded pointers for accessing and interacting with remote datasets. An important objective of this paper is to illustrate how these integrated technologies foster the creation of documents that allow the reader to learn the scientific concepts by direct interaction with illustrative datasets, and help build a framework for integrated Earth System science.

  17. Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling.

    PubMed

    Sferra, Gabriella; Fratini, Federica; Ponzi, Marta; Pizzi, Elisabetta

    2017-09-05

    Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

  18. Metadata improvements driving new tools and services at a NASA data center

    NASA Astrophysics Data System (ADS)

    Moroni, D. F.; Hausman, J.; Foti, G.; Armstrong, E. M.

    2011-12-01

    The NASA Physical Oceanography DAAC (PO.DAAC) is responsible for distributing and maintaining satellite derived oceanographic data from a number of NASA and non-NASA missions for the physical disciplines of ocean winds, sea surface temperature, ocean topography and gravity. Currently its holdings consist of over 600 datasets with a data archive in excess of 200 Terrabytes. The PO.DAAC has recently embarked on a metadata quality and completeness project to migrate, update and improve metadata records for over 300 public datasets. An interactive database management tool has been developed to allow data scientists to enter, update and maintain metadata records. This tool communicates directly with PO.DAAC's Data Management and Archiving System (DMAS), which serves as the new archival and distribution backbone as well as a permanent repository of dataset and granule-level metadata. Although we will briefly discuss the tool, more important ramifications are the ability to now expose, propagate and leverage the metadata in a number of ways. First, the metadata are exposed directly through a faceted and free text search interface directly from drupal-based PO.DAAC web pages allowing for quick browsing and data discovery especially by "drilling" through the various facet levels that organize datasets by time/space resolution, processing level, sensor, measurement type etc. Furthermore, the metadata can now be exposed through web services to produce metadata records in a number of different formats such as FGDC and ISO 19115, or potentially propagated to visualization and subsetting tools, and other discovery interfaces. The fundamental concept is that the metadata forms the essential bridge between the user, and the tool or discovery mechanism for a broad range of ocean earth science data records.

  19. Fluid Structure Interaction of Parachutes in Supersonic Planetary Entry

    NASA Technical Reports Server (NTRS)

    Sengupta, Anita

    2011-01-01

    A research program to provide physical insight into disk-gap-band parachute operation in the supersonic regime on Mars was conducted. The program included supersonic wind tunnel tests, computational fluid dynamics and fluid structure interaction simulations. Specifically, the nature and cause of the "area oscillation" phenomenon were investigated to determine the scale, aerodynamic, and aero-elastic dependence of the supersonic parachute collapse and re-inflation event. A variety of non-intrusive, temporally resolved, and high resolution diagnostic techniques were used to interrogate the flow and generate validation datasets. The results of flow visualization, particle image velocimetry, load measurements, and photogrammetric reconstruction will be presented. Implications to parachute design, use, and verification will also be discussed.

  20. A Semantic Sensor Web for Environmental Decision Support Applications

    PubMed Central

    Gray, Alasdair J. G.; Sadler, Jason; Kit, Oles; Kyzirakos, Kostis; Karpathiotakis, Manos; Calbimonte, Jean-Paul; Page, Kevin; García-Castro, Raúl; Frazer, Alex; Galpin, Ixent; Fernandes, Alvaro A. A.; Paton, Norman W.; Corcho, Oscar; Koubarakis, Manolis; De Roure, David; Martinez, Kirk; Gómez-Pérez, Asunción

    2011-01-01

    Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g., flood emergency response. For these applications, the sensor readings need to be put in context by integrating them with other sources of data about the surrounding environment. Traditional systems for predicting and detecting floods rely on methods that need significant human resources. In this paper we describe a semantic sensor web architecture for integrating multiple heterogeneous datasets, including live and historic sensor data, databases, and map layers. The architecture provides mechanisms for discovering datasets, defining integrated views over them, continuously receiving data in real-time, and visualising on screen and interacting with the data. Our approach makes extensive use of web service standards for querying and accessing data, and semantic technologies to discover and integrate datasets. We demonstrate the use of our semantic sensor web architecture in the context of a flood response planning web application that uses data from sensor networks monitoring the sea-state around the coast of England. PMID:22164110

  1. Interactive visualization and analysis of multimodal datasets for surgical applications.

    PubMed

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  2. An interactive web application for the dissemination of human systems immunology data.

    PubMed

    Speake, Cate; Presnell, Scott; Domico, Kelly; Zeitner, Brad; Bjork, Anna; Anderson, David; Mason, Michael J; Whalen, Elizabeth; Vargas, Olivia; Popov, Dimitry; Rinchai, Darawan; Jourde-Chiche, Noemie; Chiche, Laurent; Quinn, Charlie; Chaussabel, Damien

    2015-06-19

    Systems immunology approaches have proven invaluable in translational research settings. The current rate at which large-scale datasets are generated presents unique challenges and opportunities. Mining aggregates of these datasets could accelerate the pace of discovery, but new solutions are needed to integrate the heterogeneous data types with the contextual information that is necessary for interpretation. In addition, enabling tools and technologies facilitating investigators' interaction with large-scale datasets must be developed in order to promote insight and foster knowledge discovery. State of the art application programming was employed to develop an interactive web application for browsing and visualizing large and complex datasets. A collection of human immune transcriptome datasets were loaded alongside contextual information about the samples. We provide a resource enabling interactive query and navigation of transcriptome datasets relevant to human immunology research. Detailed information about studies and samples are displayed dynamically; if desired the associated data can be downloaded. Custom interactive visualizations of the data can be shared via email or social media. This application can be used to browse context-rich systems-scale data within and across systems immunology studies. This resource is publicly available online at [Gene Expression Browser Landing Page ( https://gxb.benaroyaresearch.org/dm3/landing.gsp )]. The source code is also available openly [Gene Expression Browser Source Code ( https://github.com/BenaroyaResearch/gxbrowser )]. We have developed a data browsing and visualization application capable of navigating increasingly large and complex datasets generated in the context of immunological studies. This intuitive tool ensures that, whether taken individually or as a whole, such datasets generated at great effort and expense remain interpretable and a ready source of insight for years to come.

  3. Differential C3NET reveals disease networks of direct physical interactions

    PubMed Central

    2011-01-01

    Background Genes might have different gene interactions in different cell conditions, which might be mapped into different networks. Differential analysis of gene networks allows spotting condition-specific interactions that, for instance, form disease networks if the conditions are a disease, such as cancer, and normal. This could potentially allow developing better and subtly targeted drugs to cure cancer. Differential network analysis with direct physical gene interactions needs to be explored in this endeavour. Results C3NET is a recently introduced information theory based gene network inference algorithm that infers direct physical gene interactions from expression data, which was shown to give consistently higher inference performances over various networks than its competitors. In this paper, we present, DC3net, an approach to employ C3NET in inferring disease networks. We apply DC3net on a synthetic and real prostate cancer datasets, which show promising results. With loose cutoffs, we predicted 18583 interactions from tumor and normal samples in total. Although there are no reference interactions databases for the specific conditions of our samples in the literature, we found verifications for 54 of our predicted direct physical interactions from only four of the biological interaction databases. As an example, we predicted that RAD50 with TRF2 have prostate cancer specific interaction that turned out to be having validation from the literature. It is known that RAD50 complex associates with TRF2 in the S phase of cell cycle, which suggests that this predicted interaction may promote telomere maintenance in tumor cells in order to allow tumor cells to divide indefinitely. Our enrichment analysis suggests that the identified tumor specific gene interactions may be potentially important in driving the growth in prostate cancer. Additionally, we found that the highest connected subnetwork of our predicted tumor specific network is enriched for all proliferation genes, which further suggests that the genes in this network may serve in the process of oncogenesis. Conclusions Our approach reveals disease specific interactions. It may help to make experimental follow-up studies more cost and time efficient by prioritizing disease relevant parts of the global gene network. PMID:21777411

  4. Online sparse Gaussian process based human motion intent learning for an electrically actuated lower extremity exoskeleton.

    PubMed

    Long, Yi; Du, Zhi-Jiang; Chen, Chao-Feng; Dong, Wei; Wang, Wei-Dong

    2017-07-01

    The most important step for lower extremity exoskeleton is to infer human motion intent (HMI), which contributes to achieve human exoskeleton collaboration. Since the user is in the control loop, the relationship between human robot interaction (HRI) information and HMI is nonlinear and complicated, which is difficult to be modeled by using mathematical approaches. The nonlinear approximation can be learned by using machine learning approaches. Gaussian Process (GP) regression is suitable for high-dimensional and small-sample nonlinear regression problems. GP regression is restrictive for large data sets due to its computation complexity. In this paper, an online sparse GP algorithm is constructed to learn the HMI. The original training dataset is collected when the user wears the exoskeleton system with friction compensation to perform unconstrained movement as far as possible. The dataset has two kinds of data, i.e., (1) physical HRI, which is collected by torque sensors placed at the interaction cuffs for the active joints, i.e., knee joints; (2) joint angular position, which is measured by optical position sensors. To reduce the computation complexity of GP, grey relational analysis (GRA) is utilized to specify the original dataset and provide the final training dataset. Those hyper-parameters are optimized offline by maximizing marginal likelihood and will be applied into online GP regression algorithm. The HMI, i.e., angular position of human joints, will be regarded as the reference trajectory for the mechanical legs. To verify the effectiveness of the proposed algorithm, experiments are performed on a subject at a natural speed. The experimental results show the HMI can be obtained in real time, which can be extended and employed in the similar exoskeleton systems.

  5. Understanding patient outcomes after acute respiratory distress syndrome: identifying subtypes of physical, cognitive and mental health outcomes.

    PubMed

    Brown, Samuel M; Wilson, Emily L; Presson, Angela P; Dinglas, Victor D; Greene, Tom; Hopkins, Ramona O; Needham, Dale M

    2017-12-01

    With improving short-term mortality in acute respiratory distress syndrome (ARDS), understanding survivors' posthospitalisation outcomes is increasingly important. However, little is known regarding associations among physical, cognitive and mental health outcomes. Identification of outcome subtypes may advance understanding of post-ARDS morbidities. We analysed baseline variables and 6-month health status for participants in the ARDS Network Long-Term Outcomes Study. After division into derivation and validation datasets, we used weighted network analysis to identify subtypes from predictors and outcomes in the derivation dataset. We then used recursive partitioning to develop a subtype classification rule and assessed adequacy of the classification rule using a kappa statistic with the validation dataset. Among 645 ARDS survivors, 430 were in the derivation and 215 in the validation datasets. Physical and mental health status, but not cognitive status, were closely associated. Four distinct subtypes were apparent (percentages in the derivation cohort): (1) mildly impaired physical and mental health (22% of patients), (2) moderately impaired physical and mental health (39%), (3) severely impaired physical health with moderately impaired mental health (15%) and (4) severely impaired physical and mental health (24%). The classification rule had high agreement (kappa=0.89 in validation dataset). Female Latino smokers had the poorest status, while male, non-Latino non-smokers had the best status. We identified four post-ARDS outcome subtypes that were predicted by sex, ethnicity, pre-ARDS smoking status and other baseline factors. These subtypes may help develop tailored rehabilitation strategies, including investigation of combined physical and mental health interventions, and distinct interventions to improve cognitive outcomes. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. Automated Analysis of Fluorescence Microscopy Images to Identify Protein-Protein Interactions

    DOE PAGES

    Venkatraman, S.; Doktycz, M. J.; Qi, H.; ...

    2006-01-01

    The identification of protein interactions is important for elucidating biological networks. One obstacle in comprehensive interaction studies is the analyses of large datasets, particularly those containing images. Development of an automated system to analyze an image-based protein interaction dataset is needed. Such an analysis system is described here, to automatically extract features from fluorescence microscopy images obtained from a bacterial protein interaction assay. These features are used to relay quantitative values that aid in the automated scoring of positive interactions. Experimental observations indicate that identifying at least 50% positive cells in an image is sufficient to detect a protein interaction.more » Based on this criterion, the automated system presents 100% accuracy in detecting positive interactions for a dataset of 16 images. Algorithms were implemented using MATLAB and the software developed is available on request from the authors.« less

  7. Simultaneous fits in ISIS on the example of GRO J1008-57

    NASA Astrophysics Data System (ADS)

    Kühnel, Matthias; Müller, Sebastian; Kreykenbohm, Ingo; Schwarm, Fritz-Walter; Grossberger, Christoph; Dauser, Thomas; Pottschmidt, Katja; Ferrigno, Carlo; Rothschild, Richard E.; Klochkov, Dmitry; Staubert, Rüdiger; Wilms, Joern

    2015-04-01

    Parallel computing and steadily increasing computation speed have led to a new tool for analyzing multiple datasets and datatypes: fitting several datasets simultaneously. With this technique, physically connected parameters of individual data can be treated as a single parameter by implementing this connection into the fit directly. We discuss the terminology, implementation, and possible issues of simultaneous fits based on the X-ray data analysis tool Interactive Spectral Interpretation System (ISIS). While all data modeling tools in X-ray astronomy allow in principle fitting data from multiple data sets individually, the syntax used in these tools is not often well suited for this task. Applying simultaneous fits to the transient X-ray binary GRO J1008-57, we find that the spectral shape is only dependent on X-ray flux. We determine time independent parameters such as, e.g., the folding energy E_fold, with unprecedented precision.

  8. Open University Learning Analytics dataset.

    PubMed

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  9. Open University Learning Analytics dataset

    PubMed Central

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-01-01

    Learning Analytics focuses on the collection and analysis of learners’ data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students’ interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license. PMID:29182599

  10. Unbalanced 2 x 2 Factorial Designs and the Interaction Effect: A Troublesome Combination

    PubMed Central

    2015-01-01

    In this power study, ANOVAs of unbalanced and balanced 2 x 2 datasets are compared (N = 120). Datasets are created under the assumption that H1 of the effects is true. The effects are constructed in two ways, assuming: 1. contributions to the effects solely in the treatment groups; 2. contrasting contributions in treatment and control groups. The main question is whether the two ANOVA correction methods for imbalance (applying Sums of Squares Type II or III; SS II or SS III) offer satisfactory power in the presence of an interaction. Overall, SS II showed higher power, but results varied strongly. When compared to a balanced dataset, for some unbalanced datasets the rejection rate of H0 of main effects was undesirably higher. SS III showed consistently somewhat lower power. When the effects were constructed with equal contributions from control and treatment groups, the interaction could be re-estimated satisfactorily. When an interaction was present, SS III led consistently to somewhat lower rejection rates of H0 of main effects, compared to the rejection rates found in equivalent balanced datasets, while SS II produced strongly varying results. In data constructed with only effects in the treatment groups and no effects in the control groups, the H0 of moderate and strong interaction effects was often not rejected and SS II seemed applicable. Even then, SS III provided slightly better results when a true interaction was present. ANOVA allowed not always for a satisfactory re-estimation of the unique interaction effect. Yet, SS II worked better only when an interaction effect could be excluded, whereas SS III results were just marginally worse in that case. Overall, SS III provided consistently 1 to 5% lower rejection rates of H0 in comparison with analyses of balanced datasets, while results of SS II varied too widely for general application. PMID:25807514

  11. Land-atmosphere interaction patterns in southeastern South America using satellite products and climate models

    NASA Astrophysics Data System (ADS)

    Spennemann, P. C.; Salvia, M.; Ruscica, R. C.; Sörensson, A. A.; Grings, F.; Karszenbaum, H.

    2018-02-01

    In regions of strong Land-Atmosphere (L-A) interaction, soil moisture (SM) conditions can impact the atmosphere through modulating the land surface fluxes. The importance of the identification of L-A interaction regions lies in the potential improvement of the weather/seasonal forecast and the better understanding of the physical mechanisms involved. This study aims to compare the terrestrial segment of the L-A interaction from satellite products and climate models, motivated by previous modeling studies pointing out southeastern South America (SESA) as a L-A hotspot during austral summer. In addition, the L-A interaction under dry or wet anomalous conditions over SESA is analyzed. To identify L-A hotspots the AMSRE-LPRM SM and MODIS land surface temperature products; coupled climate models and uncoupled land surface models were used. SESA highlights as a strong L-A interaction hotspot when employing different metrics, temporal scales and independent datasets, showing consistency between models and satellite estimations. Both AMSRE-LPRM bands (X and C) are consistent showing a strong L-A interaction hotspot over the Pampas ecoregion. Intensification and a larger spatial extent of the L-A interaction for dry summers was observed in both satellite products and models compared to wet summers. These results, which were derived from measured physical variables, are encouraging and promising for future studies analyzing L-A interactions. L-A interaction analysis is proposed here as a meeting point between remote sensing and climate modelling communities of Argentina, within a region with the highest agricultural and livestock production of the continent, but with an important lack of in-situ SM observations.

  12. Evolving hard problems: Generating human genetics datasets with a complex etiology.

    PubMed

    Himmelstein, Daniel S; Greene, Casey S; Moore, Jason H

    2011-07-07

    A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

  13. Graph theoretic analysis of protein interaction networks of eukaryotes

    NASA Astrophysics Data System (ADS)

    Goh, K.-I.; Kahng, B.; Kim, D.

    2005-11-01

    Owing to the recent progress in high-throughput experimental techniques, the datasets of large-scale protein interactions of prototypical multicellular species, the nematode worm Caenorhabditis elegans and the fruit fly Drosophila melanogaster, have been assayed. The datasets are obtained mainly by using the yeast hybrid method, which contains false-positive and false-negative simultaneously. Accordingly, while it is desirable to test such datasets through further wet experiments, here we invoke recent developed network theory to test such high-throughput datasets in a simple way. Based on the fact that the key biological processes indispensable to maintaining life are conserved across eukaryotic species, and the comparison of structural properties of the protein interaction networks (PINs) of the two species with those of the yeast PIN, we find that while the worm and yeast PIN datasets exhibit similar structural properties, the current fly dataset, though most comprehensively screened ever, does not reflect generic structural properties correctly as it is. The modularity is suppressed and the connectivity correlation is lacking. Addition of interologs to the current fly dataset increases the modularity and enhances the occurrence of triangular motifs as well. The connectivity correlation function of the fly, however, remains distinct under such interolog additions, for which we present a possible scenario through an in silico modeling.

  14. NP-PAH Interaction Dataset

    EPA Pesticide Factsheets

    Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration were allowed to equilibrate with known mass of nanoparticles. The mixture was then ultracentrifuged and sampled for analysis. This dataset is associated with the following publication:Sahle-Demessie, E., A. Zhao, C. Han, B. Hann, and H. Grecsek. Interaction of engineered nanomaterials with hydrophobic organic pollutants.. Journal of Nanotechnology. Hindawi Publishing Corporation, New York, NY, USA, 27(28): 284003, (2016).

  15. Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning

    NASA Astrophysics Data System (ADS)

    Bereau, Tristan; DiStasio, Robert A.; Tkatchenko, Alexandre; von Lilienfeld, O. Anatole

    2018-06-01

    Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions—electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters—optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically relevant molecules. We further focus on hydrogen-bonded complexes—essential but challenging due to their directional nature—where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML for denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.

  16. Channel Classification across Arid West Landscapes in Support of OHW Delineation

    DTIC Science & Technology

    2013-01-01

    8 Figure 5. National Hydrography Dataset for Chinle Creek, AZ...the OHW boundary is determined by observing recent physical evidence subsequent to flow. Channel morphology and physical features associated with the...data from the National Hydrography Dataset (NHD) (USGS 2010). The NHD digital ERDC/CRREL TR-13-3 9 stream data were downloaded as a line

  17. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  18. Gravity, aeromagnetic and rock-property data of the central California Coast Ranges

    USGS Publications Warehouse

    Langenheim, V.E.

    2014-01-01

    Gravity, aeromagnetic, and rock-property data were collected to support geologic-mapping, water-resource, and seismic-hazard studies for the central California Coast Ranges. These data are combined with existing data to provide gravity, aeromagnetic, and physical-property datasets for this region. The gravity dataset consists of approximately 18,000 measurements. The aeromagnetic dataset consists of total-field anomaly values from several detailed surveys that have been merged and gridded at an interval of 200 m. The physical property dataset consists of approximately 800 density measurements and 1,100 magnetic-susceptibility measurements from rock samples, in addition to previously published borehole gravity surveys from Santa Maria Basin, density logs from Salinas Valley, and intensities of natural remanent magnetization.

  19. A web Accessible Framework for Discovery, Visualization and Dissemination of Polar Data

    NASA Astrophysics Data System (ADS)

    Kirsch, P. J.; Breen, P.; Barnes, T. D.

    2007-12-01

    A web accessible information framework, currently under development within the Physical Sciences Division of the British Antarctic Survey is described. The datasets accessed are generally heterogeneous in nature from fields including space physics, meteorology, atmospheric chemistry, ice physics, and oceanography. Many of these are returned in near real time over a 24/7 limited bandwidth link from remote Antarctic Stations and ships. The requirement is to provide various user groups - each with disparate interests and demands - a system incorporating a browsable and searchable catalogue; bespoke data summary visualization, metadata access facilities and download utilities. The system allows timely access to raw and processed datasets through an easily navigable discovery interface. Once discovered, a summary of the dataset can be visualized in a manner prescribed by the particular projects and user communities or the dataset may be downloaded, subject to accessibility restrictions that may exist. In addition, access to related ancillary information including software, documentation, related URL's and information concerning non-electronic media (of particular relevance to some legacy datasets) is made directly available having automatically been associated with a dataset during the discovery phase. Major components of the framework include the relational database containing the catalogue, the organizational structure of the systems holding the data - enabling automatic updates of the system catalogue and real-time access to data -, the user interface design, and administrative and data management scripts allowing straightforward incorporation of utilities, datasets and system maintenance.

  20. NP_PAH_interaction dataset

    EPA Pesticide Factsheets

    Concentrations of different polyaromatic hydrocarbons in water before and after interaction with nanomaterials. The results show the capacity of engineer nanomaterials for adsorbing different organic pollutants. This dataset is associated with the following publication:Sahle-Demessie, E., A. Zhao, C. Han, B. Hann, and H. Grecsek. Interaction of engineered nanomaterials with hydrophobic organic pollutants.. Journal of Nanotechnology. Hindawi Publishing Corporation, New York, NY, USA, 27(28): 284003, (2016).

  1. Common Structure in Different Physical Properties: Electrical Conductivity and Surface Waves Phase Velocity

    NASA Astrophysics Data System (ADS)

    Mandolesi, E.; Jones, A. G.; Roux, E.; Lebedev, S.

    2009-12-01

    Recently different studies were undertaken on the correlation between diverse geophysical datasets. Magnetotelluric (MT) data are used to map the electrical conductivity structure behind the Earth, but one of the problems in MT method is the lack in resolution in mapping zones beneath a region of high conductivity. Joint inversion of different datasets in which a common structure is recognizable reduces non-uniqueness and may improve the quality of interpretation when different dataset are sensitive to different physical properties with an underlined common structure. A common structure is recognized if the change of physical properties occur at the same spatial locations. Common structure may be recognized in 1D inversion of seismic and MT datasets, and numerous authors show that also 2D common structure may drive to an improvement of inversion quality while dataset are jointly inverted. In this presentation a tool to constrain MT 2D inversion with phase velocity of surface wave seismic data (SW) is proposed and is being developed and tested on synthetic data. Results obtained suggest that a joint inversion scheme could be applied with success along a section profile for which data are compatible with a 2D MT model.

  2. The MISTRALS programme data portal

    NASA Astrophysics Data System (ADS)

    Brissebrat, Guillaume; Belmahfoud, Nizar; Cloché, Sophie; Darras, Sabine; Descloitres, Jacques; Drocourt, Yoann; Ferré, Hélène; Henriot, Nicolas; Ramage, Karim

    2017-04-01

    Mediterranean Integrated STudies at Regional And Local Scales (MISTRALS) is a decennial programme for systematic observations and research dedicated to the understanding of the Mediterranean Basin environmental process and its evolution under the planet global change. It is composed of eight multidisciplinary projects that cover all the components of the Earth system (atmosphere, ocean, continental surfaces, lithosphere...) and their interactions, all the disciplines (physics, chemistry, marine biogeochemistry, biology, geology, sociology…) and different time scales. For example Hydrological cycle in the Mediterranean eXperiment (HyMeX) aims at improving the predictability of rainfall extreme events, and assessing the social and economic vulnerability to extreme events and adaptation capacity. Paleo Mediterranean Experiment (PaleoMeX) is dedicated to the study of the interactions between climate, societies and civilizations of the Mediterranean world during the last 10000 years. Many long term monitoring research networks are associated with MISTRALS, such as Mediterranean Ocean Observing System on Environment (MOOSE), Centre d'Observation Régional pour la Surveillance du Climat et de l'environnement Atmosphérique et océanographique en Méditerranée occidentale (CORSICA) and the environmental observations from Mediterranean Eurocentre for Underwater Sciences and Technologies (MEUST-SE). Therefore, the data generated or used by the different MISTRALS projects are very heterogeneous. They include in situ observations, satellite products, model outputs, social sciences surveys... Some datasets are automatically produced by operational networks, and others come from research instruments and analysis procedures. They correspond to different time scales (historical time series, observatories, campaigns...) and are managed by several data centres. They originate from many scientific communities, with different data sharing practices, specific expectations and using different file formats and data processing tools. The MISTRALS data portal - http://mistrals.sedoo.fr/ - has been designed and developed as a unified tool for sharing scientific data in spite of many sources of heterogeneity, and for fostering collaboration between research communities. The metadata (data description) are standardized and comply with international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). A search tool allows to browse the catalog by keyword or multicriteria selection (area, period, physical property...) and to access data. Every in situ dataset is available in the native format, but the most commonly used datasets have been homogenized (property names, units, quality flags...) and inserted in a relational database, in order to enable accurate data selection, and download in standard formats. At present the MISTRALS data portal enables to access about 650 datasets. It counts more than 675 registered users and about 100 data requests every month. The number of available datasets is increasing daily, due to the provision of campaign datasets by several projects. Every scientist is invited to browse the catalog, complete the online registration and use MISTRALS data. Feel free to contact mistrals-contact@sedoo.fr for any question.

  3. Hierarchical cortical transcriptome disorganization in autism.

    PubMed

    Lombardo, Michael V; Courchesne, Eric; Lewis, Nathan E; Pramparo, Tiziano

    2017-01-01

    Autism spectrum disorders (ASD) are etiologically heterogeneous and complex. Functional genomics work has begun to identify a diverse array of dysregulated transcriptomic programs (e.g., synaptic, immune, cell cycle, DNA damage, WNT signaling, cortical patterning and differentiation) potentially involved in ASD brain abnormalities during childhood and adulthood. However, it remains unclear whether such diverse dysregulated pathways are independent of each other or instead reflect coordinated hierarchical systems-level pathology. Two ASD cortical transcriptome datasets were re-analyzed using consensus weighted gene co-expression network analysis (WGCNA) to identify common co-expression modules across datasets. Linear mixed-effect models and Bayesian replication statistics were used to identify replicable differentially expressed modules. Eigengene network analysis was then utilized to identify between-group differences in how co-expression modules interact and cluster into hierarchical meta-modular organization. Protein-protein interaction analyses were also used to determine whether dysregulated co-expression modules show enhanced interactions. We find replicable evidence for 10 gene co-expression modules that are differentially expressed in ASD cortex. Rather than being independent non-interacting sources of pathology, these dysregulated co-expression modules work in synergy and physically interact at the protein level. These systems-level transcriptional signals are characterized by downregulation of synaptic processes coordinated with upregulation of immune/inflammation, response to other organism, catabolism, viral processes, translation, protein targeting and localization, cell proliferation, and vasculature development. Hierarchical organization of meta-modules (clusters of highly correlated modules) is also highly affected in ASD. These findings highlight that dysregulation of the ASD cortical transcriptome is characterized by the dysregulation of multiple coordinated transcriptional programs producing synergistic systems-level effects that cannot be fully appreciated by studying the individual component biological processes in isolation.

  4. Reproducibility in Data-Scarce Environments

    NASA Astrophysics Data System (ADS)

    Darch, P. T.

    2016-12-01

    Among the usual requirements for reproducibility are large volumes of data and computationally intensive methods. Many fields within earth sciences, however, do not meet these requirements. Data are scarce and data-intensive methods are not well established. How can science be reproducible under these conditions? What changes, both infrastructural and cultural, are needed to advance reproducibility? This paper presents findings from a long-term social scientific case study of an emergent and data scarce field, the deep subseafloor biosphere. This field studies interactions between microbial communities living in the seafloor and the physical environments they inhabit. Factors such as these make reproducibility seem a distant goal for this community: - The relative newness of the field. Serious study began in the late 1990s; - The highly multidisciplinary nature of the field. Researchers come from a range of physical and life science backgrounds; - Data scarcity. Domain researchers produce much of these data in their own onshore laboratories by analyzing cores from international ocean drilling expeditions. Allocation of cores is negotiated between researchers from many fields. These factors interact in multiple ways to inhibit reproducibility: - Incentive structures emphasize producing new data and new knowledge rather than reanalysing extant data; - Only a few steps of laboratory analyses can be reproduced - such as analysis of DNA sequences, but not extraction of DNA from cores -, due to scarcity of cores; - Methodological heterogeneity is a consequence of multidisciplinarity, as researchers bring different techniques from diverse fields. - Few standards for data collection or analysis are available at this early stage of the field; - While datasets from multiple biological and physical phenomena can be integrated into a single workflow, curation tends to be divergent. Each type of dataset may be subject to different disparate policies and contributed to different databases. Our study demonstrates that data scarcity can be particularly acute in emerging scientific fields, and often results from resource scarcity more generally. Reproducibility tends to be a low priority among the many other scientific challenges they face.

  5. Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.

    PubMed

    Guo, Xuan; Meng, Yu; Yu, Ning; Pan, Yi

    2014-04-10

    Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.

  6. Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering

    PubMed Central

    2014-01-01

    Backgroud Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. Results In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Conclusions Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS. PMID:24717145

  7. Statistical Physics of Complex Substitutive Systems

    NASA Astrophysics Data System (ADS)

    Jin, Qing

    Diffusion processes are central to human interactions. Despite extensive studies that span multiple disciplines, our knowledge is limited to spreading processes in non-substitutive systems. Yet, a considerable number of ideas, products, and behaviors spread by substitution; to adopt a new one, agents must give up an existing one. This captures the spread of scientific constructs--forcing scientists to choose, for example, a deterministic or probabilistic worldview, as well as the adoption of durable items, such as mobile phones, cars, or homes. In this dissertation, I develop a statistical physics framework to describe, quantify, and understand substitutive systems. By empirically exploring three collected high-resolution datasets pertaining to such systems, I build a mechanistic model describing substitutions, which not only analytically predicts the universal macroscopic phenomenon discovered in the collected datasets, but also accurately captures the trajectories of individual items in a complex substitutive system, demonstrating a high degree of regularity and universality in substitutive systems. I also discuss the origins and insights of the parameters in the substitution model and possible generalization form of the mathematical framework. The systematical study of substitutive systems presented in this dissertation could potentially guide the understanding and prediction of all spreading phenomena driven by substitutions, from electric cars to scientific paradigms, and from renewable energy to new healthy habits.

  8. QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks.

    PubMed

    Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu

    2016-06-01

    Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. QuIN's web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.

  9. Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering.

    PubMed

    Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina

    2015-03-01

    Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mitrani, J

    Bayesian networks (BN) are an excellent tool for modeling uncertainties in systems with several interdependent variables. A BN is a directed acyclic graph, and consists of a structure, or the set of directional links between variables that depend on other variables, and conditional probabilities (CP) for each variable. In this project, we apply BN's to understand uncertainties in NIF ignition experiments. One can represent various physical properties of National Ignition Facility (NIF) capsule implosions as variables in a BN. A dataset containing simulations of NIF capsule implosions was provided. The dataset was generated from a radiation hydrodynamics code, and itmore » contained 120 simulations of 16 variables. Relevant knowledge about the physics of NIF capsule implosions and greedy search algorithms were used to search for hypothetical structures for a BN. Our preliminary results found 6 links between variables in the dataset. However, we thought there should have been more links between the dataset variables based on the physics of NIF capsule implosions. Important reasons for the paucity of links are the relatively small size of the dataset, and the sampling of the values for dataset variables. Another factor that might have caused the paucity of links is the fact that in the dataset, 20% of the simulations represented successful fusion, and 80% didn't, (simulations of unsuccessful fusion are useful for measuring certain diagnostics) which skewed the distributions of several variables, and possibly reduced the number of links. Nevertheless, by illustrating the interdependencies and conditional probabilities of several parameters and diagnostics, an accurate and complete BN built from an appropriate simulation set would provide uncertainty quantification for NIF capsule implosions.« less

  11. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919-2014.

    PubMed

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-04-26

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919-2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks.

  12. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919–2014

    PubMed Central

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-01-01

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919–2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks. PMID:27116565

  13. Note: The performance of new density functionals for a recent blind test of non-covalent interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mardirossian, Narbe; Head-Gordon, Martin

    Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.

  14. Note: The performance of new density functionals for a recent blind test of non-covalent interactions

    DOE PAGES

    Mardirossian, Narbe; Head-Gordon, Martin

    2016-11-09

    Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.

  15. Dataset of Scientific Inquiry Learning Environment

    ERIC Educational Resources Information Center

    Ting, Choo-Yee; Ho, Chiung Ching

    2015-01-01

    This paper presents the dataset collected from student interactions with INQPRO, a computer-based scientific inquiry learning environment. The dataset contains records of 100 students and is divided into two portions. The first portion comprises (1) "raw log data", capturing the student's name, interfaces visited, the interface…

  16. Runaways and weathervanes: The shape of stellar bow shocks

    NASA Astrophysics Data System (ADS)

    Henney, W. J.; Tarango-Yong, J. A.

    2017-11-01

    Stellar bow shocks are the result of the supersonic interaction between a stellar wind and its environment. Some of these are "runaways": high-velocity stars that have been ejected from a star cluster. Others are "weather vanes", where it is the local interstellar medium itself that is moving, perhaps as the result of a champagne flow of ionized gas from a nearby HII region. We propose a new two-dimensional classification scheme for bow shapes, which is based on dimensionless geometric ratios that can be estimated from observational images. The two ratios are related to the flatness of the bow’s apex, which we term "planitude" and the openness of its wings, which we term "alatude". We calculate the inclination-dependent tracks on the planitude-alatude plane that are predicted by simple models for the bow shock shape. We also measure the shapes of bow shocks from three different observational datasets: mid-infrared arcs around hot main-sequence stars, far-infrared arcs around luminous cool stars, and emission-line arcs around proplyds and other young stars in the Orion Nebula. Clear differences are found between the different datasets in their distributions on the planitude-alatude plane, which can be used to constrain the physics of the bow shock interaction and emission mechanisms in the different classes of object.

  17. Linking Automated Data Analysis and Visualization with Applications in Developmental Biology and High-Energy Physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruebel, Oliver

    2009-11-20

    Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less

  18. Dataset Lifecycle Policy

    NASA Technical Reports Server (NTRS)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  19. A Robust Dynamic Heart-Rate Detection Algorithm Framework During Intense Physical Activities Using Photoplethysmographic Signals

    PubMed Central

    Song, Jiajia; Li, Dan; Ma, Xiaoyuan; Teng, Guowei; Wei, Jianming

    2017-01-01

    Dynamic accurate heart-rate (HR) estimation using a photoplethysmogram (PPG) during intense physical activities is always challenging due to corruption by motion artifacts (MAs). It is difficult to reconstruct a clean signal and extract HR from contaminated PPG. This paper proposes a robust HR-estimation algorithm framework that uses one-channel PPG and tri-axis acceleration data to reconstruct the PPG and calculate the HR based on features of the PPG and spectral analysis. Firstly, the signal is judged by the presence of MAs. Then, the spectral peaks corresponding to acceleration data are filtered from the periodogram of the PPG when MAs exist. Different signal-processing methods are applied based on the amount of remaining PPG spectral peaks. The main MA-removal algorithm (NFEEMD) includes the repeated single-notch filter and ensemble empirical mode decomposition. Finally, HR calibration is designed to ensure the accuracy of HR tracking. The NFEEMD algorithm was performed on the 23 datasets from the 2015 IEEE Signal Processing Cup Database. The average estimation errors were 1.12 BPM (12 training datasets), 2.63 BPM (10 testing datasets) and 1.87 BPM (all 23 datasets), respectively. The Pearson correlation was 0.992. The experiment results illustrate that the proposed algorithm is not only suitable for HR estimation during continuous activities, like slow running (13 training datasets), but also for intense physical activities with acceleration, like arm exercise (10 testing datasets). PMID:29068403

  20. QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks

    PubMed Central

    Thibodeau, Asa; Márquez, Eladio J.; Luo, Oscar; Ruan, Yijun; Shin, Dong-Guk; Stitzel, Michael L.; Ucar, Duygu

    2016-01-01

    Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/. PMID:27336171

  1. Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations.

    PubMed

    Cheng, Wei; Zhang, Kai; Chen, Haifeng; Jiang, Guofei; Chen, Zhengzhang; Wang, Wei

    2016-08-01

    Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.

  2. Analyzing How We Do Analysis and Consume Data, Results from the SciDAC-Data Project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ding, P.; Aliaga, L.; Mubarak, M.

    One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data deliverymore » is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption« less

  3. Analyzing how we do Analysis and Consume Data, Results from the SciDAC-Data Project

    NASA Astrophysics Data System (ADS)

    Ding, P.; Aliaga, L.; Mubarak, M.; Tsaris, A.; Norman, A.; Lyon, A.; Ross, R.

    2017-10-01

    One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.

  4. User-Appropriate Viewer for High Resolution Interactive Engagement with 3d Digital Cultural Artefacts

    NASA Astrophysics Data System (ADS)

    Gillespie, D.; La Pensée, A.; Cooper, M.

    2013-07-01

    Three dimensional (3D) laser scanning is an important documentation technique for cultural heritage. This technology has been adopted from the engineering and aeronautical industry and is an invaluable tool for the documentation of objects within museum collections (La Pensée, 2008). The datasets created via close range laser scanning are extremely accurate and the created 3D dataset allows for a more detailed analysis in comparison to other documentation technologies such as photography. The dataset can be used for a range of different applications including: documentation; archiving; surface monitoring; replication; gallery interactives; educational sessions; conservation and visualization. However, the novel nature of a 3D dataset is presenting a rather unique challenge with respect to its sharing and dissemination. This is in part due to the need for specialised 3D software and a supported graphics card to display high resolution 3D models. This can be detrimental to one of the main goals of cultural institutions, which is to share knowledge and enable activities such as research, education and entertainment. This has limited the presentation of 3D models of cultural heritage objects to mainly either images or videos. Yet with recent developments in computer graphics, increased internet speed and emerging technologies such as Adobe's Stage 3D (Adobe, 2013) and WebGL (Khronos, 2013), it is now possible to share a dataset directly within a webpage. This allows website visitors to interact with the 3D dataset allowing them to explore every angle of the object, gaining an insight into its shape and nature. This can be very important considering that it is difficult to offer the same level of understanding of the object through the use of traditional mediums such as photographs and videos. Yet this presents a range of problems: this is a very novel experience and very few people have engaged with 3D objects outside of 3D software packages or games. This paper presents results of research that aims to provide a methodology for museums and cultural institutions for prototyping a 3D viewer within a webpage, thereby not only allowing institutions to promote their collections via the internet but also providing a tool for users to engage in a meaningful way with cultural heritage datasets. The design process encompasses evaluation as the central part of the design methodology; focusing on how slight changes to navigation, object engagement and aesthetic appearance can influence the user's experience. The prototype used in this paper, was created using WebGL with the Three.Js (Three.JS, 2013) library and datasets were loaded as the OpenCTM (Geelnard, 2010) file format. The overall design is centred on creating an easy-tolearn interface allowing non-skilled users to interact with the datasets, and also providing tools allowing skilled users to discover more about the cultural heritage object. User testing was carried out, allowing users to interact with 3D datasets within the interactive viewer. The results are analysed and the insights learned are discussed in relation to an interface designed to interact with 3D content. The results will lead to the design of interfaces for interacting with 3D objects, which allow for both skilled and non skilled users to engage with 3D cultural heritage objects in a meaningful way.

  5. Scalable Visual Analytics of Massive Textual Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.

    2007-04-01

    This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

  6. Exploratory visualization of astronomical data on ultra-high-resolution wall displays

    NASA Astrophysics Data System (ADS)

    Pietriga, Emmanuel; del Campo, Fernando; Ibsen, Amanda; Primet, Romain; Appert, Caroline; Chapuis, Olivier; Hempel, Maren; Muñoz, Roberto; Eyheramendy, Susana; Jordan, Andres; Dole, Hervé

    2016-07-01

    Ultra-high-resolution wall displays feature a very high pixel density over a large physical surface, which makes them well-suited to the collaborative, exploratory visualization of large datasets. We introduce FITS-OW, an application designed for such wall displays, that enables astronomers to navigate in large collections of FITS images, query astronomical databases, and display detailed, complementary data and documents about multiple sources simultaneously. We describe how astronomers interact with their data using both the wall's touchsensitive surface and handheld devices. We also report on the technical challenges we addressed in terms of distributed graphics rendering and data sharing over the computer clusters that drive wall displays.

  7. An Ensemble Multilabel Classification for Disease Risk Prediction

    PubMed Central

    Liu, Wei; Zhao, Hongling; Zhang, Chaoyang

    2017-01-01

    It is important to identify and prevent disease risk as early as possible through regular physical examinations. We formulate the disease risk prediction into a multilabel classification problem. A novel Ensemble Label Power-set Pruned datasets Joint Decomposition (ELPPJD) method is proposed in this work. First, we transform the multilabel classification into a multiclass classification. Then, we propose the pruned datasets and joint decomposition methods to deal with the imbalance learning problem. Two strategies size balanced (SB) and label similarity (LS) are designed to decompose the training dataset. In the experiments, the dataset is from the real physical examination records. We contrast the performance of the ELPPJD method with two different decomposition strategies. Moreover, the comparison between ELPPJD and the classic multilabel classification methods RAkEL and HOMER is carried out. The experimental results show that the ELPPJD method with label similarity strategy has outstanding performance. PMID:29065647

  8. Asteroid Family Physical Properties

    NASA Astrophysics Data System (ADS)

    Masiero, J. R.; DeMeo, F. E.; Kasuga, T.; Parker, A. H.

    An asteroid family is typically formed when a larger parent body undergoes a catastrophic collisional disruption, and as such, family members are expected to show physical properties that closely trace the composition and mineralogical evolution of the parent. Recently a number of new datasets have been released that probe the physical properties of a large number of asteroids, many of which are members of identified families. We review these datasets and the composite properties of asteroid families derived from this plethora of new data. We also discuss the limitations of the current data, as well as the open questions in the field.

  9. Interactions Between Channel Topography and Hydrokinetic Turbines: Sediment Transport, Turbine Performance, and Wake Characteristics

    NASA Astrophysics Data System (ADS)

    Hill, Craig Steven

    Accelerating marine hydrokinetic (MHK) renewable energy development towards commercial viability requires investigating interactions between the engineered environment and its surrounding physical and biological environments. Complex and energetic hydrodynamic and morphodynamic environments desired for such energy conversion installations present difficulties for designing efficient yet robust sustainable devices, while permitting agency uncertainties regarding MHK device environmental interactions result in lengthy and costly processes prior to installing and demonstrating emerging technologies. A research program at St. Anthony Falls Laboratory (SAFL), University of Minnesota, utilized multi-scale physical experiments to study the interactions between axial-flow hydrokinetic turbines, turbulent open channel flow, sediment transport, turbulent turbine wakes, and complex hydro-morphodynamic processes in channels. Model axial-flow current-driven three-bladed turbines (rotor diameters, dT = 0.15m and 0.5m) were installed in open channel flumes with both erodible and non-erodible substrates. Device-induced local scour was monitored over several hydraulic conditions and material sizes. Synchronous velocity, bed elevation and turbine performance measurements provide an indication into the effect channel topography has on device performance. Complimentary experiments were performed in a realistic meandering outdoor research channel with active sediment transport to investigate device interactions with bedform migration and secondary turbulent flow patterns in asymmetric channel environments. The suite of experiments undertaken during this research program at SAFL in multiple channels with stationary and mobile substrates under a variety of turbine configurations provides an in-depth investigation into how axial-flow hydrokinetic devices respond to turbulent channel flow and topographic complexity, and how they impact local and far-field sediment transport characteristics. Results provide the foundation for investigating advanced turbine control strategies for optimal power production in non-stationary environments, while also providing a robust data-set for computational model validation for further investigating the interactions between energy conversion devices and the physical environment.

  10. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed Central

    Meseck, Kristin; Jankowska, Marta M.; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and post-imputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  11. Kinetic physics in ICF: present understanding and future directions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rinderknecht, Hans G.; Amendt, P. A.; Wilks, S. C.

    Kinetic physics has the potential to impact the performance of indirect-drive inertial confinement fusion (ICF) experiments. Systematic anomalies in the National Ignition Facility implosion dataset have been identified in which kinetic physics may play a role, including inferred missing energy in the hohlraum, drive asymmetry in near-vacuum hohlraums, low areal density and high burn-averaged ion temperatures (T i ) compared with mainline simulations, and low ratios of the DD-neutron and DT-neutron yields and inferred T i . Several components of ICF implosions are likely to be influenced or dominated by kinetic physics: laser-plasma interactions in the LEH and hohlraum interior;more » the hohlraum wall blowoff, blowoff/gas and blowoff/ablator interfaces; the ablator and ablator/ice interface; and the DT fuel all present conditions in which kinetic physics can significantly affect the dynamics. This review presents the assembled experimental data and simulation results to date, which indicate that the effects of long mean-free-path plasma phenomena and self-generated electromagnetic fields may have a significant impact in ICF targets. Finally, simulation and experimental efforts are proposed to definitively quantify the importance of these effects at ignition-relevant conditions, including priorities for ongoing study.« less

  12. Kinetic physics in ICF: present understanding and future directions

    DOE PAGES

    Rinderknecht, Hans G.; Amendt, P. A.; Wilks, S. C.; ...

    2018-03-19

    Kinetic physics has the potential to impact the performance of indirect-drive inertial confinement fusion (ICF) experiments. Systematic anomalies in the National Ignition Facility implosion dataset have been identified in which kinetic physics may play a role, including inferred missing energy in the hohlraum, drive asymmetry in near-vacuum hohlraums, low areal density and high burn-averaged ion temperatures (T i ) compared with mainline simulations, and low ratios of the DD-neutron and DT-neutron yields and inferred T i . Several components of ICF implosions are likely to be influenced or dominated by kinetic physics: laser-plasma interactions in the LEH and hohlraum interior;more » the hohlraum wall blowoff, blowoff/gas and blowoff/ablator interfaces; the ablator and ablator/ice interface; and the DT fuel all present conditions in which kinetic physics can significantly affect the dynamics. This review presents the assembled experimental data and simulation results to date, which indicate that the effects of long mean-free-path plasma phenomena and self-generated electromagnetic fields may have a significant impact in ICF targets. Finally, simulation and experimental efforts are proposed to definitively quantify the importance of these effects at ignition-relevant conditions, including priorities for ongoing study.« less

  13. Kinetic physics in ICF: present understanding and future directions

    NASA Astrophysics Data System (ADS)

    Rinderknecht, Hans G.; Amendt, P. A.; Wilks, S. C.; Collins, G.

    2018-06-01

    Kinetic physics has the potential to impact the performance of indirect-drive inertial confinement fusion (ICF) experiments. Systematic anomalies in the National Ignition Facility implosion dataset have been identified in which kinetic physics may play a role, including inferred missing energy in the hohlraum, drive asymmetry in near-vacuum hohlraums, low areal density and high burn-averaged ion temperatures (〈Ti 〉) compared with mainline simulations, and low ratios of the DD-neutron and DT-neutron yields and inferred 〈Ti 〉. Several components of ICF implosions are likely to be influenced or dominated by kinetic physics: laser-plasma interactions in the LEH and hohlraum interior; the hohlraum wall blowoff, blowoff/gas and blowoff/ablator interfaces; the ablator and ablator/ice interface; and the DT fuel all present conditions in which kinetic physics can significantly affect the dynamics. This review presents the assembled experimental data and simulation results to date, which indicate that the effects of long mean-free-path plasma phenomena and self-generated electromagnetic fields may have a significant impact in ICF targets. Simulation and experimental efforts are proposed to definitively quantify the importance of these effects at ignition-relevant conditions, including priorities for ongoing study.

  14. A global distributed basin morphometric dataset

    NASA Astrophysics Data System (ADS)

    Shen, Xinyi; Anagnostou, Emmanouil N.; Mei, Yiwen; Hong, Yang

    2017-01-01

    Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack's law.

  15. The Spectral Image Processing System (SIPS) - Interactive visualization and analysis of imaging spectrometer data

    NASA Technical Reports Server (NTRS)

    Kruse, F. A.; Lefkoff, A. B.; Boardman, J. W.; Heidebrecht, K. B.; Shapiro, A. T.; Barloon, P. J.; Goetz, A. F. H.

    1993-01-01

    The Center for the Study of Earth from Space (CSES) at the University of Colorado, Boulder, has developed a prototype interactive software system called the Spectral Image Processing System (SIPS) using IDL (the Interactive Data Language) on UNIX-based workstations. SIPS is designed to take advantage of the combination of high spectral resolution and spatial data presentation unique to imaging spectrometers. It streamlines analysis of these data by allowing scientists to rapidly interact with entire datasets. SIPS provides visualization tools for rapid exploratory analysis and numerical tools for quantitative modeling. The user interface is X-Windows-based, user friendly, and provides 'point and click' operation. SIPS is being used for multidisciplinary research concentrating on use of physically based analysis methods to enhance scientific results from imaging spectrometer data. The objective of this continuing effort is to develop operational techniques for quantitative analysis of imaging spectrometer data and to make them available to the scientific community prior to the launch of imaging spectrometer satellite systems such as the Earth Observing System (EOS) High Resolution Imaging Spectrometer (HIRIS).

  16. Introducing video recording in primary care midwifery for research purposes: procedure, dataset, and use.

    PubMed

    Spelten, Evelien R; Martin, Linda; Gitsels, Janneke T; Pereboom, Monique T R; Hutton, Eileen K; van Dulmen, Sandra

    2015-01-01

    video recording studies have been found to be complex; however very few studies describe the actual introduction and enrolment of the study, the resulting dataset and its interpretation. In this paper we describe the introduction and the use of video recordings of health care provider (HCP)-client interactions in primary care midwifery for research purposes. We also report on the process of data management, data coding and the resulting data set. we describe our experience in undertaking a study using video recording to assess the interaction of the midwife and her client in the first antenatal consultation, in a real life clinical practice setting in the Netherlands. Midwives from six practices across the Netherlands were recruited to videotape 15-20 intakes. The introduction, complexity of the study and intrusiveness of the study were discussed within the research group. The number of valid recordings and missing recordings was measured; reasons not to participate, non-response analyses, and the inter-rater reliability of the coded videotapes were assessed. Video recordings were supplemented by questionnaires for midwives and clients. The Roter Interaction Analysis System (RIAS) was used for coding as well as an obstetric topics scale. at the introduction of the study, more initial hesitation in co-operation was found among the midwives than among their clients. The intrusive nature of the recording on the interaction was perceived to be minimal. The complex nature of the study affected recruitment and data collection. Combining the dataset with the questionnaires and medical records proved to be a challenge. The final dataset included videotapes of 20 midwives (7-23 recordings per midwife). Of the 460 eligible clients, 324 gave informed consent. The study resulted in a significant dataset of first antenatal consultations involving recording 269 clients and 194 partners. video recording of midwife-client interaction was both feasible and challenging and resulted in a unique dataset of recordings of midwife-client interaction. Video recording studies will benefit from a tight design, and vigilant monitoring during the data collection to ensure effective data collection. We provide suggestions to promote successful introduction of video recording for research purposes. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Distributing and storing data efficiently by means of special datasets in the ATLAS collaboration

    NASA Astrophysics Data System (ADS)

    Köneke, Karsten; ATLAS Collaboration

    2011-12-01

    With the start of the LHC physics program, the ATLAS experiment started to record vast amounts of data. This data has to be distributed and stored on the world-wide computing grid in a smart way in order to enable an effective and efficient analysis by physicists. This article describes how the ATLAS collaboration chose to create specialized reduced datasets in order to efficiently use computing resources and facilitate physics analyses.

  18. Best Practices for International Collaboration and Applications of Interoperability within a NASA Data Center

    NASA Astrophysics Data System (ADS)

    Moroni, D. F.; Armstrong, E. M.; Tauer, E.; Hausman, J.; Huang, T.; Thompson, C. K.; Chung, N.

    2013-12-01

    The Physical Oceanographic Distributed Active Archive Center (PO.DAAC) is one of 12 data centers sponsored by NASA's Earth Science Data and Information System (ESDIS) project. The PO.DAAC is tasked with archival and distribution of NASA Earth science missions specific to physical oceanography, many of which have interdisciplinary applications for weather forecasting/monitoring, ocean biology, ocean modeling, and climate studies. PO.DAAC has a 20-year history of cross-project and international collaborations with partners in Europe, Japan, Australia, and the UK. Domestically, the PO.DAAC has successfully established lasting partners with non-NASA institutions and projects including the National Oceanic and Atmospheric Administration (NOAA), United States Navy, Remote Sensing Systems, and Unidata. A key component of these partnerships is PO.DAAC's direct involvement with international working groups and science teams, such as the Group for High Resolution Sea Surface Temperature (GHRSST), International Ocean Vector Winds Science Team (IOVWST), Ocean Surface Topography Science Team (OSTST), and the Committee on Earth Observing Satellites (CEOS). To help bolster new and existing collaborations, the PO.DAAC has established a standardized approach to its internal Data Management and Archiving System (DMAS), utilizing a Data Dictionary to provide the baseline standard for entry and capture of dataset and granule metadata. Furthermore, the PO.DAAC has established an end-to-end Dataset Lifecycle Policy, built upon both internal and external recommendations of best practices toward data stewardship. Together, DMAS, the Data Dictionary, and the Dataset Lifecycle Policy provide the infrastructure to enable standardized data and metadata to be fully ingested and harvested to facilitate interoperability and compatibility across data access protocols, tools, and services. The Dataset Lifecycle Policy provides the checks and balances to help ensure all incoming HDF and netCDF-based datasets meet minimum compliance requirements with the Lawrence Livermore National Laboratory's actively maintained Climate and Forecast (CF) conventions with additional goals toward metadata standards provided by the Attribute Convention for Dataset Discovery (ACDD), the International Organization for Standardization (ISO) 19100-series, and the Federal Geographic Data Committee (FGDC). By default, DMAS ensures all datasets are compliant with NASA's Global Change Master Directory (GCMD) and NASA's Reverb data discovery clearinghouse (also known as ECHO). For data access, PO.DAAC offers several widely-used technologies, including File Transfer Protocol (FTP), Open-source Project for a Network Data Access Protocol (OPeNDAP), and Thematic Realtime Environmental Distributed Data Services (THREDDS). These access technologies are available directly to users or through PO.DAAC's web interfaces, specifically the High-level Tool for Interactive Data Extraction (HiTIDE), Live Access Server (LAS), and PO.DAAC's set of search, image, and Consolidated Web Services (CWS). Lastly, PO.DAAC's newly introduced, standards-based CWS provide singular endpoints for search, imaging, and extraction capabilities, respectively, across L2/L3/L4 datasets. Altogether, these tools, services and policies serve to provide flexible, interoperable functionality for both users and data providers.

  19. Natural Hazards characterisation in industrial practice

    NASA Astrophysics Data System (ADS)

    Bernardara, Pietro

    2017-04-01

    The definition of rare hydroclimatic extremes (up to 10-4 annual probability of occurrence) is of the utmost importance for the design of high value industrial infrastructures, such as grids, power plants, offshore platforms. The underestimation as well as the overestimation of the risk may lead to huge costs (ex. mid-life expensive works or overdesign) which may even prevent the project to happen. Nevertheless, the uncertainty associated to the extrapolation towards the rare frequencies are huge and manifold. They are mainly due to the scarcity of observations, the lack of quality on the extreme value records and on the arbitrary choice of the models used for extrapolations. This often put the design engineers in uncomfortable situations when they must choose the design values to use. Providentially, the recent progresses in the earth observation techniques, information technology, historical data collection and weather and ocean modelling are making huge datasets available. A careful use of big datasets of observations and modelled data are leading towards a better understanding of the physics of the underlying phenomena, the complex interactions between them and thus of the extreme events frequency extrapolations. This will move the engineering practice from the single site, small sample, application of statistical analysis to a more spatially coherent, physically driven extrapolation of extreme values. Few examples, from the EDF industrial practice are given to illustrate these progresses and their potential impact on the design approaches.

  20. Exploring pathway interactions in insulin resistant mouse liver

    PubMed Central

    2011-01-01

    Background Complex phenotypes such as insulin resistance involve different biological pathways that may interact and influence each other. Interpretation of related experimental data would be facilitated by identifying relevant pathway interactions in the context of the dataset. Results We developed an analysis approach to study interactions between pathways by integrating gene and protein interaction networks, biological pathway information and high-throughput data. This approach was applied to a transcriptomics dataset to investigate pathway interactions in insulin resistant mouse liver in response to a glucose challenge. We identified regulated pathway interactions at different time points following the glucose challenge and also studied the underlying protein interactions to find possible mechanisms and key proteins involved in pathway cross-talk. A large number of pathway interactions were found for the comparison between the two diet groups at t = 0. The initial response to the glucose challenge (t = 0.6) was typed by an acute stress response and pathway interactions showed large overlap between the two diet groups, while the pathway interaction networks for the late response were more dissimilar. Conclusions Studying pathway interactions provides a new perspective on the data that complements established pathway analysis methods such as enrichment analysis. This study provided new insights in how interactions between pathways may be affected by insulin resistance. In addition, the analysis approach described here can be generally applied to different types of high-throughput data and will therefore be useful for analysis of other complex datasets as well. PMID:21843341

  1. Accelerometry-based classification of human activities using Markov modeling.

    PubMed

    Mannini, Andrea; Sabatini, Angelo Maria

    2011-01-01

    Accelerometers are a popular choice as body-motion sensors: the reason is partly in their capability of extracting information that is useful for automatically inferring the physical activity in which the human subject is involved, beside their role in feeding biomechanical parameters estimators. Automatic classification of human physical activities is highly attractive for pervasive computing systems, whereas contextual awareness may ease the human-machine interaction, and in biomedicine, whereas wearable sensor systems are proposed for long-term monitoring. This paper is concerned with the machine learning algorithms needed to perform the classification task. Hidden Markov Model (HMM) classifiers are studied by contrasting them with Gaussian Mixture Model (GMM) classifiers. HMMs incorporate the statistical information available on movement dynamics into the classification process, without discarding the time history of previous outcomes as GMMs do. An example of the benefits of the obtained statistical leverage is illustrated and discussed by analyzing two datasets of accelerometer time series.

  2. 3D Imaging of Microbial Biofilms: Integration of Synchrotron Imaging and an Interactive Visualization Interface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thomas, Mathew; Marshall, Matthew J.; Miller, Erin A.

    2014-08-26

    Understanding the interactions of structured communities known as “biofilms” and other complex matrixes is possible through the X-ray micro tomography imaging of the biofilms. Feature detection and image processing for this type of data focuses on efficiently identifying and segmenting biofilms and bacteria in the datasets. The datasets are very large and often require manual interventions due to low contrast between objects and high noise levels. Thus new software is required for the effectual interpretation and analysis of the data. This work specifies the evolution and application of the ability to analyze and visualize high resolution X-ray micro tomography datasets.

  3. Data on the interaction between thermal comfort and building control research.

    PubMed

    Park, June Young; Nagy, Zoltan

    2018-04-01

    This dataset contains bibliography information regarding thermal comfort and building control research. In addition, the instruction of a data-driven literature survey method guides readers to reproduce their own literature survey on related bibliography datasets. Based on specific search terms, all relevant bibliographic datasets are downloaded. We explain the keyword co-occurrences of historical developments and recent trends, and the citation network which represents the interaction between thermal comfort and building control research. Results and discussions are described in the research article entitled "Comprehensive analysis of the relationship between thermal comfort and building control research - A data-driven literature review" (Park and Nagy, 2018).

  4. Genome-wide gene–gene interaction analysis for next-generation sequencing

    PubMed Central

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-01-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972

  5. A texture-based framework for improving CFD data visualization in a virtual environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bivins, Gerrick O'Ron

    2005-01-01

    In the field of computational fluid dynamics (CFD) accurate representations of fluid phenomena can be simulated hut require large amounts of data to represent the flow domain. Most datasets generated from a CFD simulation can be coarse, ~10,000 nodes or cells, or very fine with node counts on the order of 1,000,000. A typical dataset solution can also contain multiple solutions for each node, pertaining to various properties of the flow at a particular node. Scalar properties such as density, temperature, pressure, and velocity magnitude are properties that are typically calculated and stored in a dataset solution. Solutions are notmore » limited to just scalar properties. Vector quantities, such as velocity, are also often calculated and stored for a CFD simulation. Accessing all of this data efficiently during runtime is a key problem for visualization in an interactive application. Understanding simulation solutions requires a post-processing tool to convert the data into something more meaningful. Ideally, the application would present an interactive visual representation of the numerical data for any dataset that was simulated while maintaining the accuracy of the calculated solution. Most CFD applications currently sacrifice interactivity for accuracy, yielding highly detailed flow descriptions hut limiting interaction for investigating the field.« less

  6. A texture-based frameowrk for improving CFD data visualization in a virtual environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bivins, Gerrick O'Ron

    2005-01-01

    In the field of computational fluid dynamics (CFD) accurate representations of fluid phenomena can be simulated but require large amounts of data to represent the flow domain. Most datasets generated from a CFD simulation can be coarse, ~ 10,000 nodes or cells, or very fine with node counts on the order of 1,000,000. A typical dataset solution can also contain multiple solutions for each node, pertaining to various properties of the flow at a particular node. Scalar properties such as density, temperature, pressure, and velocity magnitude are properties that are typically calculated and stored in a dataset solution. Solutions aremore » not limited to just scalar properties. Vector quantities, such as velocity, are also often calculated and stored for a CFD simulation. Accessing all of this data efficiently during runtime is a key problem for visualization in an interactive application. Understanding simulation solutions requires a post-processing tool to convert the data into something more meaningful. Ideally, the application would present an interactive visual representation of the numerical data for any dataset that was simulated while maintaining the accuracy of the calculated solution. Most CFD applications currently sacrifice interactivity for accuracy, yielding highly detailed flow descriptions but limiting interaction for investigating the field.« less

  7. New physics in the visible final states of B → D(*) τν

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ligeti, Zoltan; Papucci, Michele; Robinson, Dean J.

    We derive compact expressions for the helicity amplitudes of the many-body B → D (*) (→ DY)τ(→ Xν)ν decays, specifically for X = ℓν or π and Y = π or γ. We include contributions from all ten possible new physics four-Fermi operators with arbitrary couplings. Our results capture interference effects in the full phase space of the visible τ and D * decay products which are missed in analyses that treat the τ or D * or both as stable. The τ interference effects are sizable, formally of order m τ/m B for the standard model, and may bemore » of order unity in the presence of new physics. Treating interference correctly is essential when considering kinematic distributions of the τ or D * decay products, and when including experimentally unavoidable phase space cuts. Our amplitude-level results also allow for efficient exploration of new physics effects in the fully differential phase space, by enabling experiments to perform such studies on fully simulated Monte Carlo datasets via efficient event reweighing. As an example, we explore a class of new physics interactions that can fit the observed R(D (*) ) ratios, and show that analyses including more differential kinematic information can provide greater discriminating power for new physics, than single kinematic variables alone.« less

  8. New physics in the visible final states of B → D(*) τν

    DOE PAGES

    Ligeti, Zoltan; Papucci, Michele; Robinson, Dean J.

    2017-01-18

    We derive compact expressions for the helicity amplitudes of the many-body B → D (*) (→ DY)τ(→ Xν)ν decays, specifically for X = ℓν or π and Y = π or γ. We include contributions from all ten possible new physics four-Fermi operators with arbitrary couplings. Our results capture interference effects in the full phase space of the visible τ and D * decay products which are missed in analyses that treat the τ or D * or both as stable. The τ interference effects are sizable, formally of order m τ/m B for the standard model, and may bemore » of order unity in the presence of new physics. Treating interference correctly is essential when considering kinematic distributions of the τ or D * decay products, and when including experimentally unavoidable phase space cuts. Our amplitude-level results also allow for efficient exploration of new physics effects in the fully differential phase space, by enabling experiments to perform such studies on fully simulated Monte Carlo datasets via efficient event reweighing. As an example, we explore a class of new physics interactions that can fit the observed R(D (*) ) ratios, and show that analyses including more differential kinematic information can provide greater discriminating power for new physics, than single kinematic variables alone.« less

  9. Shifting Stakes: Understanding the Dynamic Roles of Individuals and Organizations in Social Media Protests.

    PubMed

    Spiro, Emma S; Monroy-Hernández, Andrés

    2016-01-01

    In this paper we examine two protests characterized by substantial social media presence and distributed participation frameworks via two core questions: what roles did organizations and individuals play, and how did participants' social interactions change over the course of the protests? To answer these questions, we analyzed a large Twitter activity dataset for the #YoSoy132 student uprising in Mexico and Brazil's "bus rebellion." Results indicate that individuals initially took prominence at the protests but faded in importance as the movements dwindled and organizations took over. Regarding the dynamics and structure of the interactions, we found that key time points with unique social structures often map to exogenous events such as coordinated protests in physical locations. Our results have important consequences for the visibility of such social movements and their ability to attract continued participation by individuals and organizations.

  10. Shifting Stakes: Understanding the Dynamic Roles of Individuals and Organizations in Social Media Protests

    PubMed Central

    2016-01-01

    In this paper we examine two protests characterized by substantial social media presence and distributed participation frameworks via two core questions: what roles did organizations and individuals play, and how did participants’ social interactions change over the course of the protests? To answer these questions, we analyzed a large Twitter activity dataset for the #YoSoy132 student uprising in Mexico and Brazil’s “bus rebellion.” Results indicate that individuals initially took prominence at the protests but faded in importance as the movements dwindled and organizations took over. Regarding the dynamics and structure of the interactions, we found that key time points with unique social structures often map to exogenous events such as coordinated protests in physical locations. Our results have important consequences for the visibility of such social movements and their ability to attract continued participation by individuals and organizations. PMID:27776191

  11. Intuitive, but not simple: including explicit water molecules in protein-protein docking simulations improves model quality.

    PubMed

    Parikh, Hardik I; Kellogg, Glen E

    2014-06-01

    Characterizing the nature of interaction between proteins that have not been experimentally cocrystallized requires a computational docking approach that can successfully predict the spatial conformation adopted in the complex. In this work, the Hydropathic INTeractions (HINT) force field model was used for scoring docked models in a data set of 30 high-resolution crystallographically characterized "dry" protein-protein complexes and was shown to reliably identify native-like models. However, most current protein-protein docking algorithms fail to explicitly account for water molecules involved in bridging interactions that mediate and stabilize the association of the protein partners, so we used HINT to illuminate the physical and chemical properties of bridging waters and account for their energetic stabilizing contributions. The HINT water Relevance metric identified the "truly" bridging waters at the 30 protein-protein interfaces and we utilized them in "solvated" docking by manually inserting them into the input files for the rigid body ZDOCK program. By accounting for these interfacial waters, a statistically significant improvement of ∼24% in the average hit-count within the top-10 predictions the protein-protein dataset was seen, compared to standard "dry" docking. The results also show scoring improvement, with medium and high accuracy models ranking much better than incorrect ones. These improvements can be attributed to the physical presence of water molecules that alter surface properties and better represent native shape and hydropathic complementarity between interacting partners, with concomitantly more accurate native-like structure predictions. © 2013 Wiley Periodicals, Inc.

  12. Climate Model Diagnostic Analyzer Web Service System

    NASA Astrophysics Data System (ADS)

    Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Jiang, J. H.

    2014-12-01

    We have developed a cloud-enabled web-service system that empowers physics-based, multi-variable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. We have developed a methodology to transform an existing science application code into a web service using a Python wrapper interface and Python web service frameworks. The web-service system, called Climate Model Diagnostic Analyzer (CMDA), currently supports (1) all the observational datasets from Obs4MIPs and a few ocean datasets from NOAA and Argo, which can serve as observation-based reference data for model evaluation, (2) many of CMIP5 model outputs covering a broad range of atmosphere, ocean, and land variables from the CMIP5 specific historical runs and AMIP runs, and (3) ECMWF reanalysis outputs for several environmental variables in order to supplement observational datasets. Analysis capabilities currently supported by CMDA are (1) the calculation of annual and seasonal means of physical variables, (2) the calculation of time evolution of the means in any specified geographical region, (3) the calculation of correlation between two variables, (4) the calculation of difference between two variables, and (5) the conditional sampling of one physical variable with respect to another variable. A web user interface is chosen for CMDA because it not only lowers the learning curve and removes the adoption barrier of the tool but also enables instantaneous use, avoiding the hassle of local software installation and environment incompatibility. CMDA will be used as an educational tool for the summer school organized by JPL's Center for Climate Science in 2014. In order to support 30+ simultaneous users during the school, we have deployed CMDA to the Amazon cloud environment. The cloud-enabled CMDA will provide each student with a virtual machine while the user interaction with the system will remain the same through web-browser interfaces. The summer school will serve as a valuable testbed for the tool development, preparing CMDA to serve its target community: Earth-science modeling and model-analysis community.

  13. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    PubMed

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  14. A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

    NASA Astrophysics Data System (ADS)

    Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

    2014-12-01

    The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.

  15. Tracking Research Data Footprints via Integration with Research Graph

    NASA Astrophysics Data System (ADS)

    Evans, B. J. K.; Wang, J.; Aryani, A.; Conlon, M.; Wyborn, L. A.; Choudhury, S. A.

    2017-12-01

    The researcher of today is likely to be part of a team that will use subsets of data from at least one, if not more external repositories, and that same data could be used by multiple researchers for many different purposes. At best, the repositories that host this data will know who is accessing their data, but rarely what they are using it for, resulting in funders of data collecting programs and data repositories that store the data unlikely to know: 1) which research funding contributed to the collection and preservation of a dataset, and 2) which data contributed to high impact research and publications. In days of funding shortages there is a growing need to be able to trace the footprint a data set from the originator that collected the data to the repository that stores the data and ultimately to any derived publications. The Research Data Alliance's Data Description Registry Interoperability Working Group (DDRIWG) has addressed this problem through the development of a distributed graph, called Research Graph that can map each piece of the research interaction puzzle by building aggregated graphs. It can connect datasets on the basis of co-authorship or other collaboration models such as joint funding and grants and can connect research datasets, publications, grants and researcher profiles across research repositories and infrastructures such as DataCite and ORCID. National Computational Infrastructure (NCI) in Australia is one of the early adopters of Research Graph. The graphic view and quantitative analysis helps NCI track the usage of their National reference data collections thus quantifying the role that these NCI-hosted data assets play within the funding-researcher-data-publication-cycle. The graph can unlock the complex interactions of the research projects by tracking the contribution of datasets, the various funding bodies and the downstream data users. RMap Project is a similar initiative which aims to solve complex relationships among scholarly publications and their underlying data, including IEEE publications. It is hoped to combine RMap and Research Graph in the near futures and also to add physical samples to Research Graph.

  16. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    NASA Astrophysics Data System (ADS)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be distributed. The most important updates in GLEAM include the revision of the soil moisture data assimilation system, the evaporative stress functions and the infiltration of rainfall. In this presentation, we will highlight the changes of the methodology and present the new datasets, their validation against in situ observations and the comparisons against alternative datasets of terrestrial evaporation, such as GLDAS-Noah, ERA-Interim and previous GLEAM datasets. Preliminary results indicate that the magnitude and the spatio-temporal variability of the evaporation estimates have been slightly improved upon previous versions of the datasets.

  17. Immersive Interaction, Manipulation and Analysis of Large 3D Datasets for Planetary and Earth Sciences

    NASA Astrophysics Data System (ADS)

    Pariser, O.; Calef, F.; Manning, E. M.; Ardulov, V.

    2017-12-01

    We will present implementation and study of several use-cases of utilizing Virtual Reality (VR) for immersive display, interaction and analysis of large and complex 3D datasets. These datasets have been acquired by the instruments across several Earth, Planetary and Solar Space Robotics Missions. First, we will describe the architecture of the common application framework that was developed to input data, interface with VR display devices and program input controllers in various computing environments. Tethered and portable VR technologies will be contrasted and advantages of each highlighted. We'll proceed to presenting experimental immersive analytics visual constructs that enable augmentation of 3D datasets with 2D ones such as images and statistical and abstract data. We will conclude by presenting comparative analysis with traditional visualization applications and share the feedback provided by our users: scientists and engineers.

  18. Three visualization approaches for communicating and exploring PIT tag data

    USGS Publications Warehouse

    Letcher, Benjamin; Walker, Jeffrey D.; O'Donnell, Matthew; Whiteley, Andrew R.; Nislow, Keith; Coombs, Jason

    2018-01-01

    As the number, size and complexity of ecological datasets has increased, narrative and interactive raw data visualizations have emerged as important tools for exploring and understanding these large datasets. As a demonstration, we developed three visualizations to communicate and explore passive integrated transponder tag data from two long-term field studies. We created three independent visualizations for the same dataset, allowing separate entry points for users with different goals and experience levels. The first visualization uses a narrative approach to introduce users to the study. The second visualization provides interactive cross-filters that allow users to explore multi-variate relationships in the dataset. The last visualization allows users to visualize the movement histories of individual fish within the stream network. This suite of visualization tools allows a progressive discovery of more detailed information and should make the data accessible to users with a wide variety of backgrounds and interests.

  19. The strength of friendship ties in proximity sensor data.

    PubMed

    Sekara, Vedran; Lehmann, Sune

    2014-01-01

    Understanding how people interact and socialize is important in many contexts from disease control to urban planning. Datasets that capture this specific aspect of human life have increased in size and availability over the last few years. We have yet to understand, however, to what extent such electronic datasets may serve as a valid proxy for real life social interactions. For an observational dataset, gathered using mobile phones, we analyze the problem of identifying transient and non-important links, as well as how to highlight important social interactions. Applying the Bluetooth signal strength parameter to distinguish between observations, we demonstrate that weak links, compared to strong links, have a lower probability of being observed at later times, while such links-on average-also have lower link-weights and probability of sharing an online friendship. Further, the role of link-strength is investigated in relation to social network properties.

  20. Uranium plume persistence impacted by hydrologic and geochemical heterogeneity in the groundwater and river water interaction zone of Hanford site

    NASA Astrophysics Data System (ADS)

    Chen, X.; Zachara, J. M.; Vermeul, V. R.; Freshley, M.; Hammond, G. E.

    2015-12-01

    The behavior of a persistent uranium plume in an extended groundwater- river water (GW-SW) interaction zone at the DOE Hanford site is dominantly controlled by river stage fluctuations in the adjacent Columbia River. The plume behavior is further complicated by substantial heterogeneity in physical and geochemical properties of the host aquifer sediments. Multi-scale field and laboratory experiments and reactive transport modeling were integrated to understand the complex plume behavior influenced by highly variable hydrologic and geochemical conditions in time and space. In this presentation we (1) describe multiple data sets from field-scale uranium adsorption and desorption experiments performed at our experimental well-field, (2) develop a reactive transport model that incorporates hydrologic and geochemical heterogeneities characterized from multi-scale and multi-type datasets and a surface complexation reaction network based on laboratory studies, and (3) compare the modeling and observation results to provide insights on how to refine the conceptual model and reduce prediction uncertainties. The experimental results revealed significant spatial variability in uranium adsorption/desorption behavior, while modeling demonstrated that ambient hydrologic and geochemical conditions and heterogeneities in sediment physical and chemical properties both contributed to complex plume behavior and its persistence. Our analysis provides important insights into the characterization, understanding, modeling, and remediation of groundwater contaminant plumes influenced by surface water and groundwater interactions.

  1. Studying Cold Nuclear Matter with the MPC-EX of PHENIX

    NASA Astrophysics Data System (ADS)

    Grau, Nathan; Phenix Collaboration

    2017-09-01

    Highly asymmetric collision systems, such as d+Au, provide a unique environment to study cold nuclear matter. Potential measurements range from pinning down the modification of the nuclear wave function, i.e. saturation, to studying final state interactions, i.e. energy loss. The PHENIX experiment has enhanced the muon piston calorimeter (MPC) with a silicon-tungsten preshower, the MPC-EX. With its fine segmentation the MPC-EX extends the photon detection capability at 3 < | η | < 3.8. In this talk we review the current status of the detector, its calibration, and its identification capabilities using the 2016 d+Au dataset. We also discuss the specific physics observables the MPC-EX can measure.

  2. Interoperable Solar Data and Metadata via LISIRD 3

    NASA Astrophysics Data System (ADS)

    Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

    2015-12-01

    LISIRD 3 is a major upgrade of the LASP Interactive Solar Irradiance Data Center (LISIRD), which serves several dozen space based solar irradiance and related data products to the public. Through interactive plots, LISIRD 3 provides data browsing supported by data subsetting and aggregation. Incorporating a semantically enabled metadata repository, LISIRD 3 users see current, vetted, consistent information about the datasets offered. Users can now also search for datasets based on metadata fields such as dataset type and/or spectral or temporal range. This semantic database enables metadata browsing, so users can discover the relationships between datasets, instruments, spacecraft, mission and PI. The database also enables creation and publication of metadata records in a variety of formats, such as SPASE or ISO, making these datasets more discoverable. The database also enables the possibility of a public SPARQL endpoint, making the metadata browsable in an automated fashion. LISIRD 3's data access middleware, LaTiS, provides dynamic, on demand reformatting of data and timestamps, subsetting and aggregation, and other server side functionality via a RESTful OPeNDAP compliant API, enabling interoperability between LASP datasets and many common tools. LISIRD 3's templated front end design, coupled with the uniform data interface offered by LaTiS, allows easy integration of new datasets. Consequently the number and variety of datasets offered by LISIRD has grown to encompass several dozen, with many more to come. This poster will discuss design and implementation of LISIRD 3, including tools used, capabilities enabled, and issues encountered.

  3. ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery.

    PubMed

    Partl, Christian; Lex, Alexander; Streit, Marc; Strobelt, Hendrik; Wassermann, Anne-Mai; Pfister, Hanspeter; Schmalstieg, Dieter

    2014-12-01

    Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets. At its core ConTour lists all items of each dataset in a column. Relationships between the columns are revealed through interaction: selecting one or multiple items in one column highlights and re-sorts the items in other columns. Filters based on relationships enable drilling down into the large data space. To identify interesting items in the first place, ConTour employs advanced sorting strategies, including strategies based on connectivity strength and uniqueness, as well as sorting based on item attributes. ConTour also introduces interactive nesting of columns, a powerful method to show the related items of a child column for each item in the parent column. Within the columns, ConTour shows rich attribute data about the items as well as information about the connection strengths to other datasets. Finally, ConTour provides a number of detail views, which can show items from multiple datasets and their associated data at the same time. We demonstrate the utility of our system in case studies conducted with a team of chemical biologists, who investigate the effects of chemical compounds on cells and need to understand the underlying mechanisms.

  4. Overview of Nuclear Physics Data: Databases, Web Applications and Teaching Tools

    NASA Astrophysics Data System (ADS)

    McCutchan, Elizabeth

    2017-01-01

    The mission of the United States Nuclear Data Program (USNDP) is to provide current, accurate, and authoritative data for use in pure and applied areas of nuclear science and engineering. This is accomplished by compiling, evaluating, and disseminating extensive datasets. Our main products include the Evaluated Nuclear Structure File (ENSDF) containing information on nuclear structure and decay properties and the Evaluated Nuclear Data File (ENDF) containing information on neutron-induced reactions. The National Nuclear Data Center (NNDC), through the website www.nndc.bnl.gov, provides web-based retrieval systems for these and many other databases. In addition, the NNDC hosts several on-line physics tools, useful for calculating various quantities relating to basic nuclear physics. In this talk, I will first introduce the quantities which are evaluated and recommended in our databases. I will then outline the searching capabilities which allow one to quickly and efficiently retrieve data. Finally, I will demonstrate how the database searches and web applications can provide effective teaching tools concerning the structure of nuclei and how they interact. Work supported by the Office of Nuclear Physics, Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-98CH10886.

  5. Exploring 4D Flow Data in an Immersive Virtual Environment

    NASA Astrophysics Data System (ADS)

    Stevens, A. H.; Butkiewicz, T.

    2017-12-01

    Ocean models help us to understand and predict a wide range of intricate physical processes which comprise the atmospheric and oceanic systems of the Earth. Because these models output an abundance of complex time-varying three-dimensional (i.e., 4D) data, effectively conveying the myriad information from a given model poses a significant visualization challenge. The majority of the research effort into this problem has concentrated around synthesizing and examining methods for representing the data itself; by comparison, relatively few studies have looked into the potential merits of various viewing conditions and virtual environments. We seek to improve our understanding of the benefits offered by current consumer-grade virtual reality (VR) systems through an immersive, interactive 4D flow visualization system. Our dataset is a Regional Ocean Modeling System (ROMS) model representing a 12-hour tidal cycle of the currents within New Hampshire's Great Bay estuary. The model data was loaded into a custom VR particle system application using the OpenVR software library and the HTC Vive hardware, which tracks a headset and two six-degree-of-freedom (6DOF) controllers within a 5m-by-5m area. The resulting visualization system allows the user to coexist in the same virtual space as the data, enabling rapid and intuitive analysis of the flow model through natural interactions with the dataset and within the virtual environment. Whereas a traditional computer screen typically requires the user to reposition a virtual camera in the scene to obtain the desired view of the data, in virtual reality the user can simply move their head to the desired viewpoint, completely eliminating the mental context switches from data exploration/analysis to view adjustment and back. The tracked controllers become tools to quickly manipulate (reposition, reorient, and rescale) the dataset and to interrogate it by, e.g., releasing dye particles into the flow field, probing scalar velocities, placing a cutting plane through a region of interest, etc. It is hypothesized that the advantages afforded by head-tracked viewing and 6DOF interaction devices will lead to faster and more efficient examination of 4D flow data. A human factors study is currently being prepared to empirically evaluate this method of visualization and interaction.

  6. An interactive, stereoscopic virtual environment for medical imaging visualization, simulation and training

    NASA Astrophysics Data System (ADS)

    Krueger, Evan; Messier, Erik; Linte, Cristian A.; Diaz, Gabriel

    2017-03-01

    Recent advances in medical image acquisition allow for the reconstruction of anatomies with 3D, 4D, and 5D renderings. Nevertheless, standard anatomical and medical data visualization still relies heavily on the use of traditional 2D didactic tools (i.e., textbooks and slides), which restrict the presentation of image data to a 2D slice format. While these approaches have their merits beyond being cost effective and easy to disseminate, anatomy is inherently three-dimensional. By using 2D visualizations to illustrate more complex morphologies, important interactions between structures can be missed. In practice, such as in the planning and execution of surgical interventions, professionals require intricate knowledge of anatomical complexities, which can be more clearly communicated and understood through intuitive interaction with 3D volumetric datasets, such as those extracted from high-resolution CT or MRI scans. Open source, high quality, 3D medical imaging datasets are freely available, and with the emerging popularity of 3D display technologies, affordable and consistent 3D anatomical visualizations can be created. In this study we describe the design, implementation, and evaluation of one such interactive, stereoscopic visualization paradigm for human anatomy extracted from 3D medical images. A stereoscopic display was created by projecting the scene onto the lab floor using sequential frame stereo projection and viewed through active shutter glasses. By incorporating a PhaseSpace motion tracking system, a single viewer can navigate an augmented reality environment and directly manipulate virtual objects in 3D. While this paradigm is sufficiently versatile to enable a wide variety of applications in need of 3D visualization, we designed our study to work as an interactive game, which allows users to explore the anatomy of various organs and systems. In this study we describe the design, implementation, and evaluation of an interactive and stereoscopic visualization platform for exploring and understanding human anatomy. This system can present medical imaging data in three dimensions and allows for direct physical interaction and manipulation by the viewer. This should provide numerous benefits over traditional, 2D display and interaction modalities, and in our analysis, we aim to quantify and qualify users' visual and motor interactions with the virtual environment when employing this interactive display as a 3D didactic tool.

  7. Toward a Physical Characterization of Raindrop Collision Outcome Regimes

    NASA Technical Reports Server (NTRS)

    Testik, F. Y.; Barros, Ana P.; Bilven, Francis L.

    2011-01-01

    A comprehensive raindrop collision outcome regime diagram that delineates the physical conditions associated with the outcome regimes (i.e., bounce, coalescence, and different breakup types) of binary raindrop collisions is proposed. The proposed diagram builds on a theoretical regime diagram defined in the phase space of collision Weber numbers We and the drop diameter ratio p by including critical angle of impact considerations. In this study, the theoretical regime diagram is first evaluated against a comprehensive dataset for drop collision experiments representative of raindrop collisions in nature. Subsequently, the theoretical regime diagram is modified to explicitly describe the dominant regimes of raindrop interactions in (We, p) by delineating the physical conditions necessary for the occurrence of distinct types of collision-induced breakup (neck/filament, sheet, disk, and crown breakups) based on critical angle of impact consideration. Crown breakup is a subtype of disk breakup for lower collision kinetic energy that presents distinctive morphology. Finally, the experimental results are analyzed in the context of the comprehensive collision regime diagram, and conditional probabilities that can be used in the parameterization of breakup kernels in stochastic models of raindrop dynamics are provided.

  8. GeoNotebook: Browser based Interactive analysis and visualization workflow for very large climate and geospatial datasets

    NASA Astrophysics Data System (ADS)

    Ozturk, D.; Chaudhary, A.; Votava, P.; Kotfila, C.

    2016-12-01

    Jointly developed by Kitware and NASA Ames, GeoNotebook is an open source tool designed to give the maximum amount of flexibility to analysts, while dramatically simplifying the process of exploring geospatially indexed datasets. Packages like Fiona (backed by GDAL), Shapely, Descartes, Geopandas, and PySAL provide a stack of technologies for reading, transforming, and analyzing geospatial data. Combined with the Jupyter notebook and libraries like matplotlib/Basemap it is possible to generate detailed geospatial visualizations. Unfortunately, visualizations generated is either static or does not perform well for very large datasets. Also, this setup requires a great deal of boilerplate code to create and maintain. Other extensions exist to remedy these problems, but they provide a separate map for each input cell and do not support map interactions that feed back into the python environment. To support interactive data exploration and visualization on large datasets we have developed an extension to the Jupyter notebook that provides a single dynamic map that can be managed from the Python environment, and that can communicate back with a server which can perform operations like data subsetting on a cloud-based cluster.

  9. SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome.

    PubMed

    Li, Yiwei; Ilie, Lucian

    2017-11-15

    Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/ .

  10. Soil Physical, Chemical, and Thermal Characterization, Teller Road Site, Seward Peninsula, Alaska, 2016

    DOE Data Explorer

    Graham, David; Kholodov, Alexander; Wilson, Cathy; Moon, Ji-Won; Romanovsky, Vladimir; Busey, Bob

    2018-02-05

    This dataset provides the results of physical, chemical, and thermal characterization of soils at the Teller Road Site, Seward Peninsula, Alaska. Soil pits were dug from 7-14 September 2016 at designated Intensive Stations 2 through 9 at the Teller Road MM 27 Site. This dataset includes field observations and descriptions of soil layers or horizons, field measurements of soil volumetric water content, soil temperature, thermal conductivity, and heat capacity. Laboratory measurements of soil properties include gravimetric water content, bulk density, volumetric water content, and total carbon and nitrogen.

  11. Soil Physical, Chemical, and Thermal Characterization, Council Road Site, Seward Peninsula, Alaska, 2016

    DOE Data Explorer

    Alexander Kholodov; David Graham; Ji-Won Moon

    2018-01-22

    This dataset provides the results of physical, chemical, and thermal characterization of soils at the Council Road Site at MM71, Seward Peninsula, Alaska. Soil pits were dug on 11 September 2016 at three sites. This dataset includes field observations and descriptions of soil layers or horizons, field measurements of soil volumetric water content, soil temperature, thermal conductivity, and heat capacity. Laboratory measurements of soil properties include gravimetric water content, bulk density, volumetric water content, total carbon and nitrogen, and elemental composition from X-ray fluorescence for some elements.

  12. VEMAP Phase 2 bioclimatic database. I. Gridded historical (20th century) climate for modeling ecosystem dynamics across the conterminous USA

    USGS Publications Warehouse

    Kittel, T.G.F.; Rosenbloom, N.A.; Royle, J. Andrew; Daly, Christopher; Gibson, W.P.; Fisher, H.H.; Thornton, P.; Yates, D.N.; Aulenbach, S.; Kaufman, C.; McKeown, R.; Bachelet, D.; Schimel, D.S.; Neilson, R.; Lenihan, J.; Drapek, R.; Ojima, D.S.; Parton, W.J.; Melillo, J.M.; Kicklighter, D.W.; Tian, H.; McGuire, A.D.; Sykes, M.T.; Smith, B.; Cowling, S.; Hickler, T.; Prentice, I.C.; Running, S.; Hibbard, K.A.; Post, W.M.; King, A.W.; Smith, T.; Rizzo, B.; Woodward, F.I.

    2004-01-01

    Analysis and simulation of biospheric responses to historical forcing require surface climate data that capture those aspects of climate that control ecological processes, including key spatial gradients and modes of temporal variability. We developed a multivariate, gridded historical climate dataset for the conterminous USA as a common input database for the Vegetation/Ecosystem Modeling and Analysis Project (VEMAP), a biogeochemical and dynamic vegetation model intercomparison. The dataset covers the period 1895-1993 on a 0.5?? latitude/longitude grid. Climate is represented at both monthly and daily timesteps. Variables are: precipitation, mininimum and maximum temperature, total incident solar radiation, daylight-period irradiance, vapor pressure, and daylight-period relative humidity. The dataset was derived from US Historical Climate Network (HCN), cooperative network, and snowpack telemetry (SNOTEL) monthly precipitation and mean minimum and maximum temperature station data. We employed techniques that rely on geostatistical and physical relationships to create the temporally and spatially complete dataset. We developed a local kriging prediction model to infill discontinuous and limited-length station records based on spatial autocorrelation structure of climate anomalies. A spatial interpolation model (PRISM) that accounts for physiographic controls was used to grid the infilled monthly station data. We implemented a stochastic weather generator (modified WGEN) to disaggregate the gridded monthly series to dailies. Radiation and humidity variables were estimated from the dailies using a physically-based empirical surface climate model (MTCLIM3). Derived datasets include a 100 yr model spin-up climate and a historical Palmer Drought Severity Index (PDSI) dataset. The VEMAP dataset exhibits statistically significant trends in temperature, precipitation, solar radiation, vapor pressure, and PDSI for US National Assessment regions. The historical climate and companion datasets are available online at data archive centers. ?? Inter-Research 2004.

  13. A Systems Biology Methodology Combining Transcriptome and Interactome Datasets to Assess the Implications of Cytokinin Signaling for Plant Immune Networks.

    PubMed

    Kunz, Meik; Dandekar, Thomas; Naseem, Muhammad

    2017-01-01

    Cytokinins (CKs) play an important role in plant growth and development. Also, several studies highlight the modulatory implications of CKs for plant-pathogen interaction. However, the underlying mechanisms of CK mediating immune networks in plants are still not fully understood. A detailed analysis of high-throughput transcriptome (RNA-Seq and microarrays) datasets under modulated conditions of plant CKs and its mergence with cellular interactome (large-scale protein-protein interaction data) has the potential to unlock the contribution of CKs to plant defense. Here, we specifically describe a detailed systems biology methodology pertinent to the acquisition and analysis of various omics datasets that delineate the role of plant CKs in impacting immune pathways in Arabidopsis.

  14. Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions

    PubMed Central

    Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret

    2009-01-01

    Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254

  15. Quantifying Spatially Integrated Floodplain and Wetland Systems for the Conterminous US

    NASA Astrophysics Data System (ADS)

    Lane, C.; D'Amico, E.; Wing, O.; Bates, P. D.

    2017-12-01

    Wetlands interact with other waters across a variable connectivity continuum, from permanent to transient, from fast to slow, and from primarily surface water to exclusively groundwater flows. Floodplain wetlands typically experience fast and frequent surface and near-surface groundwater interactions with their river networks, leading to an increasing effort to tailor management strategies for these wetlands. Management of floodplain wetlands is contingent on accurate floodplain delineation, and though this has proven challenging, multiple efforts are being made to alleviate this data gap at the conterminous scale using spatial, physical, and hydrological floodplain proxies. In this study, we derived and contrasted floodplain extents using the following nationally available approaches: 1) a geospatial-buffer floodplain proxy (Lane and D'Amico 2016, JAWRA 52(3):705-722, 2) a regionalized flood frequency analysis coupled to a 30m resolution continental-scale hydraulic model (RFFA; Smith et al. 2015, WRR 51:539-553), and 3) a soils-based floodplain analysis (Sangwan and Merwade 2015, JAWRA 51(5):1286-1304). The geospatial approach uses National Wetlands Inventory and buffered National Hydrography Datasets. RFFA estimates extreme flows based on catchment size, regional climatology and upstream annual rainfall and routes these flows through a hydraulic model built with data from USGS HydroSHEDS, NOAA, and the National Elevation Dataset. Soil-based analyses define floodplains based on attributes within the USDA soil-survey data (SSURGO). Nearly 30% (by count) of U.S. freshwater wetlands are located within floodplains with geospatial analyses, contrasted with 37% (soils-based), and 53% (RFFA-based). The dichotomies between approaches are mainly a function of input data-layer resolution, accuracy, coverage, and extent, further discussed in this presentation. Ultimately, these spatial analyses and findings will improve floodplain and integrated wetland system extent assessment. This will lead to better management of the physically, chemically, and biologically integrated floodplain wetlands affecting the integrity of downstream waterbodies at multiple scales.

  16. Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs

    NASA Astrophysics Data System (ADS)

    Böhm, Hans-Joachim

    1998-07-01

    A dataset of 82 protein-ligand complexes of known 3D structure and binding constant Ki was analysed to elucidate the important factors that determine the strength of protein-ligand interactions. The following parameters were investigated: the number and geometry of hydrogen bonds and ionic interactions between the protein and the ligand, the size of the lipophilic contact surface, the flexibility of the ligand, the electrostatic potential in the binding site, water molecules in the binding site, cavities along the protein-ligand interface and specific interactions between aromatic rings. Based on these parameters, a new empirical scoring function is presented that estimates the free energy of binding for a protein-ligand complex of known 3D structure. The function distinguishes between buried and solvent accessible hydrogen bonds. It tolerates deviations in the hydrogen bond geometry of up to 0.25 Å in the length and up to 30 °Cs in the hydrogen bond angle without penalizing the score. The new energy function reproduces the binding constants (ranging from 3.7 × 10-2 M to 1 × 10-14 M, corresponding to binding energies between -8 and -80 kJ/mol) of the dataset with a standard deviation of 7.3 kJ/mol corresponding to 1.3 orders of magnitude in binding affinity. The function can be evaluated very fast and is therefore also suitable for the application in a 3D database search or de novo ligand design program such as LUDI. The physical significance of the individual contributions is discussed.

  17. Using the Gravity Model to Estimate the Spatial Spread of Vector-Borne Diseases

    PubMed Central

    Barrios, José Miguel; Verstraeten, Willem W.; Maes, Piet; Aerts, Jean-Marie; Farifteh, Jamshid; Coppin, Pol

    2012-01-01

    The gravity models are commonly used spatial interaction models. They have been widely applied in a large set of domains dealing with interactions amongst spatial entities. The spread of vector-borne diseases is also related to the intensity of interaction between spatial entities, namely, the physical habitat of pathogens’ vectors and/or hosts, and urban areas, thus humans. This study implements the concept behind gravity models in the spatial spread of two vector-borne diseases, nephropathia epidemica and Lyme borreliosis, based on current knowledge on the transmission mechanism of these diseases. Two sources of information on vegetated systems were tested: the CORINE land cover map and MODIS NDVI. The size of vegetated areas near urban centers and a local indicator of occupation-related exposure were found significant predictors of disease risk. Both the land cover map and the space-borne dataset were suited yet not equivalent input sources to locate and measure vegetated areas of importance for disease spread. The overall results point at the compatibility of the gravity model concept and the spatial spread of vector-borne diseases. PMID:23202882

  18. Using the gravity model to estimate the spatial spread of vector-borne diseases.

    PubMed

    Barrios, José Miguel; Verstraeten, Willem W; Maes, Piet; Aerts, Jean-Marie; Farifteh, Jamshid; Coppin, Pol

    2012-11-30

    The gravity models are commonly used spatial interaction models. They have been widely applied in a large set of domains dealing with interactions amongst spatial entities. The spread of vector-borne diseases is also related to the intensity of interaction between spatial entities, namely, the physical habitat of pathogens’ vectors and/or hosts, and urban areas, thus humans. This study implements the concept behind gravity models in the spatial spread of two vector-borne diseases, nephropathia epidemica and Lyme borreliosis, based on current knowledge on the transmission mechanism of these diseases. Two sources of information on vegetated systems were tested: the CORINE land cover map and MODIS NDVI. The size of vegetated areas near urban centers and a local indicator of occupation-related exposure were found significant predictors of disease risk. Both the land cover map and the space-borne dataset were suited yet not equivalent input sources to locate and measure vegetated areas of importance for disease spread. The overall results point at the compatibility of the gravity model concept and the spatial spread of vector-borne diseases.

  19. Physical properties of biological entities: an introduction to the ontology of physics for biology.

    PubMed

    Cook, Daniel L; Bookstein, Fred L; Gennari, John H

    2011-01-01

    As biomedical investigators strive to integrate data and analyses across spatiotemporal scales and biomedical domains, they have recognized the benefits of formalizing languages and terminologies via computational ontologies. Although ontologies for biological entities-molecules, cells, organs-are well-established, there are no principled ontologies of physical properties-energies, volumes, flow rates-of those entities. In this paper, we introduce the Ontology of Physics for Biology (OPB), a reference ontology of classical physics designed for annotating biophysical content of growing repositories of biomedical datasets and analytical models. The OPB's semantic framework, traceable to James Clerk Maxwell, encompasses modern theories of system dynamics and thermodynamics, and is implemented as a computational ontology that references available upper ontologies. In this paper we focus on the OPB classes that are designed for annotating physical properties encoded in biomedical datasets and computational models, and we discuss how the OPB framework will facilitate biomedical knowledge integration. © 2011 Cook et al.

  20. a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

    NASA Astrophysics Data System (ADS)

    Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

    2017-09-01

    Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.

  1. The MISTRALS programme data portal

    NASA Astrophysics Data System (ADS)

    Fleury, Laurence; Brissebrat, Guillaume; Belmahfoud, Nizar; Boichard, Jean-Luc; Brosolo, Laetitia; Cloché, Sophie; Descloitres, Jacques; Ferré, Hélène; Focsa, Loredana; Labatut, Laurent; Mastrorillo, Laurence; Mière, Arnaud; Petit de la Villéon, Loïc; Ramage, Karim; Schmechtig, Catherine

    2014-05-01

    Mediterranean Integrated STudies at Regional And Local Scales (MISTRALS) is a decennial programme for systematic observations and research dedicated to the understanding of the Mediterranean Basin environmental process and its evolution under the planet global change. It is composed of eight multidisciplinary projects that cover all the components of the Earth system (atmosphere, ocean, continental surfaces, lithosphere...) and their interactions, many disciplines (physics, chemistry, marine biogeochemistry, biology, geology, sociology...) and different time scales. For example Hydrological cycle in the Mediterranean eXperiment (HyMeX) aims at improving the predictability of rainfall extreme events, and assessing the social and economic vulnerability to extreme events and adaptation capacity, and Paleo Mediterranean Experiment (PaleoMeX) is dedicated to the study of the interactions between climate, societies and civilizations of the Mediterranean world during the last 10000 years. Many long term monitoring research networks are associated with MISTRALS, like Mediterranean Ocean Observing System on Environment (MOOSE), Centre d'Observation Régional pour la Surveillance du Climat et de l'environnement Atmosphérique et océanographique en Méditerranée occidentale (CORSICA) and the environmental observations from Mediterranean Eurocentre for Underwater Sciences and Technologies (MEUST-SE). Therefore, the data generated or used by the different MISTRALS projects are very heterogeneous. They include in situ observations, satellite products, model outputs, qualitative field surveys... Some datasets are automatically produced by operational networks, and others come from research instruments and analysis procedures. They correspond to different time scales (historical time series, observatories, campaigns...) and are managed by different data centres. They originate from many scientific communities, with varied data sharing cultures, specific expectations, and using different file formats and data processing tools. The MISTRALS data portal - http://mistrals.sedoo.fr/ - has been designed and developed as a unified tool to share scientific data in spite of many sources of heterogeneity, and to foster collaboration between research communities. The metadata (data description) are standardized and comply with international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). A search tool allows to browse the catalogue by keyword or by multicriteria selection (location, period, physical property...) and to access data. Data sets managed by different data centres (ICARE, IPSL, SEDOO, CORIOLIS) are available through interoperability protocols (OPeNDAP, xml requests...) or archive synchronisation. At present the MISTRALS data portal allows to access more than 400 datasets and counts more than 500 registered users. The number of available datasets is increasing daily, due to the provision of campaign datasets (2012, 2013) by several projects. Every in situ data set is available in the native format, but the favorite data sets have been homogenized (property names, units, quality flags...) and inserted in a relational database, in order to enable more accurate data selection, and download of different datasets in a shared format. Every scientist is invited to make use of the different MISTRALS tools and data. Do not hesitate to browse the catalogue and fill the online registration form. Feel free to contact mistrals-contact@sedoo.fr for any question.

  2. CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data.

    PubMed

    Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang

    2016-12-23

    A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .

  3. A family of interaction-adjusted indices of community similarity.

    PubMed

    Schmidt, Thomas Sebastian Benedikt; Matias Rodrigues, João Frederico; von Mering, Christian

    2017-03-01

    Interactions between taxa are essential drivers of ecological community structure and dynamics, but they are not taken into account by traditional indices of β diversity. In this study, we propose a novel family of indices that quantify community similarity in the context of taxa interaction networks. Using publicly available datasets, we assessed the performance of two specific indices that are Taxa INteraction-Adjusted (TINA, based on taxa co-occurrence networks), and Phylogenetic INteraction-Adjusted (PINA, based on phylogenetic similarities). TINA and PINA outperformed traditional indices when partitioning human-associated microbial communities according to habitat, even for extremely downsampled datasets, and when organising ocean micro-eukaryotic plankton diversity according to geographical and physicochemical gradients. We argue that interaction-adjusted indices capture novel aspects of diversity outside the scope of traditional approaches, highlighting the biological significance of ecological association networks in the interpretation of community similarity.

  4. A family of interaction-adjusted indices of community similarity

    PubMed Central

    Schmidt, Thomas Sebastian Benedikt; Matias Rodrigues, João Frederico; von Mering, Christian

    2017-01-01

    Interactions between taxa are essential drivers of ecological community structure and dynamics, but they are not taken into account by traditional indices of β diversity. In this study, we propose a novel family of indices that quantify community similarity in the context of taxa interaction networks. Using publicly available datasets, we assessed the performance of two specific indices that are Taxa INteraction-Adjusted (TINA, based on taxa co-occurrence networks), and Phylogenetic INteraction-Adjusted (PINA, based on phylogenetic similarities). TINA and PINA outperformed traditional indices when partitioning human-associated microbial communities according to habitat, even for extremely downsampled datasets, and when organising ocean micro-eukaryotic plankton diversity according to geographical and physicochemical gradients. We argue that interaction-adjusted indices capture novel aspects of diversity outside the scope of traditional approaches, highlighting the biological significance of ecological association networks in the interpretation of community similarity. PMID:27935587

  5. PIVOT: platform for interactive analysis and visualization of transcriptomics data.

    PubMed

    Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong

    2018-01-05

    Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.

  6. Integrated dataset of impact of dissolved organic matter on particle behavior and phototoxicity of titanium dioxide nanoparticles

    EPA Pesticide Factsheets

    This dataset is generated to both qualitatively and quantitatively examine the interactions between nano-TiO2 and natural organic matter (NOM). This integrated dataset assemble all data generated in this project through a series of experiments. This dataset is associated with the following publication:Li , S., H. Ma, L. Wallis, M. Etterson , B. Riley , D. Hoff , and S. Diamond. Impact of natural organic matter on particle behavior and phototoxicity of titanium dioxide nanoparticles. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 542: 324-333, (2016).

  7. A unified high-resolution wind and solar dataset from a rapidly updating numerical weather prediction model

    DOE PAGES

    James, Eric P.; Benjamin, Stanley G.; Marquis, Melinda

    2016-10-28

    A new gridded dataset for wind and solar resource estimation over the contiguous United States has been derived from hourly updated 1-h forecasts from the National Oceanic and Atmospheric Administration High-Resolution Rapid Refresh (HRRR) 3-km model composited over a three-year period (approximately 22 000 forecast model runs). The unique dataset features hourly data assimilation, and provides physically consistent wind and solar estimates for the renewable energy industry. The wind resource dataset shows strong similarity to that previously provided by a Department of Energy-funded study, and it includes estimates in southern Canada and northern Mexico. The solar resource dataset represents anmore » initial step towards application-specific fields such as global horizontal and direct normal irradiance. This combined dataset will continue to be augmented with new forecast data from the advanced HRRR atmospheric/land-surface model.« less

  8. Theory of impossible worlds: Toward a physics of information.

    PubMed

    Buscema, Paolo Massimo; Sacco, Pier Luigi; Della Torre, Francesca; Massini, Giulia; Breda, Marco; Ferilli, Guido

    2018-05-01

    In this paper, we introduce an innovative approach to the fusion between datasets in terms of attributes and observations, even when they are not related at all. With our technique, starting from datasets representing independent worlds, it is possible to analyze a single global dataset, and transferring each dataset onto the others is always possible. This procedure allows a deeper perspective in the study of a problem, by offering the chance of looking into it from other, independent points of view. Even unrelated datasets create a metaphoric representation of the problem, useful in terms of speed of convergence and predictive results, preserving the fundamental relationships in the data. In order to extract such knowledge, we propose a new learning rule named double backpropagation, by which an auto-encoder concurrently codifies all the different worlds. We test our methodology on different datasets and different issues, to underline the power and flexibility of the Theory of Impossible Worlds.

  9. Theory of impossible worlds: Toward a physics of information

    NASA Astrophysics Data System (ADS)

    Buscema, Paolo Massimo; Sacco, Pier Luigi; Della Torre, Francesca; Massini, Giulia; Breda, Marco; Ferilli, Guido

    2018-05-01

    In this paper, we introduce an innovative approach to the fusion between datasets in terms of attributes and observations, even when they are not related at all. With our technique, starting from datasets representing independent worlds, it is possible to analyze a single global dataset, and transferring each dataset onto the others is always possible. This procedure allows a deeper perspective in the study of a problem, by offering the chance of looking into it from other, independent points of view. Even unrelated datasets create a metaphoric representation of the problem, useful in terms of speed of convergence and predictive results, preserving the fundamental relationships in the data. In order to extract such knowledge, we propose a new learning rule named double backpropagation, by which an auto-encoder concurrently codifies all the different worlds. We test our methodology on different datasets and different issues, to underline the power and flexibility of the Theory of Impossible Worlds.

  10. The global coastline dataset: the observed relation between erosion and sea-level rise

    NASA Astrophysics Data System (ADS)

    Donchyts, G.; Baart, F.; Luijendijk, A.; Hagenaars, G.

    2017-12-01

    Erosion of sandy coasts is considered one of the key risks of sea-level rise. Because sandy coastlines of the world are often highly populated, erosive coastline trends result in risk to populations and infrastructure. Most of our understanding of the relation between sea-level rise and coastal erosion is based on local or regional observations and generalizations of numerical and physical experiments. Until recently there was no reliable global scale assessment of the location of sandy coasts and their rate of erosion and accretion. Here we present the global coastline dataset that covers erosion indicators on a local scale with global coverage. The dataset uses our global coastline transects grid defined with an alongshore spacing of 250 m and a cross shore length extending 1 km seaward and 1 km landward. This grid matches up with pre-existing local grids where available. We present the latest results on validation of coastal-erosion trends (based on optical satellites) and classification of sandy versus non-sandy coasts. We show the relation between sea-level rise (based both on tide-gauges and multi-mission satellite altimetry) and observed erosion trends over the last decades, taking into account broken-coastline trends (for example due to nourishments).An interactive web application presents the publicly-accessible results using a backend based on Google Earth Engine. It allows both researchers and stakeholders to use objective estimates of coastline trends, particularly when authoritative sources are not available.

  11. Association between volume and momentum of online searches and real-world collective unrest

    NASA Astrophysics Data System (ADS)

    Qi, Hong; Manrique, Pedro; Johnson, Daniela; Restrepo, Elvira; Johnson, Neil F.

    A fundamental idea from physics is that macroscopic transitions can occur as a result of an escalation in the correlated activity of a many-body system's constituent particles. Here we apply this idea in an interdisciplinary setting, whereby the particles are individuals, their correlated activity involves online search activity surrounding the topics of social unrest, and the macroscopic phenomenon being measured are real-world protests. Our empirical study covers countries in Latin America during 2011-2014 using datasets assembled from multiple sources by subject matter experts. We find specifically that the volume and momentum of searches on Google Trends surrounding mass protest language, can detect - and may even pre-empt - the macroscopic on-street activity. Not only can this simple open-source solution prove an invaluable aid for monitoring civil order, our study serves to strengthen the increasing literature in the physics community aimed at understanding the collective dynamics of interacting populations of living objects across the life sciences.

  12. Multivariate spatiotemporal visualizations for mobile devices in Flyover Country

    NASA Astrophysics Data System (ADS)

    Loeffler, S.; Thorn, R.; Myrbo, A.; Roth, R.; Goring, S. J.; Williams, J.

    2017-12-01

    Visualizing and interacting with complex multivariate and spatiotemporal datasets on mobile devices is challenging due to their smaller screens, reduced processing power, and limited data connectivity. Pollen data require visualizing pollen assemblages spatially, temporally, and across multiple taxa to understand plant community dynamics through time. Drawing from cartography, information visualization, and paleoecology, we have created new mobile-first visualization techniques that represent multiple taxa across many sites and enable user interaction. Using pollen datasets from the Neotoma Paleoecology Database as a case study, the visualization techniques allow ecological patterns and trends to be quickly understood on a mobile device compared to traditional pollen diagrams and maps. This flexible visualization system can be used for datasets beyond pollen, with the only requirements being point-based localities and multiple variables changing through time or depth.

  13. Physical punishment and childhood aggression: the role of gender and gene-environment interplay.

    PubMed

    Boutwell, Brian B; Franklin, Cortney A; Barnes, J C; Beaver, Kevin M

    2011-01-01

    A large body of research has linked spanking with a range of adverse outcomes in children, including aggression, psychopathology, and criminal involvement. Despite evidence concerning the association of spanking with antisocial behavior, not all children who are spanked develop antisocial traits. Given the heterogeneous effects of spanking on behavior, it is possible that a third variable may condition the influence of corporal punishment on child development. We test this possibility using data drawn from a nationally representative dataset of twin siblings. Our findings suggest that genetic risk factors condition the effects of spanking on antisocial behavior. Moreover, our results provide evidence that the interaction between genetic risk factors and corporal punishment may be particularly salient for males. © 2011 Wiley Periodicals, Inc.

  14. The FLIGHT Drosophila RNAi database

    PubMed Central

    Bursteinas, Borisas; Jain, Ekta; Gao, Qiong; Baum, Buzz; Zvelebil, Marketa

    2010-01-01

    FLIGHT (http://flight.icr.ac.uk/) is an online resource compiling data from high-throughput Drosophila in vivo and in vitro RNAi screens. FLIGHT includes details of RNAi reagents and their predicted off-target effects, alongside RNAi screen hits, scores and phenotypes, including images from high-content screens. The latest release of FLIGHT is designed to enable users to upload, analyze, integrate and share their own RNAi screens. Users can perform multiple normalizations, view quality control plots, detect and assign screen hits and compare hits from multiple screens using a variety of methods including hierarchical clustering. FLIGHT integrates RNAi screen data with microarray gene expression as well as genomic annotations and genetic/physical interaction datasets to provide a single interface for RNAi screen analysis and datamining in Drosophila. PMID:20855970

  15. Artificial intelligence support for scientific model-building

    NASA Technical Reports Server (NTRS)

    Keller, Richard M.

    1992-01-01

    Scientific model-building can be a time-intensive and painstaking process, often involving the development of large and complex computer programs. Despite the effort involved, scientific models cannot easily be distributed and shared with other scientists. In general, implemented scientific models are complex, idiosyncratic, and difficult for anyone but the original scientific development team to understand. We believe that artificial intelligence techniques can facilitate both the model-building and model-sharing process. In this paper, we overview our effort to build a scientific modeling software tool that aids the scientist in developing and using models. This tool includes an interactive intelligent graphical interface, a high-level domain specific modeling language, a library of physics equations and experimental datasets, and a suite of data display facilities.

  16. A Bayesian network approach for modeling local failure in lung cancer

    NASA Astrophysics Data System (ADS)

    Oh, Jung Hun; Craft, Jeffrey; Lozi, Rawan Al; Vaidya, Manushka; Meng, Yifan; Deasy, Joseph O.; Bradley, Jeffrey D.; El Naqa, Issam

    2011-03-01

    Locally advanced non-small cell lung cancer (NSCLC) patients suffer from a high local failure rate following radiotherapy. Despite many efforts to develop new dose-volume models for early detection of tumor local failure, there was no reported significant improvement in their application prospectively. Based on recent studies of biomarker proteins' role in hypoxia and inflammation in predicting tumor response to radiotherapy, we hypothesize that combining physical and biological factors with a suitable framework could improve the overall prediction. To test this hypothesis, we propose a graphical Bayesian network framework for predicting local failure in lung cancer. The proposed approach was tested using two different datasets of locally advanced NSCLC patients treated with radiotherapy. The first dataset was collected retrospectively, which comprises clinical and dosimetric variables only. The second dataset was collected prospectively in which in addition to clinical and dosimetric information, blood was drawn from the patients at various time points to extract candidate biomarkers as well. Our preliminary results show that the proposed method can be used as an efficient method to develop predictive models of local failure in these patients and to interpret relationships among the different variables in the models. We also demonstrate the potential use of heterogeneous physical and biological variables to improve the model prediction. With the first dataset, we achieved better performance compared with competing Bayesian-based classifiers. With the second dataset, the combined model had a slightly higher performance compared to individual physical and biological models, with the biological variables making the largest contribution. Our preliminary results highlight the potential of the proposed integrated approach for predicting post-radiotherapy local failure in NSCLC patients.

  17. A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.

    PubMed

    Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan

    2014-07-18

    Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.

  18. Understanding metropolitan patterns of daily encounters.

    PubMed

    Sun, Lijun; Axhausen, Kay W; Lee, Der-Horng; Huang, Xianfeng

    2013-08-20

    Understanding of the mechanisms driving our daily face-to-face encounters is still limited; the field lacks large-scale datasets describing both individual behaviors and their collective interactions. However, here, with the help of travel smart card data, we uncover such encounter mechanisms and structures by constructing a time-resolved in-vehicle social encounter network on public buses in a city (about 5 million residents). Using a population scale dataset, we find physical encounters display reproducible temporal patterns, indicating that repeated encounters are regular and identical. On an individual scale, we find that collective regularities dominate distinct encounters' bounded nature. An individual's encounter capability is rooted in his/her daily behavioral regularity, explaining the emergence of "familiar strangers" in daily life. Strikingly, we find individuals with repeated encounters are not grouped into small communities, but become strongly connected over time, resulting in a large, but imperceptible, small-world contact network or "structure of co-presence" across the whole metropolitan area. Revealing the encounter pattern and identifying this large-scale contact network are crucial to understanding the dynamics in patterns of social acquaintances, collective human behaviors, and--particularly--disclosing the impact of human behavior on various diffusion/spreading processes.

  19. Quantum-assisted learning of graphical models with arbitrary pairwise connectivity

    NASA Astrophysics Data System (ADS)

    Realpe-Gómez, John; Benedetti, Marcello; Biswas, Rupak; Perdomo-Ortiz, Alejandro

    Mainstream machine learning techniques rely heavily on sampling from generally intractable probability distributions. There is increasing interest in the potential advantages of using quantum computing technologies as sampling engines to speedup these tasks. However, some pressing challenges in state-of-the-art quantum annealers have to be overcome before we can assess their actual performance. The sparse connectivity, resulting from the local interaction between quantum bits in physical hardware implementations, is considered the most severe limitation to the quality of constructing powerful machine learning models. Here we show how to surpass this `curse of limited connectivity' bottleneck and illustrate our findings by training probabilistic generative models with arbitrary pairwise connectivity on a real dataset of handwritten digits and two synthetic datasets in experiments with up to 940 quantum bits. Our model can be trained in quantum hardware without full knowledge of the effective parameters specifying the corresponding Boltzmann-like distribution. Therefore, the need to infer the effective temperature at each iteration is avoided, speeding up learning, and the effect of noise in the control parameters is mitigated, improving accuracy. This work was supported in part by NASA, AFRL, ODNI, and IARPA.

  20. Research Applications of Data from Arctic Ocean Drifting Platforms: The Arctic Buoy Program and the Environmental Working Group CD's.

    NASA Astrophysics Data System (ADS)

    Moritz, R. E.; Rigor, I.

    2006-12-01

    ABSTRACT: The Arctic Buoy Program was initiated in 1978 to measure surface air pressure, surface temperature and sea-ice motion in the Arctic Ocean, on the space and time scales of synoptic weather systems, and to make the data available for research, forecasting and operations. The program, subsequently renamed the International Arctic Buoy Programme (IABP), has endured and expanded over the past 28 years. A hallmark of the IABP is the production, dissemination and archival of research-quality datasets and analyses. These datasets have been used by the authors of over 500 papers on meteorolgy, sea-ice physics, oceanography, air-sea interactions, climate, remote sensing and other topics. Elements of the IABP are described briefly, including measurements, analysis, data dissemination and data archival. Selected highlights of the research applications are reviewed, including ice dynamics, ocean-ice modeling, low-frequency variability of Arctic air-sea-ice circulation, and recent changes in the age, thickness and extent of Arctic Sea-ice. The extended temporal coverage of the data disseminated on the Environmental Working Group CD's is important for interpreting results in the context of climate.

  1. Dynamic patterns and ecological impacts of declining ocean pH in a high-resolution multi-year dataset.

    PubMed

    Wootton, J Timothy; Pfister, Catherine A; Forester, James D

    2008-12-02

    Increasing global concentrations of atmospheric CO(2) are predicted to decrease ocean pH, with potentially severe impacts on marine food webs, but empirical data documenting ocean pH over time are limited. In a high-resolution dataset spanning 8 years, pH at a north-temperate coastal site declined with increasing atmospheric CO(2) levels and varied substantially in response to biological processes and physical conditions that fluctuate over multiple time scales. Applying a method to link environmental change to species dynamics via multispecies Markov chain models reveals strong links between in situ benthic species dynamics and variation in ocean pH, with calcareous species generally performing more poorly than noncalcareous species in years with low pH. The models project the long-term consequences of these dynamic changes, which predict substantial shifts in the species dominating the habitat as a consequence of both direct effects of reduced calcification and indirect effects arising from the web of species interactions. Our results indicate that pH decline is proceeding at a more rapid rate than previously predicted in some areas, and that this decline has ecological consequences for near shore benthic ecosystems.

  2. Construction and Analysis of Long-Term Surface Temperature Dataset in Fujian Province

    NASA Astrophysics Data System (ADS)

    Li, W. E.; Wang, X. Q.; Su, H.

    2017-09-01

    Land surface temperature (LST) is a key parameter of land surface physical processes on global and regional scales, linking the heat fluxes and interactions between the ground and atmosphere. Based on MODIS 8-day LST products (MOD11A2) from the split-window algorithms, we constructed and obtained the monthly and annual LST dataset of Fujian Province from 2000 to 2015. Then, we analyzed the monthly and yearly time series LST data and further investigated the LST distribution and its evolution features. The average LST of Fujian Province reached the highest in July, while the lowest in January. The monthly and annual LST time series present a significantly periodic features (annual and interannual) from 2000 to 2015. The spatial distribution showed that the LST in North and West was lower than South and East in Fujian Province. With the rapid development and urbanization of the coastal area in Fujian Province, the LST in coastal urban region was significantly higher than that in mountainous rural region. The LST distributions might affected by the climate, topography and land cover types. The spatio-temporal distribution characteristics of LST could provide good references for the agricultural layout and environment monitoring in Fujian Province.

  3. Understanding metropolitan patterns of daily encounters

    PubMed Central

    Sun, Lijun; Axhausen, Kay W.; Lee, Der-Horng; Huang, Xianfeng

    2013-01-01

    Understanding of the mechanisms driving our daily face-to-face encounters is still limited; the field lacks large-scale datasets describing both individual behaviors and their collective interactions. However, here, with the help of travel smart card data, we uncover such encounter mechanisms and structures by constructing a time-resolved in-vehicle social encounter network on public buses in a city (about 5 million residents). Using a population scale dataset, we find physical encounters display reproducible temporal patterns, indicating that repeated encounters are regular and identical. On an individual scale, we find that collective regularities dominate distinct encounters’ bounded nature. An individual’s encounter capability is rooted in his/her daily behavioral regularity, explaining the emergence of “familiar strangers” in daily life. Strikingly, we find individuals with repeated encounters are not grouped into small communities, but become strongly connected over time, resulting in a large, but imperceptible, small-world contact network or “structure of co-presence” across the whole metropolitan area. Revealing the encounter pattern and identifying this large-scale contact network are crucial to understanding the dynamics in patterns of social acquaintances, collective human behaviors, and—particularly—disclosing the impact of human behavior on various diffusion/spreading processes. PMID:23918373

  4. A Novel Algorithm for Detecting Protein Complexes with the Breadth First Search

    PubMed Central

    Tang, Xiwei; Wang, Jianxin; Li, Min; He, Yiming; Pan, Yi

    2014-01-01

    Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes. PMID:24818139

  5. Implementing DOIs for Oceanographic Satellite Data at PO.DAAC

    NASA Astrophysics Data System (ADS)

    Hausman, J.; Tauer, E.; Chung, N.; Chen, C.; Moroni, D. F.

    2013-12-01

    The Physical Oceanographic Distributed Active Archive Center (PO.DAAC) is NASA's archive for physical oceanographic satellite data. It distributes over 500 datasets from gravity, ocean wind, sea surface topography, sea ice, ocean currents, salinity, and sea surface temperature satellite missions. A dataset is a collection of granules/files that share the same mission/project, versioning, processing level, spatial, and temporal characteristics. The large number of datasets is partially due to the number of satellite missions, but mostly because a single satellite mission typically has multiple versions or even temporal and spatial resolutions of data. As a result, a user might mistake one dataset for a different dataset from the same satellite mission. Due to the PO.DAAC'S vast variety and volume of data and growing requirements to report dataset usage, it has begun implementing DOIs for the datasets it archives and distributes. However, this was not as simple as registering a name for a DOI and providing a URL. Before implementing DOIs multiple questions needed to be answered. What are the sponsor and end-user expectations regarding DOIs? At what level does a DOI get assigned (dataset, file/granule)? Do all data get a DOI, or only selected data? How do we create a DOI? How do we create landing pages and manage them? What changes need to be made to the data archive, life cycle policy and web portal to accommodate DOIs? What if the data also exists at another archive and a DOI already exists? How is a DOI included if the data were obtained via a subsetting tool? How does a researcher or author provide a unique, definitive reference (standard citation) for a given dataset? This presentation will discuss how these questions were answered through changes in policy, process, and system design. Implementing DOIs is not a trivial undertaking, but as DOIs are rapidly becoming the de facto approach, it is worth the effort. Researchers have historically referenced the source satellite and data center (or archive), but scientific writings do not typically provide enough detail to point to a singular, uniquely identifiable dataset. DOIs provide the means to help researchers be precise in their data citations and provide needed clarity, standardization and permanence.

  6. Relieving the tension between weak lensing and cosmic microwave background with interacting dark matter and dark energy models

    NASA Astrophysics Data System (ADS)

    An, Rui; Feng, Chang; Wang, Bin

    2018-02-01

    We constrain interacting dark matter and dark energy (IDMDE) models using a 450-degree-square cosmic shear data from the Kilo Degree Survey (KiDS) and the angular power spectra from Planck's latest cosmic microwave background measurements. We revisit the discordance problem in the standard Lambda cold dark matter (ΛCDM) model between weak lensing and Planck datasets and extend the discussion by introducing interacting dark sectors. The IDMDE models are found to be able to alleviate the discordance between KiDS and Planck as previously inferred from the ΛCDM model, and moderately favored by a combination of the two datasets.

  7. Enabling systematic interrogation of protein-protein interactions in live cells with a versatile ultra-high-throughput biosensor platform | Office of Cancer Genomics

    Cancer.gov

    The vast datasets generated by next generation gene sequencing and expression profiling have transformed biological and translational research. However, technologies to produce large-scale functional genomics datasets, such as high-throughput detection of protein-protein interactions (PPIs), are still in early development. While a number of powerful technologies have been employed to detect PPIs, a singular PPI biosensor platform featured with both high sensitivity and robustness in a mammalian cell environment remains to be established.

  8. Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

    EPA Science Inventory

    Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...

  9. Species interactions in occurrence data for a community of tick-transmitted pathogens

    PubMed Central

    Estrada-Peña, Agustín; de la Fuente, José

    2016-01-01

    Interactions between tick species, their realized range of hosts, the pathogens they carry and transmit, and the geographic distribution of species in the Western Palearctic were determined based on evidence published between 1970–2014. These relationships were linked to remotely sensed features of temperature and vegetation and used to extract the network of interactions among the organisms. The resulting datasets focused on niche overlap among ticks and hosts, species interactions, and the fraction of the environmental niche in which tick-borne pathogens may circulate as a result of interactions and overlapping environmental traits. The resulting datasets provide a valuable resource for researchers interested in tick-borne pathogens, as they conciliate the abiotic and biotic sides of their niche, allowing exploration of the importance of each host species acting as a vertebrate reservoir in the circulation of tick-transmitted pathogens in the environmental niche. PMID:27479213

  10. Genome-wide interaction study of smoking and bladder cancer risk

    PubMed Central

    Figueroa, Jonine D.; Han, Summer S.; Garcia-Closas, Montserrat; Baris, Dalsu; Jacobs, Eric J.; Kogevinas, Manolis; Schwenn, Molly; Malats, Nuria; Johnson, Alison; Purdue, Mark P.; Caporaso, Neil; Landi, Maria Teresa; Prokunina-Olsson, Ludmila; Wang, Zhaoming; Hutchinson, Amy; Burdette, Laurie; Wheeler, William; Vineis, Paolo; Siddiq, Afshan; Cortessis, Victoria K.; Kooperberg, Charles; Cussenot, Olivier; Benhamou, Simone; Prescott, Jennifer; Porru, Stefano; Bueno-de-Mesquita, H.Bas; Trichopoulos, Dimitrios; Ljungberg, Börje; Clavel-Chapelon, Françoise; Weiderpass, Elisabete; Krogh, Vittorio; Dorronsoro, Miren; Travis, Ruth; Tjønneland, Anne; Brenan, Paul; Chang-Claude, Jenny; Riboli, Elio; Conti, David; Gago-Dominguez, Manuela; Stern, Mariana C.; Pike, Malcolm C.; Van Den Berg, David; Yuan, Jian-Min; Hohensee, Chancellor; Rodabough, Rebecca; Cancel-Tassin, Geraldine; Roupret, Morgan; Comperat, Eva; Chen, Constance; De Vivo, Immaculata; Giovannucci, Edward; Hunter, David J.; Kraft, Peter; Lindstrom, Sara; Carta, Angela; Pavanello, Sofia; Arici, Cecilia; Mastrangelo, Giuseppe; Karagas, Margaret R.; Schned, Alan; Armenti, Karla R.; Hosain, G.M.Monawar; Haiman, Chris A.; Fraumeni, Joseph F.; Chanock, Stephen J.; Chatterjee, Nilanjan; Rothman, Nathaniel; Silverman, Debra T.

    2014-01-01

    Bladder cancer is a complex disease with known environmental and genetic risk factors. We performed a genome-wide interaction study (GWAS) of smoking and bladder cancer risk based on primary scan data from 3002 cases and 4411 controls from the National Cancer Institute Bladder Cancer GWAS. Alternative methods were used to evaluate both additive and multiplicative interactions between individual single nucleotide polymorphisms (SNPs) and smoking exposure. SNPs with interaction P values < 5 × 10− 5 were evaluated further in an independent dataset of 2422 bladder cancer cases and 5751 controls. We identified 10 SNPs that showed association in a consistent manner with the initial dataset and in the combined dataset, providing evidence of interaction with tobacco use. Further, two of these novel SNPs showed strong evidence of association with bladder cancer in tobacco use subgroups that approached genome-wide significance. Specifically, rs1711973 (FOXF2) on 6p25.3 was a susceptibility SNP for never smokers [combined odds ratio (OR) = 1.34, 95% confidence interval (CI) = 1.20–1.50, P value = 5.18 × 10− 7]; and rs12216499 (RSPH3-TAGAP-EZR) on 6q25.3 was a susceptibility SNP for ever smokers (combined OR = 0.75, 95% CI = 0.67–0.84, P value = 6.35 × 10− 7). In our analysis of smoking and bladder cancer, the tests for multiplicative interaction seemed to more commonly identify susceptibility loci with associations in never smokers, whereas the additive interaction analysis identified more loci with associations among smokers—including the known smoking and NAT2 acetylation interaction. Our findings provide additional evidence of gene–environment interactions for tobacco and bladder cancer. PMID:24662972

  11. A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system

    NASA Astrophysics Data System (ADS)

    Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley

    2016-04-01

    An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between observed and simulated streamflows as a result of more realistic sub-daily meteorological forcing.

  12. Assessing reliability of protein-protein interactions by integrative analysis of data in model organisms.

    PubMed

    Lin, Xiaotong; Liu, Mei; Chen, Xue-wen

    2009-04-29

    Protein-protein interactions play vital roles in nearly all cellular processes and are involved in the construction of biological pathways such as metabolic and signal transduction pathways. Although large-scale experiments have enabled the discovery of thousands of previously unknown linkages among proteins in many organisms, the high-throughput interaction data is often associated with high error rates. Since protein interaction networks have been utilized in numerous biological inferences, the inclusive experimental errors inevitably affect the quality of such prediction. Thus, it is essential to assess the quality of the protein interaction data. In this paper, a novel Bayesian network-based integrative framework is proposed to assess the reliability of protein-protein interactions. We develop a cross-species in silico model that assigns likelihood scores to individual protein pairs based on the information entirely extracted from model organisms. Our proposed approach integrates multiple microarray datasets and novel features derived from gene ontology. Furthermore, the confidence scores for cross-species protein mappings are explicitly incorporated into our model. Applying our model to predict protein interactions in the human genome, we are able to achieve 80% in sensitivity and 70% in specificity. Finally, we assess the overall quality of the experimentally determined yeast protein-protein interaction dataset. We observe that the more high-throughput experiments confirming an interaction, the higher the likelihood score, which confirms the effectiveness of our approach. This study demonstrates that model organisms certainly provide important information for protein-protein interaction inference and assessment. The proposed method is able to assess not only the overall quality of an interaction dataset, but also the quality of individual protein-protein interactions. We expect the method to continually improve as more high quality interaction data from more model organisms becomes available and is readily scalable to a genome-wide application.

  13. Hydrodynamic variability of the Cretan Sea derived from Argo float profiles and multi-parametric buoy measurements during 2010-2012

    NASA Astrophysics Data System (ADS)

    Kassis, Dimitris; Korres, Gerasimos; Petihakis, George; Perivoliotis, Leonidas

    2015-12-01

    In this work, we examine the complex hydrology of the Cretan Sea, an important area which affects the dynamics of the Eastern Mediterranean basin. We use T/S profile data derived from the first Argo float deployed in the area during June 2010 within the framework of the Greek Argo program. Temperature and salinity profiles were measured over a 2-year period, analyzed, and combined with time series data recorded from the POSEIDON E1-M3A multi-parametric instrumentation platform operating in the area since 2007. The acquired datasets have been enriched with available CTD profiles taken on the mooring site during cruise maintenance surveys. The combined research activities resulted in a large dataset of physical properties allowing extended geographical coverage and an in-depth analysis of the Cretan Sea dynamics during this 2-year period. Data analysis shows significant variability of water masses of different origin at subsurface and deep layers. This confirms previous findings describing the area as transitional with water masses of different origin meeting and interacting. Furthermore, additional features of the area are described combining information from satellite altimetry. In this study, new circulation systems are identified at intermediate and subsurface layers affecting both the dynamic behavior of the basin's upper thermocline and the intermediate/deep water mass tempo-spatial variability. We further investigate the physical properties of the water column and suggest an updated mesoscale circulation picture based on the dynamics of the variable hydrological regimes of the Cretan Sea basin.

  14. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  15. Prediction of drug indications based on chemical interactions and chemical similarities.

    PubMed

    Huang, Guohua; Lu, Yin; Lu, Changhong; Zheng, Mingyue; Cai, Yu-Dong

    2015-01-01

    Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs.

  16. Prediction of Drug Indications Based on Chemical Interactions and Chemical Similarities

    PubMed Central

    Huang, Guohua; Lu, Yin; Lu, Changhong; Cai, Yu-Dong

    2015-01-01

    Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs. PMID:25821813

  17. MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets

    PubMed Central

    Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J.

    2017-01-01

    Abstract Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. PMID:28472340

  18. Integrated genome browser: visual analytics platform for genomics.

    PubMed

    Freese, Nowlan H; Norris, David C; Loraine, Ann E

    2016-07-15

    Genome browsers that support fast navigation through vast datasets and provide interactive visual analytics functions can help scientists achieve deeper insight into biological systems. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Here we describe multiple updates to IGB, including all-new capabilities to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB's ability to consume data from diverse sources, including Galaxy, Distributed Annotation and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data. IGB is open source and is freely available from http://bioviz.org/igb aloraine@uncc.edu. © The Author 2016. Published by Oxford University Press.

  19. Glycan array data management at Consortium for Functional Glycomics.

    PubMed

    Venkataraman, Maha; Sasisekharan, Ram; Raman, Rahul

    2015-01-01

    Glycomics or the study of structure-function relationships of complex glycans has reshaped post-genomics biology. Glycans mediate fundamental biological functions via their specific interactions with a variety of proteins. Recognizing the importance of glycomics, large-scale research initiatives such as the Consortium for Functional Glycomics (CFG) were established to address these challenges. Over the past decade, the Consortium for Functional Glycomics (CFG) has generated novel reagents and technologies for glycomics analyses, which in turn have led to generation of diverse datasets. These datasets have contributed to understanding glycan diversity and structure-function relationships at molecular (glycan-protein interactions), cellular (gene expression and glycan analysis), and whole organism (mouse phenotyping) levels. Among these analyses and datasets, screening of glycan-protein interactions on glycan array platforms has gained much prominence and has contributed to cross-disciplinary realization of the importance of glycomics in areas such as immunology, infectious diseases, cancer biomarkers, etc. This manuscript outlines methodologies for capturing data from glycan array experiments and online tools to access and visualize glycan array data implemented at the CFG.

  20. EasyFRAP-web: a web-based tool for the analysis of fluorescence recovery after photobleaching data.

    PubMed

    Koulouras, Grigorios; Panagopoulos, Andreas; Rapsomaniki, Maria A; Giakoumakis, Nickolaos N; Taraviras, Stavros; Lygerou, Zoi

    2018-06-13

    Understanding protein dynamics is crucial in order to elucidate protein function and interactions. Advances in modern microscopy facilitate the exploration of the mobility of fluorescently tagged proteins within living cells. Fluorescence recovery after photobleaching (FRAP) is an increasingly popular functional live-cell imaging technique which enables the study of the dynamic properties of proteins at a single-cell level. As an increasing number of labs generate FRAP datasets, there is a need for fast, interactive and user-friendly applications that analyze the resulting data. Here we present easyFRAP-web, a web application that simplifies the qualitative and quantitative analysis of FRAP datasets. EasyFRAP-web permits quick analysis of FRAP datasets through an intuitive web interface with interconnected analysis steps (experimental data assessment, different types of normalization and estimation of curve-derived quantitative parameters). In addition, easyFRAP-web provides dynamic and interactive data visualization and data and figure export for further analysis after every step. We test easyFRAP-web by analyzing FRAP datasets capturing the mobility of the cell cycle regulator Cdt2 in the presence and absence of DNA damage in cultured cells. We show that easyFRAP-web yields results consistent with previous studies and highlights cell-to-cell heterogeneity in the estimated kinetic parameters. EasyFRAP-web is platform-independent and is freely accessible at: https://easyfrap.vmnet.upatras.gr/.

  1. cellVIEW: a Tool for Illustrative and Multi-Scale Rendering of Large Biomolecular Datasets

    PubMed Central

    Le Muzic, Mathieu; Autin, Ludovic; Parulek, Julius; Viola, Ivan

    2017-01-01

    In this article we introduce cellVIEW, a new system to interactively visualize large biomolecular datasets on the atomic level. Our tool is unique and has been specifically designed to match the ambitions of our domain experts to model and interactively visualize structures comprised of several billions atom. The cellVIEW system integrates acceleration techniques to allow for real-time graphics performance of 60 Hz display rate on datasets representing large viruses and bacterial organisms. Inspired by the work of scientific illustrators, we propose a level-of-detail scheme which purpose is two-fold: accelerating the rendering and reducing visual clutter. The main part of our datasets is made out of macromolecules, but it also comprises nucleic acids strands which are stored as sets of control points. For that specific case, we extend our rendering method to support the dynamic generation of DNA strands directly on the GPU. It is noteworthy that our tool has been directly implemented inside a game engine. We chose to rely on a third party engine to reduce software development work-load and to make bleeding-edge graphics techniques more accessible to the end-users. To our knowledge cellVIEW is the only suitable solution for interactive visualization of large bimolecular landscapes on the atomic level and is freely available to use and extend. PMID:29291131

  2. Integrating genome-wide association study summaries and element-gene interaction datasets identified multiple associations between elements and complex diseases.

    PubMed

    He, Awen; Wang, Wenyu; Prakash, N Tejo; Tinkov, Alexey A; Skalny, Anatoly V; Wen, Yan; Hao, Jingcan; Guo, Xiong; Zhang, Feng

    2018-03-01

    Chemical elements are closely related to human health. Extensive genomic profile data of complex diseases offer us a good opportunity to systemically investigate the relationships between elements and complex diseases/traits. In this study, we applied gene set enrichment analysis (GSEA) approach to detect the associations between elements and complex diseases/traits though integrating element-gene interaction datasets and genome-wide association study (GWAS) data of complex diseases/traits. To illustrate the performance of GSEA, the element-gene interaction datasets of 24 elements were extracted from the comparative toxicogenomics database (CTD). GWAS summary datasets of 24 complex diseases or traits were downloaded from the dbGaP or GEFOS websites. We observed significant associations between 7 elements and 13 complex diseases or traits (all false discovery rate (FDR) < 0.05), including reported relationships such as aluminum vs. Alzheimer's disease (FDR = 0.042), calcium vs. bone mineral density (FDR = 0.031), magnesium vs. systemic lupus erythematosus (FDR = 0.012) as well as novel associations, such as nickel vs. hypertriglyceridemia (FDR = 0.002) and bipolar disorder (FDR = 0.027). Our study results are consistent with previous biological studies, supporting the good performance of GSEA. Our analyzing results based on GSEA framework provide novel clues for discovering causal relationships between elements and complex diseases. © 2017 WILEY PERIODICALS, INC.

  3. Developing Novel Machine Learning Algorithms to Improve Sedentary Assessment for Youth Health Enhancement.

    PubMed

    Golla, Gowtham Kumar; Carlson, Jordan A; Huan, Jun; Kerr, Jacqueline; Mitchell, Tarrah; Borner, Kelsey

    2016-10-01

    Sedentary behavior of youth is an important determinant of health. However, better measures are needed to improve understanding of this relationship and the mechanisms at play, as well as to evaluate health promotion interventions. Wearable accelerometers are considered as the standard for assessing physical activity in research, but do not perform well for assessing posture (i.e., sitting vs. standing), a critical component of sedentary behavior. The machine learning algorithms that we propose for assessing sedentary behavior will allow us to re-examine existing accelerometer data to better understand the association between sedentary time and health in various populations. We collected two datasets, a laboratory-controlled dataset and a free-living dataset. We trained machine learning classifiers separately on each dataset and compared performance across datasets. The classifiers predict five postures: sit, stand, sit-stand, stand-sit, and stand\\walk. We compared a manually constructed Hidden Markov model (HMM) with an automated HMM from existing software. The manually constructed HMM gave more F1-Macro score on both datasets.

  4. Branch: an interactive, web-based tool for testing hypotheses and developing predictive models.

    PubMed

    Gangavarapu, Karthik; Babji, Vyshakh; Meißner, Tobias; Su, Andrew I; Good, Benjamin M

    2016-07-01

    Branch is a web application that provides users with the ability to interact directly with large biomedical datasets. The interaction is mediated through a collaborative graphical user interface for building and evaluating decision trees. These trees can be used to compose and test sophisticated hypotheses and to develop predictive models. Decision trees are built and evaluated based on a library of imported datasets and can be stored in a collective area for sharing and re-use. Branch is hosted at http://biobranch.org/ and the open source code is available at http://bitbucket.org/sulab/biobranch/ asu@scripps.edu or bgood@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  5. NASA GES DISC On-line Visualization and Analysis System for Gridded Remote Sensing Data

    NASA Technical Reports Server (NTRS)

    Leptoukh, Gregory G.; Berrick, S.; Rui, H.; Liu, Z.; Zhu, T.; Teng, W.; Shen, S.; Qin, J.

    2005-01-01

    The ability to use data stored in the current NASA Earth Observing System (EOS) archives for studying regional or global phenomena is highly dependent on having a detailed understanding of the data's internal structure and physical implementation. Gaining this understanding and applying it to data reduction is a time-consuming task that must be undertaken before the core investigation can begin. This is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets that are usually of different formats, structures, and resolutions. The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) has taken a major step towards meeting this challenge by developing an infrastructure with a Web interface that allows users to perform interactive analysis online without downloading any data, the GES-DISC Interactive Online Visualization and Analysis Infrastructure or "Giovanni." Giovanni provides interactive, online, analysis tools for data users to facilitate their research. There have been several instances of this interface created to serve TRMM users, Aerosol scientists, Ocean Color and Agriculture applications users. The first generation of these tools support gridded data only. The user selects geophysical parameters, area of interest, time period; and the system generates an output on screen in a matter of seconds. The currently available output options are: Area plot averaged or accumulated over any available data period for any rectangular area; Time plot time series averaged over any rectangular area; Hovmoller plots image view of any longitude-time and latitude-time cross sections; ASCII output for all plot types; Image animation for area plot. Another analysis suite deals with parameter intercomparison: scatter plots, temporal correlation maps, GIs-compatible outputs, etc. This allow user to focus on data content (i.e. science parameters) and eliminate the need for expensive learning, development and processing tasks that are redundantly incurred by an archive's user community. The current implementation utilizes the GrADS-DODS Server (GDS), and provides subsetting and analysis services across the Internet for any GrADS-readable dataset. The subsetting capability allows users to retrieve a specified temporal and/or spatial subdomain from a large dataset, eliminating the need to download everything simply to access a small relevant portion of a dataset. The analysis capability allows users to retrieve the results of an operation applied to one or more datasets on the server. We use this approach to read pre-processed binary files and/or to read and extract the needed parts directly from HDF or HDF-EOS files. These subsets then serve as inputs into GrADS analysis scripts. It can be used in a wide variety of Earth science applications: climate and weather events study and monitoring; modeling. It can be easily configured for new applications.

  6. Accessing Solar Irradiance Data Products From the LASP Interactive Solar IRradiance Datacenter (LISIRD)

    NASA Astrophysics Data System (ADS)

    Ware Dewolfe, A.; Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M.; Woods, T. N.

    2009-12-01

    The Laboratory for Atmospheric and Space Physics (LASP) is enhancing the LASP Interactive Solar IRradiance Datacenter (LISIRD) to provide access to a comprehensive set of solar spectral irradiance measurements. LISIRD has recently been updated to serve many new datasets and models, including sunspot index, photometric sunspot index, Lyman-alpha, and magnesium-II core-to-wing ratio. A new user interface emphasizes web-based interactive visualizations, allowing users to explore and compare this data before downloading it for analysis. The data provided covers a wavelength range from soft X-ray (XUV) at 0.1 nm up to the near infrared (NIR) at 2400 nm, as well as wavelength-independent Total Solar Irradiance (TSI). Combined data from the SORCE, TIMED-SEE, UARS-SOLSTICE, and SME instruments provide almost continuous coverage from 1981 to the present, while Hydrogen Lyman-alpha (121.6 nm) measurements / models date from 1947 to the present. This poster provides an overview of the LISIRD system, summarizes the data sets currently available, describes future plans and capabilities, and provides details on how to access solar irradiance data through LISIRD interfaces at http://lasp.colorado.edu/lisird/.

  7. Solar Irradiance Data Products at the LASP Interactive Solar IRradiance Datacenter (LISIRD)

    NASA Astrophysics Data System (ADS)

    Ware Dewolfe, A.; Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

    2010-12-01

    The Laboratory for Atmospheric and Space Physics (LASP) has developed the LASP Interactive Solar IRradiance Datacenter (LISIRD) to provide access to a comprehensive set of solar irradiance measurements. LISIRD has recently been updated to serve many new datasets and models, including data from SORCE, UARS-SOLSTICE, SME, and TIMED-SEE, and model data from the Flare Irradiance Spectral Model (FISM). The user interface emphasizes web-based interactive visualizations, allowing users to explore and compare this data before downloading it for analysis. The data provided covers a wavelength range from soft X-ray (XUV) at 0.1 nm up to the near infrared (NIR) at 2400 nm, as well as wavelength-independent Total Solar Irradiance (TSI). Combined data from the SORCE, TIMED-SEE, UARS-SOLSTICE, and SME instruments provide continuous coverage from 1981 to the present, while Lyman-alpha measurements, FISM daily data, and TSI models date from the 1940s to the present. LISIRD will also host Glory TSI data as part of the SORCE data system. This poster provides an overview of the LISIRD system, summarizes the data sets currently available, describes future plans and capabilities, and provides details on how to access solar irradiance data through LISIRD’s interfaces.

  8. Workshop on Incomplete Network Data Held at Sandia National Labs – Livermore

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soundarajan, Sucheta; Wendt, Jeremy D.

    2016-06-01

    While network analysis is applied in a broad variety of scientific fields (including physics, computer science, biology, and the social sciences), how networks are constructed and the resulting bias and incompleteness have drawn more limited attention. For example, in biology, gene networks are typically developed via experiment -- many actual interactions are likely yet to be discovered. In addition to this incompleteness, the data-collection processes can introduce significant bias into the observed network datasets. For instance, if you observe part of the World Wide Web network through a classic random walk, then high degree nodes are more likely to bemore » found than if you had selected nodes at random. Unfortunately, such incomplete and biasing data collection methods must be often used.« less

  9. Assembling a protein-protein interaction map of the SSU processome from existing datasets.

    PubMed

    Lim, Young H; Charette, J Michael; Baserga, Susan J

    2011-03-10

    The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis.

  10. Assembling a Protein-Protein Interaction Map of the SSU Processome from Existing Datasets

    PubMed Central

    Baserga, Susan J.

    2011-01-01

    Background The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. Methodology We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. Conclusions We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis. PMID:21423703

  11. Exploring the interaction between O₃ and NOx pollution patterns in the atmosphere of Barcelona, Spain using the MCR-ALS method.

    PubMed

    Malik, Amrita; Tauler, Roma

    2015-06-01

    This work focuses on understanding the behaviour and patterns of three atmospheric pollutants namely, nitric oxide (NO), nitrogen dioxide (NO2), and ozone (O3) along with their mutual interactions in the atmosphere of Barcelona, North Spain. Hourly samples were collected for NO, NO2 and O3 from the same city location for three consecutive years (2010-2012). The study explores the seasonal, annual and weekday-weekend variations in their diurnal profiles along with the possible identification of their source and mutual interactions in the region. Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) was applied to the individual datasets of these pollutants, as well as to all of them simultaneously (augmented mode) to resolve the profiles related to their source and variation patterns in the atmosphere. The analysis of the individual datasets confirmed the source pattern variations in the concerned pollutant's profiles; and the resolved profiles for augmented datasets suggested for the mutual interaction of the pollutants along with their patterns variations, simultaneously. The study suggests vehicular pollution as the major source of atmospheric nitrogen oxides and presence of weekend ozone effect in the region. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Drug/Cell-line Browser: interactive canvas visualization of cancer drug/cell-line viability assay datasets.

    PubMed

    Duan, Qiaonan; Wang, Zichen; Fernandez, Nicolas F; Rouillard, Andrew D; Tan, Christopher M; Benes, Cyril H; Ma'ayan, Avi

    2014-11-15

    Recently, several high profile studies collected cell viability data from panels of cancer cell lines treated with many drugs applied at different concentrations. Such drug sensitivity data for cancer cell lines provide suggestive treatments for different types and subtypes of cancer. Visualization of these datasets can reveal patterns that may not be obvious by examining the data without such efforts. Here we introduce Drug/Cell-line Browser (DCB), an online interactive HTML5 data visualization tool for interacting with three of the recently published datasets of cancer cell lines/drug-viability studies. DCB uses clustering and canvas visualization of the drugs and the cell lines, as well as a bar graph that summarizes drug effectiveness for the tissue of origin or the cancer subtypes for single or multiple drugs. DCB can help in understanding drug response patterns and prioritizing drug/cancer cell line interactions by tissue of origin or cancer subtype. DCB is an open source Web-based tool that is freely available at: http://www.maayanlab.net/LINCS/DCB CONTACT: avi.maayan@mssm.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. EnviroAtlas - Austin, TX - Demographics by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset is a summary of key demographic groups for the EnviroAtlas community. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition

    PubMed Central

    Saez, Yago; Baldominos, Alejandro; Isasi, Pedro

    2016-01-01

    Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called “deep learning”, which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google’s TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset. This finding enables the design of more energy-efficient devices and facilitates cold starts and big data processing of physical activity records. PMID:28042838

  15. A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition.

    PubMed

    Saez, Yago; Baldominos, Alejandro; Isasi, Pedro

    2016-12-30

    Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called "deep learning", which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google's TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset. This finding enables the design of more energy-efficient devices and facilitates cold starts and big data processing of physical activity records.

  16. Global evaluation of ammonia bidirectional exchange and livestock diurnal variation schemes

    EPA Pesticide Factsheets

    There is no EPA generated dataset in this study.This dataset is associated with the following publication:Zhu, L., D. Henze, J. Bash , G. Jeong, K. Cady-Pereira, M. Shephard, M. Luo, F. Poulot, and S. Capps. Global evaluation of ammonia bidirectional exchange and livestock diurnal variation schemes. Atmospheric Chemistry and Physics. Copernicus Publications, Katlenburg-Lindau, GERMANY, 15: 12823-12843, (2015).

  17. SAMPL5: 3D-RISM partition coefficient calculations with partial molar volume corrections and solute conformational sampling.

    PubMed

    Luchko, Tyler; Blinov, Nikolay; Limon, Garrett C; Joyce, Kevin P; Kovalenko, Andriy

    2016-11-01

    Implicit solvent methods for classical molecular modeling are frequently used to provide fast, physics-based hydration free energies of macromolecules. Less commonly considered is the transferability of these methods to other solvents. The Statistical Assessment of Modeling of Proteins and Ligands 5 (SAMPL5) distribution coefficient dataset and the accompanying explicit solvent partition coefficient reference calculations provide a direct test of solvent model transferability. Here we use the 3D reference interaction site model (3D-RISM) statistical-mechanical solvation theory, with a well tested water model and a new united atom cyclohexane model, to calculate partition coefficients for the SAMPL5 dataset. The cyclohexane model performed well in training and testing ([Formula: see text] for amino acid neutral side chain analogues) but only if a parameterized solvation free energy correction was used. In contrast, the same protocol, using single solute conformations, performed poorly on the SAMPL5 dataset, obtaining [Formula: see text] compared to the reference partition coefficients, likely due to the much larger solute sizes. Including solute conformational sampling through molecular dynamics coupled with 3D-RISM (MD/3D-RISM) improved agreement with the reference calculation to [Formula: see text]. Since our initial calculations only considered partition coefficients and not distribution coefficients, solute sampling provided little benefit comparing against experiment, where ionized and tautomer states are more important. Applying a simple [Formula: see text] correction improved agreement with experiment from [Formula: see text] to [Formula: see text], despite a small number of outliers. Better agreement is possible by accounting for tautomers and improving the ionization correction.

  18. SAMPL5: 3D-RISM partition coefficient calculations with partial molar volume corrections and solute conformational sampling

    NASA Astrophysics Data System (ADS)

    Luchko, Tyler; Blinov, Nikolay; Limon, Garrett C.; Joyce, Kevin P.; Kovalenko, Andriy

    2016-11-01

    Implicit solvent methods for classical molecular modeling are frequently used to provide fast, physics-based hydration free energies of macromolecules. Less commonly considered is the transferability of these methods to other solvents. The Statistical Assessment of Modeling of Proteins and Ligands 5 (SAMPL5) distribution coefficient dataset and the accompanying explicit solvent partition coefficient reference calculations provide a direct test of solvent model transferability. Here we use the 3D reference interaction site model (3D-RISM) statistical-mechanical solvation theory, with a well tested water model and a new united atom cyclohexane model, to calculate partition coefficients for the SAMPL5 dataset. The cyclohexane model performed well in training and testing (R=0.98 for amino acid neutral side chain analogues) but only if a parameterized solvation free energy correction was used. In contrast, the same protocol, using single solute conformations, performed poorly on the SAMPL5 dataset, obtaining R=0.73 compared to the reference partition coefficients, likely due to the much larger solute sizes. Including solute conformational sampling through molecular dynamics coupled with 3D-RISM (MD/3D-RISM) improved agreement with the reference calculation to R=0.93. Since our initial calculations only considered partition coefficients and not distribution coefficients, solute sampling provided little benefit comparing against experiment, where ionized and tautomer states are more important. Applying a simple pK_{ {a}} correction improved agreement with experiment from R=0.54 to R=0.66, despite a small number of outliers. Better agreement is possible by accounting for tautomers and improving the ionization correction.

  19. Self-reported physical activity among blacks: estimates from national surveys.

    PubMed

    Whitt-Glover, Melicia C; Taylor, Wendell C; Heath, Gregory W; Macera, Caroline A

    2007-11-01

    National surveillance data provide population-level estimates of physical activity participation, but generally do not include detailed subgroup analyses, which could provide a better understanding of physical activity among subgroups. This paper presents a descriptive analysis of self-reported regular physical activity among black adults using data from the 2003 Behavioral Risk Factor Surveillance System (n=19,189), the 2004 National Health Interview Survey (n=4263), and the 1999-2004 National Health and Nutrition Examination Survey (n=3407). Analyses were conducted between January and March 2006. Datasets were analyzed separately to estimate the proportion of black adults meeting national physical activity recommendations overall and stratified by gender and other demographic subgroups. The proportion of black adults reporting regular PA ranged from 24% to 36%. Regular physical activity was highest among men; younger age groups; highest education and income groups; those who were employed and married; overweight, but not obese, men; and normal-weight women. This pattern was consistent across surveys. The observed physical activity patterns were consistent with national trends. The data suggest that older black adults and those with low education and income levels are at greatest risk for inactive lifestyles and may require additional attention in efforts to increase physical activity in black adults. The variability across datasets reinforces the need for objective measures in national surveys.

  20. Phonetic Variation and Interactional Contingencies in Simultaneous Responses

    ERIC Educational Resources Information Center

    Walker, Gareth

    2016-01-01

    An auspicious but unexplored environment for studying phonetic variation in naturalistic interaction is where two or more participants say the same thing at the same time. Working with a core dataset built from the multimodal Augmented Multi-party Interaction corpus, the principles of conversation analysis were followed to analyze the sequential…

  1. Full data acquisition in Kelvin Probe Force Microscopy: Mapping dynamic electric phenomena in real space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balke, Nina; Kalinin, Sergei V.; Jesse, Stephen

    Kelvin probe force microscopy (KPFM) has provided deep insights into the role local electronic, ionic and electrochemical processes play on the global functionality of materials and devices, even down to the atomic scale. Conventional KPFM utilizes heterodyne detection and bias feedback to measure the contact potential difference (CPD) between tip and sample. This measurement paradigm, however, permits only partial recovery of the information encoded in bias- and time-dependent electrostatic interactions between the tip and sample and effectively down-samples the cantilever response to a single measurement of CPD per pixel. This level of detail is insufficient for electroactive materials, devices, ormore » solid-liquid interfaces, where non-linear dielectrics are present or spurious electrostatic events are possible. Here, we simulate and experimentally validate a novel approach for spatially resolved KPFM capable of a full information transfer of the dynamic electric processes occurring between tip and sample. General acquisition mode, or G-Mode, adopts a big data approach utilising high speed detection, compression, and storage of the raw cantilever deflection signal in its entirety at high sampling rates (> 4 MHz), providing a permanent record of the tip trajectory. We develop a range of methodologies for analysing the resultant large multidimensional datasets involving classical, physics-based and information-based approaches. Physics-based analysis of G-Mode KPFM data recovers the parabolic bias dependence of the electrostatic force for each cycle of the excitation voltage, leading to a multidimensional dataset containing spatial and temporal dependence of the CPD and capacitance channels. We use multivariate statistical methods to reduce data volume and separate the complex multidimensional data sets into statistically significant components that can then be mapped onto separate physical mechanisms. Overall, G-Mode KPFM offers a new paradigm to study dynamic electric phenomena in electroactive interfaces as well as offer a promising approach to extend KPFM to solid-liquid interfaces.« less

  2. Evaluation of Application Space Expansion for the Sensor Fish

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DeRolph, Christopher R.; Bevelhimer, Mark S.

    The Pacific Northwest National Laboratory has developed an instrument known as the sensor fish that can be released into downstream passage routes at hydropower facilities to collect data on the physical conditions that a fish might be exposed to during passage through a turbine. The US Department of Energy Wind and Water Power Program sees value in expanding the sensor fish application space beyond large Kaplan turbines in the northwest United States to evaluate conditions to which a greater variety of fish species are exposed. Development of fish-friendly turbines requires an understanding of both physical passage conditions and biological responsesmore » to those conditions. Expanding the use of sensor fish into other application spaces will add to the knowledge base of physical passage conditions and could also enhance the use of sensor fish as a site-specific tool in mitigating potential impacts to fish populations from hydropower. The Oak Ridge National Laboratory (ORNL) National Hydropower Assessment Program (NHAAP) database contains hydropower facility characteristics that, along with national fish distribution data, were used to evaluate potential interactions between fish species and project characteristics related to downstream passage issues. ORNL developed rankings for the turbine types in the NHAAP database in terms of their potential to impact fish through injury or mortality during downstream turbine passage. National-scale fish distributions for 31 key migratory species were spatially intersected with hydropower plant locations to identify facilities where turbines with a high threat to fish injury or mortality overlap with the potential range of a sensitive fish species. A dataset was produced that identifies hydropower facilities where deployment of the sensor fish technology might be beneficial in addressing issues related to downstream fish passage. The dataset can be queried to target specific geographic regions, fish species, license expiration dates, generation capacity levels, ownership characteristics, turbine characteristics, or any combination of these metrics.« less

  3. Full data acquisition in Kelvin Probe Force Microscopy: Mapping dynamic electric phenomena in real space

    DOE PAGES

    Balke, Nina; Kalinin, Sergei V.; Jesse, Stephen; ...

    2016-08-12

    Kelvin probe force microscopy (KPFM) has provided deep insights into the role local electronic, ionic and electrochemical processes play on the global functionality of materials and devices, even down to the atomic scale. Conventional KPFM utilizes heterodyne detection and bias feedback to measure the contact potential difference (CPD) between tip and sample. This measurement paradigm, however, permits only partial recovery of the information encoded in bias- and time-dependent electrostatic interactions between the tip and sample and effectively down-samples the cantilever response to a single measurement of CPD per pixel. This level of detail is insufficient for electroactive materials, devices, ormore » solid-liquid interfaces, where non-linear dielectrics are present or spurious electrostatic events are possible. Here, we simulate and experimentally validate a novel approach for spatially resolved KPFM capable of a full information transfer of the dynamic electric processes occurring between tip and sample. General acquisition mode, or G-Mode, adopts a big data approach utilising high speed detection, compression, and storage of the raw cantilever deflection signal in its entirety at high sampling rates (> 4 MHz), providing a permanent record of the tip trajectory. We develop a range of methodologies for analysing the resultant large multidimensional datasets involving classical, physics-based and information-based approaches. Physics-based analysis of G-Mode KPFM data recovers the parabolic bias dependence of the electrostatic force for each cycle of the excitation voltage, leading to a multidimensional dataset containing spatial and temporal dependence of the CPD and capacitance channels. We use multivariate statistical methods to reduce data volume and separate the complex multidimensional data sets into statistically significant components that can then be mapped onto separate physical mechanisms. Overall, G-Mode KPFM offers a new paradigm to study dynamic electric phenomena in electroactive interfaces as well as offer a promising approach to extend KPFM to solid-liquid interfaces.« less

  4. Learning from physics-based earthquake simulators: a minimal approach

    NASA Astrophysics Data System (ADS)

    Artale Harris, Pietro; Marzocchi, Warner; Melini, Daniele

    2017-04-01

    Physics-based earthquake simulators are aimed to generate synthetic seismic catalogs of arbitrary length, accounting for fault interaction, elastic rebound, realistic fault networks, and some simple earthquake nucleation process like rate and state friction. Through comparison of synthetic and real catalogs seismologists can get insights on the earthquake occurrence process. Moreover earthquake simulators can be used to to infer some aspects of the statistical behavior of earthquakes within the simulated region, by analyzing timescales not accessible through observations. The develoment of earthquake simulators is commonly led by the approach "the more physics, the better", pushing seismologists to go towards simulators more earth-like. However, despite the immediate attractiveness, we argue that this kind of approach makes more and more difficult to understand which physical parameters are really relevant to describe the features of the seismic catalog at which we are interested. For this reason, here we take an opposite minimal approach and analyze the behavior of a purposely simple earthquake simulator applied to a set of California faults. The idea is that a simple model may be more informative than a complex one for some specific scientific objectives, because it is more understandable. The model has three main components: the first one is a realistic tectonic setting, i.e., a fault dataset of California; the other two components are quantitative laws for earthquake generation on each single fault, and the Coulomb Failure Function for modeling fault interaction. The final goal of this work is twofold. On one hand, we aim to identify the minimum set of physical ingredients that can satisfactorily reproduce the features of the real seismic catalog, such as short-term seismic cluster, and to investigate on the hypothetical long-term behavior, and faults synchronization. On the other hand, we want to investigate the limits of predictability of the model itself.

  5. An Integrative Analysis of Preeclampsia Based on the Construction of an Extended Composite Network Featuring Protein-Protein Physical Interactions and Transcriptional Relationships

    PubMed Central

    Vaiman, Daniel; Miralles, Francisco

    2016-01-01

    Preeclampsia (PE) is a pregnancy disorder defined by hypertension and proteinuria. This disease remains a major cause of maternal and fetal morbidity and mortality. Defective placentation is generally described as being at the root of the disease. The characterization of the transcriptome signature of the preeclamptic placenta has allowed to identify differentially expressed genes (DEGs). However, we still lack a detailed knowledge on how these DEGs impact the function of the placenta. The tools of network biology offer a methodology to explore complex diseases at a systems level. In this study we performed a cross-platform meta-analysis of seven publically available gene expression datasets comparing non-pathological and preeclamptic placentas. Using the rank product algorithm we identified a total of 369 DEGs consistently modified in PE. The DEGs were used as seeds to build both an extended physical protein-protein interactions network and a transcription factors regulatory network. Topological and clustering analysis was conducted to analyze the connectivity properties of the networks. Finally both networks were merged into a composite network which presents an integrated view of the regulatory pathways involved in preeclampsia and the crosstalk between them. This network is a useful tool to explore the relationship between the DEGs and enable hypothesis generation for functional experimentation. PMID:27802351

  6. Where the bugs are: analyzing distributions of bacterial phyla by descriptor keyword search in the nucleotide database.

    PubMed

    Squartini, Andrea

    2011-07-26

    The associations between bacteria and environment underlie their preferential interactions with given physical or chemical conditions. Microbial ecology aims at extracting conserved patterns of occurrence of bacterial taxa in relation to defined habitats and contexts. In the present report the NCBI nucleotide sequence database is used as dataset to extract information relative to the distribution of each of the 24 phyla of the bacteria superkingdom and of the Archaea. Over two and a half million records are filtered in their cross-association with each of 48 sets of keywords, defined to cover natural or artificial habitats, interactions with plant, animal or human hosts, and physical-chemical conditions. The results are processed showing: (a) how the different descriptors enrich or deplete the proportions at which the phyla occur in the total database; (b) in which order of abundance do the different keywords score for each phylum (preferred habitats or conditions), and to which extent are phyla clustered to few descriptors (specific) or spread across many (cosmopolitan); (c) which keywords individuate the communities ranking highest for diversity and evenness. A number of cues emerge from the results, contributing to sharpen the picture on the functional systematic diversity of prokaryotes. Suggestions are given for a future automated service dedicated to refining and updating such kind of analyses via public bioinformatic engines.

  7. Exposure Render: An Interactive Photo-Realistic Volume Rendering Framework

    PubMed Central

    Kroes, Thomas; Post, Frits H.; Botha, Charl P.

    2012-01-01

    The field of volume visualization has undergone rapid development during the past years, both due to advances in suitable computing hardware and due to the increasing availability of large volume datasets. Recent work has focused on increasing the visual realism in Direct Volume Rendering (DVR) by integrating a number of visually plausible but often effect-specific rendering techniques, for instance modeling of light occlusion and depth of field. Besides yielding more attractive renderings, especially the more realistic lighting has a positive effect on perceptual tasks. Although these new rendering techniques yield impressive results, they exhibit limitations in terms of their exibility and their performance. Monte Carlo ray tracing (MCRT), coupled with physically based light transport, is the de-facto standard for synthesizing highly realistic images in the graphics domain, although usually not from volumetric data. Due to the stochastic sampling of MCRT algorithms, numerous effects can be achieved in a relatively straight-forward fashion. For this reason, we have developed a practical framework that applies MCRT techniques also to direct volume rendering (DVR). With this work, we demonstrate that a host of realistic effects, including physically based lighting, can be simulated in a generic and flexible fashion, leading to interactive DVR with improved realism. In the hope that this improved approach to DVR will see more use in practice, we have made available our framework under a permissive open source license. PMID:22768292

  8. Interactive Visualization and Analysis of Geospatial Data Sets - TrikeND-iGlobe

    NASA Astrophysics Data System (ADS)

    Rosebrock, Uwe; Hogan, Patrick; Chandola, Varun

    2013-04-01

    The visualization of scientific datasets is becoming an ever-increasing challenge as advances in computing technologies have enabled scientists to build high resolution climate models that have produced petabytes of climate data. To interrogate and analyze these large datasets in real-time is a task that pushes the boundaries of computing hardware and software. But integration of climate datasets with geospatial data requires considerable amount of effort and close familiarity of various data formats and projection systems, which has prevented widespread utilization outside of climate community. TrikeND-iGlobe is a sophisticated software tool that bridges this gap, allows easy integration of climate datasets with geospatial datasets and provides sophisticated visualization and analysis capabilities. The objective for TrikeND-iGlobe is the continued building of an open source 4D virtual globe application using NASA World Wind technology that integrates analysis of climate model outputs with remote sensing observations as well as demographic and environmental data sets. This will facilitate a better understanding of global and regional phenomenon, and the impact analysis of climate extreme events. The critical aim is real-time interactive interrogation. At the data centric level the primary aim is to enable the user to interact with the data in real-time for the purpose of analysis - locally or remotely. TrikeND-iGlobe provides the basis for the incorporation of modular tools that provide extended interactions with the data, including sub-setting, aggregation, re-shaping, time series analysis methods and animation to produce publication-quality imagery. TrikeND-iGlobe may be run locally or can be accessed via a web interface supported by high-performance visualization compute nodes placed close to the data. It supports visualizing heterogeneous data formats: traditional geospatial datasets along with scientific data sets with geographic coordinates (NetCDF, HDF, etc.). It also supports multiple data access mechanisms, including HTTP, FTP, WMS, WCS, and Thredds Data Server (for NetCDF data and for scientific data, TrikeND-iGlobe supports various visualization capabilities, including animations, vector field visualization, etc. TrikeND-iGlobe is a collaborative open-source project, contributors include NASA (ARC-PX), ORNL (Oakridge National Laboratories), Unidata, Kansas University, CSIRO CMAR Australia and Geoscience Australia.

  9. Spatio-temporal Eigenvector Filtering: Application on Bioenergy Crop Impacts

    NASA Astrophysics Data System (ADS)

    Wang, M.; Kamarianakis, Y.; Georgescu, M.

    2017-12-01

    A suite of 10-year ensemble-based simulations was conducted to investigate the hydroclimatic impacts due to large-scale deployment of perennial bioenergy crops across the continental United States. Given the large size of the simulated dataset (about 60Tb), traditional hierarchical spatio-temporal statistical modelling cannot be implemented for the evaluation of physics parameterizations and biofuel impacts. In this work, we propose a filtering algorithm that takes into account the spatio-temporal autocorrelation structure of the data while avoiding spatial confounding. This method is used to quantify the robustness of simulated hydroclimatic impacts associated with bioenergy crops to alternative physics parameterizations and observational datasets. Results are evaluated against those obtained from three alternative Bayesian spatio-temporal specifications.

  10. shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics.

    PubMed

    Khomtchouk, Bohdan B; Hennessy, James R; Wahlestedt, Claes

    2017-01-01

    Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps. We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed. shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.

  11. The interfacial character of antibody paratopes: analysis of antibody-antigen structures.

    PubMed

    Nguyen, Minh N; Pradhan, Mohan R; Verma, Chandra; Zhong, Pingyu

    2017-10-01

    In this study, computational methods are applied to investigate the general properties of antigen engaging residues of a paratope from a non-redundant dataset of 403 antibody-antigen complexes to dissect the contribution of hydrogen bonds, hydrophobic, van der Waals contacts and ionic interactions, as well as role of water molecules in the antigen-antibody interface. Consistent with previous reports using smaller datasets, we found that Tyr, Trp, Ser, Asn, Asp, Thr, Arg, Gly, His contribute substantially to the interactions between antibody and antigen. Furthermore, antibody-antigen interactions can be mediated by interfacial waters. However, there is no reported comprehensive analysis for a large number of structured waters that engage in higher ordered structures at the antibody-antigen interface. From our dataset, we have found the presence of interfacial waters in 242 complexes. We present evidence that suggests a compelling role of these interfacial waters in interactions of antibodies with a range of antigens differing in shape complementarity. Finally, we carry out 296 835 pairwise 3D structure comparisons of 771 structures of contact residues of antibodies with their interfacial water molecules from our dataset using CLICK method. A heuristic clustering algorithm is used to obtain unique structural similarities, and found to separate into 368 different clusters. These clusters are used to identify structural motifs of contact residues of antibodies for epitope binding. This clustering database of contact residues is freely accessible at http://mspc.bii.a-star.edu.sg/minhn/pclick.html. minhn@bii.a-star.edu.sg, chandra@bii.a-star.edu.sg or zhong_pingyu@immunol.a-star.edu.sg. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  12. Dynamic Modularity of Host Protein Interaction Networks in Salmonella Typhi Infection

    PubMed Central

    Dhal, Paltu Kumar; Barman, Ranjan Kumar; Saha, Sudipto; Das, Santasabuj

    2014-01-01

    Background Salmonella Typhi is a human-restricted pathogen, which causes typhoid fever and remains a global health problem in the developing countries. Although previously reported host expression datasets had identified putative biomarkers and therapeutic targets of typhoid fever, the underlying molecular mechanism of pathogenesis remains incompletely understood. Methods We used five gene expression datasets of human peripheral blood from patients suffering from S. Typhi or other bacteremic infections or non-infectious disease like leukemia. The expression datasets were merged into human protein interaction network (PIN) and the expression correlation between the hubs and their interacting proteins was measured by calculating Pearson Correlation Coefficient (PCC) values. The differences in the average PCC for each hub between the disease states and their respective controls were calculated for studied datasets. The individual hubs and their interactors with expression, PCC and average PCC values were treated as dynamic subnetworks. The hubs that showed unique trends of alterations specific to S. Typhi infection were identified. Results We identified S. Typhi infection-specific dynamic subnetworks of the host, which involve 81 hubs and 1343 interactions. The major enriched GO biological process terms in the identified subnetworks were regulation of apoptosis and biological adhesions, while the enriched pathways include cytokine signalling in the immune system and downstream TCR signalling. The dynamic nature of the hubs CCR1, IRS2 and PRKCA with their interactors was studied in detail. The difference in the dynamics of the subnetworks specific to S. Typhi infection suggests a potential molecular model of typhoid fever. Conclusions Hubs and their interactors of the S. Typhi infection-specific dynamic subnetworks carrying distinct PCC values compared with the non-typhoid and other disease conditions reveal new insight into the pathogenesis of S. Typhi. PMID:25144185

  13. Antineutrino Oscillations and a Search for Non-standard Interactions with the MINOS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Isvan, Zeynep

    2012-01-01

    MINOS searches for neutrino oscillations using the disappearance of muon neutrinos from the NuMI beam at Fermilab between two detectors. The Near Detector, located near the source, measures the beam composition before flavor change occurs. The energy spectrum is measured again at the Far Detector after neutrinos travel a distance. The mixing angle and mass splitting between the second and third mass states are extracted from the energy dependent difference between the spectra at the two detectors. NuMI is able to produce an antineutrino-enhanced beam as well as a neutrino-enhanced beam. Collecting data in antineutrino-mode allows the direct measurement of antineutrino oscillation parameters. From the analysis of the antineutrino mode data we measuremore » $$|\\Delta\\bar{m}^{2}_{\\text{atm}}| = 2.62^{+0.31}_{-0.28}\\times10^{-3}\\text{eV}^{2}$$ and $$\\sin^{2}(2\\bar{\\theta})_{23} = 0.95^{+0.10}_{-0.11}$$, which is the most precise measurement of antineutrino oscillation parameters to date. A difference between neutrino and antineutrino oscillation parameters may indicate new physics involving interactions that are not part of the Standard Model, called non-standard interactions, that alter the apparent disappearance probability. Collecting data in neutrino and antineutrino mode independently allows a direct search for non-standard interactions. In this dissertation non-standard interactions are constrained by a combined analysis of neutrino and antineutrino datasets and no evidence of such interactions is found.« less

  14. EnviroAtlas - Percent Stream Buffer Zone As Natural Land Cover for the Conterminous United States

    EPA Pesticide Factsheets

    This EnviroAtlas dataset shows the percentage of land area within a 30 meter buffer zone along the National Hydrography Dataset (NHD) high resolution stream network, and along water bodies such as lakes and ponds that are connected via flow to the streams, that is classified as forest land cover, modified forest land cover, and natural land cover using the 2006 National Land Cover Dataset (NLCD) for each Watershed Boundary Dataset (WBD) 12-digit hydrological unit (HUC) in the conterminous United States. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  15. iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets

    PubMed Central

    2012-01-01

    Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB. PMID:23194258

  16. Animated analysis of geoscientific datasets: An interactive graphical application

    NASA Astrophysics Data System (ADS)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  17. Docking pose selection by interaction pattern graph similarity: application to the D3R grand challenge 2015

    NASA Astrophysics Data System (ADS)

    Slynko, Inna; Da Silva, Franck; Bret, Guillaume; Rognan, Didier

    2016-09-01

    High affinity ligands for a given target tend to share key molecular interactions with important anchoring amino acids and therefore often present quite conserved interaction patterns. This simple concept was formalized in a topological knowledge-based scoring function (GRIM) for selecting the most appropriate docking poses from previously X-rayed interaction patterns. GRIM first converts protein-ligand atomic coordinates (docking poses) into a simple 3D graph describing the corresponding interaction pattern. In a second step, proposed graphs are compared to that found from template structures in the Protein Data Bank. Last, all docking poses are rescored according to an empirical score (GRIMscore) accounting for overlap of maximum common subgraphs. Taking the opportunity of the public D3R Grand Challenge 2015, GRIM was used to rescore docking poses for 36 ligands (6 HSP90α inhibitors, 30 MAP4K4 inhibitors) prior to the release of the corresponding protein-ligand X-ray structures. When applied to the HSP90α dataset, for which many protein-ligand X-ray structures are already available, GRIM provided very high quality solutions (mean rmsd = 1.06 Å, n = 6) as top-ranked poses, and significantly outperformed a state-of-the-art scoring function. In the case of MAP4K4 inhibitors, for which preexisting 3D knowledge is scarce and chemical diversity is much larger, the accuracy of GRIM poses decays (mean rmsd = 3.18 Å, n = 30) although GRIM still outperforms an energy-based scoring function. GRIM rescoring appears to be quite robust with comparison to the other approaches competing for the same challenge (42 submissions for the HSP90 dataset, 27 for the MAP4K4 dataset) as it ranked 3rd and 2nd respectively, for the two investigated datasets. The rescoring method is quite simple to implement, independent on a docking engine, and applicable to any target for which at least one holo X-ray structure is available.

  18. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  19. The 3D Reference Earth Model: Status and Preliminary Results

    NASA Astrophysics Data System (ADS)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    In the 20th century, seismologists constructed models of how average physical properties (e.g. density, rigidity, compressibility, anisotropy) vary with depth in the Earth's interior. These one-dimensional (1D) reference Earth models (e.g. PREM) have proven indispensable in earthquake location, imaging of interior structure, understanding material properties under extreme conditions, and as a reference in other fields, such as particle physics and astronomy. Over the past three decades, new datasets motivated more sophisticated efforts that yielded models of how properties vary both laterally and with depth in the Earth's interior. Though these three-dimensional (3D) models exhibit compelling similarities at large scales, differences in the methodology, representation of structure, and dataset upon which they are based, have prevented the creation of 3D community reference models. As part of the REM-3D project, we are compiling and reconciling reference seismic datasets of body wave travel-time measurements, fundamental mode and overtone surface wave dispersion measurements, and normal mode frequencies and splitting functions. These reference datasets are being inverted for a long-wavelength, 3D reference Earth model that describes the robust long-wavelength features of mantle heterogeneity. As a community reference model with fully quantified uncertainties and tradeoffs and an associated publically available dataset, REM-3D will facilitate Earth imaging studies, earthquake characterization, inferences on temperature and composition in the deep interior, and be of improved utility to emerging scientific endeavors, such as neutrino geoscience. Here, we summarize progress made in the construction of the reference long period dataset and present a preliminary version of REM-3D in the upper-mantle. In order to determine the level of detail warranted for inclusion in REM-3D, we analyze the spectrum of discrepancies between models inverted with different subsets of the reference dataset. This procedure allows us to evaluate the extent of consistency in imaging heterogeneity at various depths and between spatial scales.

  20. The BioGRID interaction database: 2017 update

    PubMed Central

    Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike

    2017-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099

  1. The Human and Physical Determinants of Wildfires and Burnt Areas in Israel

    NASA Astrophysics Data System (ADS)

    Levin, Noam; Tessler, Naama; Smith, Andrew; McAlpine, Clive

    2016-09-01

    Wildfires are expected to increase in Mediterranean landscapes as a result of climate change and changes in land-use practices. In order to advance our understanding of human and physical factors shaping spatial patterns of wildfires in the region, we compared two independently generated datasets of wildfires for Israel that cover approximately the same study period. We generated a site-based dataset containing the location of 10,879 wildfires (1991-2011), and compared it to a dataset of burnt areas derived from MODIS imagery (2000-2011). We hypothesized that the physical and human factors explaining the spatial distribution of burnt areas derived from remote sensing (mostly large fires, >100 ha) will differ from those explaining site-based wildfires recorded by national agencies (mostly small fires, <10 ha). Small wildfires recorded by forestry agencies were concentrated within planted forests and near built-up areas, whereas the largest wildfires were located in more remote regions, often associated with military training areas and herbaceous vegetation. We conclude that to better understand wildfire dynamics, consolidation of wildfire databases should be achieved, combining field reports and remote sensing. As nearly all wildfires in Mediterranean landscapes are caused by human activities, improving the management of forest areas and raising public awareness to fire risk are key considerations in reducing fire danger.

  2. The Human and Physical Determinants of Wildfires and Burnt Areas in Israel.

    PubMed

    Levin, Noam; Tessler, Naama; Smith, Andrew; McAlpine, Clive

    2016-09-01

    Wildfires are expected to increase in Mediterranean landscapes as a result of climate change and changes in land-use practices. In order to advance our understanding of human and physical factors shaping spatial patterns of wildfires in the region, we compared two independently generated datasets of wildfires for Israel that cover approximately the same study period. We generated a site-based dataset containing the location of 10,879 wildfires (1991-2011), and compared it to a dataset of burnt areas derived from MODIS imagery (2000-2011). We hypothesized that the physical and human factors explaining the spatial distribution of burnt areas derived from remote sensing (mostly large fires, >100 ha) will differ from those explaining site-based wildfires recorded by national agencies (mostly small fires, <10 ha). Small wildfires recorded by forestry agencies were concentrated within planted forests and near built-up areas, whereas the largest wildfires were located in more remote regions, often associated with military training areas and herbaceous vegetation. We conclude that to better understand wildfire dynamics, consolidation of wildfire databases should be achieved, combining field reports and remote sensing. As nearly all wildfires in Mediterranean landscapes are caused by human activities, improving the management of forest areas and raising public awareness to fire risk are key considerations in reducing fire danger.

  3. Development of a large-sample catchment-scale hydro-meteorological, land cover and physical dataset for Chile

    NASA Astrophysics Data System (ADS)

    Alvarez-Garreton, C. D.; Mendoza, P. A.; Zambrano-Bigiarini, M.; Galleguillos, M. H.; Boisier, J. P.; Lara, A.; Cortés, G.; Garreaud, R.; McPhee, J. P.; Addor, N.; Puelma, C.

    2017-12-01

    We provide the first catchment-based hydrometeorological, vegetation and physical data set over 531 catchments in Chile (17.8 S - 55.0 S). We compiled publicly available streamflow records at daily time steps for the period 1980-2015, and generated basin-averaged time series of the following hydrometeorological variables: 1) daily precipitation coming from three different gridded sources (re-analysis and satellite-based); 2) daily maximum and minimum temperature; 3) 8-days potential evapotranspiration (PET) based on MODIS imagery and daily PET based on Hargreaves formula; and 4) daily snow water equivalent. Additionally, catchments are characterized by their main physical (area, mean elevation, mean slope) and land cover characteristics. We synthetized these datasets with several indices characterizing the spatial distribution of climatic, hydrological, topographic and vegetation attributes. The new catchment-based dataset is unprecedented in the region and provides information that can be used in a myriad of applications, including catchment classification and regionalization studies, impacts of different land cover types on catchment response, characterization of drought history and projections, climate change impacts on hydrological processes, etc. Derived practical applications include water management and allocation strategies, decision making and adaptation planning to climate change. This data set will be publicly available and we encourage the community to use it.

  4. Predicting drug-target interactions by dual-network integrated logistic matrix factorization

    NASA Astrophysics Data System (ADS)

    Hao, Ming; Bryant, Stephen H.; Wang, Yanli

    2017-01-01

    In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.

  5. MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets.

    PubMed

    Lemieux, Sebastien; Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J; Mader, Sylvie; Sauvageau, Guy

    2017-07-27

    Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Physical fitness and academic performance: empirical evidence from the National Administrative Senior High School Student Data in Taiwan.

    PubMed

    Liao, Pei-An; Chang, Hung-Hao; Wang, Jiun-Hao; Wu, Min-Chen

    2013-06-01

    This study examined the relationship between the changes of physical fitness across the 3-year spectrum of senior high school study and academic performance measured by standardized tests in Taiwan. A unique dataset of 149 240 university-bound senior high school students from 2009 to 2011 was constructed by merging two nationwide administrative datasets of physical fitness test performance and the university entrance exam scores. Hierarchical linear regression models were used. All regressions included controls for students' baseline physical fitness status, changes of physical fitness performance over time, age and family economic status. Some notable findings were revealed. An increase of 1 SD on students' overall physical fitness from the first to third school year is associated with an increase in the university entrance exam scores by 0.007 and 0.010 SD for male and female students, respectively. An increase of 1 SD on anaerobic power (flexibility) from the first to third school year is positively associated with an increase in the university entrance exam scores by 0.018 (0.010) SD among female students. We suggest that education and school health policymakers should consider and design policies to improve physical fitness as part of their overall strategy of improving academic performance.

  7. Teaching the Thrill of Discovery: Student Exploration of the Large-Scale Structures of the Universe

    NASA Astrophysics Data System (ADS)

    Juneau, Stephanie; Dey, Arjun; Walker, Constance E.; NOAO Data Lab

    2018-01-01

    In collaboration with the Teen Astronomy Cafes program, the NOAO Data Lab is developing online Jupyter Notebooks as a free and publicly accessible tool for students and teachers. Each interactive activity teaches students simultaneously about coding and astronomy with a focus on large datasets. Therefore, students learn state-of-the-art techniques at the cross-section between astronomy and data science. During the activity entitled “Our Vast Universe”, students use real spectroscopic data to measure the distance to galaxies before moving on to a catalog with distances to over 100,000 galaxies. Exploring this dataset gives students an appreciation of the large number of galaxies in the universe (2 trillion!), and leads them to discover how galaxies are located in large and impressive filamentary structures. During the Teen Astronomy Cafes program, the notebook is supplemented with visual material conducive to discussion, and hands-on activities involving cubes representing model universes. These steps contribute to build the students’ physical intuition and give them a better grasp of the concepts before using software and coding. At the end of the activity, students have made their own measurements, and have experienced scientific research directly. More information is available online for the Teen Astronomy Cafes (teensciencecafe.org/cafes) and the NOAO Data Lab (datalab.noao.edu).

  8. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

    PubMed

    Doungpan, Narumol; Engchuan, Worrawat; Chan, Jonathan H; Meechai, Asawin

    2016-12-05

    Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results. The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms. The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method. The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.

  9. Toward a complete dataset of drug-drug interaction information from publicly available sources.

    PubMed

    Ayvaz, Serkan; Horn, John; Hassanzadeh, Oktie; Zhu, Qian; Stan, Johann; Tatonetti, Nicholas P; Vilar, Santiago; Brochhausen, Mathias; Samwald, Matthias; Rastegar-Mojarad, Majid; Dumontier, Michel; Boyce, Richard D

    2015-06-01

    Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  10. LiDAR Vegetation Investigation and Signature Analysis System (LVISA)

    NASA Astrophysics Data System (ADS)

    Höfle, Bernhard; Koenig, Kristina; Griesbaum, Luisa; Kiefer, Andreas; Hämmerle, Martin; Eitel, Jan; Koma, Zsófia

    2015-04-01

    Our physical environment undergoes constant changes in space and time with strongly varying triggers, frequencies, and magnitudes. Monitoring these environmental changes is crucial to improve our scientific understanding of complex human-environmental interactions and helps us to respond to environmental change by adaptation or mitigation. The three-dimensional (3D) description of the Earth surface features and the detailed monitoring of surface processes using 3D spatial data have gained increasing attention within the last decades, such as in climate change research (e.g., glacier retreat), carbon sequestration (e.g., forest biomass monitoring), precision agriculture and natural hazard management. In all those areas, 3D data have helped to improve our process understanding by allowing quantifying the structural properties of earth surface features and their changes over time. This advancement has been fostered by technological developments and increased availability of 3D sensing systems. In particular, LiDAR (light detection and ranging) technology, also referred to as laser scanning, has made significant progress and has evolved into an operational tool in environmental research and geosciences. The main result of LiDAR measurements is a highly spatially resolved 3D point cloud. Each point within the LiDAR point cloud has a XYZ coordinate associated with it and often additional information such as the strength of the returned backscatter. The point cloud provided by LiDAR contains rich geospatial, structural, and potentially biochemical information about the surveyed objects. To deal with the inherently unorganized datasets and the large data volume (frequently millions of XYZ coordinates) of LiDAR datasets, a multitude of algorithms for automatic 3D object detection (e.g., of single trees) and physical surface description (e.g., biomass) have been developed. However, so far the exchange of datasets and approaches (i.e., extraction algorithms) among LiDAR users lacks behind. We propose a novel concept, the LiDAR Vegetation Investigation and Signature Analysis System (LVISA), which shall enhance sharing of i) reference datasets of single vegetation objects with rich reference data (e.g., plant species, basic plant morphometric information) and ii) approaches for information extraction (e.g., single tree detection, tree species classification based on waveform LiDAR features). We will build an extensive LiDAR data repository for supporting the development and benchmarking of LiDAR-based object information extraction. The LiDAR Vegetation Investigation and Signature Analysis System (LVISA) uses international web service standards (Open Geospatial Consortium, OGC) for geospatial data access and also analysis (e.g., OGC Web Processing Services). This will allow the research community identifying plant object specific vegetation features from LiDAR data, while accounting for differences in LiDAR systems (e.g., beam divergence), settings (e.g., point spacing), and calibration techniques. It is the goal of LVISA to develop generic 3D information extraction approaches, which can be seamlessly transferred to other datasets, timestamps and also extraction tasks. The current prototype of LVISA can be visited and tested online via http://uni-heidelberg.de/lvisa. Video tutorials provide a quick overview and entry into the functionality of LVISA. We will present the current advances of LVISA and we will highlight future research and extension of LVISA, such as integrating low-cost LiDAR data and datasets acquired by highly temporal scanning of vegetation (e.g., continuous measurements). Everybody is invited to join the LVISA development and share datasets and analysis approaches in an interoperable way via the web-based LVISA geoportal.

  11. Joining the yellow hub: Uses of the Simple Application Messaging Protocol in Space Physics analysis tools

    NASA Astrophysics Data System (ADS)

    Génot, V.; André, N.; Cecconi, B.; Bouchemit, M.; Budnik, E.; Bourrel, N.; Gangloff, M.; Dufourg, N.; Hess, S.; Modolo, R.; Renard, B.; Lormant, N.; Beigbeder, L.; Popescu, D.; Toniutti, J.-P.

    2014-11-01

    The interest for data communication between analysis tools in planetary sciences and space physics is illustrated in this paper via several examples of the uses of SAMP. The Simple Application Messaging Protocol is developed in the frame of the IVOA from an earlier protocol called PLASTIC. SAMP enables easy communication and interoperability between astronomy software, stand-alone and web-based; it is now increasingly adopted by the planetary sciences and space physics community. Its attractiveness is based, on one hand, on the use of common file formats for exchange and, on the other hand, on established messaging models. Examples of uses at the CDPP and elsewhere are presented. The CDPP (Centre de Données de la Physique des Plasmas, http://cdpp.eu/), the French data center for plasma physics, is engaged for more than a decade in the archiving and dissemination of data products from space missions and ground observatories. Besides these activities, the CDPP developed services like AMDA (Automated Multi Dataset Analysis, http://amda.cdpp.eu/) which enables in depth analysis of large amount of data through dedicated functionalities such as: visualization, conditional search and cataloging. Besides AMDA, the 3DView (http://3dview.cdpp.eu/) tool provides immersive visualizations and is further developed to include simulation and observational data. These tools and their interactions with each other, notably via SAMP, are presented via science cases of interest to planetary sciences and space physics communities.

  12. Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

    PubMed Central

    Jeong, Won-Ki; Beyer, Johanna; Hadwiger, Markus; Vazquez, Amelio; Pfister, Hanspeter; Whitaker, Ross T.

    2011-01-01

    Recent advances in scanning technology provide high resolution EM (Electron Microscopy) datasets that allow neuroscientists to reconstruct complex neural connections in a nervous system. However, due to the enormous size and complexity of the resulting data, segmentation and visualization of neural processes in EM data is usually a difficult and very time-consuming task. In this paper, we present NeuroTrace, a novel EM volume segmentation and visualization system that consists of two parts: a semi-automatic multiphase level set segmentation with 3D tracking for reconstruction of neural processes, and a specialized volume rendering approach for visualization of EM volumes. It employs view-dependent on-demand filtering and evaluation of a local histogram edge metric, as well as on-the-fly interpolation and ray-casting of implicit surfaces for segmented neural structures. Both methods are implemented on the GPU for interactive performance. NeuroTrace is designed to be scalable to large datasets and data-parallel hardware architectures. A comparison of NeuroTrace with a commonly used manual EM segmentation tool shows that our interactive workflow is faster and easier to use for the reconstruction of complex neural processes. PMID:19834227

  13. EnviroAtlas - Memphis, TN - Tree Cover Configuration and Connectivity, Water Background

    EPA Pesticide Factsheets

    This EnviroAtlas dataset categorizes forest land cover into structural elements (e.g. core, edge, connector, etc.). Forest is defined as Trees & Forest and Woody Wetlands. Water was considered background (value 129) during the analysis to create this dataset, however it has been converted into value 10 to distinguish it from land area background. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. Simulation of Smart Home Activity Datasets

    PubMed Central

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-01-01

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation. PMID:26087371

  15. EnviroAtlas - Austin, TX - Land Cover by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, and agriculture. Forest is defined as Trees & Forest. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. This dataset also includes the area per capita for each block group for some land cover types. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  16. EnviroAtlas - Austin, TX - Impervious Proximity Gradient

    EPA Pesticide Factsheets

    In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of impervious surface within 1 square kilometer centered over the given point. Water is shown as '-99999' in this dataset to distinguish it from land areas with low impervious. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  17. Simulation of Smart Home Activity Datasets.

    PubMed

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  18. Assessment of Irrigation Physics in a Land Surface Modeling Framework Using Non-Traditional and Human-Practice Datasets

    NASA Technical Reports Server (NTRS)

    Lawston, Patricia M.; Santanello, Joseph A.; Rodell, Matthew; Franz, Trenton E.

    2017-01-01

    Irrigation increases soil moisture, which in turn controls water and energy fluxes from the land surface to the10 planetary boundary layer and determines plant stress and productivity. Therefore, developing a realistic representation of irrigation is critical to understanding land-atmosphere interactions in agricultural areas. Irrigation parameterizations are becoming more common in land surface models and are growing in sophistication, but there is difficulty in assessing the realism of these schemes, due to limited observations (e.g., soil moisture, evapotranspiration) and scant reporting of irrigation timing and quantity. This study uses the Noah land surface model run at high resolution within NASAs Land15 Information System to assess the physics of a sprinkler irrigation simulation scheme and model sensitivity to choice of irrigation intensity and greenness fraction datasets over a small, high resolution domain in Nebraska. Differences between experiments are small at the interannual scale but become more apparent at seasonal and daily time scales. In addition, this study uses point and gridded soil moisture observations from fixed and roving Cosmic Ray Neutron Probes and co-located human practice data to evaluate the realism of irrigation amounts and soil moisture impacts simulated by the model. Results20 show that field-scale heterogeneity resulting from the individual actions of farmers is not captured by the model and the amount of irrigation applied by the model exceeds that applied at the two irrigated fields. However, the seasonal timing of irrigation and soil moisture contrasts between irrigated and non-irrigated areas are simulated well by the model. Overall, the results underscore the necessity of both high-quality meteorological forcing data and proper representation of irrigation foraccurate simulation of water and energy states and fluxes over cropland.

  19. Inferring Boolean network states from partial information

    PubMed Central

    2013-01-01

    Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data. PMID:24006954

  20. Partitioned fluid-solid coupling for cardiovascular blood flow: left-ventricular fluid mechanics.

    PubMed

    Krittian, Sebastian; Janoske, Uwe; Oertel, Herbert; Böhlke, Thomas

    2010-04-01

    We present a 3D code-coupling approach which has been specialized towards cardiovascular blood flow. For the first time, the prescribed geometry movement of the cardiovascular flow model KaHMo (Karlsruhe Heart Model) has been replaced by a myocardial composite model. Deformation is driven by fluid forces and myocardial response, i.e., both its contractile and constitutive behavior. Whereas the arbitrary Lagrangian-Eulerian formulation (ALE) of the Navier-Stokes equations is discretized by finite volumes (FVM), the solid mechanical finite elasticity equations are discretized by a finite element (FEM) approach. Taking advantage of specialized numerical solution strategies for non-matching fluid and solid domain meshes, an iterative data-exchange guarantees the interface equilibrium of the underlying governing equations. The focus of this work is on left-ventricular fluid-structure interaction based on patient-specific magnetic resonance imaging datasets. Multi-physical phenomena are described by temporal visualization and characteristic FSI numbers. The results gained show flow patterns that are in good agreement with previous observations. A deeper understanding of cavity deformation, blood flow, and their vital interaction can help to improve surgical treatment and clinical therapy planning.

  1. The DMLite Rucio Plugin: ATLAS data in a filesystem

    NASA Astrophysics Data System (ADS)

    Lassnig, M.; van Dongen, D.; Brito Da Rocha, R.; Alvarez Ayllon, A.; Calfayan, P.

    2014-06-01

    Rucio is the next-generation data management system of the ATLAS experiment. Historically, clients interacted with the data management system via specialised tools, but in Rucio additional methods are provided. To support filesystem-like interaction with all ATLAS data, a plugin to the DMLite software stack has been developed. It is possible to mount Rucio as a filesystem, and execute regular filesystem operations in a POSIX fashion. This is exposed via various protocols, for example, WebDAV or NFS, which then removes any dependency on Rucio for client software. The main challenge for this work is the mapping of the set-like ATLAS namespace into a hierarchical filesystem, whilst preserving the high performance features of the former. This includes listing and searching for data, creation of files, datasets and containers, and the aggregation of existing data - all within directories with potentially millions of entries. This contribution details the design and implementation of the plugin. Furthermore, an evaluation of the performance characteristics is given, to show that this approach can scale to the requirements of ATLAS physics analysis.

  2. A collection of non-human primate computed tomography scans housed in MorphoSource, a repository for 3D data

    PubMed Central

    Copes, Lynn E.; Lucas, Lynn M.; Thostenson, James O.; Hoekstra, Hopi E.; Boyer, Doug M.

    2016-01-01

    A dataset of high-resolution microCT scans of primate skulls (crania and mandibles) and certain postcranial elements was collected to address questions about primate skull morphology. The sample consists of 489 scans taken from 431 specimens, representing 59 species of most Primate families. These data have transformative reuse potential as such datasets are necessary for conducting high power research into primate evolution, but require significant time and funding to collect. Similar datasets were previously only available to select research groups across the world. The physical specimens are vouchered at Harvard’s Museum of Comparative Zoology. The data collection took place at the Center for Nanoscale Systems at Harvard. The dataset is archived on MorphoSource.org. Though this is the largest high fidelity comparative dataset yet available, its provisioning on a web archive that allows unlimited researcher contributions promises a future with vastly increased digital collections available at researchers’ finger tips. PMID:26836025

  3. The MISTRALS programme data portal

    NASA Astrophysics Data System (ADS)

    Fleury, Laurence; Brissebrat, Guillaume; Belmahfoud, Nizar; Boichard, Jean-Luc; Brosolo, Laetitia; Cloché, Sophie; Descloitres, Jacques; Ferré, Hélène; Focsa, Loredana; Henriot, Nicolas; Labatut, Laurent; Mière, Arnaud; Petit de la Villéon, Loïc; Ramage, Karim; Schmechtig, Catherine; Vermeulen, Anne; André, François

    2015-04-01

    Mediterranean Integrated STudies at Regional And Local Scales (MISTRALS) is a decennial programme for systematic observations and research dedicated to the understanding of the Mediterranean Basin environmental process and its evolution under the planet global change. It is composed of eight multidisciplinary projects that cover all the components of the Earth system (atmosphere, ocean, continental surfaces, lithosphere...) and their interactions, all the disciplines (physics, chemistry, marine biogeochemistry, biology, geology, sociology...) and different time scales. For example Hydrological cycle in the Mediterranean eXperiment (HyMeX) aims at improving the predictability of rainfall extreme events, and assessing the social and economic vulnerability to extreme events and adaptation capacity. Paleo Mediterranean Experiment (PaleoMeX) is dedicated to the study of the interactions between climate, societies and civilizations of the Mediterranean world during the last 10000 years. Many long term monitoring research networks are associated with MISTRALS, such as Mediterranean Ocean Observing System on Environment (MOOSE), Centre d'Observation Régional pour la Surveillance du Climat et de l'environnement Atmosphérique et océanographique en Méditerranée occidentale (CORSICA) and the environmental observations from Mediterranean Eurocentre for Underwater Sciences and Technologies (MEUST-SE). Therefore, the data generated or used by the different MISTRALS projects are very heterogeneous. They include in situ observations, satellite products, model outputs, social sciences surveys... Some datasets are automatically produced by operational networks, and others come from research instruments and analysis procedures. They correspond to different time scales (historical time series, observatories, campaigns...) and are managed by several data centres. They originate from many scientific communities, with different data sharing practices, specific expectations and using different file formats and data processing tools. The MISTRALS data portal - http://mistrals.sedoo.fr/ - has been designed and developed as a unified tool for sharing scientific data in spite of many sources of heterogeneity, and for fostering collaboration between research communities. The metadata (data description) are standardized and comply with international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). A search tool allows to browse the catalog by keyword or multicriteria selection (area, period, physical property...) and to access data. Data sets managed by different data centres (ICARE, IPSL, SEDOO, CORIOLIS) are available through interoperability protocols (OPeNDAP, xml requests...) or archive synchronisation. Every in situ data set is available in the native format, but the most commonly used data sets have been homogenized (property names, units, quality flags...) and inserted in a relational database, in order to enable accurate data selection, and download of different data sets in a shared format. At present the MISTRALS data portal enables to access about 550 datasets. It counts more than 600 registered users and about 100 data requests every month. The number of available datasets is increasing daily, due to the provision of campaign datasets (2012, 2013, 2014) by several projects. Every scientist is invited to browse the catalog, complete the online registration form and use MISTRALS data. Feel free to contact mistrals-contact@sedoo.fr for any question.

  4. The MISTRALS programme data portal

    NASA Astrophysics Data System (ADS)

    Brissebrat, Guillaume; Albert-Aguilar, Alexandre; Belmahfoud, Nizar; Cloché, Sophie; Darras, Sabine; Descloitres, Jacques; Ferré, Hélène; Fleury, Laurence; Focsa, Loredana; Henriot, Nicolas; Labatut, Laurent; Petit de la Villéon, Loïc; Ramage, Karim; Schmechtig, Catherine; Vermeulen, Anne

    2016-04-01

    Mediterranean Integrated STudies at Regional And Local Scales (MISTRALS) is a decennial programme for systematic observations and research dedicated to the understanding of the Mediterranean Basin environmental process and its evolution under the planet global change. It is composed of eight multidisciplinary projects that cover all the components of the Earth system (atmosphere, ocean, continental surfaces, lithosphere...) and their interactions, all the disciplines (physics, chemistry, marine biogeochemistry, biology, geology, sociology...) and different time scales. For example Hydrological cycle in the Mediterranean eXperiment (HyMeX) aims at improving the predictability of rainfall extreme events, and assessing the social and economic vulnerability to extreme events and adaptation capacity. Paleo Mediterranean Experiment (PaleoMeX) is dedicated to the study of the interactions between climate, societies and civilizations of the Mediterranean world during the last 10000 years. Many long term monitoring research networks are associated with MISTRALS, such as Mediterranean Ocean Observing System on Environment (MOOSE), Centre d'Observation Régional pour la Surveillance du Climat et de l'environnement Atmosphérique et océanographique en Méditerranée occidentale (CORSICA) and the environmental observations from Mediterranean Eurocentre for Underwater Sciences and Technologies (MEUST-SE). Therefore, the data generated or used by the different MISTRALS projects are very heterogeneous. They include in situ observations, satellite products, model outputs, social sciences surveys... Some datasets are automatically produced by operational networks, and others come from research instruments and analysis procedures. They correspond to different time scales (historical time series, observatories, campaigns...) and are managed by several data centres. They originate from many scientific communities, with different data sharing practices, specific expectations and using different file formats and data processing tools. The MISTRALS data portal - http://mistrals.sedoo.fr/ - has been designed and developed as a unified tool for sharing scientific data in spite of many sources of heterogeneity, and for fostering collaboration between research communities. The metadata (data description) are standardized and comply with international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). A search tool allows to browse the catalog by keyword or multicriteria selection (area, period, physical property...) and to access data. Data sets managed by different data centres (ICARE, IPSL, SEDOO, CORIOLIS) are available through interoperability protocols (OPeNDAP, xml requests...) or archive synchronisation. Every in situ data set is available in the native format, but the most commonly used data sets have been homogenized (property names, units, quality flags...) and inserted in a relational database, in order to enable accurate data selection, and download of different data sets in a shared format. At present the MISTRALS data portal enables to access about 600 datasets. It counts more than 675 registered users and about 100 data requests every month. The number of available datasets is increasing daily, due to the provision of campaign datasets by several projects. Every scientist is invited to browse the catalog, complete the online registration form and use MISTRALS data. Feel free to contact mistrals-contact@sedoo.fr for any question.

  5. A Novel Time-Varying Spectral Filtering Algorithm for Reconstruction of Motion Artifact Corrupted Heart Rate Signals During Intense Physical Activities Using a Wearable Photoplethysmogram Sensor

    PubMed Central

    Salehizadeh, Seyed M. A.; Dao, Duy; Bolkhovsky, Jeffrey; Cho, Chae; Mendelson, Yitzhak; Chon, Ki H.

    2015-01-01

    Accurate estimation of heart rates from photoplethysmogram (PPG) signals during intense physical activity is a very challenging problem. This is because strenuous and high intensity exercise can result in severe motion artifacts in PPG signals, making accurate heart rate (HR) estimation difficult. In this study we investigated a novel technique to accurately reconstruct motion-corrupted PPG signals and HR based on time-varying spectral analysis. The algorithm is called Spectral filter algorithm for Motion Artifacts and heart rate reconstruction (SpaMA). The idea is to calculate the power spectral density of both PPG and accelerometer signals for each time shift of a windowed data segment. By comparing time-varying spectra of PPG and accelerometer data, those frequency peaks resulting from motion artifacts can be distinguished from the PPG spectrum. The SpaMA approach was applied to three different datasets and four types of activities: (1) training datasets from the 2015 IEEE Signal Process. Cup Database recorded from 12 subjects while performing treadmill exercise from 1 km/h to 15 km/h; (2) test datasets from the 2015 IEEE Signal Process. Cup Database recorded from 11 subjects while performing forearm and upper arm exercise. (3) Chon Lab dataset including 10 min recordings from 10 subjects during treadmill exercise. The ECG signals from all three datasets provided the reference HRs which were used to determine the accuracy of our SpaMA algorithm. The performance of the SpaMA approach was calculated by computing the mean absolute error between the estimated HR from the PPG and the reference HR from the ECG. The average estimation errors using our method on the first, second and third datasets are 0.89, 1.93 and 1.38 beats/min respectively, while the overall error on all 33 subjects is 1.86 beats/min and the performance on only treadmill experiment datasets (22 subjects) is 1.11 beats/min. Moreover, it was found that dynamics of heart rate variability can be accurately captured using the algorithm where the mean Pearson’s correlation coefficient between the power spectral densities of the reference and the reconstructed heart rate time series was found to be 0.98. These results show that the SpaMA method has a potential for PPG-based HR monitoring in wearable devices for fitness tracking and health monitoring during intense physical activities. PMID:26703618

  6. A Novel Time-Varying Spectral Filtering Algorithm for Reconstruction of Motion Artifact Corrupted Heart Rate Signals During Intense Physical Activities Using a Wearable Photoplethysmogram Sensor.

    PubMed

    Salehizadeh, Seyed M A; Dao, Duy; Bolkhovsky, Jeffrey; Cho, Chae; Mendelson, Yitzhak; Chon, Ki H

    2015-12-23

    Accurate estimation of heart rates from photoplethysmogram (PPG) signals during intense physical activity is a very challenging problem. This is because strenuous and high intensity exercise can result in severe motion artifacts in PPG signals, making accurate heart rate (HR) estimation difficult. In this study we investigated a novel technique to accurately reconstruct motion-corrupted PPG signals and HR based on time-varying spectral analysis. The algorithm is called Spectral filter algorithm for Motion Artifacts and heart rate reconstruction (SpaMA). The idea is to calculate the power spectral density of both PPG and accelerometer signals for each time shift of a windowed data segment. By comparing time-varying spectra of PPG and accelerometer data, those frequency peaks resulting from motion artifacts can be distinguished from the PPG spectrum. The SpaMA approach was applied to three different datasets and four types of activities: (1) training datasets from the 2015 IEEE Signal Process. Cup Database recorded from 12 subjects while performing treadmill exercise from 1 km/h to 15 km/h; (2) test datasets from the 2015 IEEE Signal Process. Cup Database recorded from 11 subjects while performing forearm and upper arm exercise. (3) Chon Lab dataset including 10 min recordings from 10 subjects during treadmill exercise. The ECG signals from all three datasets provided the reference HRs which were used to determine the accuracy of our SpaMA algorithm. The performance of the SpaMA approach was calculated by computing the mean absolute error between the estimated HR from the PPG and the reference HR from the ECG. The average estimation errors using our method on the first, second and third datasets are 0.89, 1.93 and 1.38 beats/min respectively, while the overall error on all 33 subjects is 1.86 beats/min and the performance on only treadmill experiment datasets (22 subjects) is 1.11 beats/min. Moreover, it was found that dynamics of heart rate variability can be accurately captured using the algorithm where the mean Pearson's correlation coefficient between the power spectral densities of the reference and the reconstructed heart rate time series was found to be 0.98. These results show that the SpaMA method has a potential for PPG-based HR monitoring in wearable devices for fitness tracking and health monitoring during intense physical activities.

  7. Soil Moisture fusion across scales using a multiscale nonstationary Spatial Hierarchical Model

    NASA Astrophysics Data System (ADS)

    Kathuria, D.; Mohanty, B.; Katzfuss, M.

    2017-12-01

    Soil moisture (SM) datasets from remote sensing (RS) platforms (such as SMOS and SMAP) and reanalysis products from land surface models are typically available on a coarse spatial granularity of several square km. Ground based sensors, on the other hand, provide observations on a finer spatial scale (meter scale or less) but are sparsely available. SM is affected by high variability due to complex interactions between geologic, topographic, vegetation and atmospheric variables and these interactions change dynamically with footprint scales. Past literature has largely focused on the scale specific effect of these covariates on soil moisture. The present study proposes a robust Multiscale-Nonstationary Spatial Hierarchical Model (MN-SHM) which can assimilate SM from point to RS footprints. The spatial structure of SM across footprints is modeled by a class of scalable covariance functions whose nonstationary depends on atmospheric forcings (such as precipitation) and surface physical controls (such as topography, soil-texture and vegetation). The proposed model is applied to fuse point and airborne ( 1.5 km) SM data obtained during the SMAPVEX12 campaign in the Red River watershed in Southern Manitoba, Canada with SMOS ( 30km) data. It is observed that precipitation, soil-texture and vegetation are the dominant factors which affect the SM distribution across various footprint scales (750 m, 1.5 km, 3 km, 9 km,15 km and 30 km). We conclude that MN-SHM handles the change of support problems easily while retaining reasonable predictive accuracy across multiple spatial resolutions in the presence of surface heterogeneity. The MN-SHM can be considered as a complex non-stationary extension of traditional geostatistical prediction methods (such as Kriging) for fusing multi-platform multi-scale datasets.

  8. Climate Model Diagnostic Analyzer Web Service System

    NASA Astrophysics Data System (ADS)

    Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Jiang, J. H.

    2013-12-01

    The latest Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report stressed the need for the comprehensive and innovative evaluation of climate models with newly available global observations. The traditional approach to climate model evaluation, which compares a single parameter at a time, identifies symptomatic model biases and errors but fails to diagnose the model problems. The model diagnosis process requires physics-based multi-variable comparisons that typically involve large-volume and heterogeneous datasets, making them both computationally- and data-intensive. To address these challenges, we are developing a parallel, distributed web-service system that enables the physics-based multi-variable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. We have developed a methodology to transform an existing science application code into a web service using a Python wrapper interface and Python web service frameworks (i.e., Flask, Gunicorn, and Tornado). The web-service system, called Climate Model Diagnostic Analyzer (CMDA), currently supports (1) all the datasets from Obs4MIPs and a few ocean datasets from NOAA and Argo, which can serve as observation-based reference data for model evaluation and (2) many of CMIP5 model outputs covering a broad range of atmosphere, ocean, and land variables from the CMIP5 specific historical runs and AMIP runs. Analysis capabilities currently supported by CMDA are (1) the calculation of annual and seasonal means of physical variables, (2) the calculation of time evolution of the means in any specified geographical region, (3) the calculation of correlation between two variables, and (4) the calculation of difference between two variables. A web user interface is chosen for CMDA because it not only lowers the learning curve and removes the adoption barrier of the tool but also enables instantaneous use, avoiding the hassle of local software installation and environment incompatibility. CMDA is planned to be used as an educational tool for the summer school organized by JPL's Center for Climate Science in 2014. The requirements of the educational tool are defined with the interaction with the school organizers, and CMDA is customized to meet the requirements accordingly. The tool needs to be production quality for 30+ simultaneous users. The summer school will thus serve as a valuable testbed for the tool development, preparing CMDA to serve the Earth-science modeling and model-analysis community at the end of the project. This work was funded by the NASA Earth Science Program called Computational Modeling Algorithms and Cyberinfrastructure (CMAC).

  9. ­Understanding Information Flow Interaction along Separable Causal Paths in Environmental Signals

    NASA Astrophysics Data System (ADS)

    Jiang, P.; Kumar, P.

    2017-12-01

    Multivariate environmental signals reflect the outcome of complex inter-dependencies, such as those in ecohydrologic systems. Transfer entropy and information partitioning approaches have been used to characterize such dependencies. However, these approaches capture net information flow occurring through a multitude of pathways involved in the interaction and as a result mask our ability to discern the causal interaction within an interested subsystem through specific pathways. We build on recent developments of momentary information transfer along causal paths proposed by Runge [2015] to develop a framework for quantifying information decomposition along separable causal paths. Momentary information transfer along causal paths captures the amount of information flow between any two variables lagged at two specific points in time. Our approach expands this concept to characterize the causal interaction in terms of synergistic, unique and redundant information flow through separable causal paths. Multivariate analysis using this novel approach reveals precise understanding of causality and feedback. We illustrate our approach with synthetic and observed time series data. We believe the proposed framework helps better delineate the internal structure of complex systems in geoscience where huge amounts of observational datasets exist, and it will also help the modeling community by providing a new way to look at the complexity of real and modeled systems. Runge, Jakob. "Quantifying information transfer and mediation along causal pathways in complex systems." Physical Review E 92.6 (2015): 062829.

  10. Scaling identity connects human mobility and social interactions.

    PubMed

    Deville, Pierre; Song, Chaoming; Eagle, Nathan; Blondel, Vincent D; Barabási, Albert-László; Wang, Dashun

    2016-06-28

    Massive datasets that capture human movements and social interactions have catalyzed rapid advances in our quantitative understanding of human behavior during the past years. One important aspect affecting both areas is the critical role space plays. Indeed, growing evidence suggests both our movements and communication patterns are associated with spatial costs that follow reproducible scaling laws, each characterized by its specific critical exponents. Although human mobility and social networks develop concomitantly as two prolific yet largely separated fields, we lack any known relationships between the critical exponents explored by them, despite the fact that they often study the same datasets. Here, by exploiting three different mobile phone datasets that capture simultaneously these two aspects, we discovered a new scaling relationship, mediated by a universal flux distribution, which links the critical exponents characterizing the spatial dependencies in human mobility and social networks. Therefore, the widely studied scaling laws uncovered in these two areas are not independent but connected through a deeper underlying reality.

  11. Scaling identity connects human mobility and social interactions

    PubMed Central

    Deville, Pierre; Song, Chaoming; Eagle, Nathan; Blondel, Vincent D.; Barabási, Albert-László; Wang, Dashun

    2016-01-01

    Massive datasets that capture human movements and social interactions have catalyzed rapid advances in our quantitative understanding of human behavior during the past years. One important aspect affecting both areas is the critical role space plays. Indeed, growing evidence suggests both our movements and communication patterns are associated with spatial costs that follow reproducible scaling laws, each characterized by its specific critical exponents. Although human mobility and social networks develop concomitantly as two prolific yet largely separated fields, we lack any known relationships between the critical exponents explored by them, despite the fact that they often study the same datasets. Here, by exploiting three different mobile phone datasets that capture simultaneously these two aspects, we discovered a new scaling relationship, mediated by a universal flux distribution, which links the critical exponents characterizing the spatial dependencies in human mobility and social networks. Therefore, the widely studied scaling laws uncovered in these two areas are not independent but connected through a deeper underlying reality. PMID:27274050

  12. A Nonlinear Model for Interactive Data Analysis and Visualization and an Implementation Using Progressive Computation for Massive Remote Climate Data Ensembles

    NASA Astrophysics Data System (ADS)

    Christensen, C.; Liu, S.; Scorzelli, G.; Lee, J. W.; Bremer, P. T.; Summa, B.; Pascucci, V.

    2017-12-01

    The creation, distribution, analysis, and visualization of large spatiotemporal datasets is a growing challenge for the study of climate and weather phenomena in which increasingly massive domains are utilized to resolve finer features, resulting in datasets that are simply too large to be effectively shared. Existing workflows typically consist of pipelines of independent processes that preclude many possible optimizations. As data sizes increase, these pipelines are difficult or impossible to execute interactively and instead simply run as large offline batch processes. Rather than limiting our conceptualization of such systems to pipelines (or dataflows), we propose a new model for interactive data analysis and visualization systems in which we comprehensively consider the processes involved from data inception through analysis and visualization in order to describe systems composed of these processes in a manner that facilitates interactive implementations of the entire system rather than of only a particular component. We demonstrate the application of this new model with the implementation of an interactive system that supports progressive execution of arbitrary user scripts for the analysis and visualization of massive, disparately located climate data ensembles. It is currently in operation as part of the Earth System Grid Federation server running at Lawrence Livermore National Lab, and accessible through both web-based and desktop clients. Our system facilitates interactive analysis and visualization of massive remote datasets up to petabytes in size, such as the 3.5 PB 7km NASA GEOS-5 Nature Run simulation, previously only possible offline or at reduced resolution. To support the community, we have enabled general distribution of our application using public frameworks including Docker and Anaconda.

  13. Experiments to Determine Whether Recursive Partitioning (CART) or an Artificial Neural Network Overcomes Theoretical Limitations of Cox Proportional Hazards Regression

    NASA Technical Reports Server (NTRS)

    Kattan, Michael W.; Hess, Kenneth R.; Kattan, Michael W.

    1998-01-01

    New computationally intensive tools for medical survival analyses include recursive partitioning (also called CART) and artificial neural networks. A challenge that remains is to better understand the behavior of these techniques in effort to know when they will be effective tools. Theoretically they may overcome limitations of the traditional multivariable survival technique, the Cox proportional hazards regression model. Experiments were designed to test whether the new tools would, in practice, overcome these limitations. Two datasets in which theory suggests CART and the neural network should outperform the Cox model were selected. The first was a published leukemia dataset manipulated to have a strong interaction that CART should detect. The second was a published cirrhosis dataset with pronounced nonlinear effects that a neural network should fit. Repeated sampling of 50 training and testing subsets was applied to each technique. The concordance index C was calculated as a measure of predictive accuracy by each technique on the testing dataset. In the interaction dataset, CART outperformed Cox (P less than 0.05) with a C improvement of 0.1 (95% Cl, 0.08 to 0.12). In the nonlinear dataset, the neural network outperformed the Cox model (P less than 0.05), but by a very slight amount (0.015). As predicted by theory, CART and the neural network were able to overcome limitations of the Cox model. Experiments like these are important to increase our understanding of when one of these new techniques will outperform the standard Cox model. Further research is necessary to predict which technique will do best a priori and to assess the magnitude of superiority.

  14. A multi-source dataset of urban life in the city of Milan and the Province of Trentino.

    PubMed

    Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno

    2015-01-01

    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others.

  15. A multi-source dataset of urban life in the city of Milan and the Province of Trentino

    NASA Astrophysics Data System (ADS)

    Barlacchi, Gianni; de Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno

    2015-10-01

    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others.

  16. EnviroAtlas - Austin, TX - Atlas Area Boundary

    EPA Pesticide Factsheets

    This EnviroAtlas dataset shows the boundary of the Austin, TX Atlas Area. It represents the outside edge of all the block groups included in the EnviroAtlas Area.This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  17. EnviroAtlas - Fresno, CA - Riparian Buffer Land Cover by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of different land cover types within 15- and 50-meters of hydrologically connected streams, rivers, and other water bodies within the Atlas Area. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  18. A multi-source dataset of urban life in the city of Milan and the Province of Trentino

    PubMed Central

    Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno

    2015-01-01

    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others. PMID:26528394

  19. -A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome.

    PubMed

    Mackeh, Rafah; Boughorbel, Sabri; Chaussabel, Damien; Kino, Tomoshige

    2017-01-01

    The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB), a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp.

  20. ­A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome

    PubMed Central

    Mackeh, Rafah; Boughorbel, Sabri; Chaussabel, Damien; Kino, Tomoshige

    2017-01-01

    The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB), a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp. PMID:28413616

  1. Physical environment virtualization for human activities recognition

    NASA Astrophysics Data System (ADS)

    Poshtkar, Azin; Elangovan, Vinayak; Shirkhodaie, Amir; Chan, Alex; Hu, Shuowen

    2015-05-01

    Human activity recognition research relies heavily on extensive datasets to verify and validate performance of activity recognition algorithms. However, obtaining real datasets are expensive and highly time consuming. A physics-based virtual simulation can accelerate the development of context based human activity recognition algorithms and techniques by generating relevant training and testing videos simulating diverse operational scenarios. In this paper, we discuss in detail the requisite capabilities of a virtual environment to aid as a test bed for evaluating and enhancing activity recognition algorithms. To demonstrate the numerous advantages of virtual environment development, a newly developed virtual environment simulation modeling (VESM) environment is presented here to generate calibrated multisource imagery datasets suitable for development and testing of recognition algorithms for context-based human activities. The VESM environment serves as a versatile test bed to generate a vast amount of realistic data for training and testing of sensor processing algorithms. To demonstrate the effectiveness of VESM environment, we present various simulated scenarios and processed results to infer proper semantic annotations from the high fidelity imagery data for human-vehicle activity recognition under different operational contexts.

  2. Climate Model Diagnostic Analyzer

    NASA Technical Reports Server (NTRS)

    Lee, Seungwon; Pan, Lei; Zhai, Chengxing; Tang, Benyang; Kubar, Terry; Zhang, Zia; Wang, Wei

    2015-01-01

    The comprehensive and innovative evaluation of climate models with newly available global observations is critically needed for the improvement of climate model current-state representation and future-state predictability. A climate model diagnostic evaluation process requires physics-based multi-variable analyses that typically involve large-volume and heterogeneous datasets, making them both computation- and data-intensive. With an exploratory nature of climate data analyses and an explosive growth of datasets and service tools, scientists are struggling to keep track of their datasets, tools, and execution/study history, let alone sharing them with others. In response, we have developed a cloud-enabled, provenance-supported, web-service system called Climate Model Diagnostic Analyzer (CMDA). CMDA enables the physics-based, multivariable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. At the same time, CMDA provides a crowd-sourcing space where scientists can organize their work efficiently and share their work with others. CMDA is empowered by many current state-of-the-art software packages in web service, provenance, and semantic search.

  3. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

    PubMed Central

    2009-01-01

    Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. PMID:19758426

  4. Monte Carlo simulations of soft proton flares: testing the physics with XMM-Newton

    NASA Astrophysics Data System (ADS)

    Fioretti, Valentina; Bulgarelli, Andrea; Malaguti, Giuseppe; Spiga, Daniele; Tiengo, Andrea

    2016-07-01

    Low energy protons (< 100 - 300 keV) in the Van Allen belt and the outer regions can enter the field of view of X-ray focusing telescopes, interact with the Wolter-I optics, and reach the focal plane. The funneling of soft protons was discovered after the damaging of the Chandra/ACIS Front-Illuminated CCDs in September 1999 after the first passages through the radiation belt. The use of special filters protects the XMM-Newton focal plane below an altitude of 70000 km, but above this limit the effect of soft protons is still present in the form of sudden ares in the count rate of the EPIC instruments that can last from hundreds of seconds to hours and can hardly be disentangled from X-ray photons, causing the loss of large amounts of observing time. The accurate characterization of (i) the distribution of the soft proton population, (ii) the physics interaction at play, and (iii) the effect on the focal plane, are mandatory to evaluate the background and design the proton magnetic diverter on board future X-ray focusing telescopes (e.g. ATHENA). Several solutions have been proposed so far for the primary population and the physics interaction, however the difficulty in precise angle and energy measurements in laboratory makes the smoking gun still unclear. Since the only real data available is the XMM-Newton spectrum of soft proton flares in orbit, we try to characterize the input proton population and the physics interaction by simulating, using the BoGEMMS framework, the proton interaction with a simplified model of the X-ray mirror module and the focal plane, and comparing the result with a real observation. The analysis of ten orbits of observations of the EPIC/pn instrument show that the detection of flares in regions far outside the radiation belt is largely influenced by the different orientation of the Earth's magnetosphere respect with XMM-Newton'os orbit, confirming the solar origin of the soft proton population. The Equator-S proton spectrum at 70000 km altitude is used for the proton population entering the optics, where a combined multiple and Firsov scattering is used as physics interaction. If the thick filter is used, the soft protons in the 30-70 keV energy range are the main contributors to the simulated spectrum below 10 keV. We are able to reproduce the proton vignetting observed in real data-sets, with a 50% decrease from the inner to the outer region, but a maximum flux of 0:01 counts cm2 s-1 keV-1 is obtained below 10 keV, about 5 times lower than the EPIC/MOS detection and 100 times lower than the EPIC/pn one. Given the high variability of the are intensity, we conclude that an average spectrum, based on the analysis of a full season of soft proton events is required to compare Monte Carlo simulations with real events.

  5. Social Priorities as Data

    NASA Astrophysics Data System (ADS)

    Grubert, E.

    2015-12-01

    Decision makers' responses to local risks and expected changes to a community from circumstances like natural hazards, human developments, and demographic changes can greatly affect social and environmental outcomes in a community. Translating physical data based in disciplines like engineering and geosciences into positive outcomes for communities can be challenging and often results in conflict that appears to pit "science" against "the public." Scientists can be reluctant to offer recommendations for action based on their work, often (and often correctly) noting that their role is not to make value judgments for a community - particularly for a community that is not their own. Conversely, decision makers can be frustrated by the lack of guidance they receive to help translate data into effective and acceptable action. The solution posed by this submission, given the goal of co-production of knowledge by scientists and decision makers to foster better community outcomes, is to involve the community directly by integrating social scientific methods that address decision making and community engagement to the scientist-decision maker interaction. Specifically, the missing dataset in many scientist-decision maker interactions is the nature of community priorities. Using scientifically valid methods to rigorously collect and characterize community priorities to help recommend tradeoffs between different outcomes indicated by the work of physical and natural scientists can bridge the gap between science and action by involving the community in the process. This submission presents early work on US preferences for different types of social and environmental outcomes designed to integrate directly with engineering and physical science frameworks like Life Cycle Assessment and Environmental Impact Statements. Cardinal preference data are based on surveys of US adults using tools like the Analytical Hierarchy Process, budget allocation, and ranking.

  6. Distributive On-line Processing, Visualization and Analysis System for Gridded Remote Sensing Data

    NASA Technical Reports Server (NTRS)

    Leptoukh, G.; Berrick, S.; Liu, Z.; Pham, L.; Rui, H.; Shen, S.; Teng, W.; Zhu, T.

    2004-01-01

    The ability to use data stored in the current Earth Observing System (EOS) archives for studying regional or global phenomena is highly dependent on having a detailed understanding of the data's internal structure and physical implementation. Gaining this understanding and applying it to data reduction is a time- consuming task that must be undertaken before the core investigation can begin. This is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets that are usually of different formats, structures, and resolutions, for example, when preparing data for input into modeling systems. The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) has taken a major step towards meeting this challenge by developing an infrastructure with a Web interface that allows users to perform interactive analysis online without downloading any data, the GES-DISC Interactive Online Visualization and Analysis Infrastructure or "Giovanni." Giovanni provides interactive, online, analysis tools for data users to facilitate their research. There have been several instances of this interface created to serve TRMM users, Aerosol scientists, Ocean Color and Agriculture applications users. The first generation of these tools support gridded data only. The user selects geophysical parameters, area of interest, time period; and the system generates an output on screen in a matter of seconds. The currently available output options are: Area plot averaged or accumulated over any available data period for any rectangular area; Time plot time series averaged over any rectangular area; Time plots image view of any longitude-time and latitude-time cross sections; ASCII output for all plot types; Image animation for area plot. In the future, we will add correlation plots, GIS-compatible outputs, etc. This allow user to focus on data content (i.e. science parameters) and eliminate the need for expensive learning, development and processing tasks that are redundantly incurred by an archive's user community. The current implementation utilizes the GrADS-DODS Server (GDS), a stable, secure data server that provides subsetting and analysis services across the Internet for any GrADS-readable dataset. The subsetting capability allows users to retrieve a specified temporal and/or spatial subdomain from a large dataset, eliminating the need to download everything simply to access a small relevant portion of a dataset. The analysis capability allows users to retrieve the results of an operation applied to one or more datasets on the server. In our case, we use this approach to read pre-processed binary files and/or to read and extract the needed parts from HDF or HDF-EOS files. These subsets then serve as inputs into GrADS processing and analysis scripts. It can be used in a wide variety of Earth science applications: climate and weather events study and monitoring; modeling. It can be easily configured for new applications.

  7. Phylogenetic conservatism in plant-soil feedback and its implications for plant abundance

    USDA-ARS?s Scientific Manuscript database

    Plant interactions with macro-mutualists (e.g., seed dispersers, pollinators) and antagonists (e.g., herbivores, pathogens) often exhibit phylogenetic conservatism, but conservatism of interactions with soil microorganisms is understudied. We assembled one of the best available datasets to examine c...

  8. P-MartCancer–Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.; Bramer, Lisa M.; Jensen, Jeffrey L.

    P-MartCancer is a new interactive web-based software environment that enables biomedical and biological scientists to perform in-depth analyses of global proteomics data without requiring direct interaction with the data or with statistical software. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access to multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium (CPTAC) at the peptide, gene and protein levels. P-MartCancer is deployed using Azure technologies (http://pmart.labworks.org/cptac.html), the web-service is alternativelymore » available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/) and many statistical functions can be utilized directly from an R package available on GitHub (https://github.com/pmartR).« less

  9. Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data

    PubMed Central

    Fujimori, Shigeo; Hirai, Naoya; Ohashi, Hiroyuki; Masuoka, Kazuyo; Nishikimi, Akihiko; Fukui, Yoshinori; Washio, Takanori; Oshikubo, Tomohiro; Yamashita, Tatsuhiro; Miyamoto-Sato, Etsuko

    2012-01-01

    Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks. PMID:23056904

  10. Using Global Total Electron Content to Understand Interminimum Changes in Solar EUV Irradiance and Thermospheric Composition

    NASA Astrophysics Data System (ADS)

    McDonald, S. E.; Emmert, J. T.; Krall, J.; Mannucci, A. J.; Vergados, P.

    2017-12-01

    To understand how and why the distribution of geospace plasma in the ionosphere/plasmasphere is evolving over multi-decadal time scales in response to solar, heliospheric and atmospheric forcing, it is critically important to have long-term, stable datasets. In this study, we use a newly constructed dataset of GPS-based total electron content (TEC) developed by JPL. The JPL Global Ionosphere Mapping (GIM) algorithm was used to generate a 35-station dataset spanning two solar minimum periods (1993-2014). We also use altimeter-derived TEC measurements from TOPEX-Poseidon and Jason-1 to construct a continuous dataset for the 1995-2014 time period. Both longterm datasets are compared to each other to study interminimum changes in the global TEC (during 1995-1995 and 2008-2009). We use the SAMI3 physics-based model of the ionosphere to compare the simulations of 1995-2014 with the JPL TEC and TOPEX/Jason-1 datasets. To drive SAMI3, we use the Naval Research Laboratory Solar Spectral Irradiance (NRLSSI) model to specify the EUV irradiances, and NRLMSIS to specify the thermosphere. We adjust the EUV irradiances and thermospheric constituents to match the TEC datasets and draw conclusions regarding sources of the differences between the two solar minimum periods.

  11. 3Drefine: an interactive web server for efficient protein structure refinement

    PubMed Central

    Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. PMID:27131371

  12. Integrative Analysis of Transcription Factor Combinatorial Interactions Using a Bayesian Tensor Factorization Approach

    PubMed Central

    Ye, Yusen; Gao, Lin; Zhang, Shihua

    2017-01-01

    Transcription factors play a key role in transcriptional regulation of genes and determination of cellular identity through combinatorial interactions. However, current studies about combinatorial regulation is deficient due to lack of experimental data in the same cellular environment and extensive existence of data noise. Here, we adopt a Bayesian CANDECOMP/PARAFAC (CP) factorization approach (BCPF) to integrate multiple datasets in a network paradigm for determining precise TF interaction landscapes. In our first application, we apply BCPF to integrate three networks built based on diverse datasets of multiple cell lines from ENCODE respectively to predict a global and precise TF interaction network. This network gives 38 novel TF interactions with distinct biological functions. In our second application, we apply BCPF to seven types of cell type TF regulatory networks and predict seven cell lineage TF interaction networks, respectively. By further exploring the dynamics and modularity of them, we find cell lineage-specific hub TFs participate in cell type or lineage-specific regulation by interacting with non-specific TFs. Furthermore, we illustrate the biological function of hub TFs by taking those of cancer lineage and blood lineage as examples. Taken together, our integrative analysis can reveal more precise and extensive description about human TF combinatorial interactions. PMID:29033978

  13. Integrative Analysis of Transcription Factor Combinatorial Interactions Using a Bayesian Tensor Factorization Approach.

    PubMed

    Ye, Yusen; Gao, Lin; Zhang, Shihua

    2017-01-01

    Transcription factors play a key role in transcriptional regulation of genes and determination of cellular identity through combinatorial interactions. However, current studies about combinatorial regulation is deficient due to lack of experimental data in the same cellular environment and extensive existence of data noise. Here, we adopt a Bayesian CANDECOMP/PARAFAC (CP) factorization approach (BCPF) to integrate multiple datasets in a network paradigm for determining precise TF interaction landscapes. In our first application, we apply BCPF to integrate three networks built based on diverse datasets of multiple cell lines from ENCODE respectively to predict a global and precise TF interaction network. This network gives 38 novel TF interactions with distinct biological functions. In our second application, we apply BCPF to seven types of cell type TF regulatory networks and predict seven cell lineage TF interaction networks, respectively. By further exploring the dynamics and modularity of them, we find cell lineage-specific hub TFs participate in cell type or lineage-specific regulation by interacting with non-specific TFs. Furthermore, we illustrate the biological function of hub TFs by taking those of cancer lineage and blood lineage as examples. Taken together, our integrative analysis can reveal more precise and extensive description about human TF combinatorial interactions.

  14. Interaction Junk: User Interaction-Based Evaluation of Visual Analytic Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Endert, Alexander; North, Chris

    2012-10-14

    With the growing need for visualization to aid users in understanding large, complex datasets, the ability for users to interact and explore these datasets is critical. As visual analytic systems have advanced to leverage powerful computational models and data analytics capabilities, the modes by which users engage and interact with the information are limited. Often, users are taxed with directly manipulating parameters of these models through traditional GUIs (e.g., using sliders to directly manipulate the value of a parameter). However, the purpose of user interaction in visual analytic systems is to enable visual data exploration – where users can focusmore » on their task, as opposed to the tool or system. As a result, users can engage freely in data exploration and decision-making, for the purpose of gaining insight. In this position paper, we discuss how evaluating visual analytic systems can be approached through user interaction analysis, where the goal is to minimize the cognitive translation between the visual metaphor and the mode of interaction (i.e., reducing the “Interactionjunk”). We motivate this concept through a discussion of traditional GUIs used in visual analytics for direct manipulation of model parameters, and the importance of designing interactions the support visual data exploration.« less

  15. Differential network analysis reveals the genome-wide landscape of estrogen receptor modulation in hormonal cancers

    PubMed Central

    Hsiao, Tzu-Hung; Chiu, Yu-Chiao; Hsu, Pei-Yin; Lu, Tzu-Pin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Huang, Tim H.-M.; Chuang, Eric Y.; Chen, Yidong

    2016-01-01

    Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC. PMID:26972162

  16. Emory University: High-Throughput Protein-Protein Interaction Dataset for Lung Cancer-Associated Genes | Office of Cancer Genomics

    Cancer.gov

    To discover novel PPI signaling hubs for lung cancer, CTD2 Center at Emory utilized large-scale genomics datasets and literature to compile a set of lung cancer-associated genes. A library of expression vectors were generated for these genes and utilized for detecting pairwise PPIs with cell lysate-based TR-FRET assays in high-throughput screening format. Read the abstract.

  17. Segmentation-less Digital Rock Physics

    NASA Astrophysics Data System (ADS)

    Tisato, N.; Ikeda, K.; Goldfarb, E. J.; Spikes, K. T.

    2017-12-01

    In the last decade, Digital Rock Physics (DRP) has become an avenue to investigate physical and mechanical properties of geomaterials. DRP offers the advantage of simulating laboratory experiments on numerical samples that are obtained from analytical methods. Potentially, DRP could allow sparing part of the time and resources that are allocated to perform complicated laboratory tests. Like classic laboratory tests, the goal of DRP is to estimate accurately physical properties of rocks like hydraulic permeability or elastic moduli. Nevertheless, the physical properties of samples imaged using micro-computed tomography (μCT) are estimated through segmentation of the μCT dataset. Segmentation proves to be a challenging and arbitrary procedure that typically leads to inaccurate estimates of physical properties. Here we present a novel technique to extract physical properties from a μCT dataset without the use of segmentation. We show examples in which we use segmentation-less method to simulate elastic wave propagation and pressure wave diffusion to estimate elastic properties and permeability, respectively. The proposed method takes advantage of effective medium theories and uses the density and the porosity that are measured in the laboratory to constrain the results. We discuss the results and highlight that segmentation-less DRP is more accurate than segmentation based DRP approaches and theoretical modeling for the studied rock. In conclusion, the segmentation-less approach here presented seems to be a promising method to improve accuracy and to ease the overall workflow of DRP.

  18. Updates to FuncLab, a Matlab based GUI for handling receiver functions

    NASA Astrophysics Data System (ADS)

    Porritt, Robert W.; Miller, Meghan S.

    2018-02-01

    Receiver functions are a versatile tool commonly used in seismic imaging. Depending on how they are processed, they can be used to image discontinuity structure within the crust or mantle or they can be inverted for seismic velocity either directly or jointly with complementary datasets. However, modern studies generally require large datasets which can be challenging to handle; therefore, FuncLab was originally written as an interactive Matlab GUI to assist in handling these large datasets. This software uses a project database to allow interactive trace editing, data visualization, H-κ stacking for crustal thickness and Vp/Vs ratio, and common conversion point stacking while minimizing computational costs. Since its initial release, significant advances have been made in the implementation of web services and changes in the underlying Matlab platform have necessitated a significant revision to the software. Here, we present revisions to the software, including new features such as data downloading via irisFetch.m, receiver function calculations via processRFmatlab, on-the-fly cross-section tools, interface picking, and more. In the descriptions of the tools, we present its application to a test dataset in Michigan, Wisconsin, and neighboring areas following the passage of USArray Transportable Array. The software is made available online at https://robporritt.wordpress.com/software.

  19. Exploring relationship between human mobility and social ties: Physical distance is not dead

    NASA Astrophysics Data System (ADS)

    Jin, Bo; Liao, Binbing; Yuan, Ning; Wang, Wenjun

    2015-06-01

    Partly due to the difficulty of the access to a worldwide dataset that simultaneously captures the location history and social networks, our understanding of the relationship between human mobility and the social ties has been limited. However, this topic is essential for a deeper study from human dynamics and social networks aspects. In this paper, we examine the location history data and social networks data of 712 email users and 399 offline events users from a map-editing based social network website. Based on these data, we expand all our experiment both from individual aspect and community aspect. We find that the physical distance is still the most influential factor to social ties among the nine representative human mobility features extracted from our GPS trajectory dataset, although Internet revolution has made long-distance communication dramatically faster, easier and cheaper than ever before, and in turn, partly expand the physical scope of social networks. Furthermore, we find that to a certain extent, the proximity of South-North direction is more influential than East-West direction to social ties. To the our best of our knowledge, this difference between South-North and East-West is the first time to be raised and quantitatively supported by a large dataset. We believe our findings on the interplay of human mobility and social ties offer a new perspective to this field of study.

  20. Development and comparison of projection and image space 3D nodule insertion techniques

    NASA Astrophysics Data System (ADS)

    Robins, Marthony; Solomon, Justin; Sahbaee, Pooyan; Samei, Ehsan

    2016-04-01

    This study aimed to develop and compare two methods of inserting computerized virtual lesions into CT datasets. 24 physical (synthetic) nodules of three sizes and four morphologies were inserted into an anthropomorphic chest phantom (LUNGMAN, KYOTO KAGAKU). The phantom was scanned (Somatom Definition Flash, Siemens Healthcare) with and without nodules present, and images were reconstructed with filtered back projection and iterative reconstruction (SAFIRE) at 0.6 mm slice thickness using a standard thoracic CT protocol at multiple dose settings. Virtual 3D CAD models based on the physical nodules were virtually inserted (accounting for the system MTF) into the nodule-free CT data using two techniques. These techniques include projection-based and image-based insertion. Nodule volumes were estimated using a commercial segmentation tool (iNtuition, TeraRecon, Inc.). Differences were tested using paired t-tests and R2 goodness of fit between the virtually and physically inserted nodules. Both insertion techniques resulted in nodule volumes very similar to the real nodules (<3% difference) and in most cases the differences were not statistically significant. Also, R2 values were all <0.97 for both insertion techniques. These data imply that these techniques can confidently be used as a means of inserting virtual nodules in CT datasets. These techniques can be instrumental in building hybrid CT datasets composed of patient images with virtually inserted nodules.

  1. NOXclass: prediction of protein-protein interaction types.

    PubMed

    Zhu, Hongbo; Domingues, Francisco S; Sommer, Ingolf; Lengauer, Thomas

    2006-01-19

    Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.

  2. Network representations of immune system complexity

    PubMed Central

    Subramanian, Naeha; Torabi-Parizi, Parizad; Gottschalk, Rachel A.; Germain, Ronald N.; Dutta, Bhaskar

    2015-01-01

    The mammalian immune system is a dynamic multi-scale system composed of a hierarchically organized set of molecular, cellular and organismal networks that act in concert to promote effective host defense. These networks range from those involving gene regulatory and protein-protein interactions underlying intracellular signaling pathways and single cell responses to increasingly complex networks of in vivo cellular interaction, positioning and migration that determine the overall immune response of an organism. Immunity is thus not the product of simple signaling events but rather non-linear behaviors arising from dynamic, feedback-regulated interactions among many components. One of the major goals of systems immunology is to quantitatively measure these complex multi-scale spatial and temporal interactions, permitting development of computational models that can be used to predict responses to perturbation. Recent technological advances permit collection of comprehensive datasets at multiple molecular and cellular levels while advances in network biology support representation of the relationships of components at each level as physical or functional interaction networks. The latter facilitate effective visualization of patterns and recognition of emergent properties arising from the many interactions of genes, molecules, and cells of the immune system. We illustrate the power of integrating ‘omics’ and network modeling approaches for unbiased reconstruction of signaling and transcriptional networks with a focus on applications involving the innate immune system. We further discuss future possibilities for reconstruction of increasingly complex cellular and organism-level networks and development of sophisticated computational tools for prediction of emergent immune behavior arising from the concerted action of these networks. PMID:25625853

  3. Bayesian correlated clustering to integrate multiple datasets

    PubMed Central

    Kirk, Paul; Griffin, Jim E.; Savage, Richard S.; Ghahramani, Zoubin; Wild, David L.

    2012-01-01

    Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047558

  4. A modified active appearance model based on an adaptive artificial bee colony.

    PubMed

    Abdulameer, Mohammed Hasan; Sheikh Abdullah, Siti Norul Huda; Othman, Zulaiha Ali

    2014-01-01

    Active appearance model (AAM) is one of the most popular model-based approaches that have been extensively used to extract features by highly accurate modeling of human faces under various physical and environmental circumstances. However, in such active appearance model, fitting the model with original image is a challenging task. State of the art shows that optimization method is applicable to resolve this problem. However, another common problem is applying optimization. Hence, in this paper we propose an AAM based face recognition technique, which is capable of resolving the fitting problem of AAM by introducing a new adaptive ABC algorithm. The adaptation increases the efficiency of fitting as against the conventional ABC algorithm. We have used three datasets: CASIA dataset, property 2.5D face dataset, and UBIRIS v1 images dataset in our experiments. The results have revealed that the proposed face recognition technique has performed effectively, in terms of accuracy of face recognition.

  5. Heliophysics Legacy Data Restoration

    NASA Astrophysics Data System (ADS)

    Candey, R. M.; Bell, E. V., II; Bilitza, D.; Chimiak, R.; Cooper, J. F.; Garcia, L. N.; Grayzeck, E. J.; Harris, B. T.; Hills, H. K.; Johnson, R. C.; Kovalick, T. J.; Lal, N.; Leckner, H. A.; Liu, M. H.; McCaslin, P. W.; McGuire, R. E.; Papitashvili, N. E.; Rhodes, S. A.; Roberts, D. A.; Yurow, R. E.

    2016-12-01

    The Space Physics Data Facility (SPDF) , in collaboration with the National Space Science Data Coordinated Archive (NSSDCA), is converting datasets from older NASA missions to online storage. Valuable science is still buried within these datasets, particularly by applying modern algorithms on computers with vastly more storage and processing power than available when originally measured, and when analyzed in conjunction with other data and models. The data were also not readily accessible as archived on 7- and 9-track tapes, microfilm and microfiche and other media. Although many datasets have now been moved online in formats that are readily analyzed, others will still require some deciphering to puzzle out the data values and scientific meaning. There is an ongoing effort to convert the datasets to a modern Common Data Format (CDF) and add metadata for use in browse and analysis tools such as CDAWeb .

  6. L2 Teaching in the Wild: A Closer Look at Correction and Explanation Practices in Everyday L2 Interaction

    ERIC Educational Resources Information Center

    Theodorsdottor, Gudrun

    2018-01-01

    This article argues for a reconceptualization of the concept of "corrective feedback" for the investigation of correction practices in everyday second language (L2) interaction ("in the wild"). Expanding the dataset for L2 research as suggested by Firth and Wagner (1997) to include interactions from the wild has consequences…

  7. A novel method based on new adaptive LVQ neural network for predicting protein-protein interactions from protein sequences.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2013-11-07

    Protein-Protein interaction (PPI) is one of the most important data in understanding the cellular processes. Many interesting methods have been proposed in order to predict PPIs. However, the methods which are based on the sequence of proteins as a prior knowledge are more universal. In this paper, a sequence-based, fast, and adaptive PPI prediction method is introduced to assign two proteins to an interaction class (yes, no). First, in order to improve the presentation of the sequences, twelve physicochemical properties of amino acid have been used by different representation methods to transform the sequence of protein pairs into different feature vectors. Then, for speeding up the learning process and reducing the effect of noise PPI data, principal component analysis (PCA) is carried out as a proper feature extraction algorithm. Finally, a new and adaptive Learning Vector Quantization (LVQ) predictor is designed to deal with different models of datasets that are classified into balanced and imbalanced datasets. The accuracy of 93.88%, 90.03%, and 89.72% has been found on S. cerevisiae, H. pylori, and independent datasets, respectively. The results of various experiments indicate the efficiency and validity of the method. © 2013 Published by Elsevier Ltd.

  8. P-MartCancer-Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets.

    PubMed

    Webb-Robertson, Bobbie-Jo M; Bramer, Lisa M; Jensen, Jeffrey L; Kobold, Markus A; Stratton, Kelly G; White, Amanda M; Rodland, Karin D

    2017-11-01

    P-MartCancer is an interactive web-based software environment that enables statistical analyses of peptide or protein data, quantitated from mass spectrometry-based global proteomics experiments, without requiring in-depth knowledge of statistical programming. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification, and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access and the capability to analyze multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium at the peptide, gene, and protein levels. P-MartCancer is deployed as a web service (https://pmart.labworks.org/cptac.html), alternatively available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/). Cancer Res; 77(21); e47-50. ©2017 AACR . ©2017 American Association for Cancer Research.

  9. EnviroAtlas - 303(d) Impairments by 12-digit HUC for the Conterminous United States

    EPA Pesticide Factsheets

    This EnviroAtlas dataset depicts the total length of stream or river flowlines that have impairments submitted to the EPA by states under section 303(d) of the Clean Water Act. It also contains the total lengths of streams, rivers, and canals, total waterbody area, and stream density (stream length per area) from the US Geological Survey's high-resolution National Hydrography Dataset (NHD).This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  10. A daily global mesoscale ocean eddy dataset from satellite altimetry.

    PubMed

    Faghmous, James H; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993-2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System.

  11. EnviroAtlas - Austin, TX - Green Space Proximity Gradient

    EPA Pesticide Factsheets

    In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  12. EnviroAtlas - Austin, TX - Tree Cover Configuration and Connectivity, Water Background

    EPA Pesticide Factsheets

    This EnviroAtlas dataset categorizes forest land cover into structural elements (e.g. core, edge, connector, etc.). In this community, Forest is defined as Trees & Forest (Trees & Forest - 40 = 1; All Else = 0). Water was considered background (value 129) during the analysis to create this dataset, however it has been converted into value 10 to distinguish it from land area background. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  13. EnviroAtlas - New York, NY - Green Space Proximity Gradient

    EPA Pesticide Factsheets

    In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. In this community, green space is defined as Trees & Forest and Grass & Herbaceous. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. EnviroAtlas - Des Moines, IA - Green Space Proximity Gradient

    EPA Pesticide Factsheets

    In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://enviroatlas.epa.gov/EnviroAtlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  15. EnviroAtlas - Cleveland, OH - Green Space Proximity Gradient

    EPA Pesticide Factsheets

    In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. In this community, green space is defined as Trees & Forest, Grass & Herbaceous, Woody Wetlands, and Emergent Wetlands. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas ) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  16. EnviroAtlas - Memphis, TN - Green Space Proximity Gradient

    EPA Pesticide Factsheets

    In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. Green space is defined as Trees & Forest, Grass & Herbaceous, Agriculture, Woody Wetlands, and Emergent Wetlands. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  17. EnviroAtlas - Durham, NC - Land Cover Summaries by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, wetland, and agriculture. Impervious is a combination of dark and light impervious. Green space is a combination of trees and forest and grass and herbaceous. This dataset also includes the area per capita for each block group for impervious, forest, and green space land cover. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas ) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets ).

  18. A daily global mesoscale ocean eddy dataset from satellite altimetry

    PubMed Central

    Faghmous, James H.; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993–2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744

  19. Analysis of the STAT3 interactome using in-situ biotinylation and SILAC.

    PubMed

    Blumert, Conny; Kalkhof, Stefan; Brocke-Heidrich, Katja; Kohajda, Tibor; von Bergen, Martin; Horn, Friedemann

    2013-12-06

    Signal transducer and activator of transcription 3 (STAT3) is activated by a variety of cytokines and growth factors. To generate a comprehensive data set of proteins interacting specifically with STAT3, we applied stable isotope labeling with amino acids in cell culture (SILAC). For high-affinity pull-down using streptavidin, we fused STAT3 with a short peptide tag allowing biotinylation in situ (bio-tag), which did not affect STAT3 functions. By this approach, 3642 coprecipitated proteins were detected in human embryonic kidney-293 cells. Filtering using statistical and functional criteria finally extracted 136 proteins as putative interaction partners of STAT3. Both, a physical interaction network analysis and the enrichment of known and predicted interaction partners suggested that our filtering criteria successfully enriched true STAT3 interactors. Our approach identified numerous novel interactors, including ones previously predicted to associate with STAT3. By reciprocal coprecipitation, we were able to verify the physical association between STAT3 and selected interactors, including the novel interaction with TOX4, a member of the TOX high mobility group box family. Applying the same method, we next investigated the activation-dependency of the STAT3 interactome. Again, we identified both known and novel interactions. Thus, our approach allows to study protein-protein interaction effectively and comprehensively. The location, activity, function, degradation, and synthesis of proteins are significantly regulated by interactions of proteins with other proteins, biopolymers and small molecules. Thus, the comprehensive characterization of interactions of proteins in a given proteome is the next milestone on the path to understanding the biochemistry of the cell. In order to generate a comprehensive interactome dataset of proteins specifically interacting with a selected bait protein, we fused our bait protein STAT3 with a short peptide tag allowing biotinylation in situ (bio-tag). This bio-tag allows an affinity pull-down using streptavidin but affected neither the activation of STAT3 by tyrosine phosphorylation nor its transactivating potential. We combined SILAC for accurate relative protein quantification, subcellular fractionation to increase the coverage of interacting proteins, high-affinity pull-down and a stringent filtering method to successfully analyze the interactome of STAT3. With our approach we confirmed several already known and identified numerous novel STAT3 interactors. The approach applied provides a rapid and effective method, which is broadly applicable for studying protein-protein interactions and their dependency on post-translational modifications. © 2013. Published by Elsevier B.V. All rights reserved.

  20. Assessing land ownership as a driver of change in the distribution, structure, and composition of California's forests.

    NASA Astrophysics Data System (ADS)

    Easterday, K.; Kelly, M.; McIntyre, P. J.

    2015-12-01

    Climate change is forecasted to have considerable influence on the distribution, structure, and function of California's forests. However, human interactions with forested landscapes (e.g. fire suppression, resource extraction and etc.) have complicated scientific understanding of the relative contributions of climate change and anthropogenic land management practices as drivers of change. Observed changes in forest structure towards smaller, denser forests across California have been attributed to both climate change (e.g. increased temperatures and declining water availability) and management practices (e.g. fire suppression and logging). Disentangling how these drivers of change act both together and apart is important to developing sustainable policy and land management practices as well as enhancing knowledge of human and natural system interactions. To that end, a comprehensive historical dataset - the Vegetation Type Mapping project (VTM) - and a modern forest inventory dataset (FIA) are used to analyze how spatial variations in vegetation composition and structure over a ~100 year period can be explained by land ownership.Climate change is forecasted to have considerable influence on the distribution, structure, and function of California's forests. However, human interactions with forested landscapes (e.g. fire suppression, resource extraction and etc.) have complicated scientific understanding of the relative contributions of climate change and anthropogenic land management practices as drivers of change. Observed changes in forest structure towards smaller, denser forests across California have been attributed to both climate change (e.g. increased temperatures and declining water availability) and management practices (e.g. fire suppression and logging). Disentangling how these drivers of change act both together and apart is important to developing sustainable policy and land management practices as well as enhancing knowledge of human and natural system interactions. To that end, a comprehensive historical dataset - the Vegetation Type Mapping project (VTM) - and a modern forest inventory dataset (FIA) are used to analyze how spatial variations in vegetation composition and structure over a ~100 year period can be explained by land ownership.

  1. Open-Source Python Tools for Deploying Interactive GIS Dashboards for a Billion Datapoints on a Laptop

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Bednar, J. A.; Rudiger, P.; Stevens, J. L. R.; Ball, C. E.; Christensen, S. D.; Pothina, D.

    2017-12-01

    The rich variety of software libraries available in the Python scientific ecosystem provides a flexible and powerful alternative to traditional integrated GIS (geographic information system) programs. Each such library focuses on doing a certain set of general-purpose tasks well, and Python makes it relatively simple to glue the libraries together to solve a wide range of complex, open-ended problems in Earth science. However, choosing an appropriate set of libraries can be challenging, and it is difficult to predict how much "glue code" will be needed for any particular combination of libraries and tasks. Here we present a set of libraries that have been designed to work well together to build interactive analyses and visualizations of large geographic datasets, in standard web browsers. The resulting workflows run on ordinary laptops even for billions of data points, and easily scale up to larger compute clusters when available. The declarative top-level interface used in these libraries means that even complex, fully interactive applications can be built and deployed as web services using only a few dozen lines of code, making it simple to create and share custom interactive applications even for datasets too large for most traditional GIS systems. The libraries we will cover include GeoViews (HoloViews extended for geographic applications) for declaring visualizable/plottable objects, Bokeh for building visual web applications from GeoViews objects, Datashader for rendering arbitrarily large datasets faithfully as fixed-size images, Param for specifying user-modifiable parameters that model your domain, Xarray for computing with n-dimensional array data, Dask for flexibly dispatching computational tasks across processors, and Numba for compiling array-based Python code down to fast machine code. We will show how to use the resulting workflow with static datasets and with simulators such as GSSHA or AdH, allowing you to deploy flexible, high-performance web-based dashboards for your GIS data or simulations without needing major investments in code development or maintenance.

  2. Reconstruction of a Three Hourly 1-km Land Surface Air Temperature Dataset in the Qinghai-Tibet Plateau

    NASA Astrophysics Data System (ADS)

    Zhou, J.; Ding, L.

    2017-12-01

    Land surface air temperature (SAT) is an important parameter in the modeling of radiation balance and energy budget of the earth surface. Generally, SAT is measured at ground meteorological stations; then SAT mapping is possible though a spatial interpolation process. The interpolated SAT map relies on the spatial distribution of ground stations, the terrain, and many other factors; thus, it has great uncertainties in regions with complicated terrain. Instead, SAT map can also be obtained through physical modeling of interactions between the land surface and the atmosphere. Such dataset generally has coarse spatial resolution (e.g. coarser than 0.1°) and cannot satisfy the applications at fine scales, e.g. 1 km. This presentation reports the reconstruction of a three hourly 1-km SAT dataset from 2001 to 2015 over the Qinghai-Tibet Plateau. The terrain in the Qinghai-Tibet Plateau, especially in the eastern part, is extremely complicated. Two SAT datasets with good qualities are used in this study. The first one is from the 3h China Meteorological Forcing Dataset with a 0.1° resolution released by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences (Yang et al., 2010); the second one is from the ERA-Interim product with the same temporal resolution and a 0.125° resolution. A statistical approach is developed to downscale the spatial resolution of the derived SAT to 1-km. The elevation and the normalized difference vegetation index (NDVI) are selected as two scaling factors in the downscaling approach. Results demonstrate there is significantly negative correlation between the SAT and elevation in all seasons; there is also significantly negative correlation between the SAT and NDVI in the vegetation growth seasons, while the correlation decreases in the other seasons. Therefore, a temporally dynamic downscaling approach is feasible to enhance the spatial resolution of the SAT. Compared with the SAT at the 0.1° or 0.125°, the reconstructed 1-km SAT can provide much more spatial details in areas with complicated terrain. Additionally, the 1-km SAT agrees well with the ground measured air temperatures as well as the SAT before downscaling. The reconstructed SAT will be beneficial for the modeling of surface radiation balance and energy budget over the Qinghai-Tibet Plateau.

  3. Docking pose selection by interaction pattern graph similarity: application to the D3R grand challenge 2015.

    PubMed

    Slynko, Inna; Da Silva, Franck; Bret, Guillaume; Rognan, Didier

    2016-09-01

    High affinity ligands for a given target tend to share key molecular interactions with important anchoring amino acids and therefore often present quite conserved interaction patterns. This simple concept was formalized in a topological knowledge-based scoring function (GRIM) for selecting the most appropriate docking poses from previously X-rayed interaction patterns. GRIM first converts protein-ligand atomic coordinates (docking poses) into a simple 3D graph describing the corresponding interaction pattern. In a second step, proposed graphs are compared to that found from template structures in the Protein Data Bank. Last, all docking poses are rescored according to an empirical score (GRIMscore) accounting for overlap of maximum common subgraphs. Taking the opportunity of the public D3R Grand Challenge 2015, GRIM was used to rescore docking poses for 36 ligands (6 HSP90α inhibitors, 30 MAP4K4 inhibitors) prior to the release of the corresponding protein-ligand X-ray structures. When applied to the HSP90α dataset, for which many protein-ligand X-ray structures are already available, GRIM provided very high quality solutions (mean rmsd = 1.06 Å, n = 6) as top-ranked poses, and significantly outperformed a state-of-the-art scoring function. In the case of MAP4K4 inhibitors, for which preexisting 3D knowledge is scarce and chemical diversity is much larger, the accuracy of GRIM poses decays (mean rmsd = 3.18 Å, n = 30) although GRIM still outperforms an energy-based scoring function. GRIM rescoring appears to be quite robust with comparison to the other approaches competing for the same challenge (42 submissions for the HSP90 dataset, 27 for the MAP4K4 dataset) as it ranked 3rd and 2nd respectively, for the two investigated datasets. The rescoring method is quite simple to implement, independent on a docking engine, and applicable to any target for which at least one holo X-ray structure is available.

  4. EnviroAtlas - Austin, TX - Block Groups

    EPA Pesticide Factsheets

    This EnviroAtlas dataset is the base layer for the Austin, TX EnviroAtlas area. The block groups are from the US Census Bureau and are included/excluded based on EnviroAtlas criteria described in the procedure log. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  5. EnviroAtlas - Austin, TX - Demographics by Block Group Web Service

    EPA Pesticide Factsheets

    This EnviroAtlas web service supports research and online mapping activities related to EnviroAtlas (https://enviroatlas.epa.gov/EnviroAtlas). This EnviroAtlas dataset is a summary of key demographic groups for the EnviroAtlas community. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  6. Hydrogeophysical Cyberinfrastructure For Real-Time Interactive Browser Controlled Monitoring Of Near Surface Hydrology: Results Of A 13 Month Monitoring Effort At The Hanford 300 Area

    NASA Astrophysics Data System (ADS)

    Versteeg, R. J.; Johnson, T.; Henrie, A.; Johnson, D.

    2013-12-01

    The Hanford 300 Area, located adjacent to the Columbia River in south-central Washington, USA, is the site of former research and uranium fuel rod fabrication facilities. Waste disposal practices at the site included discharging between 33 and 59 metric tons of uranium over a 40 year period into shallow infiltration galleries, resulting in persistent uranium contamination within the vadose and saturated zones. Uranium transport from the vadose zone to the saturated zone is intimately linked with water table fluctuations and river water driven by upstream dam operations. Different remedial efforts have occurred at the site to address uranium contamination. Numerous investigations are occurring at the site, both to investigate remedial performance and to increase the understanding of uranium dynamics. Several of these studies include acquisition of large hydrological and time lapse electrical geophysical data sets. Such datasets contain large amounts of information on hydrological processes. There are substantial challenges in how to effectively deal with the data volumes of such datasets, how to process such datasets and how to provide users with the ability to effectively access and synergize the hydrological information contained in raw and processed data. These challenges motivated the development of a cloud based cyberinfrastructure for dealing with large electrical hydrogeophysical datasets. This cyberinfrastructure is modular and extensible and includes datamanagement, data processing, visualization and result mining capabilities. Specifically, it provides for data transmission to a central server, data parsing in a relational database and processing of the data using a PNNL developed parallel inversion code on either dedicated or commodity compute clusters. Access to results is done through a browser with interactive tools allowing for generation of on demand visualization of the inversion results as well as interactive data mining and statistical calculation. This infrastructure was used for the acquisition and processing of an electrical geophysical timelapse survey which was collected over a highly instrumented field site in the Hanford 300 Area. Over a 13 month period between November 2011 and December 2012 1823 timelapse datasets were collected (roughly 5 datasets a day for a total of 23 million individual measurements) on three parallel resistivity lines of 30 m each with 0.5 meter electrode spacing. In addition, hydrological and environmental data were collected from dedicated and general purpose sensors. This dataset contains rich information on near surface processes on a range of different spatial and temporal scales (ranging from hourly to seasonal). We will show how this cyberinfrastructure was used to manage and process this dataset and how the cyberinfrastructure can be used to access, mine and visualize the resulting data and information.

  7. Association between urbanisation and type 2 diabetes: an ecological study.

    PubMed

    Gassasse, Zakariah; Smith, Dianna; Finer, Sarah; Gallo, Valentina

    2017-01-01

    Previous studies have explored the effect of urbanisation on the prevalence of type 2 diabetes (T2D) at regional/national level. The aim of this study is to investigate the association between urbanisation and T2D at country level, worldwide, and to explore the role of intermediate variables (physical inactivity, sugar consumption and obesity). The potential effect modification of gross domestic product (GDP) was also assessed. Data for 207 countries were collected from accessible datasets. Direct acyclic graphs were used to describe the association between urbanisation, T2D and their intermediate variables (physical inactivity, sugar consumption and obesity). Urbanisation was measured as urban percentage (UP) and as agglomeration index (AI). Crude and multivariate linear regression analyses were conducted to explore selected associations. The interaction between urbanisation and T2D across levels of GDP per capita was investigated. The association between urbanisation and T2D diverged by exposure: AI was positively associated, while UP negatively associated with T2D prevalence. Physical inactivity and obesity were statistically significantly associated with increased prevalence of T2D. In middle-income countries (MIC) UP, AI and GDP were significantly associated with T2D prevalence, while in high-income countries (HIC), physical inactivity and obesity were the main determinant of T2D prevalence. The type of urban growth, not urbanisation per se, predicted T2D prevalence at country level. In MIC, population density and GDP were the main determinant of diabetes, while in HIC. these were physical inactivity and obesity. Globalisation is playing an important role in the rise of T2D worldwide.

  8. Critical Zone Co-dynamics: Quantifying Interactions between Subsurface, Land Surface, and Vegetation Properties Using UAV and Geophysical Approaches

    NASA Astrophysics Data System (ADS)

    Dafflon, B.; Leger, E.; Peterson, J.; Falco, N.; Wainwright, H. M.; Wu, Y.; Tran, A. P.; Brodie, E.; Williams, K. H.; Versteeg, R.; Hubbard, S. S.

    2017-12-01

    Improving understanding and modelling of terrestrial systems requires advances in measuring and quantifying interactions among subsurface, land surface and vegetation processes over relevant spatiotemporal scales. Such advances are important to quantify natural and managed ecosystem behaviors, as well as to predict how watershed systems respond to increasingly frequent hydrological perturbations, such as droughts, floods and early snowmelt. Our study focuses on the joint use of UAV-based multi-spectral aerial imaging, ground-based geophysical tomographic monitoring (incl., electrical and electromagnetic imaging) and point-scale sensing (soil moisture sensors and soil sampling) to quantify interactions between above and below ground compartments of the East River Watershed in the Upper Colorado River Basin. We evaluate linkages between physical properties (incl. soil composition, soil electrical conductivity, soil water content), metrics extracted from digital surface and terrain elevation models (incl., slope, wetness index) and vegetation properties (incl., greenness, plant type) in a 500 x 500 m hillslope-floodplain subsystem of the watershed. Data integration and analysis is supported by numerical approaches that simulate the control of soil and geomorphic characteristic on hydrological processes. Results provide an unprecedented window into critical zone interactions, revealing significant below- and above-ground co-dynamics. Baseline geophysical datasets provide lithological structure along the hillslope, which includes a surface soil horizon, underlain by a saprolite layer and the fractured Mancos shale. Time-lapse geophysical data show very different moisture dynamics in various compartments and locations during the winter and growing season. Integration with aerial imaging reveals a significant linkage between plant growth and the subsurface wetness, soil characteristics and the topographic gradient. The obtained information about the organization and connectivity of the landscape is being transferred to larger regions using aerial imaging and will be used to constrain multi-scale, multi-physics hydro-biogeochemical simulations of the East River watershed response to hydrological perturbations.

  9. Addressing key concepts in physical geography through interactive learning activities in an online geo-ICT environment

    NASA Astrophysics Data System (ADS)

    Verstraeten, Gert; Steegen, An; Martens, Lotte

    2016-04-01

    The increasing number of geospatial datasets and free online geo-ICT tools offers new opportunities for education in Earth Sciences. Geospatial technology indeed provides an environment through which interactive learning can be introduced in Earth Sciences curricula. However, the effectiveness of such e-learning approaches in terms of learning outcomes has rarely been addressed. Here, we present our experience with the implementation of digital interactive learning activities within an introductory Physical Geography course attended by 90 undergraduate students in Geography, Geology, Biology and Archaeology. Two traditional lectures were replaced by interactive sessions (each 2 h) in a flexible classroom where students had to work both in team and individually in order to explore some key concepts through the integrated use of geospatial data within Google EarthTM. A first interactive lesson dealt with the classification of river systems and aimed to examine the conditions under which rivers tend to meander or to develop a braided pattern. Students were required to collect properties of rivers (river channel pattern, channel slope, climate, discharge, lithology, vegetation, etc). All these data are available on a global scale and have been added as separate map layers in Google EarthTM. Each student collected data for at least two rivers and added this information to a Google Drive Spreadsheet accessible to the entire group. This resulted in a database of more than one hundred rivers spread over various environments worldwide. In a second phase small groups of students discussed the potential relationships between river channel pattern and its controlling factors. Afterwards, the findings of each discussion group were presented to the entire audience. The same set-up was followed in a second interactive session to explore spatial variations in ecosystem properties such as net primary production and soil carbon content. The qualitative evaluation of both interactive sessions showed that the majority of students perceive these as very useful and inspiring. Students were more capable in exploring the spatial linkages between various environmental variables and processes compared to traditional lectures. Furthermore, the format of the sessions offered a forum in which undergraduate students from a variety of disciplines discussed the learning content in mixed groups. The success of interactive learning activities, however, strongly depends on the quality of the educational infrastructure (flexible spaces, wireless connections with sufficient broadband capacity).

  10. Analysing and correcting the differences between multi-source and multi-scale spatial remote sensing observations.

    PubMed

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation.

  11. Analysing and Correcting the Differences between Multi-Source and Multi-Scale Spatial Remote Sensing Observations

    PubMed Central

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. PMID:25405760

  12. Physical stability of drugs after storage above and below the glass transition temperature: Relationship to glass-forming ability.

    PubMed

    Alhalaweh, Amjad; Alzghoul, Ahmad; Mahlin, Denny; Bergström, Christel A S

    2015-11-10

    Amorphous materials are inherently unstable and tend to crystallize upon storage. In this study, we investigated the extent to which the physical stability and inherent crystallization tendency of drugs are related to their glass-forming ability (GFA), the glass transition temperature (Tg) and thermodynamic factors. Differential scanning calorimetry was used to produce the amorphous state of 52 drugs [18 compounds crystallized upon heating (Class II) and 34 remained in the amorphous state (Class III)] and to perform in situ storage for the amorphous material for 12h at temperatures 20°C above or below the Tg. A computational model based on the support vector machine (SVM) algorithm was developed to predict the structure-property relationships. All drugs maintained their Class when stored at 20°C below the Tg. Fourteen of the Class II compounds crystallized when stored above the Tg whereas all except one of the Class III compounds remained amorphous. These results were only related to the glass-forming ability and no relationship to e.g. thermodynamic factors was found. The experimental data were used for computational modeling and a classification model was developed that correctly predicted the physical stability above the Tg. The use of a large dataset revealed that molecular features related to aromaticity and π-π interactions reduce the inherent physical stability of amorphous drugs. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Thermodynamic Data Rescue and Informatics for Deep Carbon Science

    NASA Astrophysics Data System (ADS)

    Zhong, H.; Ma, X.; Prabhu, A.; Eleish, A.; Pan, F.; Parsons, M. A.; Ghiorso, M. S.; West, P.; Zednik, S.; Erickson, J. S.; Chen, Y.; Wang, H.; Fox, P. A.

    2017-12-01

    A large number of legacy datasets are contained in geoscience literature published between 1930 and 1980 and not expressed external to the publication text in digitized formats. Extracting, organizing, and reusing these "dark" datasets is highly valuable for many within the Earth and planetary science community. As a part of the Deep Carbon Observatory (DCO) data legacy missions, the DCO Data Science Team and Extreme Physics and Chemistry community identified thermodynamic datasets related to carbon, or more specifically datasets about the enthalpy and entropy of chemicals, as a proof of principle analysis. The data science team endeavored to develop a semi-automatic workflow, which includes identifying relevant publications, extracting contained datasets using OCR methods, collaborative reviewing, and registering the datasets via the DCO Data Portal where the 'Linked Data' feature of the data portal provides a mechanism for connecting rescued datasets beyond their individual data sources, to research domains, DCO Communities, and more, making data discovery and retrieval more effective.To date, the team has successfully rescued, deposited and registered additional datasets from publications with thermodynamic sources. These datasets contain 3 main types of data: (1) heat content or enthalpy data determined for a given compound as a function of temperature using high-temperature calorimetry, (2) heat content or enthalpy data determined for a given compound as a function of temperature using adiabatic calorimetry, and (3) direct determination of heat capacity of a compound as a function of temperature using differential scanning calorimetry. The data science team integrated these datasets and delivered a spectrum of data analytics including visualizations, which will lead to a comprehensive characterization of the thermodynamics of carbon and carbon-related materials.

  14. Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction.

    PubMed

    Han, Youngmahn; Kim, Dongsup

    2017-12-28

    Computational scanning of peptide candidates that bind to a specific major histocompatibility complex (MHC) can speed up the peptide-based vaccine development process and therefore various methods are being actively developed. Recently, machine-learning-based methods have generated successful results by training large amounts of experimental data. However, many machine learning-based methods are generally less sensitive in recognizing locally-clustered interactions, which can synergistically stabilize peptide binding. Deep convolutional neural network (DCNN) is a deep learning method inspired by visual recognition process of animal brain and it is known to be able to capture meaningful local patterns from 2D images. Once the peptide-MHC interactions can be encoded into image-like array(ILA) data, DCNN can be employed to build a predictive model for peptide-MHC binding prediction. In this study, we demonstrated that DCNN is able to not only reliably predict peptide-MHC binding, but also sensitively detect locally-clustered interactions. Nonapeptide-HLA-A and -B binding data were encoded into ILA data. A DCNN, as a pan-specific prediction model, was trained on the ILA data. The DCNN showed higher performance than other prediction tools for the latest benchmark datasets, which consist of 43 datasets for 15 HLA-A alleles and 25 datasets for 10 HLA-B alleles. In particular, the DCNN outperformed other tools for alleles belonging to the HLA-A3 supertype. The F1 scores of the DCNN were 0.86, 0.94, and 0.67 for HLA-A*31:01, HLA-A*03:01, and HLA-A*68:01 alleles, respectively, which were significantly higher than those of other tools. We found that the DCNN was able to recognize locally-clustered interactions that could synergistically stabilize peptide binding. We developed ConvMHC, a web server to provide user-friendly web interfaces for peptide-MHC class I binding predictions using the DCNN. ConvMHC web server can be accessible via http://jumong.kaist.ac.kr:8080/convmhc . We developed a novel method for peptide-HLA-I binding predictions using DCNN trained on ILA data that encode peptide binding data and demonstrated the reliable performance of the DCNN in nonapeptide binding predictions through the independent evaluation on the latest IEDB benchmark datasets. Our approaches can be applied to characterize locally-clustered patterns in molecular interactions, such as protein/DNA, protein/RNA, and drug/protein interactions.

  15. Integrative Analysis of GWASs, Human Protein Interaction, and Gene Expression Identified Gene Modules Associated With BMDs

    PubMed Central

    He, Hao; Zhang, Lei; Li, Jian; Wang, Yu-Ping; Zhang, Ji-Gang; Shen, Jie; Guo, Yan-Fang

    2014-01-01

    Context: To date, few systems genetics studies in the bone field have been performed. We designed our study from a systems-level perspective by integrating genome-wide association studies (GWASs), human protein-protein interaction (PPI) network, and gene expression to identify gene modules contributing to osteoporosis risk. Methods: First we searched for modules significantly enriched with bone mineral density (BMD)-associated genes in human PPI network by using 2 large meta-analysis GWAS datasets through a dense module search algorithm. One included 7 individual GWAS samples (Meta7). The other was from the Genetic Factors for Osteoporosis Consortium (GEFOS2). One was assigned as a discovery dataset and the other as an evaluation dataset, and vice versa. Results: In total, 42 modules and 129 modules were identified significantly in both Meta7 and GEFOS2 datasets for femoral neck and spine BMD, respectively. There were 3340 modules identified for hip BMD only in Meta7. As candidate modules, they were assessed for the biological relevance to BMD by gene set enrichment analysis in 2 expression profiles generated from circulating monocytes in subjects with low versus high BMD values. Interestingly, there were 2 modules significantly enriched in monocytes from the low BMD group in both gene expression datasets (nominal P value <.05). Two modules had 16 nonredundant genes. Functional enrichment analysis revealed that both modules were enriched for genes involved in Wnt receptor signaling and osteoblast differentiation. Conclusion: We highlighted 2 modules and novel genes playing important roles in the regulation of bone mass, providing important clues for therapeutic approaches for osteoporosis. PMID:25119315

  16. Region effects influence local tree species diversity.

    PubMed

    Ricklefs, Robert E; He, Fangliang

    2016-01-19

    Global patterns of biodiversity reflect both regional and local processes, but the relative importance of local ecological limits to species coexistence, as influenced by the physical environment, in contrast to regional processes including species production, dispersal, and extinction, is poorly understood. Failure to distinguish regional influences from local effects has been due, in part, to sampling limitations at small scales, environmental heterogeneity within local or regional samples, and incomplete geographic sampling of species. Here, we use a global dataset comprising 47 forest plots to demonstrate significant region effects on diversity, beyond the influence of local climate, which together explain more than 92% of the global variation in local forest tree species richness. Significant region effects imply that large-scale processes shaping the regional diversity of forest trees exert influence down to the local scale, where they interact with local processes to determine the number of coexisting species.

  17. Visualizing the ground motions of the 1906 San Francisco earthquake

    USGS Publications Warehouse

    Chourasia, A.; Cutchin, S.; Aagaard, Brad T.

    2008-01-01

    With advances in computational capabilities and refinement of seismic wave-propagation models in the past decade large three-dimensional simulations of earthquake ground motion have become possible. The resulting datasets from these simulations are multivariate, temporal and multi-terabyte in size. Past visual representations of results from seismic studies have been largely confined to static two-dimensional maps. New visual representations provide scientists with alternate ways of viewing and interacting with these results potentially leading to new and significant insight into the physical phenomena. Visualizations can also be used for pedagogic and general dissemination purposes. We present a workflow for visual representation of the data from a ground motion simulation of the great 1906 San Francisco earthquake. We have employed state of the art animation tools for visualization of the ground motions with a high degree of accuracy and visual realism. ?? 2008 Elsevier Ltd.

  18. Social cohesion and self-rated health: The moderating effect of neighborhood physical disorder.

    PubMed

    Bjornstrom, Eileen E S; Ralston, Margaret L; Kuhl, Danielle C

    2013-12-01

    Using data from the Los Angeles Family and Neighborhood Survey and its companion datasets, we examined how neighborhood disorder, perceived danger and both individually perceived and contextually measured neighborhood social cohesion are associated with self-rated health. Results indicate that neighborhood disorder is negatively associated with health and the relationship is explained by perceived cohesion and danger, which are both also significant predictors of health. Further, individually perceived cohesion emerges as a more important explanation of self-rated health than neighborhood-level social cohesion. Finally, neighborhood disorder and perceived cohesion interact to influence health, such that cohesion is especially beneficial when residents live in neighborhoods characterized by low to moderate disorder; once disorder is at high levels, cohesion no longer offers protection against poor health. We interpret our findings as they relate to prior research on neighborhoods, psychosocial processes, and health, and discuss their implications for intervention efforts that address disorder in urban communities.

  19. Signal analysis of accelerometry data using gravity-based modeling

    NASA Astrophysics Data System (ADS)

    Davey, Neil P.; James, Daniel A.; Anderson, Megan E.

    2004-03-01

    Triaxial accelerometers have been used to measure human movement parameters in swimming. Interpretation of data is difficult due to interference sources including interaction of external bodies. In this investigation the authors developed a model to simulate the physical movement of the lower back. Theoretical accelerometery outputs were derived thus giving an ideal, or noiseless dataset. An experimental data collection apparatus was developed by adapting a system to the aquatic environment for investigation of swimming. Model data was compared against recorded data and showed strong correlation. Comparison of recorded and modeled data can be used to identify changes in body movement, this is especially useful when cyclic patterns are present in the activity. Strong correlations between data sets allowed development of signal processing algorithms for swimming stroke analysis using first the pure noiseless data set which were then applied to performance data. Video analysis was also used to validate study results and has shown potential to provide acceptable results.

  20. Dynamic Server-Based KML Code Generator Method for Level-of-Detail Traversal of Geospatial Data

    NASA Technical Reports Server (NTRS)

    Baxes, Gregory; Mixon, Brian; Linger, TIm

    2013-01-01

    Web-based geospatial client applications such as Google Earth and NASA World Wind must listen to data requests, access appropriate stored data, and compile a data response to the requesting client application. This process occurs repeatedly to support multiple client requests and application instances. Newer Web-based geospatial clients also provide user-interactive functionality that is dependent on fast and efficient server responses. With massively large datasets, server-client interaction can become severely impeded because the server must determine the best way to assemble data to meet the client applications request. In client applications such as Google Earth, the user interactively wanders through the data using visually guided panning and zooming actions. With these actions, the client application is continually issuing data requests to the server without knowledge of the server s data structure or extraction/assembly paradigm. A method for efficiently controlling the networked access of a Web-based geospatial browser to server-based datasets in particular, massively sized datasets has been developed. The method specifically uses the Keyhole Markup Language (KML), an Open Geospatial Consortium (OGS) standard used by Google Earth and other KML-compliant geospatial client applications. The innovation is based on establishing a dynamic cascading KML strategy that is initiated by a KML launch file provided by a data server host to a Google Earth or similar KMLcompliant geospatial client application user. Upon execution, the launch KML code issues a request for image data covering an initial geographic region. The server responds with the requested data along with subsequent dynamically generated KML code that directs the client application to make follow-on requests for higher level of detail (LOD) imagery to replace the initial imagery as the user navigates into the dataset. The approach provides an efficient data traversal path and mechanism that can be flexibly established for any dataset regardless of size or other characteristics. The method yields significant improvements in userinteractive geospatial client and data server interaction and associated network bandwidth requirements. The innovation uses a C- or PHP-code-like grammar that provides a high degree of processing flexibility. A set of language lexer and parser elements is provided that offers a complete language grammar for writing and executing language directives. A script is wrapped and passed to the geospatial data server by a client application as a component of a standard KML-compliant statement. The approach provides an efficient means for a geospatial client application to request server preprocessing of data prior to client delivery. Data is structured in a quadtree format. As the user zooms into the dataset, geographic regions are subdivided into four child regions. Conversely, as the user zooms out, four child regions collapse into a single, lower-LOD region. The approach provides an efficient data traversal path and mechanism that can be flexibly established for any dataset regardless of size or other characteristics.

  1. OpenSHS: Open Smart Home Simulator.

    PubMed

    Alshammari, Nasser; Alshammari, Talal; Sedky, Mohamed; Champion, Justin; Bauer, Carolin

    2017-05-02

    This paper develops a new hybrid, open-source, cross-platform 3D smart home simulator, OpenSHS, for dataset generation. OpenSHS offers an opportunity for researchers in the field of the Internet of Things (IoT) and machine learning to test and evaluate their models. Following a hybrid approach, OpenSHS combines advantages from both interactive and model-based approaches. This approach reduces the time and efforts required to generate simulated smart home datasets. We have designed a replication algorithm for extending and expanding a dataset. A small sample dataset produced, by OpenSHS, can be extended without affecting the logical order of the events. The replication provides a solution for generating large representative smart home datasets. We have built an extensible library of smart devices that facilitates the simulation of current and future smart home environments. Our tool divides the dataset generation process into three distinct phases: first design: the researcher designs the initial virtual environment by building the home, importing smart devices and creating contexts; second, simulation: the participant simulates his/her context-specific events; and third, aggregation: the researcher applies the replication algorithm to generate the final dataset. We conducted a study to assess the ease of use of our tool on the System Usability Scale (SUS).

  2. OpenSHS: Open Smart Home Simulator

    PubMed Central

    Alshammari, Nasser; Alshammari, Talal; Sedky, Mohamed; Champion, Justin; Bauer, Carolin

    2017-01-01

    This paper develops a new hybrid, open-source, cross-platform 3D smart home simulator, OpenSHS, for dataset generation. OpenSHS offers an opportunity for researchers in the field of the Internet of Things (IoT) and machine learning to test and evaluate their models. Following a hybrid approach, OpenSHS combines advantages from both interactive and model-based approaches. This approach reduces the time and efforts required to generate simulated smart home datasets. We have designed a replication algorithm for extending and expanding a dataset. A small sample dataset produced, by OpenSHS, can be extended without affecting the logical order of the events. The replication provides a solution for generating large representative smart home datasets. We have built an extensible library of smart devices that facilitates the simulation of current and future smart home environments. Our tool divides the dataset generation process into three distinct phases: first design: the researcher designs the initial virtual environment by building the home, importing smart devices and creating contexts; second, simulation: the participant simulates his/her context-specific events; and third, aggregation: the researcher applies the replication algorithm to generate the final dataset. We conducted a study to assess the ease of use of our tool on the System Usability Scale (SUS). PMID:28468330

  3. The iPlant Collaborative: Cyberinfrastructure for Plant Biology.

    PubMed

    Goff, Stephen A; Vaughn, Matthew; McKay, Sheldon; Lyons, Eric; Stapleton, Ann E; Gessler, Damian; Matasci, Naim; Wang, Liya; Hanlon, Matthew; Lenards, Andrew; Muir, Andy; Merchant, Nirav; Lowry, Sonya; Mock, Stephen; Helmke, Matthew; Kubach, Adam; Narro, Martha; Hopkins, Nicole; Micklos, David; Hilgert, Uwe; Gonzales, Michael; Jordan, Chris; Skidmore, Edwin; Dooley, Rion; Cazes, John; McLay, Robert; Lu, Zhenyuan; Pasternak, Shiran; Koesterke, Lars; Piel, William H; Grene, Ruth; Noutsos, Christos; Gendler, Karla; Feng, Xin; Tang, Chunlao; Lent, Monica; Kim, Seung-Jin; Kvilekval, Kristian; Manjunath, B S; Tannen, Val; Stamatakis, Alexandros; Sanderson, Michael; Welch, Stephen M; Cranston, Karen A; Soltis, Pamela; Soltis, Doug; O'Meara, Brian; Ane, Cecile; Brutnell, Tom; Kleibenstein, Daniel J; White, Jeffery W; Leebens-Mack, James; Donoghue, Michael J; Spalding, Edgar P; Vision, Todd J; Myers, Christopher R; Lowenthal, David; Enquist, Brian J; Boyle, Brad; Akoglu, Ali; Andrews, Greg; Ram, Sudha; Ware, Doreen; Stein, Lincoln; Stanzione, Dan

    2011-01-01

    The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services.

  4. The iPlant Collaborative: Cyberinfrastructure for Plant Biology

    PubMed Central

    Goff, Stephen A.; Vaughn, Matthew; McKay, Sheldon; Lyons, Eric; Stapleton, Ann E.; Gessler, Damian; Matasci, Naim; Wang, Liya; Hanlon, Matthew; Lenards, Andrew; Muir, Andy; Merchant, Nirav; Lowry, Sonya; Mock, Stephen; Helmke, Matthew; Kubach, Adam; Narro, Martha; Hopkins, Nicole; Micklos, David; Hilgert, Uwe; Gonzales, Michael; Jordan, Chris; Skidmore, Edwin; Dooley, Rion; Cazes, John; McLay, Robert; Lu, Zhenyuan; Pasternak, Shiran; Koesterke, Lars; Piel, William H.; Grene, Ruth; Noutsos, Christos; Gendler, Karla; Feng, Xin; Tang, Chunlao; Lent, Monica; Kim, Seung-Jin; Kvilekval, Kristian; Manjunath, B. S.; Tannen, Val; Stamatakis, Alexandros; Sanderson, Michael; Welch, Stephen M.; Cranston, Karen A.; Soltis, Pamela; Soltis, Doug; O'Meara, Brian; Ane, Cecile; Brutnell, Tom; Kleibenstein, Daniel J.; White, Jeffery W.; Leebens-Mack, James; Donoghue, Michael J.; Spalding, Edgar P.; Vision, Todd J.; Myers, Christopher R.; Lowenthal, David; Enquist, Brian J.; Boyle, Brad; Akoglu, Ali; Andrews, Greg; Ram, Sudha; Ware, Doreen; Stein, Lincoln; Stanzione, Dan

    2011-01-01

    The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services. PMID:22645531

  5. Psychometric properties and a latent class analysis of the 12-item World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) in a pooled dataset of community samples.

    PubMed

    MacLeod, Melissa A; Tremblay, Paul F; Graham, Kathryn; Bernards, Sharon; Rehm, Jürgen; Wells, Samantha

    2016-12-01

    The 12-item World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) is a brief measurement tool used cross-culturally to capture the multi-dimensional nature of disablement through six domains, including: understanding and interacting with the world; moving and getting around; self-care; getting on with people; life activities; and participation in society. Previous psychometric research supports that the WHODAS 2.0 functions as a general factor of disablement. In a pooled dataset from community samples of adults (N = 447) we used confirmatory factor analysis to confirm a one-factor structure. Latent class analysis was used to identify subgroups of individuals based on their patterns of responses. We identified four distinct classes, or patterns of disablement: (1) pervasive disability; (2) physical disability; (3) emotional, cognitive, or interpersonal disability; (4) no/low disability. Convergent validity of the latent class subgroups was found with respect to socio-demographic characteristics, number of days affected by disabilities, stress, mental health, and substance use. These classes offer a simple and meaningful way to classify people with disabilities based on the 12-item WHODAS 2.0. Focusing on individuals with a high probability of being in the first three classes may help guide interventions. Copyright © 2016 John Wiley & Sons, Ltd.

  6. The PO.DAAC Portal and its use of the Drupal Framework

    NASA Astrophysics Data System (ADS)

    Alarcon, C.; Huang, T.; Bingham, A.; Cosic, S.

    2011-12-01

    The Physical Oceanography Distributed Active Archive Center portal (http://podaac.jpl.nasa.gov) is the primary interface for discovering and accessing oceanographic datasets collected from the vantage point of space. In addition, it provides information about NASA's satellite missions and operational activities at the data center. Recently the portal underwent a major redesign and deployment utilizing the Drupal framework. The Drupal framework was chosen as the platform for the portal due to its flexibility, open source community, and modular infrastructure. The portal features efficient content addition and management, mailing lists, forums, role based access control, and a faceted dataset browse capability. The dataset browsing was built as a custom Drupal module and integrates with a SOLR search engine.

  7. A maximum likelihood analysis of the CoGeNT public dataset

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kelso, Chris, E-mail: ckelso@unf.edu

    The CoGeNT detector, located in the Soudan Underground Laboratory in Northern Minnesota, consists of a 475 grams (fiducial mass of 330 grams) target mass of p-type point contact germanium detector that measures the ionization charge created by nuclear recoils. This detector has searched for recoils created by dark matter since December of 2009. We analyze the public dataset from the CoGeNT experiment to search for evidence of dark matter interactions with the detector. We perform an unbinned maximum likelihood fit to the data and compare the significance of different WIMP hypotheses relative to each other and the null hypothesis ofmore » no WIMP interactions. This work presents the current status of the analysis.« less

  8. MaGnET: Malaria Genome Exploration Tool.

    PubMed

    Sharman, Joanna L; Gerloff, Dietlind L

    2013-09-15

    The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.

  9. TableViewer for Herschel Data Processing

    NASA Astrophysics Data System (ADS)

    Zhang, L.; Schulz, B.

    2006-07-01

    The TableViewer utility is a GUI tool written in Java to support interactive data processing and analysis for the Herschel Space Observatory (Pilbratt et al. 2001). The idea was inherited from a prototype written in IDL (Schulz et al. 2005). It allows to graphically view and analyze tabular data organized in columns with equal numbers of rows. It can be run either as a standalone application, where data access is restricted to FITS (FITS 1999) files only, or it can be run from the Quick Look Analysis(QLA) or Interactive Analysis(IA) command line, from where also objects are accessible. The graphic display is very versatile, allowing plots in either linear or log scales. Zooming, panning, and changing data columns is performed rapidly using a group of navigation buttons. Selecting and de-selecting of fields of data points controls the input to simple analysis tasks like building a statistics table, or generating power spectra. The binary data stored in a TableDataset^1, a Product or in FITS files can also be displayed as tabular data, where values in individual cells can be modified. TableViewer provides several processing utilities which, besides calculation of statistics either for all channels or for selected channels, and calculation of power spectra, allows to convert/repair datasets by changing the unit name of data columns, and by modifying data values in columns with a simple calculator tool. Interactively selected data can be separated out, and modified data sets can be saved to FITS files. The tool will be very helpful especially in the early phases of Herschel data analysis when a quick access to contents of data products is important. TableDataset and Product are Java classes defined in herschel.ia.dataset.

  10. Interactive Scripting for Analysis and Visualization of Arbitrarily Large, Disparately Located Climate Data Ensembles Using a Progressive Runtime Server

    NASA Astrophysics Data System (ADS)

    Christensen, C.; Summa, B.; Scorzelli, G.; Lee, J. W.; Venkat, A.; Bremer, P. T.; Pascucci, V.

    2017-12-01

    Massive datasets are becoming more common due to increasingly detailed simulations and higher resolution acquisition devices. Yet accessing and processing these huge data collections for scientific analysis is still a significant challenge. Solutions that rely on extensive data transfers are increasingly untenable and often impossible due to lack of sufficient storage at the client side as well as insufficient bandwidth to conduct such large transfers, that in some cases could entail petabytes of data. Large-scale remote computing resources can be useful, but utilizing such systems typically entails some form of offline batch processing with long delays, data replications, and substantial cost for any mistakes. Both types of workflows can severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. In order to facilitate interactivity in both analysis and visualization of these massive data ensembles, we introduce a dynamic runtime system suitable for progressive computation and interactive visualization of arbitrarily large, disparately located spatiotemporal datasets. Our system includes an embedded domain-specific language (EDSL) that allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible processing. Computations involving large amounts of data can be performed remotely in an incremental fashion that dramatically reduces data movement, while the client receives updates progressively thereby remaining robust to fluctuating network latency or limited bandwidth. This system facilitates interactive, incremental analysis and visualization of massive remote datasets up to petabytes in size. Our system is now available for general use in the community through both docker and anaconda.

  11. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions.

    PubMed

    Heslot, Nicolas; Akdemir, Deniz; Sorrells, Mark E; Jannink, Jean-Luc

    2014-02-01

    Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.

  12. MPact: the MIPS protein interaction resource on yeast.

    PubMed

    Güldener, Ulrich; Münsterkötter, Martin; Oesterheld, Matthias; Pagel, Philipp; Ruepp, Andreas; Mewes, Hans-Werner; Stümpflen, Volker

    2006-01-01

    In recent years, the Munich Information Center for Protein Sequences (MIPS) yeast protein-protein interaction (PPI) dataset has been used in numerous analyses of protein networks and has been called a gold standard because of its quality and comprehensiveness [H. Yu, N. M. Luscombe, H. X. Lu, X. Zhu, Y. Xia, J. D. Han, N. Bertin, S. Chung, M. Vidal and M. Gerstein (2004) Genome Res., 14, 1107-1118]. MPact and the yeast protein localization catalog provide information related to the proximity of proteins in yeast. Beside the integration of high-throughput data, information about experimental evidence for PPIs in the literature was compiled by experts adding up to 4300 distinct PPIs connecting 1500 proteins in yeast. As the interaction data is a complementary part of CYGD, interactive mapping of data on other integrated data types such as the functional classification catalog [A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. Güldener, G. Mannhaupt, M. Münsterkötter and H. W. Mewes (2004) Nucleic Acids Res., 32, 5539-5545] is possible. A survey of signaling proteins and comparison with pathway data from KEGG demonstrates that based on these manually annotated data only an extensive overview of the complexity of this functional network can be obtained in yeast. The implementation of a web-based PPI-analysis tool allows analysis and visualization of protein interaction networks and facilitates integration of our curated data with high-throughput datasets. The complete dataset as well as user-defined sub-networks can be retrieved easily in the standardized PSI-MI format. The resource can be accessed through http://mips.gsf.de/genre/proj/mpact.

  13. Intermolecular interactions in the condensed phase: Evaluation of semi-empirical quantum mechanical methods

    NASA Astrophysics Data System (ADS)

    Christensen, Anders S.; Kromann, Jimmy C.; Jensen, Jan H.; Cui, Qiang

    2017-10-01

    To facilitate further development of approximate quantum mechanical methods for condensed phase applications, we present a new benchmark dataset of intermolecular interaction energies in the solution phase for a set of 15 dimers, each containing one charged monomer. The reference interaction energy in solution is computed via a thermodynamic cycle that integrates dimer binding energy in the gas phase at the coupled cluster level and solute-solvent interaction with density functional theory; the estimated uncertainty of such calculated interaction energy is ±1.5 kcal/mol. The dataset is used to benchmark the performance of a set of semi-empirical quantum mechanical (SQM) methods that include DFTB3-D3, DFTB3/CPE-D3, OM2-D3, PM6-D3, PM6-D3H+, and PM7 as well as the HF-3c method. We find that while all tested SQM methods tend to underestimate binding energies in the gas phase with a root-mean-squared error (RMSE) of 2-5 kcal/mol, they overestimate binding energies in the solution phase with an RMSE of 3-4 kcal/mol, with the exception of DFTB3/CPE-D3 and OM2-D3, for which the systematic deviation is less pronounced. In addition, we find that HF-3c systematically overestimates binding energies in both gas and solution phases. As most approximate QM methods are parametrized and evaluated using data measured or calculated in the gas phase, the dataset represents an important first step toward calibrating QM based methods for application in the condensed phase where polarization and exchange repulsion need to be treated in a balanced fashion.

  14. Using routine clinical and administrative data to produce a dataset of attendances at Emergency Departments following self-harm.

    PubMed

    Polling, C; Tulloch, A; Banerjee, S; Cross, S; Dutta, R; Wood, D M; Dargan, P I; Hotopf, M

    2015-07-16

    Self-harm is a significant public health concern in the UK. This is reflected in the recent addition to the English Public Health Outcomes Framework of rates of attendance at Emergency Departments (EDs) following self-harm. However there is currently no source of data to measure this outcome. Routinely available data for inpatient admissions following self-harm miss the majority of cases presenting to services. We aimed to investigate (i) if a dataset of ED presentations could be produced using a combination of routinely collected clinical and administrative data and (ii) to validate this dataset against another one produced using methods similar to those used in previous studies. Using the Clinical Record Interactive Search system, the electronic health records (EHRs) used in four EDs were linked to Hospital Episode Statistics to create a dataset of attendances following self-harm. This dataset was compared with an audit dataset of ED attendances created by manual searching of ED records. The proportion of total cases detected by each dataset was compared. There were 1932 attendances detected by the EHR dataset and 1906 by the audit. The EHR and audit datasets detected 77% and 76 of all attendances respectively and both detected 82% of individual patients. There were no differences in terms of age, sex, ethnicity or marital status between those detected and those missed using the EHR method. Both datasets revealed more than double the number of self-harm incidents than could be identified from inpatient admission records. It was possible to use routinely collected EHR data to create a dataset of attendances at EDs following self-harm. The dataset detected the same proportion of attendances and individuals as the audit dataset, proved more comprehensive than the use of inpatient admission records, and did not show a systematic bias in those cases it missed.

  15. Scalable, Secure Analysis of Social Sciences Data on the Azure Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simmhan, Yogesh; Deng, Litao; Kumbhare, Alok

    2012-05-07

    Human activity and interaction data is beginning to be collected at population scales through the pervasiveness of social media and willingness of people to volunteer information. This can allow social science researchers to understand and model human behavior with better accuracy and prediction power. Political and social scientists are starting to correlate such large scale social media datasets with events that impact society as evidence abound of the virtual and physical public spaces intersecting and influencing each other [1,2]. Managers of Cyber Physical Systems such as Smart Power Grid utilities are investigating the impact of consumer behavior on power consumption,more » and the possibility of influencing the usage profile [3]. Data collection is also made easier through technology such as mobile apps, social media sites and search engines that directly collect data, and sensors such smart meters and room occupancy sensors that indirectly measure human activity. These technology platforms also provide a convenient framework for “human sensors” to record and broadcast data for behavioral studies, as a form of crowd sourced citizen science. This has the added advantage of engaging the broader public in STEM activities and help influence public policy.« less

  16. Manipulation of volumetric patient data in a distributed virtual reality environment.

    PubMed

    Dech, F; Ai, Z; Silverstein, J C

    2001-01-01

    Due to increases in network speed and bandwidth, distributed exploration of medical data in immersive Virtual Reality (VR) environments is becoming increasingly feasible. The volumetric display of radiological data in such environments presents a unique set of challenges. The shear size and complexity of the datasets involved not only make them difficult to transmit to remote sites, but these datasets also require extensive user interaction in order to make them understandable to the investigator and manageable to the rendering hardware. A sophisticated VR user interface is required in order for the clinician to focus on the aspects of the data that will provide educational and/or diagnostic insight. We will describe a software system of data acquisition, data display, Tele-Immersion, and data manipulation that supports interactive, collaborative investigation of large radiological datasets. The hardware required in this strategy is still at the high-end of the graphics workstation market. Future software ports to Linux and NT, along with the rapid development of PC graphics cards, open the possibility for later work with Linux or NT PCs and PC clusters.

  17. Ocean Carbon States: Data Mining in Observations and Numerical Simulations Results

    NASA Astrophysics Data System (ADS)

    Latto, R.; Romanou, A.

    2017-12-01

    Advanced data mining techniques are rapidly becoming widely used in Climate and Earth Sciences with the purpose of extracting new meaningful information from increasingly larger and more complex datasets. This is particularly important in studies of the global carbon cycle, where any lack of understanding of its combined physical and biogeochemical drivers is detrimental to our ability to accurately describe, understand, and predict CO2 concentrations and their changes in the major carbon reservoirs. The analysis presented here evaluates the use of cluster analysis as a means of identifying and comparing spatial and temporal patterns extracted from observational and model datasets. As the observational data is organized into various regimes, which we will call "ocean carbon states", we gain insight into the physical and/or biogeochemical processes controlling the ocean carbon cycle as well as how well these processes are simulated by a state-of-the-art climate model. We find that cluster analysis effectively produces realistic, dynamic regimes that can be associated with specific processes at different temporal scales for both observations and the model. In addition, we show how these regimes can be used to illustrate and characterize the model biases in the model air-sea flux of CO2. These biases are attributed to biases in salinity, sea surface temperature, wind speed, and nitrate, which are then used to identify the physical processes that are inaccurately reproduced by the model. In this presentation, we provide a proof-of-concept application using simple datasets, and we expand to more complex ones, using several physical and biogeochemical variable pairs, thus providing considerable insight into the mechanisms and phases of the ocean carbon cycle over different temporal and spatial scales.

  18. Prediction of AL and Dst Indices from ACE Measurements Using Hybrid Physics/Black-Box Techniques

    NASA Astrophysics Data System (ADS)

    Spencer, E.; Rao, A.; Horton, W.; Mays, L.

    2008-12-01

    ACE measurements of the solar wind velocity, IMF and proton density is used to drive a hybrid Physics/Black- Box model of the nightside magnetosphere. The core physics is contained in a low order nonlinear dynamical model of the nightside magnetosphere called WINDMI. The model is augmented by wavelet based nonlinear mappings between the solar wind quantities and the input into the physics model, followed by further wavelet based mappings of the model output field aligned currents onto the ground based magnetometer measurements of the AL index and Dst index. The black box mappings are introduced at the input stage to account for uncertainties in the way the solar wind quantities are transported from the ACE spacecraft at L1 to the magnetopause. Similar mappings are introduced at the output stage to account for a spatially and temporally varying westward auroral electrojet geometry. The parameters of the model are tuned using a genetic algorithm, and trained using the large geomagnetic storm dataset of October 3-7 2000. It's predictive performance is then evaluated on subsequent storm datasets, in particular the April 15-24 2002 storm. This work is supported by grant NSF 7020201

  19. 2D/3D fetal cardiac dataset segmentation using a deformable model.

    PubMed

    Dindoyal, Irving; Lambrou, Tryphon; Deng, Jing; Todd-Pokropek, Andrew

    2011-07-01

    To segment the fetal heart in order to facilitate the 3D assessment of the cardiac function and structure. Ultrasound acquisition typically results in drop-out artifacts of the chamber walls. The authors outline a level set deformable model to automatically delineate the small fetal cardiac chambers. The level set is penalized from growing into an adjacent cardiac compartment using a novel collision detection term. The region based model allows simultaneous segmentation of all four cardiac chambers from a user defined seed point placed in each chamber. The segmented boundaries are automatically penalized from intersecting at walls with signal dropout. Root mean square errors of the perpendicular distances between the algorithm's delineation and manual tracings are within 2 mm which is less than 10% of the length of a typical fetal heart. The ejection fractions were determined from the 3D datasets. We validate the algorithm using a physical phantom and obtain volumes that are comparable to those from physically determined means. The algorithm segments volumes with an error of within 13% as determined using a physical phantom. Our original work in fetal cardiac segmentation compares automatic and manual tracings to a physical phantom and also measures inter observer variation.

  20. A Standard for Sharing and Accessing Time Series Data: The Heliophysics Application Programmers Interface (HAPI) Specification

    NASA Astrophysics Data System (ADS)

    Vandegriff, J. D.; King, T. A.; Weigel, R. S.; Faden, J.; Roberts, D. A.; Harris, B. T.; Lal, N.; Boardsen, S. A.; Candey, R. M.; Lindholm, D. M.

    2017-12-01

    We present the Heliophysics Application Programmers Interface (HAPI), a new interface specification that both large and small data centers can use to expose time series data holdings in a standard way. HAPI was inspired by the similarity of existing services at many Heliophysics data centers, and these data centers have collaborated to define a single interface that captures best practices and represents what everyone considers the essential, lowest common denominator for basic data access. This low level access can serve as infrastructure to support greatly enhanced interoperability among analysis tools, with the goal being simplified analysis and comparison of data from any instrument, model, mission or data center. The three main services a HAPI server must perform are 1. list a catalog of datasets (one unique ID per dataset), 2. describe the content of one dataset (JSON metadata), and 3. retrieve numerical content for one dataset (stream the actual data). HAPI defines both the format of the query to the server, and the response from the server. The metadata is lightweight, focusing on use rather than discovery, and the data format is a streaming one, with Comma Separated Values (CSV) being required and binary or JSON streaming being optional. The HAPI specification is available at GitHub, where projects are also underway to develop reference implementation servers that data providers can adapt and use at their own sites. Also in the works are data analysis clients in multiple languages (IDL, Python, Matlab, and Java). Institutions which have agreed to adopt HAPI include Goddard (CDAWeb for data and CCMC for models), LASP at the University of Colorado Boulder, the Particles and Plasma Interactions node of the Planetary Data System (PPI/PDS) at UCLA, the Plasma Wave Group at the University of Iowa, the Space Sector at the Johns Hopkins Applied Physics Lab (APL), and the tsds.org site maintained at George Mason University. Over the next year, the adoption of a uniform way to access time series data is expected to significantly enhance interoperability within the Heliophysics data environment. https://github.com/hapi-server/data-specification

  1. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors.

    PubMed

    Martin, Shawn; Pratt, Harry D; Anderson, Travis M

    2017-07-01

    We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18 th , 2014. The dataset consists of 4,329 measurements taken from 165 ILs made up of 72 cations and 34 anions. We benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. EnviroAtlas - Austin, TX - Riparian Buffer Land Cover by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of forested, vegetated, and impervious land within 15- and 50-meters of hydrologically connected streams, rivers, and other water bodies within the EnviroAtlas community area. Forest is defined as Trees & Forest. Vegetated cover is defined as Trees & Forest and Grass & Herbaceous. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  3. EnviroAtlas - Austin, TX - Estimated Percent Green Space Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates green space along walkable roads. Green space within 25 meters of the road centerline is included and the percentage is based on the total area between street intersections. Green space provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  4. EnviroAtlas - Austin, TX - Park Access by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset shows the block group population that is within and beyond an easy walking distance (500m) of a park entrance. Park entrances were included in this analysis if they were within 5km of the EnviroAtlas community boundary. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  5. EnviroAtlas - Austin, TX - Historic Places by Census Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset portrays the total number of historic places located within each Census Block Group (CBG). The historic places data were compiled from the National Register of Historic Places, which provides official federal lists of districts, sites, buildings, structures and objects significant to American history, architecture, archeology, engineering, and culture.This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  6. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors

    DOE PAGES

    Martin, Shawn; Pratt, III, Harry D.; Anderson, Travis M.

    2017-02-21

    We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18th, 2014. The dataset consists of 4,329 measurements taken from 165 ILs made upmore » of 72 cations and 34 anions. In conclusion, we benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset.« less

  7. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, Shawn; Pratt, III, Harry D.; Anderson, Travis M.

    We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18th, 2014. The dataset consists of 4,329 measurements taken from 165 ILs made upmore » of 72 cations and 34 anions. In conclusion, we benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset.« less

  8. Two is better than one: Physical interactions improve motor performance in humans

    NASA Astrophysics Data System (ADS)

    Ganesh, G.; Takagi, A.; Osu, R.; Yoshioka, T.; Kawato, M.; Burdet, E.

    2014-01-01

    How do physical interactions with others change our own motor behavior? Utilizing a novel motor learning paradigm in which the hands of two - individuals are physically connected without their conscious awareness, we investigated how the interaction forces from a partner adapt the motor behavior in physically interacting humans. We observed the motor adaptations during physical interactions to be mutually beneficial such that both the worse and better of the interacting partners improve motor performance during and after interactive practice. We show that these benefits cannot be explained by multi-sensory integration by an individual, but require physical interaction with a reactive partner. Furthermore, the benefits are determined by both the interacting partner's performance and similarity of the partner's behavior to one's own. Our results demonstrate the fundamental neural processes underlying human physical interactions and suggest advantages of interactive paradigms for sport-training and physical rehabilitation.

  9. A micro X-ray computed tomography dataset of South African hermit crabs (Crustacea: Decapoda: Anomura: Paguroidea) containing scans of two rare specimens and three recently described species.

    PubMed

    Landschoff, Jannes; Du Plessis, Anton; Griffiths, Charles L

    2018-04-01

    Along with the conventional deposition of physical types at natural history museums, the deposition of 3-dimensional (3D) image data has been proposed for rare and valuable museum specimens, such as irreplaceable type material. Micro computed tomography (μCT) scan data of 5 hermit crab species from South Africa, including rare specimens and type material, depicted main identification characteristics of calcified body parts. However, low-image contrasts, especially in larger (>50 mm total length) specimens, did not allow sufficient 3D reconstructions of weakly calcified and fine characteristics, such as soft tissue of the pleon, mouthparts, gills, and setation. Reconstructions of soft tissue were sometimes possible, depending on individual sample and scanning characteristics. The raw data of seven scans are publicly available for download from the GigaDB repository. Calcified body parts visualized from μCT data can aid taxonomic validation and provide additional, virtual deposition of rare specimens. The use of a nondestructive, nonstaining μCT approach for taxonomy, reconstructions of soft tissue structures, microscopic spines, and setae depend on species characteristics. Constrained to these limitations, the presented dataset can be used for future morphological studies. However, our virtual specimens will be most valuable to taxonomists who can download a digital avatar for 3D examination. Simultaneously, in the event of physical damage to or loss of the original physical specimen, this dataset serves as a vital insurance policy.

  10. Application of the Interacting Quantum Atoms Approach to the S66 and Ionic-Hydrogen-Bond Datasets for Noncovalent Interactions.

    PubMed

    Suárez, Dimas; Díaz, Natalia; Francisco, Evelio; Martín Pendás, Angel

    2018-04-17

    The interacting quantum atoms (IQA) method can assess, systematically and in great detail, the strength and physics of both covalent and noncovalent interactions. The lack of a pair density in density functional theory (DFT), which precludes the direct IQA decomposition of the characteristic exchange-correlation energy, has been recently overcome by means of a scaling technique, which can largely expand the applicability of the method. To better assess the utility of the augmented IQA methodology to derive quantum chemical decompositions at the atomic and molecular levels, we report the results of Hartree-Fock (HF) and DFT calculations on the complexes included in the S66 and the ionic H-bond databases of benchmark geometry and binding energies. For all structures, we perform single-point and geometry optimizations using HF and selected DFT methods with triple-ζ basis sets followed by full IQA calculations. Pairwise dispersion energies are accounted for by the D3 method. We analyze the goodness of the HF-D3 and DFT-D3 binding energies, the magnitude of numerical errors, the fragment and atomic distribution of formation energies, etc. It is shown that fragment-based IQA decomposes the formation energies in comparable terms to those of perturbative approaches and that the atomic IQA energies hold the promise of rigorously quantifying atomic and group energy contributions in larger biomolecular systems. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Diversity of gastrointestinal helminths in Dall's sheep and the negative association of the abomasal nematode, Marshallagia marshalli, with fitness indicators

    USDA-ARS?s Scientific Manuscript database

    Gastrointestinal helminths can have a detrimental effect on the fitness of wild ungulates. Arctic and Subarctic ecosystems are ideal for the study of host-parasite interactions due to the comparatively simple ecological interactions and limited confounding factors. We used a unique dataset collected...

  12. Collaborative volume visualization with applications to underwater acoustic signal processing

    NASA Astrophysics Data System (ADS)

    Jarvis, Susan; Shane, Richard T.

    2000-08-01

    Distributed collaborative visualization systems represent a technology whose time has come. Researchers at the Fraunhofer Center for Research in Computer Graphics have been working in the areas of collaborative environments and high-end visualization systems for several years. The medical application. TeleInVivo, is an example of a system which marries visualization and collaboration. With TeleInvivo, users can exchange and collaboratively interact with volumetric data sets in geographically distributed locations. Since examination of many physical phenomena produce data that are naturally volumetric, the visualization frameworks used by TeleInVivo have been extended for non-medical applications. The system can now be made compatible with almost any dataset that can be expressed in terms of magnitudes within a 3D grid. Coupled with advances in telecommunications, telecollaborative visualization is now possible virtually anywhere. Expert data quality assurance and analysis can occur remotely and interactively without having to send all the experts into the field. Building upon this point-to-point concept of collaborative visualization, one can envision a larger pooling of resources to form a large overview of a region of interest from contributions of numerous distributed members.

  13. Brain-inspired cheminformatics of drug-target brain interactome, synthesis, and assay of TVP1022 derivatives.

    PubMed

    Romero-Durán, Francisco J; Alonso, Nerea; Yañez, Matilde; Caamaño, Olga; García-Mera, Xerardo; González-Díaz, Humberto

    2016-04-01

    The use of Cheminformatics tools is gaining importance in the field of translational research from Medicinal Chemistry to Neuropharmacology. In particular, we need it for the analysis of chemical information on large datasets of bioactive compounds. These compounds form large multi-target complex networks (drug-target interactome network) resulting in a very challenging data analysis problem. Artificial Neural Network (ANN) algorithms may help us predict the interactions of drugs and targets in CNS interactome. In this work, we trained different ANN models able to predict a large number of drug-target interactions. These models predict a dataset of thousands of interactions of central nervous system (CNS) drugs characterized by > 30 different experimental measures in >400 different experimental protocols for >150 molecular and cellular targets present in 11 different organisms (including human). The model was able to classify cases of non-interacting vs. interacting drug-target pairs with satisfactory performance. A second aim focus on two main directions: the synthesis and assay of new derivatives of TVP1022 (S-analogues of rasagiline) and the comparison with other rasagiline derivatives recently reported. Finally, we used the best of our models to predict drug-target interactions for the best new synthesized compound against a large number of CNS protein targets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web.

    PubMed

    Probst, Daniel; Reymond, Jean-Louis

    2018-04-15

    During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing and summarizing enormous amounts of data, the results do not allow for a visual inspection of the entire data. Current scientific software, including R packages and Python libraries such as ggplot2, matplotlib and plot.ly, do not support interactive visualizations of datasets exceeding 100 000 data points on the web. Other solutions enable the web-based visualization of big data only through data reduction or statistical representations. However, recent hardware developments, especially advancements in graphical processing units, allow for the rendering of millions of data points on a wide range of consumer hardware such as laptops, tablets and mobile phones. Similar to the challenges and opportunities brought to virtually every scientific field by big data, both the visualization of and interaction with copious amounts of data are both demanding and hold great promise. Here we present FUn, a framework consisting of a client (Faerun) and server (Underdark) module, facilitating the creation of web-based, interactive 3D visualizations of large datasets, enabling record level visual inspection. We also introduce a reference implementation providing access to SureChEMBL, a database containing patent information on more than 17 million chemical compounds. The source code and the most recent builds of Faerun and Underdark, Lore.js and the data preprocessing toolchain used in the reference implementation, are available on the project website (http://doc.gdb.tools/fun/). daniel.probst@dcb.unibe.ch or jean-louis.reymond@dcb.unibe.ch.

  15. Active learning for clinical text classification: is it better than random sampling?

    PubMed

    Figueroa, Rosa L; Zeng-Treitler, Qing; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.

  16. Active learning for clinical text classification: is it better than random sampling?

    PubMed Central

    Figueroa, Rosa L; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    Objective This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Design Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Measurements Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. Results The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. Conclusion For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty. PMID:22707743

  17. Building a better search engine for earth science data

    NASA Astrophysics Data System (ADS)

    Armstrong, E. M.; Yang, C. P.; Moroni, D. F.; McGibbney, L. J.; Jiang, Y.; Huang, T.; Greguska, F. R., III; Li, Y.; Finch, C. J.

    2017-12-01

    Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search interfaces.

  18. Monitoring and long-term assessment of the Mediterranean Sea physical state

    NASA Astrophysics Data System (ADS)

    Simoncelli, Simona; Fratianni, Claudia; Clementi, Emanuela; Drudi, Massimiliano; Pistoia, Jenny; Grandi, Alessandro; Del Rosso, Damiano

    2017-04-01

    The near real time monitoring and long-term assessment of the physical state of the ocean are crucial for the wide CMEMS user community providing a continuous and up to date overview of key indicators computed from operational analysis and reanalysis datasets. This constitutes an operational warning system on particular events, stimulating the research towards a deeper understanding of them and consequently increasing CMEMS products uptake. Ocean Monitoring Indicators (OMIs) of some Essential Ocean Variables have been identified and developed by the Mediterranean Monitoring and Forecasting Centre (MED-MFC) under the umbrella of the CMEMS MYP WG (Multi Year Products Working Group). These OMIs have been operationally implemented starting from the physical reanalysis products and then they have been applied to the operational analyses product. Sea surface temperature, salinity, height as well as heat, water and momentum fluxes at the air-sea interface have been operationally implemented since the reanalysis system development as a real time monitoring of the data production. Their consistency analysis against available observational products or budget values recognized in literature guarantees the high quality of the numerical dataset. The results of the reanalysis validation procedures are yearly published in the QUality Information Document since 2014 available through the CMEMS catalogue (http://marine.copernicus.eu), together with the yearly dataset extension. New OMIs of the winter mixed layer depth, the eddy kinetic energy and the heat content will be presented, in particular we will analyze their time evolution and trends starting from 1987, then we will focus on the recent time period 2013-2016 when reanalysis and analyses datasets overlap to show their consistency beside their different system implementation (i.e. atmospheric forcing, wave coupling, nesting). At the end the focus will be on 2016 sea state and circulation of the Mediterranean Sea and its anomaly with respect to the climatological fields to early detect the 2016 peculiarities.

  19. Interacting with Petabytes of Earth Science Data using Jupyter Notebooks, IPython Widgets and Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Erickson, T. A.; Granger, B.; Grout, J.; Corlay, S.

    2017-12-01

    The volume of Earth science data gathered from satellites, aircraft, drones, and field instruments continues to increase. For many scientific questions in the Earth sciences, managing this large volume of data is a barrier to progress, as it is difficult to explore and analyze large volumes of data using the traditional paradigm of downloading datasets to a local computer for analysis. Furthermore, methods for communicating Earth science algorithms that operate on large datasets in an easily understandable and reproducible way are needed. Here we describe a system for developing, interacting, and sharing well-documented Earth Science algorithms that combines existing software components: Jupyter Notebook: An open-source, web-based environment that supports documents that combine code and computational results with text narrative, mathematics, images, and other media. These notebooks provide an environment for interactive exploration of data and development of well documented algorithms. Jupyter Widgets / ipyleaflet: An architecture for creating interactive user interface controls (such as sliders, text boxes, etc.) in Jupyter Notebooks that communicate with Python code. This architecture includes a default set of UI controls (sliders, dropboxes, etc.) as well as APIs for building custom UI controls. The ipyleaflet project is one example that offers a custom interactive map control that allows a user to display and manipulate geographic data within the Jupyter Notebook. Google Earth Engine: A cloud-based geospatial analysis platform that provides access to petabytes of Earth science data via a Python API. The combination of Jupyter Notebooks, Jupyter Widgets, ipyleaflet, and Google Earth Engine makes it possible to explore and analyze massive Earth science datasets via a web browser, in an environment suitable for interactive exploration, teaching, and sharing. Using these environments can make Earth science analyses easier to understand and reproducible, which may increase the rate of scientific discoveries and the transition of discoveries into real-world impacts.

  20. Reconstructing the dark sector interaction with LISA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cai, Rong-Gen; Yang, Tao; Tamanini, Nicola, E-mail: cairg@itp.ac.cn, E-mail: nicola.tamanini@cea.fr, E-mail: yangtao@itp.ac.cn

    We perform a forecast analysis of the ability of the LISA space-based interferometer to reconstruct the dark sector interaction using gravitational wave standard sirens at high redshift. We employ Gaussian process methods to reconstruct the distance-redshift relation in a model independent way. We adopt simulated catalogues of standard sirens given by merging massive black hole binaries visible by LISA, with an electromagnetic counterpart detectable by future telescopes. The catalogues are based on three different astrophysical scenarios for the evolution of massive black hole mergers based on the semi-analytic model of E. Barausse, Mon. Not. Roy. Astron. Soc. 423 (2012) 2533.more » We first use these standard siren datasets to assess the potential of LISA in reconstructing a possible interaction between vacuum dark energy and dark matter. Then we combine the LISA cosmological data with supernovae data simulated for the Dark Energy Survey. We consider two scenarios distinguished by the time duration of the LISA mission: 5 and 10 years. Using only LISA standard siren data, the dark sector interaction can be well reconstructed from redshift z ∼1 to z ∼3 (for a 5 years mission) and z ∼1 up to z ∼5 (for a 10 years mission), though the reconstruction is inefficient at lower redshift. When combined with the DES datasets, the interaction is well reconstructed in the whole redshift region from 0 z ∼ to z ∼3 (5 yr) and z ∼0 to z ∼5 (10 yr), respectively. Massive black hole binary standard sirens can thus be used to constrain the dark sector interaction at redshift ranges not reachable by usual supernovae datasets which probe only the z ∼< 1.5 range. Gravitational wave standard sirens will not only constitute a complementary and alternative way, with respect to familiar electromagnetic observations, to probe the cosmic expansion, but will also provide new tests to constrain possible deviations from the standard ΛCDM dynamics, especially at high redshift.« less

  1. A Bayesian network approach to predicting nest presence of thefederally-threatened piping plover (Charadrius melodus) using barrier island features

    USGS Publications Warehouse

    Gieder, Katherina D.; Karpanty, Sarah M.; Fraser, James D.; Catlin, Daniel H.; Gutierrez, Benjamin T.; Plant, Nathaniel G.; Turecek, Aaron M.; Thieler, E. Robert

    2014-01-01

    Sea-level rise and human development pose significant threats to shorebirds, particularly for species that utilize barrier island habitat. The piping plover (Charadrius melodus) is a federally-listed shorebird that nests on barrier islands and rapidly responds to changes in its physical environment, making it an excellent species with which to model how shorebird species may respond to habitat change related to sea-level rise and human development. The uncertainty and complexity in predicting sea-level rise, the responses of barrier island habitats to sea-level rise, and the responses of species to sea-level rise and human development necessitate a modelling approach that can link species to the physical habitat features that will be altered by changes in sea level and human development. We used a Bayesian network framework to develop a model that links piping plover nest presence to the physical features of their nesting habitat on a barrier island that is impacted by sea-level rise and human development, using three years of data (1999, 2002, and 2008) from Assateague Island National Seashore in Maryland. Our model performance results showed that we were able to successfully predict nest presence given a wide range of physical conditions within the model’s dataset. We found that model predictions were more successful when the range of physical conditions included in model development was varied rather than when those physical conditions were narrow. We also found that all model predictions had fewer false negatives (nests predicted to be absent when they were actually present in the dataset) than false positives (nests predicted to be present when they were actually absent in the dataset), indicating that our model correctly predicted nest presence better than nest absence. These results indicated that our approach of using a Bayesian network to link specific physical features to nest presence will be useful for modelling impacts of sea-level rise- or human-related habitat change on barrier islands. We recommend that potential users of this method utilize multiple years of data that represent a wide range of physical conditions in model development, because the model performed less well when constructed using a narrow range of physical conditions. Further, given that there will always be some uncertainty in predictions of future physical habitat conditions related to sea-level rise and/or human development, predictive models will perform best when developed using multiple, varied years of data input.

  2. A Modified Active Appearance Model Based on an Adaptive Artificial Bee Colony

    PubMed Central

    Othman, Zulaiha Ali

    2014-01-01

    Active appearance model (AAM) is one of the most popular model-based approaches that have been extensively used to extract features by highly accurate modeling of human faces under various physical and environmental circumstances. However, in such active appearance model, fitting the model with original image is a challenging task. State of the art shows that optimization method is applicable to resolve this problem. However, another common problem is applying optimization. Hence, in this paper we propose an AAM based face recognition technique, which is capable of resolving the fitting problem of AAM by introducing a new adaptive ABC algorithm. The adaptation increases the efficiency of fitting as against the conventional ABC algorithm. We have used three datasets: CASIA dataset, property 2.5D face dataset, and UBIRIS v1 images dataset in our experiments. The results have revealed that the proposed face recognition technique has performed effectively, in terms of accuracy of face recognition. PMID:25165748

  3. Preliminary Geologic Map of the Laredo, Crystal City-Eagle Pass, San Antonio, and Del Rio 1 x 2 Quadrangles, Texas, and the Nuevo Laredo, Ciudad Acuna, Piedras Negras, and Nueva Rosita 1 x 2 Quadrangles, Mexico

    USGS Publications Warehouse

    Page, William R.; Berry, Margaret E.; VanSistine, D. Paco; Snyders, Scott R.

    2009-01-01

    The purpose of this map is to provide an integrated, bi-national geologic map dataset for display and analyses on an Arc Internet Map Service (IMS) dedicated to environmental health studies in the United States-Mexico border region. The IMS web site was designed by the US-Mexico Border Environmental Health Initiative project and collaborators, and the IMS and project web site address is http://borderhealth.cr.usgs.gov/. The objective of the project is to acquire, evaluate, analyze, and provide earth, biologic, and human health resources data within a GIS framework (IMS) to further our understanding of possible linkages between the physical environment and public health issues. The geologic map dataset is just one of many datasets included in the web site; other datasets include biologic, hydrologic, geographic, and human health themes.

  4. A dataset on human navigation strategies in foreign networked systems.

    PubMed

    Kőrösi, Attila; Csoma, Attila; Rétvári, Gábor; Heszberger, Zalán; Bíró, József; Tapolcai, János; Pelle, István; Klajbár, Dávid; Novák, Márton; Halasi, Valentina; Gulyás, András

    2018-03-13

    Humans are involved in various real-life networked systems. The most obvious examples are social and collaboration networks but the language and the related mental lexicon they use, or the physical map of their territory can also be interpreted as networks. How do they find paths between endpoints in these networks? How do they obtain information about a foreign networked world they find themselves in, how they build mental model for it and how well they succeed in using it? Large, open datasets allowing the exploration of such questions are hard to find. Here we report a dataset collected by a smartphone application, in which players navigate between fixed length source and destination English words step-by-step by changing only one letter at a time. The paths reflect how the players master their navigation skills in such a foreign networked world. The dataset can be used in the study of human mental models for the world around us, or in a broader scope to investigate the navigation strategies in complex networked systems.

  5. Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets.

    PubMed

    Paxton, Alexandra; Griffiths, Thomas L

    2017-10-01

    Today, people generate and store more data than ever before as they interact with both real and virtual environments. These digital traces of behavior and cognition offer cognitive scientists and psychologists an unprecedented opportunity to test theories outside the laboratory. Despite general excitement about big data and naturally occurring datasets among researchers, three "gaps" stand in the way of their wider adoption in theory-driven research: the imagination gap, the skills gap, and the culture gap. We outline an approach to bridging these three gaps while respecting our responsibilities to the public as participants in and consumers of the resulting research. To that end, we introduce Data on the Mind ( http://www.dataonthemind.org ), a community-focused initiative aimed at meeting the unprecedented challenges and opportunities of theory-driven research with big data and naturally occurring datasets. We argue that big data and naturally occurring datasets are most powerfully used to supplement-not supplant-traditional experimental paradigms in order to understand human behavior and cognition, and we highlight emerging ethical issues related to the collection, sharing, and use of these powerful datasets.

  6. A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research

    PubMed Central

    Rinchai, Darawan; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Chaussabel, Damien

    2016-01-01

    Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp. PMID:27158452

  7. EnviroAtlas - Portland, ME - Land Cover by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, wetland, and agriculture. Impervious is a combination of dark and light impervious. Forest is combination of trees and forest and woody wetlands. Green space is a combination of trees and forest, grass and herbaceous, agriculture, woody wetlands, and emergent wetlands. Wetlands includes both Woody and Emergent Wetlands. This dataset also includes the area per capita for each block group for impervious, forest, and green space land cover. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  8. Constraints on secret neutrino interactions after Planck

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Forastieri, Francesco; Lattanzi, Massimiliano; Natoli, Paolo, E-mail: francesco.forastieri@unife.it, E-mail: lattanzi@fe.infn.it, E-mail: natoli@fe.infn.it

    Neutrino interactions beyond the standard model of particle physics may affect the cosmological evolution and can be constrained through observations. We consider the possibility that neutrinos possess secret scalar or pseudoscalar interactions mediated by the Nambu-Goldstone boson of a still unknown spontaneously broken global U(1) symmetry, as in, e.g., Majoron models. In such scenarios, neutrinos still decouple at T≅ 1 MeV, but become tightly coupled again (''recouple'') at later stages of the cosmological evolution. We use available observations of the cosmic microwave background (CMB) anisotropies, including Planck 2013 and the joint BICEP2/Planck 2015 data, to derive constraints on the quantity γ{submore » νν}{sup 4}, parameterizing the neutrino collision rate due to scalar or pseudoscalar interactions. We consider both a minimal extension of the standard ΛCDM model, and more complicated scenarios with extra relativistic degrees of freedom or non-vanishing tensor amplitude. For a wide range of dataset and model combinations, we find a typical constraint γ{sub νν}{sup 4} ∼< 0.9× 10{sup −27} (95% C.L.), implying an upper limit on the redshift z{sub νrec} of neutrino recoupling 0∼< 850, leaving open the possibility that the latter occured well before hydrogen recombination. In the framework of Majoron models, the upper limit on γ{sub νν} roughly translates on a constraint g ∼< 8.2× 10{sup −7} on the Majoron-neutrino coupling constant g. In general, the data show a weak (∼ 1σ) but intriguing preference for non-zero values of γ{sub νν}{sup 4}, with best fits in the range γ{sub νν}{sup 4} = (0.15–0.35)× 10{sup −27}, depending on the particular dataset. This is more evident when either high-resolution CMB observations from the ACT and SPT experiments are included, or the possibility of non-vanishing tensor modes is considered. In particular, for the minimal model ΛCDM+γ{sub νν} and including the Planck 2013, ACT and SPT data, we report γ{sub νν}{sup 4}=(0.44{sup +0.17}{sub −0.36})×10{sup −27} (0300 ∼< z{sub νrec} ∼< 550) at 68% confidence level.« less

  9. Learning a peptide-protein binding affinity predictor with kernel ridge regression

    PubMed Central

    2013-01-01

    Background The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. Results We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. Conclusion On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/. PMID:23497081

  10. Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model

    PubMed Central

    An, Ji‐Yong; Meng, Fan‐Rong; Chen, Xing; Yan, Gui‐Ying; Hu, Ji‐Pu

    2016-01-01

    Abstract Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. PMID:27452983

  11. ConnectViz: Accelerated Approach for Brain Structural Connectivity Using Delaunay Triangulation.

    PubMed

    Adeshina, A M; Hashim, R

    2016-03-01

    Stroke is a cardiovascular disease with high mortality and long-term disability in the world. Normal functioning of the brain is dependent on the adequate supply of oxygen and nutrients to the brain complex network through the blood vessels. Stroke, occasionally a hemorrhagic stroke, ischemia or other blood vessel dysfunctions can affect patients during a cerebrovascular incident. Structurally, the left and the right carotid arteries, and the right and the left vertebral arteries are responsible for supplying blood to the brain, scalp and the face. However, a number of impairment in the function of the frontal lobes may occur as a result of any decrease in the flow of the blood through one of the internal carotid arteries. Such impairment commonly results in numbness, weakness or paralysis. Recently, the concepts of brain's wiring representation, the connectome, was introduced. However, construction and visualization of such brain network requires tremendous computation. Consequently, previously proposed approaches have been identified with common problems of high memory consumption and slow execution. Furthermore, interactivity in the previously proposed frameworks for brain network is also an outstanding issue. This study proposes an accelerated approach for brain connectomic visualization based on graph theory paradigm using compute unified device architecture, extending the previously proposed SurLens Visualization and computer aided hepatocellular carcinoma frameworks. The accelerated brain structural connectivity framework was evaluated with stripped brain datasets from the Department of Surgery, University of North Carolina, Chapel Hill, USA. Significantly, our proposed framework is able to generate and extract points and edges of datasets, displays nodes and edges in the datasets in form of a network and clearly maps data volume to the corresponding brain surface. Moreover, with the framework, surfaces of the dataset were simultaneously displayed with the nodes and the edges. The framework is very efficient in providing greater interactivity as a way of representing the nodes and the edges intuitively, all achieved at a considerably interactive speed for instantaneous mapping of the datasets' features. Uniquely, the connectomic algorithm performed remarkably fast with normal hardware requirement specifications.

  12. ConnectViz: Accelerated approach for brain structural connectivity using Delaunay triangulation.

    PubMed

    Adeshina, A M; Hashim, R

    2015-02-06

    Stroke is a cardiovascular disease with high mortality and long-term disability in the world. Normal functioning of the brain is dependent on the adequate supply of oxygen and nutrients to the brain complex network through the blood vessels. Stroke, occasionally a hemorrhagic stroke, ischemia or other blood vessel dysfunctions can affect patients during a cerebrovascular incident. Structurally, the left and the right carotid arteries, and the right and the left vertebral arteries are responsible for supplying blood to the brain, scalp and the face. However, a number of impairment in the function of the frontal lobes may occur as a result of any decrease in the flow of the blood through one of the internal carotid arteries. Such impairment commonly results in numbness, weakness or paralysis. Recently, the concepts of brain's wiring representation, the connectome, was introduced. However, construction and visualization of such brain network requires tremendous computation. Consequently, previously proposed approaches have been identified with common problems of high memory consumption and slow execution. Furthermore, interactivity in the previously proposed frameworks for brain network is also an outstanding issue. This study proposes an accelerated approach for brain connectomic visualization based on graph theory paradigm using Compute Unified Device Architecture (CUDA), extending the previously proposed SurLens Visualization and Computer Aided Hepatocellular Carcinoma (CAHECA) frameworks. The accelerated brain structural connectivity framework was evaluated with stripped brain datasets from the Department of Surgery, University of North Carolina, Chapel Hill, United States. Significantly, our proposed framework is able to generates and extracts points and edges of datasets, displays nodes and edges in the datasets in form of a network and clearly maps data volume to the corresponding brain surface. Moreover, with the framework, surfaces of the dataset were simultaneously displayed with the nodes and the edges. The framework is very efficient in providing greater interactivity as a way of representing the nodes and the edges intuitively, all achieved at a considerably interactive speed for instantaneous mapping of the datasets' features. Uniquely, the connectomic algorithm performed remarkably fast with normal hardware requirement specifications.

  13. Data Bookkeeping Service 3 - Providing Event Metadata in CMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giffels, Manuel; Guo, Y.; Riley, Daniel

    The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about 200, 000 datasets and more than 40 million files, which adds up in around 700 GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems [1], all kind of data-processing like Monte Carlo production,more » processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.« less

  14. XRD and spectral dataset of the UV-A stable nanotubes of 3,5-bis(trifluoromethyl)benzylamine derivative of tyrosine.

    PubMed

    Govindhan, R; Karthikeyan, B

    2017-10-01

    The data presented in this article are related to the research entitled of UV-A stable nanotubes. The nanotubes have been prepared from 3,5-bis(trifluoromethyl)benzylamine derivative of tyrosine (BTTP). XRD data reveals the size of the nanotubes. As-synthesized nanotubes (BTTPNTs) are characterized by UV-vis optical absorption studies [1] and photo physical degradation kinetics. The resulted dataset is made available to enable critical or extended analyzes of the BTTPNTs as an excellent light resistive materials.

  15. The influence of data characteristics on detecting wetland/stream surface-water connections in the Delmarva Peninsula, Maryland and Delaware

    USGS Publications Warehouse

    Vanderhoof, Melanie; Distler, Hayley; Lang, Megan W.; Alexander, Laurie C.

    2018-01-01

    The dependence of downstream waters on upstream ecosystems necessitates an improved understanding of watershed-scale hydrological interactions including connections between wetlands and streams. An evaluation of such connections is challenging when, (1) accurate and complete datasets of wetland and stream locations are often not available and (2) natural variability in surface-water extent influences the frequency and duration of wetland/stream connectivity. The Upper Choptank River watershed on the Delmarva Peninsula in eastern Maryland and Delaware is dominated by a high density of small, forested wetlands. In this analysis, wetland/stream surface water connections were quantified using multiple wetland and stream datasets, including headwater streams and depressions mapped from a lidar-derived digital elevation model. Surface-water extent was mapped across the watershed for spring 2015 using Landsat-8, Radarsat-2 and Worldview-3 imagery. The frequency of wetland/stream connections increased as a more complete and accurate stream dataset was used and surface-water extent was included, in particular when the spatial resolution of the imagery was finer (i.e., <10 m). Depending on the datasets used, 12–60% of wetlands by count (21–93% of wetlands by area) experienced surface-water interactions with streams during spring 2015. This translated into a range of 50–94% of the watershed contributing direct surface water runoff to streamflow. This finding suggests that our interpretation of the frequency and duration of wetland/stream connections will be influenced not only by the spatial and temporal characteristics of wetlands, streams and potential flowpaths, but also by the completeness, accuracy and resolution of input datasets.

  16. Testing charged current quasi-elastic and multinucleon interaction models in the NEUT neutrino interaction generator with published datasets from the MiniBooNE and MINERνA experiments

    NASA Astrophysics Data System (ADS)

    Wilkinson, C.; Terri, R.; Andreopoulos, C.; Bercellie, A.; Bronner, C.; Cartwright, S.; de Perio, P.; Dobson, J.; Duffy, K.; Furmanski, A. P.; Haegel, L.; Hayato, Y.; Kaboth, A.; Mahn, K.; McFarland, K. S.; Nowak, J.; Redij, A.; Rodrigues, P.; Sánchez, F.; Schwehr, J. D.; Sinclair, P.; Sobczyk, J. T.; Stamoulis, P.; Stowell, P.; Tacik, R.; Thompson, L.; Tobayama, S.; Wascko, M. O.; Żmuda, J.

    2016-04-01

    There has been a great deal of theoretical work on sophisticated charged current quasi-elastic (CCQE) neutrino interaction models in recent years, prompted by a number of experimental results that measured unexpectedly large CCQE cross sections on nuclear targets. As the dominant interaction mode at T2K energies, and the signal process in oscillation analyses, it is important for the T2K experiment to include realistic CCQE cross section uncertainties in T2K analyses. To this end, T2K's Neutrino Interaction Working Group has implemented a number of recent models in NEUT, T2K's primary neutrino interaction event generator. In this paper, we give an overview of the models implemented and present fits to published νμ and ν¯ μ CCQE cross section measurements from the MiniBooNE and MINER ν A experiments. The results of the fits are used to select a default cross section model for future T2K analyses and to constrain the cross section uncertainties of the model. We find strong tension between datasets for all models investigated. Among the evaluated models, the combination of a modified relativistic Fermi gas with multinucleon CCQE-like interactions gives the most consistent description of the available data.

  17. Crowdsourcing Physical Network Topology Mapping With Net.Tagger

    DTIC Science & Technology

    2016-03-01

    backend server infrastructure . This in- cludes a full security audit, better web services handling, and integration with the OSM stack and dataset to...a novel approach to network infrastructure mapping that combines smartphone apps with crowdsourced collection to gather data for offline aggregation...and analysis. The project aims to build a map of physical network infrastructure such as fiber-optic cables, facilities, and access points. The

  18. Prediction of protein interaction hot spots using rough set-based multiple criteria linear programming.

    PubMed

    Chen, Ruoying; Zhang, Zhiwang; Wu, Di; Zhang, Peng; Zhang, Xinyang; Wang, Yong; Shi, Yong

    2011-01-21

    Protein-protein interactions are fundamentally important in many biological processes and it is in pressing need to understand the principles of protein-protein interactions. Mutagenesis studies have found that only a small fraction of surface residues, known as hot spots, are responsible for the physical binding in protein complexes. However, revealing hot spots by mutagenesis experiments are usually time consuming and expensive. In order to complement the experimental efforts, we propose a new computational approach in this paper to predict hot spots. Our method, Rough Set-based Multiple Criteria Linear Programming (RS-MCLP), integrates rough sets theory and multiple criteria linear programming to choose dominant features and computationally predict hot spots. Our approach is benchmarked by a dataset of 904 alanine-mutated residues and the results show that our RS-MCLP method performs better than other methods, e.g., MCLP, Decision Tree, Bayes Net, and the existing HotSprint database. In addition, we reveal several biological insights based on our analysis. We find that four features (the change of accessible surface area, percentage of the change of accessible surface area, size of a residue, and atomic contacts) are critical in predicting hot spots. Furthermore, we find that three residues (Tyr, Trp, and Phe) are abundant in hot spots through analyzing the distribution of amino acids. Copyright © 2010 Elsevier Ltd. All rights reserved.

  19. 3Drefine: an interactive web server for efficient protein structure refinement.

    PubMed

    Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin

    2016-07-08

    3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. IntellEditS: intelligent learning-based editor of segmentations.

    PubMed

    Harrison, Adam P; Birkbeck, Neil; Sofka, Michal

    2013-01-01

    Automatic segmentation techniques, despite demonstrating excellent overall accuracy, can often produce inaccuracies in local regions. As a result, correcting segmentations remains an important task that is often laborious, especially when done manually for 3D datasets. This work presents a powerful tool called Intelligent Learning-Based Editor of Segmentations (IntellEditS) that minimizes user effort and further improves segmentation accuracy. The tool partners interactive learning with an energy-minimization approach to editing. Based on interactive user input, a discriminative classifier is trained and applied to the edited 3D region to produce soft voxel labeling. The labels are integrated into a novel energy functional along with the existing segmentation and image data. Unlike the state of the art, IntellEditS is designed to correct segmentation results represented not only as masks but also as meshes. In addition, IntellEditS accepts intuitive boundary-based user interactions. The versatility and performance of IntellEditS are demonstrated on both MRI and CT datasets consisting of varied anatomical structures and resolutions.

  1. Parallel Rendering of Large Time-Varying Volume Data

    NASA Technical Reports Server (NTRS)

    Garbutt, Alexander E.

    2005-01-01

    Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.

  2. Passing messages between biological networks to refine predicted interactions.

    PubMed

    Glass, Kimberly; Huttenhower, Curtis; Quackenbush, John; Yuan, Guo-Cheng

    2013-01-01

    Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net.

  3. Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series.

    PubMed

    Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina

    2015-01-01

    Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.

  4. Hierarchical modeling of molecular energies using a deep neural network

    NASA Astrophysics Data System (ADS)

    Lubbers, Nicholas; Smith, Justin S.; Barros, Kipton

    2018-06-01

    We introduce the Hierarchically Interacting Particle Neural Network (HIP-NN) to model molecular properties from datasets of quantum calculations. Inspired by a many-body expansion, HIP-NN decomposes properties, such as energy, as a sum over hierarchical terms. These terms are generated from a neural network—a composition of many nonlinear transformations—acting on a representation of the molecule. HIP-NN achieves the state-of-the-art performance on a dataset of 131k ground state organic molecules and predicts energies with 0.26 kcal/mol mean absolute error. With minimal tuning, our model is also competitive on a dataset of molecular dynamics trajectories. In addition to enabling accurate energy predictions, the hierarchical structure of HIP-NN helps to identify regions of model uncertainty.

  5. ROS-based ground stereo vision detection: implementation and experiments.

    PubMed

    Hu, Tianjiang; Zhao, Boxin; Tang, Dengqing; Zhang, Daibing; Kong, Weiwei; Shen, Lincheng

    This article concentrates on open-source implementation on flying object detection in cluttered scenes. It is of significance for ground stereo-aided autonomous landing of unmanned aerial vehicles. The ground stereo vision guidance system is presented with details on system architecture and workflow. The Chan-Vese detection algorithm is further considered and implemented in the robot operating systems (ROS) environment. A data-driven interactive scheme is developed to collect datasets for parameter tuning and performance evaluating. The flying vehicle outdoor experiments capture the stereo sequential images dataset and record the simultaneous data from pan-and-tilt unit, onboard sensors and differential GPS. Experimental results by using the collected dataset validate the effectiveness of the published ROS-based detection algorithm.

  6. Persistent identifiers for CMIP6 data in the Earth System Grid Federation

    NASA Astrophysics Data System (ADS)

    Buurman, Merret; Weigel, Tobias; Juckes, Martin; Lautenschlager, Michael; Kindermann, Stephan

    2016-04-01

    The Earth System Grid Federation (ESGF) is a distributed data infrastructure that will provide access to the CMIP6 experiment data. The data consist of thousands of datasets composed of millions of files. Over the course of the CMIP6 operational phase, datasets may be retracted and replaced by newer versions that consist of completely or partly new files. Each dataset is hosted at a single data centre, but can have one or several backups (replicas) at other data centres. To keep track of the different data entities and relationships between them, to ensure their consistency and improve exchange of information about them, Persistent Identifiers (PIDs) are used. These are unique identifiers that are registered at a globally accessible server, along with some metadata (the PID record). While usually providing access to the data object they refer to, as long as it exists, the metadata record will remain available even beyond the object's lifetime. Besides providing access to data and metadata, PIDs will allow scientists to communicate effectively and on a fine granularity about CMIP6 data. The initiative to introduce PIDs in the ESGF infrastructure has been described and agreed upon through a series of white papers governed by the WGCM Infrastructure Panel (WIP). In CMIP6, each dataset and each file is assigned a PID that keeps track of the data object's physical copies throughout the object lifetime. In addition to this, its relationship with other data objects is stored in the PID recordA human-readable version of this information is available on an information page also linked in the PID record. A possible application that exploits the information available from the PID records is a smart information tool, which a scientific user can call to find out if his/her version was replaced by a new one, to view and browse the related datasets and files, and to get access to the various copies or to additional metadata on a dedicated website. The PID registration process is embedded in the ESGF data publication process. During their first publication, the PID records are populated with metadata including the parent dataset(s), other existing versions and physical location. Every subsequent publication, un-publication or replica publication of a dataset or file then updates the PID records to keep track of changing physical locations of the data (or lack thereof) and of reported errors in the data. Assembling the metadata records and registering the PIDs on a central server is a potential performance bottleneck as millions of data objects may be published in a short timeframe when the CMIP6 experiment phase begins. For this reason, the PID registration and metadata update tasks are pushed to a message queueing system facilitating high availability and scalability and then processed asynchronously. This will lead to a slight delay in PID registration but will avoid blocking resources at the data centres and slowing down the publication of the data so eagerly awaited by the scientists.

  7. Recently amplified arctic warming has contributed to a continual global warming trend

    NASA Astrophysics Data System (ADS)

    Huang, Jianbin; Zhang, Xiangdong; Zhang, Qiyi; Lin, Yanluan; Hao, Mingju; Luo, Yong; Zhao, Zongci; Yao, Yao; Chen, Xin; Wang, Lei; Nie, Suping; Yin, Yizhou; Xu, Ying; Zhang, Jiansong

    2017-12-01

    The existence and magnitude of the recently suggested global warming hiatus, or slowdown, have been strongly debated1-3. Although various physical processes4-8 have been examined to elucidate this phenomenon, the accuracy and completeness of observational data that comprise global average surface air temperature (SAT) datasets is a concern9,10. In particular, these datasets lack either complete geographic coverage or in situ observations over the Arctic, owing to the sparse observational network in this area9. As a consequence, the contribution of Arctic warming to global SAT changes may have been underestimated, leading to an uncertainty in the hiatus debate. Here, we constructed a new Arctic SAT dataset using the most recently updated global SATs2 and a drifting buoys based Arctic SAT dataset11 through employing the `data interpolating empirical orthogonal functions' method12. Our estimate of global SAT rate of increase is around 0.112 °C per decade, instead of 0.05 °C per decade from IPCC AR51, for 1998-2012. Analysis of this dataset shows that the amplified Arctic warming over the past decade has significantly contributed to a continual global warming trend, rather than a hiatus or slowdown.

  8. "Whoa! We're Going Deep in the Trees!": Patterns of Collaboration around an Interactive Information Visualization Exhibit

    ERIC Educational Resources Information Center

    Davis, Pryce; Horn, Michael; Block, Florian; Phillips, Brenda; Evans, E. Margaret; Diamond, Judy; Shen, Chia

    2015-01-01

    In this paper we present a qualitative analysis of natural history museum visitor interaction around a multi-touch tabletop exhibit called "DeepTree" that we designed around concepts of evolution and common descent. DeepTree combines several large scientific datasets and an innovative visualization technique to display a phylogenetic…

  9. Peer Interaction in Text Chat: Qualitative Analysis of Chat Transcripts

    ERIC Educational Resources Information Center

    Golonka, Ewa M.; Tare, Medha; Bonilla, Carrie

    2017-01-01

    Prior research has shown that intermediate-level adult learners of Russian who worked interactively with partners using text chat improved their vocabulary and oral production skills more than students who worked independently (Tare et al., 2014). Drawing on the dataset from Tare et al. (2014), the current study follows up to explore the nature of…

  10. Advanced Visualization and Interactive Display Rapid Innovation and Discovery Evaluation Research (VISRIDER) Program Task 6: Point Cloud Visualization Techniques for Desktop and Web Platforms

    DTIC Science & Technology

    2017-04-01

    ADVANCED VISUALIZATION AND INTERACTIVE DISPLAY RAPID INNOVATION AND DISCOVERY EVALUATION RESEARCH (VISRIDER) PROGRAM TASK 6: POINT CLOUD...To) OCT 2013 – SEP 2014 4. TITLE AND SUBTITLE ADVANCED VISUALIZATION AND INTERACTIVE DISPLAY RAPID INNOVATION AND DISCOVERY EVALUATION RESEARCH...various point cloud visualization techniques for viewing large scale LiDAR datasets. Evaluate their potential use for thick client desktop platforms

  11. Systemic Console: Advanced analysis of exoplanetary data

    NASA Astrophysics Data System (ADS)

    Meschiari, Stefano; Wolf, Aaron S.; Rivera, Eugenio; Laughlin, Gregory; Vogt, Steve; Butler, Paul

    2012-10-01

    Systemic Console is a tool for advanced analysis of exoplanetary data. It comprises a graphical tool for fitting radial velocity and transits datasets and a library of routines for non-interactive calculations. Among its features are interactive plotting of RV curves and transits, combined fitting of RV and transit timing (primary and secondary), interactive periodograms and FAP estimation, and bootstrap and MCMC error estimation. The console package includes public radial velocity and transit data.

  12. Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts?

    PubMed

    Fish, Alexandra E; Capra, John A; Bush, William S

    2016-10-06

    The importance of epistasis-or statistical interactions between genetic variants-to the development of complex disease in humans has been controversial. Genome-wide association studies of statistical interactions influencing human traits have recently become computationally feasible and have identified many putative interactions. However, statistical models used to detect interactions can be confounded, which makes it difficult to be certain that observed statistical interactions are evidence for true molecular epistasis. In this study, we investigate whether there is evidence for epistatic interactions between genetic variants within the cis-regulatory region that influence gene expression after accounting for technical, statistical, and biological confounding factors. We identified 1,119 (FDR = 5%) interactions that appear to regulate gene expression in human lymphoblastoid cell lines, a tightly controlled, largely genetically determined phenotype. Many of these interactions replicated in an independent dataset (90 of 803 tested, Bonferroni threshold). We then performed an exhaustive analysis of both known and novel confounders, including ceiling/floor effects, missing genotype combinations, haplotype effects, single variants tagged through linkage disequilibrium, and population stratification. Every interaction could be explained by at least one of these confounders, and replication in independent datasets did not protect against some confounders. Assuming that the confounding factors provide a more parsimonious explanation for each interaction, we find it unlikely that cis-regulatory interactions contribute strongly to human gene expression, which calls into question the relevance of cis-regulatory interactions for other human phenotypes. We additionally propose several best practices for epistasis testing to protect future studies from confounding. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  13. Association between urbanisation and type 2 diabetes: an ecological study

    PubMed Central

    Gassasse, Zakariah; Smith, Dianna; Finer, Sarah

    2017-01-01

    Introduction Previous studies have explored the effect of urbanisation on the prevalence of type 2 diabetes (T2D) at regional/national level. The aim of this study is to investigate the association between urbanisation and T2D at country level, worldwide, and to explore the role of intermediate variables (physical inactivity, sugar consumption and obesity). The potential effect modification of gross domestic product (GDP) was also assessed. Methods Data for 207 countries were collected from accessible datasets. Direct acyclic graphs were used to describe the association between urbanisation, T2D and their intermediate variables (physical inactivity, sugar consumption and obesity). Urbanisation was measured as urban percentage (UP) and as agglomeration index (AI). Crude and multivariate linear regression analyses were conducted to explore selected associations. The interaction between urbanisation and T2D across levels of GDP per capita was investigated. Results The association between urbanisation and T2D diverged by exposure: AI was positively associated, while UP negatively associated with T2D prevalence. Physical inactivity and obesity were statistically significantly associated with increased prevalence of T2D. In middle-income countries (MIC) UP, AI and GDP were significantly associated with T2D prevalence, while in high-income countries (HIC), physical inactivity and obesity were the main determinant of T2D prevalence. Conclusions The type of urban growth, not urbanisation per se, predicted T2D prevalence at country level. In MIC, population density and GDP were the main determinant of diabetes, while in HIC. these were physical inactivity and obesity. Globalisation is playing an important role in the rise of T2D worldwide. PMID:29104770

  14. Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences

    NASA Astrophysics Data System (ADS)

    Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida

    2017-04-01

    The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool Protégé, which is a graphical ontology-development tool that supports a rich knowledge model and it is open-source and freely available. To process and manage the ontology, we are using Semantic MediaWiki, which is able to process queries. Semantic MediaWiki is an extension of MediaWiki where we can do semantic search and export data in RDF. Our final goal is integrating our data repository portal and semantic processing engine in order to have a complete system to manage the data life cycle stages and their relationships, including machine-actionable DMP solution, datasets and software management, computing resources for processing and analysis and publication features (DOI mint). This way we will be able to reproduce the full data life cycle chain warranting the FAIR+R principles.

  15. Comprehensive preclinical evaluation of a multi-physics model of liver tumor radiofrequency ablation.

    PubMed

    Audigier, Chloé; Mansi, Tommaso; Delingette, Hervé; Rapaka, Saikiran; Passerini, Tiziano; Mihalef, Viorel; Jolly, Marie-Pierre; Pop, Raoul; Diana, Michele; Soler, Luc; Kamen, Ali; Comaniciu, Dorin; Ayache, Nicholas

    2017-09-01

    We aim at developing a framework for the validation of a subject-specific multi-physics model of liver tumor radiofrequency ablation (RFA). The RFA computation becomes subject specific after several levels of personalization: geometrical and biophysical (hemodynamics, heat transfer and an extended cellular necrosis model). We present a comprehensive experimental setup combining multimodal, pre- and postoperative anatomical and functional images, as well as the interventional monitoring of intra-operative signals: the temperature and delivered power. To exploit this dataset, an efficient processing pipeline is introduced, which copes with image noise, variable resolution and anisotropy. The validation study includes twelve ablations from five healthy pig livers: a mean point-to-mesh error between predicted and actual ablation extent of 5.3 ± 3.6 mm is achieved. This enables an end-to-end preclinical validation framework that considers the available dataset.

  16. WE-G-207-06: 3D Fluoroscopic Image Generation From Patient-Specific 4DCBCT-Based Motion Models Derived From Physical Phantom and Clinical Patient Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dhou, S; Cai, W; Hurwitz, M

    2015-06-15

    Purpose: Respiratory-correlated cone-beam CT (4DCBCT) images acquired immediately prior to treatment have the potential to represent patient motion patterns and anatomy during treatment, including both intra- and inter-fractional changes. We develop a method to generate patient-specific motion models based on 4DCBCT images acquired with existing clinical equipment and used to generate time varying volumetric images (3D fluoroscopic images) representing motion during treatment delivery. Methods: Motion models are derived by deformably registering each 4DCBCT phase to a reference phase, and performing principal component analysis (PCA) on the resulting displacement vector fields. 3D fluoroscopic images are estimated by optimizing the resulting PCAmore » coefficients iteratively through comparison of the cone-beam projections simulating kV treatment imaging and digitally reconstructed radiographs generated from the motion model. Patient and physical phantom datasets are used to evaluate the method in terms of tumor localization error compared to manually defined ground truth positions. Results: 4DCBCT-based motion models were derived and used to generate 3D fluoroscopic images at treatment time. For the patient datasets, the average tumor localization error and the 95th percentile were 1.57 and 3.13 respectively in subsets of four patient datasets. For the physical phantom datasets, the average tumor localization error and the 95th percentile were 1.14 and 2.78 respectively in two datasets. 4DCBCT motion models are shown to perform well in the context of generating 3D fluoroscopic images due to their ability to reproduce anatomical changes at treatment time. Conclusion: This study showed the feasibility of deriving 4DCBCT-based motion models and using them to generate 3D fluoroscopic images at treatment time in real clinical settings. 4DCBCT-based motion models were found to account for the 3D non-rigid motion of the patient anatomy during treatment and have the potential to localize tumor and other patient anatomical structures at treatment time even when inter-fractional changes occur. This project was supported, in part, through a Master Research Agreement with Varian Medical Systems, Inc., Palo Alto, CA. The project was also supported, in part, by Award Number R21CA156068 from the National Cancer Institute.« less

  17. IMG/M: integrated genome and metagenome comparative data analysis system

    DOE PAGES

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; ...

    2016-10-13

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less

  18. IMG/M: integrated genome and metagenome comparative data analysis system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less

  19. IMG/M: integrated genome and metagenome comparative data analysis system

    PubMed Central

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2017-01-01

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system. PMID:27738135

  20. Drug-target interaction prediction: A Bayesian ranking approach.

    PubMed

    Peska, Ladislav; Buza, Krisztian; Koller, Júlia

    2017-12-01

    In silico prediction of drug-target interactions (DTI) could provide valuable information and speed-up the process of drug repositioning - finding novel usage for existing drugs. In our work, we focus on machine learning algorithms supporting drug-centric repositioning approach, which aims to find novel usage for existing or abandoned drugs. We aim at proposing a per-drug ranking-based method, which reflects the needs of drug-centric repositioning research better than conventional drug-target prediction approaches. We propose Bayesian Ranking Prediction of Drug-Target Interactions (BRDTI). The method is based on Bayesian Personalized Ranking matrix factorization (BPR) which has been shown to be an excellent approach for various preference learning tasks, however, it has not been used for DTI prediction previously. In order to successfully deal with DTI challenges, we extended BPR by proposing: (i) the incorporation of target bias, (ii) a technique to handle new drugs and (iii) content alignment to take structural similarities of drugs and targets into account. Evaluation on five benchmark datasets shows that BRDTI outperforms several state-of-the-art approaches in terms of per-drug nDCG and AUC. BRDTI results w.r.t. nDCG are 0.929, 0.953, 0.948, 0.897 and 0.690 for G-Protein Coupled Receptors (GPCR), Ion Channels (IC), Nuclear Receptors (NR), Enzymes (E) and Kinase (K) datasets respectively. Additionally, BRDTI significantly outperformed other methods (BLM-NII, WNN-GIP, NetLapRLS and CMF) w.r.t. nDCG in 17 out of 20 cases. Furthermore, BRDTI was also shown to be able to predict novel drug-target interactions not contained in the original datasets. The average recall at top-10 predicted targets for each drug was 0.762, 0.560, 1.000 and 0.404 for GPCR, IC, NR, and E datasets respectively. Based on the evaluation, we can conclude that BRDTI is an appropriate choice for researchers looking for an in silico DTI prediction technique to be used in drug-centric repositioning scenarios. BRDTI Software and supplementary materials are available online at www.ksi.mff.cuni.cz/∼peska/BRDTI. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Physics Model-Based Scatter Correction in Multi-Source Interior Computed Tomography.

    PubMed

    Gong, Hao; Li, Bin; Jia, Xun; Cao, Guohua

    2018-02-01

    Multi-source interior computed tomography (CT) has a great potential to provide ultra-fast and organ-oriented imaging at low radiation dose. However, X-ray cross scattering from multiple simultaneously activated X-ray imaging chains compromises imaging quality. Previously, we published two hardware-based scatter correction methods for multi-source interior CT. Here, we propose a software-based scatter correction method, with the benefit of no need for hardware modifications. The new method is based on a physics model and an iterative framework. The physics model was derived analytically, and was used to calculate X-ray scattering signals in both forward direction and cross directions in multi-source interior CT. The physics model was integrated to an iterative scatter correction framework to reduce scatter artifacts. The method was applied to phantom data from both Monte Carlo simulations and physical experimentation that were designed to emulate the image acquisition in a multi-source interior CT architecture recently proposed by our team. The proposed scatter correction method reduced scatter artifacts significantly, even with only one iteration. Within a few iterations, the reconstructed images fast converged toward the "scatter-free" reference images. After applying the scatter correction method, the maximum CT number error at the region-of-interests (ROIs) was reduced to 46 HU in numerical phantom dataset and 48 HU in physical phantom dataset respectively, and the contrast-noise-ratio at those ROIs increased by up to 44.3% and up to 19.7%, respectively. The proposed physics model-based iterative scatter correction method could be useful for scatter correction in dual-source or multi-source CT.

  2. The Vaigat Rock Avalanche Laboratory, west-central Greenland

    NASA Astrophysics Data System (ADS)

    Dunning, S.; Rosser, N. J.; Szczucinski, W.; Norman, E. C.; Benjamin, J.; Strzelecki, M.; Long, A. J.; Drewniak, M.

    2013-12-01

    Rock avalanches have unusually high mobility and pose both an immediate hazard, but also produce far-field impacts associated with dam breach, glacier collapse and where they run-out into water, tsunami. Such secondary hazards can often pose higher risks than the original landslide. The prediction of future threats posed by potential rock avalanches is heavily reliant upon understanding of the physics derived from an interpretation of deposits left by previous events, yet drawing comparisons between multiple events is normally challenging as interactions with complex mountainous terrain makes deposits from each event unique. As such numerical models and the interpretation of the underlying physics which govern landslide mobility is commonly case-specific and poorly suited to extrapolation beyond the single events the model is tuned to. Here we present a high-resolution LiDAR and hyperspectral dataset captured across a unique cluster of large rock avalanche source areas and deposits in the Vaigat straight, west central Greenland. Vaigat offers the unprecedented opportunity to model a sample of > 15 rock avalanches of various age sourced from an 80 km coastal escarpment. At Vaigat many of the key variables (topography, geology, post-glacial history) are held constant across all landslides providing the chance to investigate the variations in dynamics and emplacement style related to variable landslide volume, drop-heights, and thinning/spreading over relatively simple, unrestricted run-out zones both onto land and into water. Our data suggest that this region represents excellent preservation of landslide deposits, and hence is well suited to calibrate numerical models of run out dynamics. We use this data to aid the interpretation of deposit morphology, structure lithology and run-out characteristics in more complex settings. Uniquely, we are also able to calibrate our models using a far-field dataset of well-preserved tsunami run-up deposits, resulting from the 21.11.00 Paatuut landslide. The study was funded by Polish National Science Centre grant No. 2011/01/B/ST10/01553, and project UK NERC ARSF IG13-15.

  3. Statistical mechanics of complex neural systems and high dimensional data

    NASA Astrophysics Data System (ADS)

    Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya

    2013-03-01

    Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.

  4. Developing a new global network of river reaches from merged satellite-derived datasets

    NASA Astrophysics Data System (ADS)

    Lion, C.; Allen, G. H.; Beighley, E.; Pavelsky, T.

    2015-12-01

    In 2020, the Surface Water and Ocean Topography satellite (SWOT), a joint mission of NASA/CNES/CSA/UK will be launched. One of its major products will be the measurements of continental water extent, including the width, height, and slope of rivers and the surface area and elevations of lakes. The mission will improve the monitoring of continental water and also our understanding of the interactions between different hydrologic reservoirs. For rivers, SWOT measurements of slope must be carried out over predefined river reaches. As such, an a priori dataset for rivers is needed in order to facilitate analysis of the raw SWOT data. The information required to produce this dataset includes measurements of river width, elevation, slope, planform, river network topology, and flow accumulation. To produce this product, we have linked two existing global datasets: the Global River Widths from Landsat (GRWL) database, which contains river centerline locations, widths, and a braiding index derived from Landsat imagery, and a modified version of the HydroSHEDS hydrologically corrected digital elevation product, which contains heights and flow accumulation measurements for streams at 3 arcsecond spatial resolution. Merging these two datasets requires considerable care. The difficulties, among others, lie in the difference of resolution: 30m versus 3 arseconds, and the age of the datasets: 2000 versus ~2010 (some rivers have moved, the braided sections are different). As such, we have developed custom software to merge the two datasets, taking into account the spatial proximity of river channels in the two datasets and ensuring that flow accumulation in the final dataset always increases downstream. Here, we present our preliminary results for a portion of South America and demonstrate the strengths and weaknesses of the method.

  5. Prediction of Drug-Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures.

    PubMed

    Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Zhou, Yong; An, Ji-Yong

    2017-07-05

    Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.

  6. Parallel Planes Information Visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bush, Brian

    2015-12-26

    This software presents a user-provided multivariate dataset as an interactive three dimensional visualization so that the user can explore the correlation between variables in the observations and the distribution of observations among the variables.

  7. The Network Structure Underlying the Earth Observation Assessment

    NASA Astrophysics Data System (ADS)

    Vitkin, S.; Doane, W. E. J.; Mary, J. C.

    2017-12-01

    The Earth Observations Assessment (EOA 2016) is a multiyear project designed to assess the effectiveness of civil earth observation data sources (instruments, sensors, models, etc.) on societal benefit areas (SBAs) for the United States. Subject matter experts (SMEs) provided input and scored how data sources inform products, product groups, key objectives, SBA sub-areas, and SBAs in an attempt to quantify the relationships between data sources and SBAs. The resulting data were processed by Integrated Applications Incorporated (IAI) using MITRE's PALMA software to create normalized relative impact scores for each of these relationships. However, PALMA processing obscures the natural network representation of the data. Any network analysis that might identify patterns of interaction among data sources, products, and SBAs is therefore impossible. Collaborating with IAI, we cleaned and recreated a network from the original dataset. Using R and Python we explore the underlying structure of the network and apply frequent itemset mining algorithms to identify groups of data sources and products that interact. We reveal interesting patterns and relationships in the EOA dataset that were not immediately observable from the EOA 2016 report and provide a basis for further exploration of the EOA network dataset.

  8. Global analysis of b → sℓℓ anomalies

    NASA Astrophysics Data System (ADS)

    Descotes-Genon, Sébastien; Hofer, Lars; Matias, Joaquim; Virto, Javier

    2016-06-01

    We present a detailed discussion of the current theoretical and experimental situation of the anomaly in the angular distribution of B → K * (→ Kπ) μ + μ -, observed at LHCb in the 1 fb-1 dataset and recently confirmed by the 3 fb-1 dataset. The impact of this data and other recent measurements on b → sℓ + ℓ - transitions ( ℓ = e, μ) is considered. We review the observables of interest, focusing on their theoretical uncertainties and their sensitivity to New Physics, based on an analysis employing the QCD factorisation approach including several sources of hadronic uncertainties (form factors, power corrections, charm-loop effects). We perform fits to New Physics contributions including experimental and theoretical correlations. The solution that we proposed in 2013 to solve the B → K * μ + μ - anomaly, with a contribution {mathcal{C}}_9^{NP}˜eq -1 , is confirmed and reinforced. A wider range of New-Physics scenarios with high significances (between 4 and 5 σ) emerges from the fit, some of them being particularly relevant for model building. More data is needed to discriminate among them conclusively. The inclusion of b → se + e - observables increases the significance of the favoured scenarios under the hypothesis of New Physics breaking lepton flavour universality. Several tests illustrate the robustness of our conclusions.

  9. Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.

    PubMed

    McCabe, Erin; Gross, Douglas P; Bulut, Okan

    2018-06-07

    The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.

  10. Commuting and health in Cambridge: a study of a 'natural experiment' in the provision of new transport infrastructure

    PubMed Central

    2010-01-01

    Background Modifying transport infrastructure to support active travel (walking and cycling) could help to increase population levels of physical activity. However, there is limited evidence for the effects of interventions in this field, and to the best of our knowledge no study has convincingly demonstrated an increase in physical activity directly attributable to this type of intervention. We have therefore taken the opportunity presented by a 'natural experiment' in Cambridgeshire, UK to establish a quasi-experimental study of the effects of a major transport infrastructural intervention on travel behaviour, physical activity and related wider health impacts. Design and methods The Commuting and Health in Cambridge study comprises three main elements: a cohort study of adults who travel to work in Cambridge, using repeated postal questionnaires and basic objective measurement of physical activity using accelerometers; in-depth quantitative studies of physical activity energy expenditure, travel and movement patterns and estimated carbon emissions using household travel diaries, combined heart rate and movement sensors and global positioning system (GPS) receivers; and a longitudinal qualitative interview study to elucidate participants' attitudes, experiences and practices and to understand how environmental and social factors interact to influence travel behaviour, for whom and in what circumstances. The impacts of a specific intervention - the opening of the Cambridgeshire Guided Busway - and of other changes in the physical environment will be examined using a controlled quasi-experimental design within the overall cohort dataset. Discussion Addressing the unresolved research and policy questions in this area is not straightforward. The challenges include those of effectively combining different disciplinary perspectives on the research problems, developing common methodological ground in measurement and evaluation, implementing robust quantitative measurement of travel and physical activity behaviour in an unpredictable 'natural experiment' setting, defining exposure to the intervention, defining controls, and conceptualising an appropriate longitudinal analytical strategy. PMID:21080928

  11. Social Environment Shapes the Speed of Cooperation

    PubMed Central

    Nishi, Akihiro; Christakis, Nicholas A.; Evans, Anthony M.; O’Malley, A. James; Rand, David G.

    2016-01-01

    Are cooperative decisions typically made more quickly or slowly than non-cooperative decisions? While this question has attracted considerable attention in recent years, most research has focused on one-shot interactions. Yet it is repeated interactions that characterize most important real-world social interactions. In repeated interactions, the cooperativeness of one’s interaction partners (the “social environment”) should affect the speed of cooperation. Specifically, we propose that reciprocal decisions (choices that mirror behavior observed in the social environment), rather than cooperative decisions per se, occur more quickly. We test this hypothesis by examining four independent decision time datasets with a total of 2,088 subjects making 55,968 decisions. We show that reciprocal decisions are consistently faster than non-reciprocal decisions: cooperation is faster than defection in cooperative environments, while defection is faster than cooperation in non-cooperative environments. These differences are further enhanced by subjects’ previous behavior – reciprocal decisions are faster when they are consistent with the subject’s previous choices. Finally, mediation analyses of a fifth dataset suggest that the speed of reciprocal decisions is explained, in part, by feelings of conflict – reciprocal decisions are less conflicted than non-reciprocal decisions, and less decision conflict appears to lead to shorter decision times. PMID:27435940

  12. Characterizing interactions in online social networks during exceptional events

    NASA Astrophysics Data System (ADS)

    Omodei, Elisa; De Domenico, Manlio; Arenas, Alex

    2015-08-01

    Nowadays, millions of people interact on a daily basis on online social media like Facebook and Twitter, where they share and discuss information about a wide variety of topics. In this paper, we focus on a specific online social network, Twitter, and we analyze multiple datasets each one consisting of individuals' online activity before, during and after an exceptional event in terms of volume of the communications registered. We consider important events that occurred in different arenas that range from policy to culture or science. For each dataset, the users' online activities are modeled by a multilayer network in which each layer conveys a different kind of interaction, specifically: retweeting, mentioning and replying. This representation allows us to unveil that these distinct types of interaction produce networks with different statistical properties, in particular concerning the degree distribution and the clustering structure. These results suggests that models of online activity cannot discard the information carried by this multilayer representation of the system, and should account for the different processes generated by the different kinds of interactions. Secondly, our analysis unveils the presence of statistical regularities among the different events, suggesting that the non-trivial topological patterns that we observe may represent universal features of the social dynamics on online social networks during exceptional events.

  13. Violent Interaction Detection in Video Based on Deep Learning

    NASA Astrophysics Data System (ADS)

    Zhou, Peipei; Ding, Qinghai; Luo, Haibo; Hou, Xinglin

    2017-06-01

    Violent interaction detection is of vital importance in some video surveillance scenarios like railway stations, prisons or psychiatric centres. Existing vision-based methods are mainly based on hand-crafted features such as statistic features between motion regions, leading to a poor adaptability to another dataset. En lightened by the development of convolutional networks on common activity recognition, we construct a FightNet to represent the complicated visual violence interaction. In this paper, a new input modality, image acceleration field is proposed to better extract the motion attributes. Firstly, each video is framed as RGB images. Secondly, optical flow field is computed using the consecutive frames and acceleration field is obtained according to the optical flow field. Thirdly, the FightNet is trained with three kinds of input modalities, i.e., RGB images for spatial networks, optical flow images and acceleration images for temporal networks. By fusing results from different inputs, we conclude whether a video tells a violent event or not. To provide researchers a common ground for comparison, we have collected a violent interaction dataset (VID), containing 2314 videos with 1077 fight ones and 1237 no-fight ones. By comparison with other algorithms, experimental results demonstrate that the proposed model for violent interaction detection shows higher accuracy and better robustness.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulatemore » gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.« less

  15. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments

    PubMed Central

    Nesvizhskii, Alexey I.

    2013-01-01

    Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false positive protein interactions present in unfiltered datasets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome-wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS datasets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data. PMID:22611043

  16. Exploring Plant Co-Expression and Gene-Gene Interactions with CORNET 3.0.

    PubMed

    Van Bel, Michiel; Coppens, Frederik

    2017-01-01

    Selecting and filtering a reference expression and interaction dataset when studying specific pathways and regulatory interactions can be a very time-consuming and error-prone task. In order to reduce the duplicated efforts required to amass such datasets, we have created the CORNET (CORrelation NETworks) platform which allows for easy access to a wide variety of data types: coexpression data, protein-protein interactions, regulatory interactions, and functional annotations. The CORNET platform outputs its results in either text format or through the Cytoscape framework, which is automatically launched by the CORNET website.CORNET 3.0 is the third iteration of the web platform designed for the user exploration of the coexpression space of plant genomes, with a focus on the model species Arabidopsis thaliana. Here we describe the platform: the tools, data, and best practices when using the platform. We indicate how the platform can be used to infer networks from a set of input genes, such as upregulated genes from an expression experiment. By exploring the network, new target and regulator genes can be discovered, allowing for follow-up experiments and more in-depth study. We also indicate how to avoid common pitfalls when evaluating the networks and how to avoid over interpretation of the results.All CORNET versions are available at http://bioinformatics.psb.ugent.be/cornet/ .

  17. Optimal design method to minimize users' thinking mapping load in human-machine interactions.

    PubMed

    Huang, Yanqun; Li, Xu; Zhang, Jie

    2015-01-01

    The discrepancy between human cognition and machine requirements/behaviors usually results in serious mental thinking mapping loads or even disasters in product operating. It is important to help people avoid human-machine interaction confusions and difficulties in today's mental work mastered society. Improving the usability of a product and minimizing user's thinking mapping and interpreting load in human-machine interactions. An optimal human-machine interface design method is introduced, which is based on the purpose of minimizing the mental load in thinking mapping process between users' intentions and affordance of product interface states. By analyzing the users' thinking mapping problem, an operating action model is constructed. According to human natural instincts and acquired knowledge, an expected ideal design with minimized thinking loads is uniquely determined at first. Then, creative alternatives, in terms of the way human obtains operational information, are provided as digital interface states datasets. In the last, using the cluster analysis method, an optimum solution is picked out from alternatives, by calculating the distances between two datasets. Considering multiple factors to minimize users' thinking mapping loads, a solution nearest to the ideal value is found in the human-car interaction design case. The clustering results show its effectiveness in finding an optimum solution to the mental load minimizing problems in human-machine interaction design.

  18. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    NASA Astrophysics Data System (ADS)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us to develop improved methods for scientists and practitioners alike.

  19. Interactive client side data visualization with d3.js

    NASA Astrophysics Data System (ADS)

    Rodzianko, A.; Versteeg, R.; Johnson, D. V.; Soltanian, M. R.; Versteeg, O. J.; Girouard, M.

    2015-12-01

    Geoscience data associated with near surface research and operational sites is increasingly voluminous and heterogeneous (both in terms of providers and data types - e.g. geochemical, hydrological, geophysical, modeling data, of varying spatiotemporal characteristics). Such data allows scientists to investigate fundamental hydrological and geochemical processes relevant to agriculture, water resources and climate change. For scientists to easily share, model and interpret such data requires novel tools with capabilities for interactive data visualization. Under sponsorship of the US Department of Energy, Subsurface Insights is developing the Predictive Assimilative Framework (PAF): a cloud based subsurface monitoring platform which can manage, process and visualize large heterogeneous datasets. Over the last year we transitioned our visualization method from a server side approach (in which images and animations were generated using Jfreechart and Visit) to a client side one that utilizes the D3 Javascript library. Datasets are retrieved using web service calls to the server, returned as JSON objects and visualized within the browser. Users can interactively explore primary and secondary datasets from various field locations. Our current capabilities include interactive data contouring and heterogeneous time series data visualization. While this approach is very powerful and not necessarily unique, special attention needs to be paid to latency and responsiveness issues as well as to issues as cross browser code compatibility so that users have an identical, fluid and frustration-free experience across different computational platforms. We gratefully acknowledge support from the US Department of Energy under SBIR Award DOE DE-SC0009732, the use of data from the Lawrence Berkeley National Laboratory (LBNL) Sustainable Systems SFA Rifle field site and collaboration with LBNL SFA scientists.

  20. Predicting radiotherapy outcomes using statistical learning techniques

    NASA Astrophysics Data System (ADS)

    El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.

    2009-09-01

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model variables. These models have the capacity to predict on unseen data. Part of this work was first presented at the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 11-13 December 2008.

  1. Analyzing Saturn's Magnetospheric Data After Cassini - Improving and Future-Proofing Cassini / MAPS Tools and Data

    NASA Astrophysics Data System (ADS)

    Brown, L. E.; Faden, J.; Vandegriff, J. D.; Kurth, W. S.; Mitchell, D. G.

    2017-12-01

    We present a plan to provide enhanced longevity to analysis software and science data used throughout the Cassini mission for viewing Magnetosphere and Plasma Science (MAPS) data. While a final archive is being prepared for Cassini, the tools that read from this archive will eventually become moribund as real world hardware and software systems evolve. We will add an access layer over existing and planned Cassini data products that will allow multiple tools to access many public MAPS datasets. The access layer is called the Heliophysics Application Programmer's Interface (HAPI), and this is a mechanism being adopted at many data centers across Heliophysics and planetary science for the serving of time series data. Two existing tools are also being enhanced to read from HAPI servers, namely Autoplot from the University of Iowa and MIDL (Mission Independent Data Layer) from The Johns Hopkins Applied Physics Lab. Thus both tools will be able to access data from RPWS, MAG, CAPS, and MIMI. In addition to being able to access data from each other's institutions, these tools will be able to read from all the new datasets expected to come online using the HAPI standard in the near future. The PDS also plans to use HAPI for all the holdings at the Planetary and Plasma Interactions (PPI) node. A basic presentation of the new HAPI data server mechanism is presented, as is an early demonstration of the modified tools.

  2. Dynamic Moss Observed with Hi-C

    NASA Technical Reports Server (NTRS)

    Alexander, Caroline; Winebarger, Amy; Morton, Richard; Savage, Sabrina

    2014-01-01

    The High-resolution Coronal Imager (Hi-C), flown on 11 July 2012, has revealed an unprecedented level of detail and substructure within the solar corona. Hi-­-C imaged a large active region (AR11520) with 0.2-0.3'' spatial resolution and 5.5s cadence over a 5 minute period. An additional dataset with a smaller FOV, the same resolution, but with a higher temporal cadence (1s) was also taken during the rocket flight. This dataset was centered on a large patch of 'moss' emission that initially seemed to show very little variability. Image processing revealed this region to be much more dynamic than first thought with numerous bright and dark features observed to appear, move and disappear over the 5 minute observation. Moss is thought to be emission from the upper transition region component of hot loops so studying its dynamics and the relation between the bright/dark features and underlying magnetic features is important to tie the interaction of the different atmospheric layers together. Hi-C allows us to study the coronal emission of the moss at the smallest scales while data from SDO/AIA and HMI is used to give information on these structures at different heights/temperatures. Using the high temporal and spatial resolution of Hi-C the observed moss features were tracked and the distribution of displacements, speeds, and sizes were measured. This allows us to comment on both the physical processes occurring within the dynamic moss and the scales at which these changes are occurring.

  3. Dynamic Moss Observed with Hi-C

    NASA Technical Reports Server (NTRS)

    Alexander, Caroline; Winebarger, Amy; Morton, Richard; Savage, Sabrina

    2014-01-01

    The High-resolution Coronal Imager (Hi-C), flown on 11 July 2012, has revealed an unprecedented level of detail and substructure within the solar corona. Hi-C imaged a large active region (AR11520) with 0.2-0.3'' spatial resolution and 5.5s cadence over a 5 minute period. An additional dataset with a smaller FOV, the same resolution, but with a higher temporal cadence (1s) was also taken during the rocket flight. This dataset was centered on a large patch of 'moss' emission that initially seemed to show very little variability. Image processing revealed this region to be much more dynamic than first thought with numerous bright and dark features observed to appear, move and disappear over the 5 minute observation. Moss is thought to be emission from the upper transition region component of hot loops so studying its dynamics and the relation between the bright/dark features and underlying magnetic features is important to tie the interaction of the different atmospheric layers together. Hi-C allows us to study the coronal emission of the moss at the smallest scales while data from SDO/AIA and HMI is used to give information on these structures at different heights/temperatures. Using the high temporal and spatial resolution of Hi-C the observed moss features were tracked and the distribution of displacements, speeds, and sizes were measured. This allows us to comment on both the physical processes occurring within the dynamic moss and the scales at which these changes are occurring.

  4. Evaluating virtual hosted desktops for graphics-intensive astronomy

    NASA Astrophysics Data System (ADS)

    Meade, B. F.; Fluke, C. J.

    2018-04-01

    Visualisation of data is critical to understanding astronomical phenomena. Today, many instruments produce datasets that are too big to be downloaded to a local computer, yet many of the visualisation tools used by astronomers are deployed only on desktop computers. Cloud computing is increasingly used to provide a computation and simulation platform in astronomy, but it also offers great potential as a visualisation platform. Virtual hosted desktops, with graphics processing unit (GPU) acceleration, allow interactive, graphics-intensive desktop applications to operate co-located with astronomy datasets stored in remote data centres. By combining benchmarking and user experience testing, with a cohort of 20 astronomers, we investigate the viability of replacing physical desktop computers with virtual hosted desktops. In our work, we compare two Apple MacBook computers (one old and one new, representing hardware and opposite ends of the useful lifetime) with two virtual hosted desktops: one commercial (Amazon Web Services) and one in a private research cloud (the Australian NeCTAR Research Cloud). For two-dimensional image-based tasks and graphics-intensive three-dimensional operations - typical of astronomy visualisation workflows - we found that benchmarks do not necessarily provide the best indication of performance. When compared to typical laptop computers, virtual hosted desktops can provide a better user experience, even with lower performing graphics cards. We also found that virtual hosted desktops are equally simple to use, provide greater flexibility in choice of configuration, and may actually be a more cost-effective option for typical usage profiles.

  5. EnviroAtlas - Historic Places by 12-digit HUC for the Conterminous United States

    EPA Pesticide Factsheets

    This EnviroAtlas dataset portrays the total number of historic places located within each 12-digit Hydrologic Unit (HUC). The historic places data were compiled from the National Park Service's National Register of Historic Places (NRHP), which provides official federal lists of districts, sites, buildings, structures and objects significant to American history, architecture, archeology, engineering, and culture. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  6. EnviroAtlas - Memphis, TN - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  7. EnviroAtlas - Portland, ME - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  8. EnviroAtlas - New York, NY - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  9. EnviroAtlas - Green Bay, WI - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  10. EnviroAtlas - Pittsburgh, PA - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  11. EnviroAtlas - Portland, OR - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  12. EnviroAtlas - Paterson, NJ - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  13. EnviroAtlas - Des Moines, IA - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. EnviroAtlas - Phoenix, AZ - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  15. EnviroAtlas - Milwaukee, WI - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  16. EnviroAtlas - Tampa, FL - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  17. EnviroAtlas - Durham, NC - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  18. EnviroAtlas - Fresno, CA - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  19. EnviroAtlas - New Bedford, MA - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  20. EnviroAtlas - Woodbine, IA - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  1. EnviroAtlas - Woodbine, IA - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1 block group in Woodbine, Iowa. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  2. EnviroAtlas - Pittsburgh, PA - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1,089 block groups in Pittsburgh, Pennsylvania. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  3. EnviroAtlas - Portland, OR - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1176 block groups in Portland, Oregon. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (http:/www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  4. EnviroAtlas - Fresno, CA - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 405 block groups in Fresno, California. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  5. EnviroAtlas - New Bedford, MA - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 128 block group in New Bedford, Massachusetts. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  6. EnviroAtlas - Tampa, FL - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1,833 block groups in Tampa Bay, Florida. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  7. EnviroAtlas - Minneapolis/St. Paul, MN - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1,772 block groups in Minneapolis/St. Paul, Minnesota. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  8. EnviroAtlas - Cleveland, OH - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1,442 block groups in Cleveland, Ohio. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas ) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  9. EnviroAtlas - Milwaukee, WI - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 1,175 block groups in Milwaukee, Wisconsin. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  10. EnviroAtlas - Portland, ME - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 146 block groups in Portland, Maine. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  11. EnviroAtlas - Memphis, TN - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 703 block groups in Memphis, Tennessee. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  12. EnviroAtlas - Green Bay, WI - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 155 block groups in Green Bay, Wisconsin. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets ).

  13. EnviroAtlas - New York, NY - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. EnviroAtlas - Paterson, NJ - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  15. EnviroAtlas - Fresno, CA - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  16. EnviroAtlas - Green Bay, WI - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  17. EnviroAtlas - Des Moines, IA - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  18. EnviroAtlas - Minneapolis/St. Paul, MN - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  19. EnviroAtlas - Woodbine, IA - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  20. EnviroAtlas - Phoenix, AZ - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  1. EnviroAtlas - Pittsburgh, PA - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  2. EnviroAtlas - New Bedford, MA - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  3. EnviroAtlas - Milwaukee, WI - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  4. EnviroAtlas - Tampa, FL - 51m Riparian Buffer Vegetated Cover

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of a 51-m riparian buffer that is vegetated. There is a potential for decreased water quality in areas where the riparian buffer is less vegetated. The displayed line represents the center of the analyzed riparian buffer. The water bodies analyzed include hydrologically connected streams, rivers, connectors, reservoirs, lakes/ponds, ice masses, washes, locks, and rapids within the Atlas Area. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  5. EnviroAtlas - Austin, TX - Proximity to Parks

    EPA Pesticide Factsheets

    This EnviroAtlas dataset shows the approximate walking distance from a park entrance at any given location within the EnviroAtlas community boundary. The zones are estimated in 1/4 km intervals up to 1km then in 1km intervals up to 5km. Park entrances were included in this analysis if they were within 5km of the community boundary. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  6. EnviroAtlas - Austin, TX - Estimated Percent Tree Cover Along Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  7. EnviroAtlas - Austin, TX - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  8. EnviroAtlas - Austin, TX - Ecosystem Services by Block Group

    EPA Pesticide Factsheets

    This EnviroAtlas dataset presents environmental benefits of the urban forest in 750 block groups in Austin, Texas. Carbon attributes, temperature reduction, pollution removal and value, and runoff effects are calculated for each block group using i-Tree models (www.itreetools.org), local weather data, pollution data, EPA provided city boundary and land cover data, and U.S. Census derived block group boundary data. This dataset was produced by the US Forest Service to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  9. Explore Earth Science Datasets for STEM with the NASA GES DISC Online Visualization and Analysis Tool, GIOVANNI

    NASA Astrophysics Data System (ADS)

    Liu, Z.; Acker, J. G.; Kempler, S. J.

    2016-12-01

    The NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to research scientists, applications scientists, applications users, and students around the world. The GES DISC is the home (archive) of NASA Precipitation and Hydrology, as well as Atmospheric Composition and Dynamics remote sensing data and information. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI, http://giovanni.gsfc.nasa.gov/) allows users to explore satellite-based data using sophisticated analyses and visualizations without downloading data and software, which is particularly suitable for novices to use NASA datasets in STEM activities. In this presentation, we will briefly introduce GIOVANNI and recommend datasets for STEM. Examples of using these datasets in STEM activities will be presented as well.

  10. Explore Earth Science Datasets for STEM with the NASA GES DISC Online Visualization and Analysis Tool, Giovanni

    NASA Technical Reports Server (NTRS)

    Liu, Z.; Acker, J.; Kempler, S.

    2016-01-01

    The NASA Goddard Earth Sciences (GES) Data and Information Services Center(DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to users around the world including research and application scientists, students, citizen scientists, etc. The GESDISC is the home (archive) of remote sensing datasets for NASA Precipitation and Hydrology, Atmospheric Composition and Dynamics, etc. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels in different countries. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (Giovanni, http:giovanni.gsfc.nasa.gov) allows users to explore satellite-based datasets using sophisticated analyses and visualization without downloading data and software, which is particularly suitable for novices (such as students) to use NASA datasets in STEM (science, technology, engineering and mathematics) activities. In this presentation, we will briefly introduce Giovanni along with examples for STEM activities.

  11. EnviroAtlas - Cleveland, OH - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  12. EnviroAtlas - Portland, ME - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  13. EnviroAtlas - Portland, OR - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. EnviroAtlas - Durham, NC - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  15. EnviroAtlas - Tampa, FL - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  16. EnviroAtlas - Memphis, TN - Estimated Intersection Density of Walkable Roads

    EPA Pesticide Factsheets

    This EnviroAtlas dataset estimates the intersection density of walkable roads within a 750 meter radius of any given 10 meter pixel in the community. Intersections are defined as any point where 3 or more roads meet and density is calculated using kernel density, where closer intersections are weighted higher than further intersections. Intersection density is highly correlated with walking for transportation. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  17. EnviroAtlas - Portland, OR - 51m Riparian Buffer Forest Cover

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of a 51-m riparian buffer that is forested. There is a potential for decreased water quality in areas where the riparian buffer is less forested. The displayed line represents the center of the analyzed riparian buffer. The water bodies analyzed include hydrologically connected streams, rivers, connectors, reservoirs, lakes/ponds, ice masses, washes, locks, and rapids within the Atlas Area.This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (http:/www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  18. EnviroAtlas - Woodbine, Iowa - 51m Riparian Buffer Forest Cover

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of a 51-m riparian buffer that is forested. There is a potential for decreased water quality in areas where the riparian buffer is less forested. The displayed line represents the center of the analyzed riparian buffer. The water bodies analyzed include hydrologically connected streams, rivers, connectors, reservoirs, lakes/ponds, ice masses, washes, locks, and rapids within the Atlas Area.This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  19. EnviroAtlas - Milwaukee, WI - 51m Riparian Buffer Forest Cover

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of a 51-m riparian buffer that is forested. There is a potential for decreased water quality in areas where the riparian buffer is less forested. The displayed line represents the center of the analyzed riparian buffer. The water bodies analyzed include hydrologically connected streams, rivers, connectors, reservoirs, lakes/ponds, ice masses, washes, locks, and rapids within the Atlas Area. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  20. EnviroAtlas - Fresno, CA - 51m Riparian Buffer Vegetated Cover

    EPA Pesticide Factsheets

    This EnviroAtlas dataset describes the percentage of a 51-m riparian buffer that is vegetated. There is a potential for decreased water quality in areas where the riparian buffer is less vegetated. The displayed line represents the center of the analyzed riparian buffer. The water bodies analyzed include hydrologically connected streams, rivers, connectors, reservoirs, lakes/ponds, ice masses, washes, locks, and rapids within the Atlas Area. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

Top