exploring scientific datasets: Topics by Science.gov

Sample records for exploring scientific datasets

The Planetary Science Archive (PSA): Exploration and discovery of scientific datasets from ESA's planetary missions

NASA Astrophysics Data System (ADS)

Vallat, C.; Besse, S.; Barbarisi, I.; Arviset, C.; De Marchi, G.; Barthelemy, M.; Coia, D.; Costa, M.; Docasal, R.; Fraga, D.; Heather, D. J.; Lim, T.; Macfarlane, A.; Martinez, S.; Rios, C.; Vallejo, F.; Said, J.

2017-09-01

The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://psa.esa.int. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA has started to implement a number of significant improvements, mostly driven by the evolution of the PDS standards, and the growing need for better interfaces and advanced applications to support science exploitation.
The New Planetary Science Archive (PSA): Exploration and Discovery of Scientific Datasets from ESA's Planetary Missions

NASA Astrophysics Data System (ADS)

Heather, David; Besse, Sebastien; Vallat, Claire; Barbarisi, Isa; Arviset, Christophe; De Marchi, Guido; Barthelemy, Maud; Coia, Daniela; Costa, Marc; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; MacFarlane, Alan; Martinez, Santa; Rios, Carlos; Vallejo, Fran; Saiz, Jaime

2017-04-01

The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://psa.esa.int. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA is currently implementing a number of significant improvements, mostly driven by the evolution of the PDS standard, and the growing need for better interfaces and advanced applications to support science exploitation. As of the end of 2016, the PSA is hosting data from all of ESA's planetary missions. This includes ESA's first planetary mission Giotto that encountered comet 1P/Halley in 1986 with a flyby at 800km. Science data from Venus Express, Mars Express, Huygens and the SMART-1 mission are also all available at the PSA. The PSA also contains all science data from Rosetta, which explored comet 67P/Churyumov-Gerasimenko and asteroids Steins and Lutetia. The year 2016 has seen the arrival of the ExoMars 2016 data in the archive. In the upcoming years, at least three new projects are foreseen to be fully archived at the PSA. The BepiColombo mission is scheduled for launch in 2018. Following that, the ExoMars Rover Surface Platform (RSP) in 2020, and then the JUpiter ICy moon Explorer (JUICE). All of these will archive their data in the PSA. In addition, a few ground-based support programmes are also available, especially for the Venus Express and Rosetta missions. The newly designed PSA will enhance the user experience and will significantly reduce the complexity for users to find their data promoting one-click access to the scientific datasets with more customized views when needed. This includes a better integration with Planetary GIS analysis tools and Planetary interoperability services (search and retrieve data, supporting e.g. PDAP, EPN-TAP). It will also be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's ExoMars and upcoming BepiColombo missions. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. The new PSA interface was released in January 2017. The home page provides a direct and simple access to the scientific data, aiming to help scientists to discover and explore its content. The archive can be explored through a set of parameters that allow the selection of products through space and time. Quick views provide information needed for the selection of appropriate scientific products. During 2017, the PSA team will focus their efforts on developing a map search interface using GIS technologies to display ESA planetary datasets, an image gallery providing navigation through images to explore the datasets, and interoperability with international partners. This will be done in parallel with additional metadata searchable through the interface (i.e., geometry), and with a dedication to improve the content of 20 years of space exploration.
The new Planetary Science Archive (PSA): Exploration and discovery of scientific datasets from ESA's planetary missions

NASA Astrophysics Data System (ADS)

Martinez, Santa; Besse, Sebastien; Heather, Dave; Barbarisi, Isa; Arviset, Christophe; De Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Rios, Carlos; Vallejo, Fran; Saiz, Jaime; ESDC (European Space Data Centre) Team

2016-10-01

The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://archives.esac.esa.int/psa. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA is currently implementing a number of significant improvements, mostly driven by the evolution of the PDS standard, and the growing need for better interfaces and advanced applications to support science exploitation. The newly designed PSA will enhance the user experience and will significantly reduce the complexity for users to find their data promoting one-click access to the scientific datasets with more specialised views when needed. This includes a better integration with Planetary GIS analysis tools and Planetary interoperability services (search and retrieve data, supporting e.g. PDAP, EPN-TAP). It will be also up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's ExoMars and upcoming BepiColombo missions. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). This contribution will introduce the new PSA, its key features and access interfaces.
Parallel Index and Query for Large Scale Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

NASA Astrophysics Data System (ADS)

Heather, David

2016-07-01

Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further advanced search function will allow users to query all the metadata present in the PSA database. Results will be displayed in 3 different ways: 1) A table listing all the corresponding data matching the criteria in the filter menu, 2) a projection of the products onto the surface of the object when applicable (i.e. planets, small bodies), and 3) a list of images for the relevant instruments to enjoy the beauty of our Solar System. These different ways of viewing the datasets will ensure that scientists and non-professionals alike will have access to the specific data they are looking for, regardless of their background. Conclusions: The new PSA will maintain the various interfaces and services it had in the past, and will include significant improvements designed to allow easier and more effective access to the scientific data and supporting materials. The new PSA is expected to be released by mid-2016. It will support the past, present and future missions, ancillary datasets, and will enhance the scientific output of ESA's missions. As such, the PSA will become a unique archive ensuring the long-term preservation and usage of scientific datasets together with user-friendly access.
The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions.

NASA Astrophysics Data System (ADS)

Heather, David; Besse, Sebastien; Barbarisi, Isa; Arviset, Christophe; de Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Martinez, Santa; Rios, Carlos

2016-04-01

Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further advanced search function will allow users to query all the metadata present in the PSA database. Results will be displayed in 3 different ways: 1) A table listing all the corresponding data matching the criteria in the filter menu, 2) a projection of the products onto the surface of the object when applicable (i.e. planets, small bodies), and 3) a list of images for the relevant instruments to enjoy the beauty of our Solar System. These different ways of viewing the datasets will ensure that scientists and non-professionals alike will have access to the specific data they are looking for, regardless of their background. Conclusions: The new PSA will maintain the various interfaces and services it had in the past, and will include significant improvements designed to allow easier and more effective access to the scientific data and supporting materials. The new PSA is expected to be released by mid-2016. It will support the past, present and future missions, ancillary datasets, and will enhance the scientific output of ESA's missions. As such, the PSA will become a unique archive ensuring the long-term preservation and usage of scientific datasets together with user-friendly access.
Exploring Antarctic Land Surface Temperature Extremes Using Condensed Anomaly Databases

NASA Astrophysics Data System (ADS)

Grant, Glenn Edwin

Satellite observations have revolutionized the Earth Sciences and climate studies. However, data and imagery continue to accumulate at an accelerating rate, and efficient tools for data discovery, analysis, and quality checking lag behind. In particular, studies of long-term, continental-scale processes at high spatiotemporal resolutions are especially problematic. The traditional technique of downloading an entire dataset and using customized analysis code is often impractical or consumes too many resources. The Condensate Database Project was envisioned as an alternative method for data exploration and quality checking. The project's premise was that much of the data in any satellite dataset is unneeded and can be eliminated, compacting massive datasets into more manageable sizes. Dataset sizes are further reduced by retaining only anomalous data of high interest. Hosting the resulting "condensed" datasets in high-speed databases enables immediate availability for queries and exploration. Proof of the project's success relied on demonstrating that the anomaly database methods can enhance and accelerate scientific investigations. The hypothesis of this dissertation is that the condensed datasets are effective tools for exploring many scientific questions, spurring further investigations and revealing important information that might otherwise remain undetected. This dissertation uses condensed databases containing 17 years of Antarctic land surface temperature anomalies as its primary data. The study demonstrates the utility of the condensate database methods by discovering new information. In particular, the process revealed critical quality problems in the source satellite data. The results are used as the starting point for four case studies, investigating Antarctic temperature extremes, cloud detection errors, and the teleconnections between Antarctic temperature anomalies and climate indices. The results confirm the hypothesis that the condensate databases are a highly useful tool for Earth Science analyses. Moreover, the quality checking capabilities provide an important method for independent evaluation of dataset veracity.
Design of FastQuery: How to Generalize Indexing and Querying System for Scientific Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Jerry; Wu, Kesheng

2011-04-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit are critical for facilitating interactive exploration of large datasets. These technologies rely on adding auxiliary information to existing datasets to accelerate query processing. To use these indices, we need to match the relational data model used by the indexing systems with the array data model used by most scientific data, and to provide an efficient input and output layer for reading and writing the indices. In this work, we present a flexible design that can be easily applied to most scientific datamore » formats. We demonstrate this flexibility by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using simulation data from the particle accelerator and climate simulation communities. To demonstrate the effectiveness of the new design, we also present a detailed performance study using both synthetic and real scientific workloads.« less
The 3D widgets for exploratory scientific visualization

NASA Technical Reports Server (NTRS)

Herndon, Kenneth P.; Meyer, Tom

1995-01-01

Computational fluid dynamics (CFD) techniques are used to simulate flows of fluids like air or water around such objects as airplanes and automobiles. These techniques usually generate very large amounts of numerical data which are difficult to understand without using graphical scientific visualization techniques. There are a number of commercial scientific visualization applications available today which allow scientists to control visualization tools via textual and/or 2D user interfaces. However, these user interfaces are often difficult to use. We believe that 3D direct-manipulation techniques for interactively controlling visualization tools will provide opportunities for powerful and useful interfaces with which scientists can more effectively explore their datasets. A few systems have been developed which use these techniques. In this paper, we will present a variety of 3D interaction techniques for manipulating parameters of visualization tools used to explore CFD datasets, and discuss in detail various techniques for positioning tools in a 3D scene.
Drilling informatics: data-driven challenges of scientific drilling

NASA Astrophysics Data System (ADS)

Yamada, Yasuhiro; Kyaw, Moe; Saito, Sanny

2017-04-01

The primary aim of scientific drilling is to precisely understand the dynamic nature of the Earth. This is the reason why we investigate the subsurface materials (rock and fluid including microbial community) existing under particular environmental conditions. This requires sample collection and analytical data production from the samples, and in-situ data measurement at boreholes. Current available data comes from cores, cuttings, mud logging, geophysical logging, and exploration geophysics, but these datasets are difficult to be integrated because of their different kinds and scales. Now we are producing more useful datasets to fill the gap between the exiting data and extracting more information from such datasets and finally integrating the information. In particular, drilling parameters are very useful datasets as geomechanical properties. We believe such approach, 'drilling informatics', would be the most appropriate to obtain the comprehensive and dynamic picture of our scientific target, such as the seismogenic fault zone and the Moho discontinuity surface. This presentation introduces our initiative and current achievements of drilling informatics.
Robot Science Autonomy in the Atacama Desert and Beyond

NASA Technical Reports Server (NTRS)

Thompson, David R.; Wettergreen, David S.

2013-01-01

Science-guided autonomy augments rovers with reasoning to make observations and take actions related to the objectives of scientific exploration. When rovers can directly interpret instrument measurements then scientific goals can inform and adapt ongoing navigation decisions. These autonomous explorers will make better scientific observations and collect massive, accurate datasets. In current astrobiology studies in the Atacama Desert we are applying algorithms for science autonomy to choose effective observations and measurements. Rovers are able to decide when and where to take follow-up actions that deepen scientific understanding. These techniques apply to planetary rovers, which we can illustrate with algorithms now used by Mars rovers and by discussing future missions.
Unsupervised learning on scientific ocean drilling datasets from the South China Sea

NASA Astrophysics Data System (ADS)

Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.

2018-06-01

Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.
From SPICE to Map-Projection, the Planetary Science Archive Approach to Enhance Visibility and Usability of ESA's Space Science Data

NASA Astrophysics Data System (ADS)

Besse, S.; Vallat, C.; Geiger, B.; Grieger, B.; Costa, M.; Barbarisi, I.

2017-06-01

The Planetary Science Archive (PSA) is the European Space Agency’s (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://psa.esa.int.
What Google Maps can do for biomedical data dissemination: examples and a design study.

PubMed

Jianu, Radu; Laidlaw, David H

2013-05-04

Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
What google maps can do for biomedical data dissemination: examples and a design study

PubMed Central

2013-01-01

Background Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. Results We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. Conclusions We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations. PMID:23642009
Tracking Provenance of Earth Science Data

NASA Technical Reports Server (NTRS)

Tilmes, Curt; Yesha, Yelena; Halem, Milton

2010-01-01

Tremendous volumes of data have been captured, archived and analyzed. Sensors, algorithms and processing systems for transforming and analyzing the data are evolving over time. Web Portals and Services can create transient data sets on-demand. Data are transferred from organization to organization with additional transformations at every stage. Provenance in this context refers to the source of data and a record of the process that led to its current state. It encompasses the documentation of a variety of artifacts related to particular data. Provenance is important for understanding and using scientific datasets, and critical for independent confirmation of scientific results. Managing provenance throughout scientific data processing has gained interest lately and there are a variety of approaches. Large scale scientific datasets consisting of thousands to millions of individual data files and processes offer particular challenges. This paper uses the analogy of art history provenance to explore some of the concerns of applying provenance tracking to earth science data. It also illustrates some of the provenance issues with examples drawn from the Ozone Monitoring Instrument (OMI) Data Processing System (OMIDAPS) run at NASA's Goddard Space Flight Center by the first author.
Semi-automated surface mapping via unsupervised classification

NASA Astrophysics Data System (ADS)

D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.

2017-09-01

Due to the increasing volume of the returned data from space mission, the human search for correlation and identification of interesting features becomes more and more unfeasible. Statistical extraction of features via machine learning methods will increase the scientific output of remote sensing missions and aid the discovery of yet unknown feature hidden in dataset. Those methods exploit algorithm trained on features from multiple instrument, returning classification maps that explore intra-dataset correlation, allowing for the discovery of unknown features. We present two applications, one for Mercury and one for Vesta.
3D Feature Extraction for Unstructured Grids

NASA Technical Reports Server (NTRS)

Silver, Deborah

1996-01-01

Visualization techniques provide tools that help scientists identify observed phenomena in scientific simulation. To be useful, these tools must allow the user to extract regions, classify and visualize them, abstract them for simplified representations, and track their evolution. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This article explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and those from Finite Element Analysis.
When I Grow Up: The Relationship of "Science Learning Activation" to STEM Career Preferences

ERIC Educational Resources Information Center

Dorph, Rena; Bathgate, Meghan E.; Schunn, Christian D.; Cannady, Matthew A.

2018-01-01

This paper proposes three new measures of components STEM career preferences (affinity, certainty, and goal), and then explores which dimensions of "science learning activation" (fascination, values, competency belief, and scientific sensemaking) are predictive of STEM career preferences. Drawn from the ALES14 dataset, a sample of 2938…
Linking Automated Data Analysis and Visualization with Applications in Developmental Biology and High-Energy Physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruebel, Oliver

2009-11-20

Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less

Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

NASA Astrophysics Data System (ADS)

Nguyen, L.; Chee, T.; Minnis, P.

2013-12-01

Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.
Lessons Learned while Exploring Cloud-Native Architectures for NASA EOSDIS Applications and Systems

NASA Technical Reports Server (NTRS)

Pilone, Dan

2016-01-01

As new, high data rate missions begin collecting data, the NASAs Earth Observing System Data and Information System (EOSDIS) archive is projected to grow roughly 20x to over 300PBs by 2025. To prepare for the dramatic increase in data and enable broad scientific inquiry into larger time series and datasets, NASA has been exploring the impact of applying cloud technologies throughout EOSDIS. In this talk we will provide an overview of NASAs prototyping and lessons learned in applying cloud architectures.
Making the MagIC (Magnetics Information Consortium) Web Application Accessible to New Users and Useful to Experts

NASA Astrophysics Data System (ADS)

Minnett, R.; Koppers, A.; Jarboe, N.; Tauxe, L.; Constable, C.; Jonestrask, L.

2017-12-01

Challenges are faced by both new and experienced users interested in contributing their data to community repositories, in data discovery, or engaged in potentially transformative science. The Magnetics Information Consortium (https://earthref.org/MagIC) has recently simplified its data model and developed a new containerized web application to reduce the friction in contributing, exploring, and combining valuable and complex datasets for the paleo-, geo-, and rock magnetic scientific community. The new data model more closely reflects the hierarchical workflow in paleomagnetic experiments to enable adequate annotation of scientific results and ensure reproducibility. The new open-source (https://github.com/earthref/MagIC) application includes an upload tool that is integrated with the data model to provide early data validation feedback and ease the friction of contributing and updating datasets. The search interface provides a powerful full text search of contributions indexed by ElasticSearch and a wide array of filters, including specific geographic and geological timescale filtering, to support both novice users exploring the database and experts interested in compiling new datasets with specific criteria across thousands of studies and millions of measurements. The datasets are not large, but they are complex, with many results from evolving experimental and analytical approaches. These data are also extremely valuable due to the cost in collecting or creating physical samples and the, often, destructive nature of the experiments. MagIC is heavily invested in encouraging young scientists as well as established labs to cultivate workflows that facilitate contributing their data in a consistent format. This eLightning presentation includes a live demonstration of the MagIC web application, developed as a configurable container hosting an isomorphic Meteor JavaScript application, MongoDB database, and ElasticSearch search engine. Visitors can explore the MagIC Database through maps and image or plot galleries or search and filter the raw measurements and their derived hierarchy of analytical interpretations.
magHD: a new approach to multi-dimensional data storage, analysis, display and exploitation

NASA Astrophysics Data System (ADS)

Angleraud, Christophe

2014-06-01

The ever increasing amount of data and processing capabilities - following the well- known Moore's law - is challenging the way scientists and engineers are currently exploiting large datasets. The scientific visualization tools, although quite powerful, are often too generic and provide abstract views of phenomena, thus preventing cross disciplines fertilization. On the other end, Geographic information Systems allow nice and visually appealing maps to be built but they often get very confused as more layers are added. Moreover, the introduction of time as a fourth analysis dimension to allow analysis of time dependent phenomena such as meteorological or climate models, is encouraging real-time data exploration techniques that allow spatial-temporal points of interests to be detected by integration of moving images by the human brain. Magellium is involved in high performance image processing chains for satellite image processing as well as scientific signal analysis and geographic information management since its creation (2003). We believe that recent work on big data, GPU and peer-to-peer collaborative processing can open a new breakthrough in data analysis and display that will serve many new applications in collaborative scientific computing, environment mapping and understanding. The magHD (for Magellium Hyper-Dimension) project aims at developing software solutions that will bring highly interactive tools for complex datasets analysis and exploration commodity hardware, targeting small to medium scale clusters with expansion capabilities to large cloud based clusters.
Redesigning the DOE Data Explorer to embed dataset relationships at the point of search and to reflect landing page organization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studwell, Sara; Robinson, Carly; Elliott, Jannean

Scientific research is producing ever-increasing amounts of data. Organizing and reflecting relationships across data collections, datasets, publications, and other research objects are essential functionalities of the modern science environment, yet challenging to implement. Landing pages are often used for providing ‘big picture’ contextual frameworks for datasets and data collections, and many large-volume data holders are utilizing them in thoughtful, creative ways. The benefits of their organizational efforts, however, are not realized unless the user eventually sees the landing page at the end point of their search. What if that organization and ‘big picture’ context could benefit the user at themore » beginning of the search? That is a challenging approach, but The Department of Energy’s (DOE) Office of Scientific and Technical Information (OSTI) is redesigning the database functionality of the DOE Data Explorer (DDE) with that goal in mind. Phase I is focused on redesigning the DDE database to leverage relationships between two existing distinct populations in DDE, data Projects and individual Datasets, and then adding a third intermediate population, data Collections. Mapped, structured linkages, designed to show user relationships, will allow users to make informed search choices. These linkages will be sustainable and scalable, created automatically with the use of new metadata fields and existing authorities. Phase II will study selected DOE Data ID Service clients, analyzing how their landing pages are organized, and how that organization might be used to improve DDE search capabilities. At the heart of both phases is the realization that adding more metadata information for cross-referencing may require additional effort for data scientists. Finally, OSTI’s approach seeks to leverage existing metadata and landing page intelligence without imposing an additional burden on the data creators.« less
Redesigning the DOE Data Explorer to embed dataset relationships at the point of search and to reflect landing page organization

DOE PAGES

Studwell, Sara; Robinson, Carly; Elliott, Jannean

2017-04-04

Scientific research is producing ever-increasing amounts of data. Organizing and reflecting relationships across data collections, datasets, publications, and other research objects are essential functionalities of the modern science environment, yet challenging to implement. Landing pages are often used for providing ‘big picture’ contextual frameworks for datasets and data collections, and many large-volume data holders are utilizing them in thoughtful, creative ways. The benefits of their organizational efforts, however, are not realized unless the user eventually sees the landing page at the end point of their search. What if that organization and ‘big picture’ context could benefit the user at themore » beginning of the search? That is a challenging approach, but The Department of Energy’s (DOE) Office of Scientific and Technical Information (OSTI) is redesigning the database functionality of the DOE Data Explorer (DDE) with that goal in mind. Phase I is focused on redesigning the DDE database to leverage relationships between two existing distinct populations in DDE, data Projects and individual Datasets, and then adding a third intermediate population, data Collections. Mapped, structured linkages, designed to show user relationships, will allow users to make informed search choices. These linkages will be sustainable and scalable, created automatically with the use of new metadata fields and existing authorities. Phase II will study selected DOE Data ID Service clients, analyzing how their landing pages are organized, and how that organization might be used to improve DDE search capabilities. At the heart of both phases is the realization that adding more metadata information for cross-referencing may require additional effort for data scientists. Finally, OSTI’s approach seeks to leverage existing metadata and landing page intelligence without imposing an additional burden on the data creators.« less
[Scientific significance and prospective application of digitized virtual human].

PubMed

Zhong, Shi-zhen

2003-03-01

As a cutting-edge research project, digitization of human anatomical information combines conventional medicine with information technology, computer technology, and virtual reality technology. Recent years have seen the establishment of, or the ongoing effort to establish various virtual human models in many countries, on the basis of continuous sections of human body that are digitized by means of computational medicine incorporating information technology to quantitatively simulate human physiological and pathological conditions, and to provide wide prospective applications in the fields of medicine and other disciplines. This article addresses 4 issues concerning the progress in virtual human model researches as the following: (1) Worldwide survey of sectioning and modeling of visible human. American visible human database was completed in 1994, which contains both a male and a female datasets, and has found wide application internationally. South Korea also finished the data collection for a male visible Korean human dataset in 2000. (2) Application of the dataset of Visible Human Project (VHP). This dataset has yielded plentiful fruits in medical education and clinical research, and further plans are proposed and practiced to construct a Physical Human and Physiological Human . (3) Scientific significance and prospect of virtual human studies. Digitized human dataset may eventually contribute to the development of many new high-tech industries. (4) Progress of virtual Chinese human project. The 174th session of Xiangshang Science Conferences held in 2001 marked the initiation of digitized virtual human project in China, and some key techniques have been explored. By now the data-collection process for 4 Chinese virtual human datasets have been successfully completed.
Interacting with Petabytes of Earth Science Data using Jupyter Notebooks, IPython Widgets and Google Earth Engine

NASA Astrophysics Data System (ADS)

Erickson, T. A.; Granger, B.; Grout, J.; Corlay, S.

2017-12-01

The volume of Earth science data gathered from satellites, aircraft, drones, and field instruments continues to increase. For many scientific questions in the Earth sciences, managing this large volume of data is a barrier to progress, as it is difficult to explore and analyze large volumes of data using the traditional paradigm of downloading datasets to a local computer for analysis. Furthermore, methods for communicating Earth science algorithms that operate on large datasets in an easily understandable and reproducible way are needed. Here we describe a system for developing, interacting, and sharing well-documented Earth Science algorithms that combines existing software components: Jupyter Notebook: An open-source, web-based environment that supports documents that combine code and computational results with text narrative, mathematics, images, and other media. These notebooks provide an environment for interactive exploration of data and development of well documented algorithms. Jupyter Widgets / ipyleaflet: An architecture for creating interactive user interface controls (such as sliders, text boxes, etc.) in Jupyter Notebooks that communicate with Python code. This architecture includes a default set of UI controls (sliders, dropboxes, etc.) as well as APIs for building custom UI controls. The ipyleaflet project is one example that offers a custom interactive map control that allows a user to display and manipulate geographic data within the Jupyter Notebook. Google Earth Engine: A cloud-based geospatial analysis platform that provides access to petabytes of Earth science data via a Python API. The combination of Jupyter Notebooks, Jupyter Widgets, ipyleaflet, and Google Earth Engine makes it possible to explore and analyze massive Earth science datasets via a web browser, in an environment suitable for interactive exploration, teaching, and sharing. Using these environments can make Earth science analyses easier to understand and reproducible, which may increase the rate of scientific discoveries and the transition of discoveries into real-world impacts.
Making Geoscience Data Relevant for Students, Teachers, and the Public

NASA Astrophysics Data System (ADS)

Taber, M.; Ledley, T. S.; Prakash, A.; Domenico, B.

2009-12-01

The scientific data collected by government funded research belongs to the public. As such, the scientific and technical communities are responsible to make scientific data accessible and usable by the educational community. However, much geoscience data are difficult for educators and students to find and use. Such data are generally described by metadata that are narrowly focused and contain scientific language. Thus, data access presents a challenge to educators in determining if a particular dataset is relevant to their needs, and to effectively access and use the data. The AccessData project (EAR-0623136, EAR-0305058) has developed a model for bridging the scientific and educational communities to develop robust inquiry-based activities using scientific datasets in the form of Earth Exploration Toolbook (EET, http://serc.carleton.edu/eet) chapters. EET chapters provide step-by-step instructions for accessing specific data and analyzing it with a software analysis tool to explore issues or concepts in science, technology, and mathematics. The AccessData model involves working directly with small teams made up of data providers from scientific data archives or research teams, data analysis tool specialists, scientists, curriculum developers, and educators (AccessData, http://serc.carleton.edu/usingdata/accessdata). The process involves a number of steps including 1) building of the team; 2) pre-workshop facilitation; 3) face-to-face 2.5 day workshop; 4) post-workshop follow-up; 5) completion and review of the EET chapter. The AccessData model has been evolved over a series of six annual workshops hosting ~10 teams each. This model has been expanded to other venues to explore expanding its scope and sustainable mechanisms. These venues include 1) workshops focused on the data collected by a large research program (RIDGE, EarthScope); 2) a workshop focused on developing a citizen scientist guide to conducting research; and 3) facilitating a team on an annual basis within the structure of the Federation of Earth Science Information Partners (ESIP Federation), leveraging their semi-annual meetings. In this presentation we will describe the AccessData model of making geoscience data accessible and usable in educational contexts from the perspective of both the organizers and from a team. We will also describe how this model has been adapted to other contexts to facilitate a broader reach of geoscience data.
Dataset of Scientific Inquiry Learning Environment

ERIC Educational Resources Information Center

Ting, Choo-Yee; Ho, Chiung Ching

2015-01-01

This paper presents the dataset collected from student interactions with INQPRO, a computer-based scientific inquiry learning environment. The dataset contains records of 100 students and is divided into two portions. The first portion comprises (1) "raw log data", capturing the student's name, interfaces visited, the interface…
Handwritten mathematical symbols dataset.

PubMed

Chajri, Yassine; Bouikhalene, Belaid

2016-06-01

Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc.
Dark Matter and Super Symmetry: Exploring and Explaining the Universe with Simulations at the LHC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gutsche, Oliver

The Large Hadron Collider (LHC) at CERN in Geneva, Switzerland, is one of the largest machines on this planet. It is built to smash protons into each other at unprecedented energies to reveal the fundamental constituents of our universe. The 4 detectors at the LHC record multi-petabyte datasets every year. The scientific analysis of this data requires equally large simulation datasets of the collisions based on the theory of particle physics, the Standard Model. The goal is to verify the validity of the Standard Model or of theories that extend the Model like the concepts of Supersymmetry and an explanationmore » of Dark Matter. I will give an overview of the nature of simulations needed to discover new particles like the Higgs boson in 2012, and review the different areas where simulations are indispensable: from the actual recording of the collisions to the extraction of scientific results to the conceptual design of improvements to the LHC and its experiments.« less
Handwritten mathematical symbols dataset

PubMed Central

Chajri, Yassine; Bouikhalene, Belaid

2016-01-01

Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc. PMID:27006975
High-priority lunar landing sites for in situ and sample return studies of polar volatiles

NASA Astrophysics Data System (ADS)

Lemelin, Myriam; Blair, David M.; Roberts, Carolyn E.; Runyon, Kirby D.; Nowka, Daniela; Kring, David A.

2014-10-01

Our understanding of the Moon has advanced greatly over the last several decades thanks to analyses of Apollo samples and lunar meteorites, and recent lunar orbital missions. Notably, it is now thought that the lunar poles may be much more enriched in H2O and other volatile chemical species than the equatorial regions sampled during the Apollo missions. The equatorial regions sampled, themselves, contain more H2O than previously thought. A new lunar mission to a polar region is therefore of great interest; it could provide a measure of the sources and processes that deliver volatiles while also evaluating the potential in situ resource utilization value they may have for human exploration. In this study, we determine the optimal sites for studying lunar volatiles by conducting a quantitative GIS-based spatial analysis of multiple relevant datasets. The datasets include the locations of permanently shadowed regions, thermal analyses of the lunar surface, and hydrogen abundances. We provide maps of the lunar surface showing areas of high scientific interest, including five regions near the lunar north pole and seven regions near the lunar south pole that have the highest scientific potential according to rational search criteria. At two of these sites-a region we call the “Intercrater Polar Highlands” (IPH) near the north pole, and Amundsen crater near the south pole-we provide a more detailed assessment of landing sites, sample locations, and exploration strategies best suited for future human or robotic exploration missions.
Network effects on scientific collaborations.

PubMed

Uddin, Shahadat; Hossain, Liaquat; Rasmussen, Kim

2013-01-01

The analysis of co-authorship network aims at exploring the impact of network structure on the outcome of scientific collaborations and research publications. However, little is known about what network properties are associated with authors who have increased number of joint publications and are being cited highly. Measures of social network analysis, for example network centrality and tie strength, have been utilized extensively in current co-authorship literature to explore different behavioural patterns of co-authorship networks. Using three SNA measures (i.e., degree centrality, closeness centrality and betweenness centrality), we explore scientific collaboration networks to understand factors influencing performance (i.e., citation count) and formation (tie strength between authors) of such networks. A citation count is the number of times an article is cited by other articles. We use co-authorship dataset of the research field of 'steel structure' for the year 2005 to 2009. To measure the strength of scientific collaboration between two authors, we consider the number of articles co-authored by them. In this study, we examine how citation count of a scientific publication is influenced by different centrality measures of its co-author(s) in a co-authorship network. We further analyze the impact of the network positions of authors on the strength of their scientific collaborations. We use both correlation and regression methods for data analysis leading to statistical validation. We identify that citation count of a research article is positively correlated with the degree centrality and betweenness centrality values of its co-author(s). Also, we reveal that degree centrality and betweenness centrality values of authors in a co-authorship network are positively correlated with the strength of their scientific collaborations. Authors' network positions in co-authorship networks influence the performance (i.e., citation count) and formation (i.e., tie strength) of scientific collaborations.
Exploring the influence of social activity on scientific career

NASA Astrophysics Data System (ADS)

Xie, Zonglin; Xie, Zheng; Li, Jianping; Yang, Qian

2018-06-01

For researchers, does activity in academic society influence their careers? In scientometrics, the activity can be expressed through the number of collaborators and scientific careers through the number of publications and citations of authors. We provide empirical evidence from four datasets of representative journals and explore the correlations between each two of the three indices. By using a hypothetical extraction method, we divide authors into patterns which can reflect the different extent of preference for social activity, according to their contributions to the correlation between the number of collaborators and that of papers. Furthermore, we choose two of the patterns as a sociable one and an unsociable one and then compare both of the expected value and the distribution of publications and citations for authors between sociable pattern and unsociable pattern. Finally, we draw a conclusion that social activity could be favorable for authors to promote academic outcomes and obtain recognition.
Benchmark Dataset for Whole Genome Sequence Compression.

PubMed

C L, Biji; S Nair, Achuthsankar

2017-01-01

The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.
Blazing Signature Filter: a library for fast pairwise similarity comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Joon-Yong; Fujimoto, Grant M.; Wilson, Ryan

Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is the vast majoritymore » of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.« less
Hearing Nano-Structures: A Case Study in Timbral Sonification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schedel, M.; Yager, K.

2012-06-18

We explore the sonification of x-ray scattering data, which are two-dimensional arrays of intensity whose meaning is obscure and non-intuitive. Direct mapping of the experimental data into sound is found to produce timbral sonifications that, while sacrificing conventional aesthetic appeal, provide a rich auditory landscape for exploration. We discuss the optimization of sonification variables, and speculate on potential real-world applications. We have presented a case study of sonifying x-ray scattering data. Direct mapping of the two-dimensional intensity values of a scattering dataset into the two-dimensional matrix of a sonogram is a natural and information-preserving operation that creates rich sounds. Ourmore » work supports the notion that many problems in understanding rather abstract scientific datasets can be ameliorated by adding the auditory modality of sonification. We further emphasize that sonification need not be limited to time-series data: any data matrix is amenable. Timbral sonification is less obviously aesthetic, than tonal sonification, which generate melody, harmony, or rhythm. However these musical sonifications necessarily sacrifice information content for beauty. Timbral sonification is useful because the entire dataset is represented. Non-musicians can understand the data through the overall color of the sound; audio experts can extract more detailed insight by studying all the features of the sound.« less
Network Effects on Scientific Collaborations

PubMed Central

Uddin, Shahadat; Hossain, Liaquat; Rasmussen, Kim

2013-01-01

Background The analysis of co-authorship network aims at exploring the impact of network structure on the outcome of scientific collaborations and research publications. However, little is known about what network properties are associated with authors who have increased number of joint publications and are being cited highly. Methodology/Principal Findings Measures of social network analysis, for example network centrality and tie strength, have been utilized extensively in current co-authorship literature to explore different behavioural patterns of co-authorship networks. Using three SNA measures (i.e., degree centrality, closeness centrality and betweenness centrality), we explore scientific collaboration networks to understand factors influencing performance (i.e., citation count) and formation (tie strength between authors) of such networks. A citation count is the number of times an article is cited by other articles. We use co-authorship dataset of the research field of ‘steel structure’ for the year 2005 to 2009. To measure the strength of scientific collaboration between two authors, we consider the number of articles co-authored by them. In this study, we examine how citation count of a scientific publication is influenced by different centrality measures of its co-author(s) in a co-authorship network. We further analyze the impact of the network positions of authors on the strength of their scientific collaborations. We use both correlation and regression methods for data analysis leading to statistical validation. We identify that citation count of a research article is positively correlated with the degree centrality and betweenness centrality values of its co-author(s). Also, we reveal that degree centrality and betweenness centrality values of authors in a co-authorship network are positively correlated with the strength of their scientific collaborations. Conclusions/Significance Authors’ network positions in co-authorship networks influence the performance (i.e., citation count) and formation (i.e., tie strength) of scientific collaborations. PMID:23469021

Interactive Scripting for Analysis and Visualization of Arbitrarily Large, Disparately Located Climate Data Ensembles Using a Progressive Runtime Server

NASA Astrophysics Data System (ADS)

Christensen, C.; Summa, B.; Scorzelli, G.; Lee, J. W.; Venkat, A.; Bremer, P. T.; Pascucci, V.

2017-12-01

Massive datasets are becoming more common due to increasingly detailed simulations and higher resolution acquisition devices. Yet accessing and processing these huge data collections for scientific analysis is still a significant challenge. Solutions that rely on extensive data transfers are increasingly untenable and often impossible due to lack of sufficient storage at the client side as well as insufficient bandwidth to conduct such large transfers, that in some cases could entail petabytes of data. Large-scale remote computing resources can be useful, but utilizing such systems typically entails some form of offline batch processing with long delays, data replications, and substantial cost for any mistakes. Both types of workflows can severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. In order to facilitate interactivity in both analysis and visualization of these massive data ensembles, we introduce a dynamic runtime system suitable for progressive computation and interactive visualization of arbitrarily large, disparately located spatiotemporal datasets. Our system includes an embedded domain-specific language (EDSL) that allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible processing. Computations involving large amounts of data can be performed remotely in an incremental fashion that dramatically reduces data movement, while the client receives updates progressively thereby remaining robust to fluctuating network latency or limited bandwidth. This system facilitates interactive, incremental analysis and visualization of massive remote datasets up to petabytes in size. Our system is now available for general use in the community through both docker and anaconda.
Enabling Climate Science Investigations by Students Using Cryosphere Climate Data Records (CDRs)

NASA Astrophysics Data System (ADS)

Ledley, T. S.; Youngman, B.; Meier, W.; Bardar, E.

2010-12-01

The polar regions are particularly sensitive to changes in the climate system, and as such changes can be recognized there first. Scientists make use of this to help them develop and execute research programs that will deepen and expand our understanding of the climate system. However, the same cryosphere CDRs collected by scientists are a useful and reliable resource for helping students investigate and discover the manifestations and implications of global climate change. We have developed a number of avenues to facilitate the use of cryosphere CDRs in educational contexts. These include the Earth Exploration Toolbook (EET, http://serc.carleton.edu/eet), DataSheets (http://serc.carleton.edu/usingdata/browse_sheets.html), and Cryosphere-EarthLabs (http://serc.carleton.edu/dev/earthlabs/cryosphere). The EET is an online resource comprised of “chapters”, each of which focuses on a specific Earth science dataset and data analysis tool. Chapters provide step-by-step instructions for accessing the dataset and analysis tool, putting the data into the tool, and conducting an analysis around a specific scientific concept or issue. There are a number of EET chapters that utilize cryosphere CDRs. The EET chapter “Whither Arctic Sea Ice?” uses ~30 years of Arctic sea ice extent images and image processing software to study changes in sea ice extent. “Is Greenland Melting?” uses ice thickness data, ice melting extents and weather station data to examine the changes in the Greenland Ice Sheet. Other EET chapters that utilize cryosphere CDRs include “Using NASA NEO and ImageJ to Explore the Role of Snow Cover in Shaping Climate” and “Envisioning Climate Change Using a Global Climate Model.” In addition to creating these activities to facilitate the use of cryosphere CDRs we have also created DataSheets for these CDRs. DataSheets are educationally relevant human readable metadata about a dataset that provide both the scientific background information about the dataset as well as the topics and skills that can be taught using the dataset. DataSheets enable an educator to make effective use of a dataset outside the context of an educational activity. A DataSheet created for the sea ice index used in the “Whither Arctic Sea Ice? EET chapter is “Exploring Sea Ice Data From Satellites.” An EarthLabs module is a suite of 7-9 labs intended to be the laboratory component of a high-school capstone Earth and Space Science course. The Cryosphere-EarthLabs module focuses on sea ice to help students deepen their understanding of change over time in the climate system on multiple and embedded time scales. The module contains hands-on activities and investigations using online cryosphere CDRs to help students understand the how sea ice forms and varies, how the cryosphere changes, and the causes of those changes on time scales ranging from the seasonal to ice age time scales. In this presentation we will examine the EET and EarthLabs resources that help educators and students explore climate change using cryosphere CDRs; examine the DataSheets for these datasets; and describe how your cryosphere CDRs can be made available through these resources.
High performance geospatial and climate data visualization using GeoJS

NASA Astrophysics Data System (ADS)

Chaudhary, A.; Beezley, J. D.

2015-12-01

GeoJS (https://github.com/OpenGeoscience/geojs) is an open-source library developed to support interactive scientific and geospatial visualization of climate and earth science datasets in a web environment. GeoJS has a convenient application programming interface (API) that enables users to harness the fast performance of WebGL and Canvas 2D APIs with sophisticated Scalable Vector Graphics (SVG) features in a consistent and convenient manner. We started the project in response to the need for an open-source JavaScript library that can combine traditional geographic information systems (GIS) and scientific visualization on the web. Many libraries, some of which are open source, support mapping or other GIS capabilities, but lack the features required to visualize scientific and other geospatial datasets. For instance, such libraries are not be capable of rendering climate plots from NetCDF files, and some libraries are limited in regards to geoinformatics (infovis in a geospatial environment). While libraries such as d3.js are extremely powerful for these kinds of plots, in order to integrate them into other GIS libraries, the construction of geoinformatics visualizations must be completed manually and separately, or the code must somehow be mixed in an unintuitive way.We developed GeoJS with the following motivations:• To create an open-source geovisualization and GIS library that combines scientific visualization with GIS and informatics• To develop an extensible library that can combine data from multiple sources and render them using multiple backends• To build a library that works well with existing scientific visualizations tools such as VTKWe have successfully deployed GeoJS-based applications for multiple domains across various projects. The ClimatePipes project funded by the Department of Energy, for example, used GeoJS to visualize NetCDF datasets from climate data archives. Other projects built visualizations using GeoJS for interactively exploring data and analysis regarding 1) the human trafficking domain, 2) New York City taxi drop-offs and pick-ups, and 3) the Ebola outbreak. GeoJS supports advanced visualization features such as picking and selecting, as well as clustering. It also supports 2D contour plots, vector plots, heat maps, and geospatial graphs.
Brokering technologies to realize the hydrology scenario in NSF BCube

NASA Astrophysics Data System (ADS)

Boldrini, Enrico; Easton, Zachary; Fuka, Daniel; Pearlman, Jay; Nativi, Stefano

2015-04-01

In the National Science Foundation (NSF) BCube project an international team composed of cyber infrastructure experts, geoscientists, social scientists and educators are working together to explore the use of brokering technologies, initially focusing on four domains: hydrology, oceans, polar, and weather. In the hydrology domain, environmental models are fundamental to understand the behaviour of hydrological systems. A specific model usually requires datasets coming from different disciplines for its initialization (e.g. elevation models from Earth observation, weather data from Atmospheric sciences, etc.). Scientific datasets are usually available on heterogeneous publishing services, such as inventory and access services (e.g. OGC Web Coverage Service, THREDDS Data Server, etc.). Indeed, datasets are published according to different protocols, moreover they usually come in different formats, resolutions, Coordinate Reference Systems (CRSs): in short different grid environments depending on the original data and the publishing service processing capabilities. Scientists can thus be impeded by the burden of discovery, access and normalize the desired datasets to the grid environment required by the model. These technological tasks of course divert scientists from their main, scientific goals. The use of GI-axe brokering framework has been experimented in a hydrology scenario where scientists needed to compare a particular hydrological model with two different input datasets (digital elevation models): - the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) dataset, v.2. - the Shuttle Radar Topography Mission (SRTM) dataset, v.3. These datasets were published by means of Hyrax Server technology, which can provide NetCDF files at their original resolution and CRS. Scientists had their model running on ArcGIS, so the main goal was to import the datasets using the available ArcPy library and have EPSG:4326 with the same resolution grid as the reference system, so that model outputs could be compared. ArcPy however is able to access only GeoTIff datasets that are published by a OGC Web Coverage Service (WCS). The GI-axe broker has then been deployed between the client application and the data providers. It has been configured to broker the two different Hyrax service endpoints and republish the data content through a WCS interface for the use of the ArcPy library. Finally, scientists were able to easily run the model, and to concentrate on the comparison of the different results obtained according to the selected input dataset. The use of a third party broker to perform such technological tasks has also shown to have the potential advantage of increasing the repeatability of a study among different researchers.
A Tropical Marine Microbial Natural Products Geobibliography as an Example of Desktop Exploration of Current Research Using Web Visualisation Tools

PubMed Central

Mukherjee, Joydeep; Llewellyn, Lyndon E; Evans-Illidge, Elizabeth A

2008-01-01

Microbial marine biodiscovery is a recent scientific endeavour developing at a time when information and other technologies are also undergoing great technical strides. Global visualisation of datasets is now becoming available to the world through powerful and readily available software such as Worldwind™, ArcGIS Explorer™ and Google Earth™. Overlaying custom information upon these tools is within the hands of every scientist and more and more scientific organisations are making data available that can also be integrated into these global visualisation tools. The integrated global view that these tools enable provides a powerful desktop exploration tool. Here we demonstrate the value of this approach to marine microbial biodiscovery by developing a geobibliography that incorporates citations on tropical and near-tropical marine microbial natural products research with Google Earth™ and additional ancillary global data sets. The tools and software used are all readily available and the reader is able to use and install the material described in this article. PMID:19172194
Using "Big Data" in a Classroom Setting for Student-Developed Projects

NASA Astrophysics Data System (ADS)

Hayes-Gehrke, Melissa; Vogel, Stuart N.

2018-01-01

The advances in exploration of the optical transient sky anticipated with major facilities such as the Zwicky Transient Facility (ZTF) and Large Synoptic Survey Telescope (LSST) provide an opportunity to integrate large public research datasets into the undergraduate classroom. As a step in this direction, the NSF PIRE-funded GROWTH (Global Relay of Observatories Watching Transients Happen) collaboration provided funding for curriculum development using data from the precursor to ZTF, the Intermediate Palomar Transient Factory (iPTF). One of the iPTF portals, the PTF Variable Marshal, was used by 56 Astronomy majors in the fall 2016 and 2017 semesters of the required Observational Astronomy course at the University of Maryland. Student teams learned about the iPTF survey and how to use the PTF Variable Marshal and then developed their own hypotheses about variable stars to test using data they gathered from the Variable Marshal. Through this project, students gained experience in how to develop scientific questions that can be explored using large datasets and became aware of the limitations and difficulties of such projects. This work was supported in part by NSF award OISE-1545949.
Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

PubMed

Yang, Ziheng; Zhu, Tianqi

2018-02-20

The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.
Automating Geospatial Visualizations with Smart Default Renderers for Data Exploration Web Applications

NASA Astrophysics Data System (ADS)

Ekenes, K.

2017-12-01

This presentation will outline the process of creating a web application for exploring large amounts of scientific geospatial data using modern automated cartographic techniques. Traditional cartographic methods, including data classification, may inadvertently hide geospatial and statistical patterns in the underlying data. This presentation demonstrates how to use smart web APIs that quickly analyze the data when it loads, and provides suggestions for the most appropriate visualizations based on the statistics of the data. Since there are just a few ways to visualize any given dataset well, it is imperative to provide smart default color schemes tailored to the dataset as opposed to static defaults. Since many users don't go beyond default values, it is imperative that they are provided with smart default visualizations. Multiple functions for automating visualizations are available in the Smart APIs, along with UI elements allowing users to create more than one visualization for a dataset since there isn't a single best way to visualize a given dataset. Since bivariate and multivariate visualizations are particularly difficult to create effectively, this automated approach removes the guesswork out of the process and provides a number of ways to generate multivariate visualizations for the same variables. This allows the user to choose which visualization is most appropriate for their presentation. The methods used in these APIs and the renderers generated by them are not available elsewhere. The presentation will show how statistics can be used as the basis for automating default visualizations of data along continuous ramps, creating more refined visualizations while revealing the spread and outliers of the data. Adding interactive components to instantaneously alter visualizations allows users to unearth spatial patterns previously unknown among one or more variables. These applications may focus on a single dataset that is frequently updated, or configurable for a variety of datasets from multiple sources.
Publishing NASA Metadata as Linked Open Data for Semantic Mashups

NASA Astrophysics Data System (ADS)

Wilson, Brian; Manipon, Gerald; Hua, Hook

2014-05-01

Data providers are now publishing more metadata in more interoperable forms, e.g. Atom or RSS 'casts', as Linked Open Data (LOD), or as ISO Metadata records. A major effort on the part of the NASA's Earth Science Data and Information System (ESDIS) project is the aggregation of metadata that enables greater data interoperability among scientific data sets regardless of source or application. Both the Earth Observing System (EOS) ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) repositories contain metadata records for NASA (and other) datasets and provided services. These records contain typical fields for each dataset (or software service) such as the source, creation date, cognizant institution, related access URL's, and domain and variable keywords to enable discovery. Under a NASA ACCESS grant, we demonstrated how to publish the ECHO and GCMD dataset and services metadata as LOD in the RDF format. Both sets of metadata are now queryable at SPARQL endpoints and available for integration into "semantic mashups" in the browser. It is straightforward to reformat sets of XML metadata, including ISO, into simple RDF and then later refine and improve the RDF predicates by reusing known namespaces such as Dublin core, georss, etc. All scientific metadata should be part of the LOD world. In addition, we developed an "instant" drill-down and browse interface that provides faceted navigation so that the user can discover and explore the 25,000 datasets and 3000 services. The available facets and the free-text search box appear in the left panel, and the instantly updated results for the dataset search appear in the right panel. The user can constrain the value of a metadata facet simply by clicking on a word (or phrase) in the "word cloud" of values for each facet. The display section for each dataset includes the important metadata fields, a full description of the dataset, potentially some related URL's, and a "search" button that points to an OpenSearch GUI that is pre-configured to search for granules within the dataset. We will present our experiences with converting NASA metadata into LOD, discuss the challenges, illustrate some of the enabled mashups, and demonstrate the latest version of the "instant browse" interface for navigating multiple metadata collections.
Phobos spectral clustering: first results using the MRO-CRISM 0.4-2.5 micron dataset

NASA Astrophysics Data System (ADS)

Pajola, M.; Roush, T. L.; Marzo, G. A.; Simioni, E.

2016-12-01

Whether Phobos is a captured asteroid or it formed in situ around Mars, is still an outstanding question within the scientific community. The proposed Japanese Mars Moon eXploration (MMX) sample return mission has the chief scientific objective to solve this conundrum, reaching Phobos in early 2020s and returning Phobos samples to Earth few years later. Nonetheless, well before surface samples are returned to Earth, there are important spectral datasets that can be mined in order to constrain Phobos' surface properties and address implications regarding Phobos' origin. One of these is the MRO-CRISM multispectral observations of Phobos. The MRO-CRISM visible and infrared observations (0.4-2.5 micron) are here corrected for incidence and emission angles of the observation. Unlike previous studies of the MRO-CRISM data that selected specific regions for analyses, we apply a statistical technique that identifies different clusters based on a K-means partitioning algorithm. Selecting specific wavelength ranges of Phobos' reflectance spectra permits identification of possible mineralogical compounds and the spatial distribution of these on the surface of Phobos. This work paves the way to a deeper analysis of the available dataset regarding Phobos, potentially identifying regions of interest on the surface of Phobos that may warrant more detailed investigation by the MXX mission as potential sampling areas. Acknowledgments: M. Pajola was supported for this research by an appointment to the NASA Postdoctoral Program at the Ames Research Center administered by USRA.
A new dataset validation system for the Planetary Science Archive

NASA Astrophysics Data System (ADS)

Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

2007-08-01

The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that has been designed to provide the user with the flexibility of defining and implementing various types of validation criteria, to iteratively and incrementally validate datasets, and to generate validation reports.
Harnessing modern web application technology to create intuitive and efficient data visualization and sharing tools.

PubMed

Wood, Dylan; King, Margaret; Landis, Drew; Courtney, William; Wang, Runtang; Kelly, Ross; Turner, Jessica A; Calhoun, Vince D

2014-01-01

Neuroscientists increasingly need to work with big data in order to derive meaningful results in their field. Collecting, organizing and analyzing this data can be a major hurdle on the road to scientific discovery. This hurdle can be lowered using the same technologies that are currently revolutionizing the way that cultural and social media sites represent and share information with their users. Web application technologies and standards such as RESTful webservices, HTML5 and high-performance in-browser JavaScript engines are being utilized to vastly improve the way that the world accesses and shares information. The neuroscience community can also benefit tremendously from these technologies. We present here a web application that allows users to explore and request the complex datasets that need to be shared among the neuroimaging community. The COINS (Collaborative Informatics and Neuroimaging Suite) Data Exchange uses web application technologies to facilitate data sharing in three phases: Exploration, Request/Communication, and Download. This paper will focus on the first phase, and how intuitive exploration of large and complex datasets is achieved using a framework that centers around asynchronous client-server communication (AJAX) and also exposes a powerful API that can be utilized by other applications to explore available data. First opened to the neuroscience community in August 2012, the Data Exchange has already provided researchers with over 2500 GB of data.
Harnessing modern web application technology to create intuitive and efficient data visualization and sharing tools

PubMed Central

Wood, Dylan; King, Margaret; Landis, Drew; Courtney, William; Wang, Runtang; Kelly, Ross; Turner, Jessica A.; Calhoun, Vince D.

2014-01-01

Neuroscientists increasingly need to work with big data in order to derive meaningful results in their field. Collecting, organizing and analyzing this data can be a major hurdle on the road to scientific discovery. This hurdle can be lowered using the same technologies that are currently revolutionizing the way that cultural and social media sites represent and share information with their users. Web application technologies and standards such as RESTful webservices, HTML5 and high-performance in-browser JavaScript engines are being utilized to vastly improve the way that the world accesses and shares information. The neuroscience community can also benefit tremendously from these technologies. We present here a web application that allows users to explore and request the complex datasets that need to be shared among the neuroimaging community. The COINS (Collaborative Informatics and Neuroimaging Suite) Data Exchange uses web application technologies to facilitate data sharing in three phases: Exploration, Request/Communication, and Download. This paper will focus on the first phase, and how intuitive exploration of large and complex datasets is achieved using a framework that centers around asynchronous client-server communication (AJAX) and also exposes a powerful API that can be utilized by other applications to explore available data. First opened to the neuroscience community in August 2012, the Data Exchange has already provided researchers with over 2500 GB of data. PMID:25206330
Provenance Challenges for Earth Science Dataset Publication

NASA Technical Reports Server (NTRS)

Tilmes, Curt

2011-01-01

Modern science is increasingly dependent on computational analysis of very large data sets. Organizing, referencing, publishing those data has become a complex problem. Published research that depends on such data often fails to cite the data in sufficient detail to allow an independent scientist to reproduce the original experiments and analyses. This paper explores some of the challenges related to data identification, equivalence and reproducibility in the domain of data intensive scientific processing. It will use the example of Earth Science satellite data, but the challenges also apply to other domains.
PomBase: The Scientific Resource for Fission Yeast.

PubMed

Lock, Antonia; Rutherford, Kim; Harris, Midori A; Wood, Valerie

2018-01-01

The fission yeast Schizosaccharomyces pombe has become well established as a model species for studying conserved cell-level biological processes, especially the mechanics and regulation of cell division. PomBase integrates the S. pombe genome sequence with traditional genetic, molecular, and cell biological experimental data as well as the growing body of large datasets generated by emerging high-throughput methods. This chapter provides insight into the curation philosophy and data organization at PomBase, and provides a guide to using PomBase for infrequent visitors and anyone considering exploring S. pombe in their research.
Connecting the Public to Scientific Research Data - Science On a Sphere°

NASA Astrophysics Data System (ADS)

Henderson, M. A.; Russell, E. L.; Science on a Sphere Datasets

2011-12-01

Connecting the Public to Scientific Research Data - Science On a Sphere° Maurice Henderson, NASA Goddard Space Flight Center Elizabeth Russell, NOAA Earth System Research Laboratory, University of Colorado Cooperative Institute for Research in Environmental Sciences Science On a Sphere° is a six foot animated globe developed by the National Ocean and Atmospheric Administration, NOAA, as a means to display global scientific research data in an intuitive, engaging format in public forums. With over 70 permanent installations of SOS around the world in science museums, visitor's centers and universities, the audience that enjoys SOS yearly is substantial, wide-ranging, and diverse. Through partnerships with the National Aeronautics and Space Administration, NASA, the SOS Data Catalog (http://sos.noaa.gov/datasets/) has grown to a collection of over 350 datasets from NOAA, NASA, and many others. Using an external projection system, these datasets are displayed onto the sphere creating a seamless global image. In a cross-site evaluation of Science On a Sphere°, 82% of participants said yes, seeing information displayed on a sphere changed their understanding of the information. This unique technology captivates viewers and exposes them to scientific research data in a way that is accessible, presentable, and understandable. The datasets that comprise the SOS Data Catalog are scientific research data that have been formatted for display on SOS. By formatting research data into visualizations that can be used on SOS, NOAA and NASA are able to turn research data into educational materials that are easily accessible for users. In many cases, visualizations do not need to be modified because SOS uses a common map projection. The SOS Data Catalog has become a "one-stop shop" for a broad range of global datasets from across NOAA and NASA, and as a result, the traffic on the site is more than just SOS users. While the target audience for this site is SOS users, many inquiries come from teachers, book editors, film producers and students interested in using the available datasets. The SOS Data Catalog online includes a written description of each dataset, rendered images of the data, animated movies of the data, links to more information, details on the data source and creator, and a link to a FTP server where each dataset can be downloaded. Many of the datasets are also displayed on the SOS YouTube Channel and Facebook page. In addition, NASA has developed NASA Earth Observations, NEO, which is a collection of global satellite datasets. The NEO website allows users to layer multiple datasets and perform basic analysis. Through a new iPad application, the NASA Earth Observations datasets can be exported to SOS and analyzed on the sphere. This new capability greatly expands the number of datasets that can be shown on SOS and adds a new element of interactivity with the datasets.
Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures.

PubMed

Holmes, Avram J; Hollinshead, Marisa O; O'Keefe, Timothy M; Petrov, Victor I; Fariello, Gabriele R; Wald, Lawrence L; Fischl, Bruce; Rosen, Bruce R; Mair, Ross W; Roffman, Joshua L; Smoller, Jordan W; Buckner, Randy L

2015-01-01

The goal of the Brain Genomics Superstruct Project (GSP) is to enable large-scale exploration of the links between brain function, behavior, and ultimately genetic variation. To provide the broader scientific community data to probe these associations, a repository of structural and functional magnetic resonance imaging (MRI) scans linked to genetic information was constructed from a sample of healthy individuals. The initial release, detailed in the present manuscript, encompasses quality screened cross-sectional data from 1,570 participants ages 18 to 35 years who were scanned with MRI and completed demographic and health questionnaires. Personality and cognitive measures were obtained on a subset of participants. Each dataset contains a T1-weighted structural MRI scan and either one (n=1,570) or two (n=1,139) resting state functional MRI scans. Test-retest reliability datasets are included from 69 participants scanned within six months of their initial visit. For the majority of participants self-report behavioral and cognitive measures are included (n=926 and n=892 respectively). Analyses of data quality, structure, function, personality, and cognition are presented to demonstrate the dataset's utility.
Building Scientific Data's list of recommended data repositories

NASA Astrophysics Data System (ADS)

Hufton, A. L.; Khodiyar, V.; Hrynaszkiewicz, I.

2016-12-01

When Scientific Data launched in 2014 we provided our authors with a list of recommended data repositories to help them identify data hosting options that were likely to meet the journal's requirements. This list has grown in size and scope, and is now a central resource for authors across the Nature-titled journals. It has also been used in the development of data deposition policies and recommended repository lists across Springer Nature and at other publishers. Each new addition to the list is assessed according to a series of criteria that emphasize the stability of the resource, its commitment to principles of open science and its implementation of relevant community standards and reporting guidelines. A preference is expressed for repositories that issue digital object identifiers (DOIs) through the DataCite system and that share data under the Creative Commons CC0 waiver. Scientific Data currently lists fourteen repositories that focus on specific areas within the Earth and environmental sciences, as well as the broad scope repositories, Dryad and figshare. Readers can browse and filter datasets published at the journal by the host repository using ISA-explorer, a demo tool built by the ISA-tools team at Oxford University1. We believe that well-maintained lists like this one help publishers build a network of trust with community data repositories and provide an important complement to more comprehensive data repository indices and more formal certification efforts. In parallel, Scientific Data has also improved its policies to better support submissions from authors using institutional and project-specific repositories, without requiring each to apply for listing individually. Online resources Journal homepage: http://www.nature.com/scientificdata Data repository criteria: http://www.nature.com/sdata/policies/data-policies#repo-criteria Recommended data repositories: http://www.nature.com/sdata/policies/repositories Archived copies of the list: https://dx.doi.org/10.6084/m9.figshare.1434640.v6 Reference Gonzalez-Beltran, A. ISA-explorer: A demo tool for discovering and exploring Scientific Data's ISA-tab metadata. Scientific Data Updates http://blogs.nature.com/scientificdata/2015/12/17/isa-explorer/ (2015).
sbtools: A package connecting R to cloud-based data for collaborative online research

USGS Publications Warehouse

Winslow, Luke; Chamberlain, Scott; Appling, Alison P.; Read, Jordan S.

2016-01-01

The adoption of high-quality tools for collaboration and reproducible research such as R and Github is becoming more common in many research fields. While Github and other version management systems are excellent resources, they were originally designed to handle code and scale poorly to large text-based or binary datasets. A number of scientific data repositories are coming online and are often focused on dataset archival and publication. To handle collaborative workflows using large scientific datasets, there is increasing need to connect cloud-based online data storage to R. In this article, we describe how the new R package sbtools enables direct access to the advanced online data functionality provided by ScienceBase, the U.S. Geological Survey’s online scientific data storage platform.
EMERALD: Coping with the Explosion of Seismic Data

NASA Astrophysics Data System (ADS)

West, J. D.; Fouch, M. J.; Arrowsmith, R.

2009-12-01

The geosciences are currently generating an unparalleled quantity of new public broadband seismic data with the establishment of large-scale seismic arrays such as the EarthScope USArray, which are enabling new and transformative scientific discoveries of the structure and dynamics of the Earth’s interior. Much of this explosion of data is a direct result of the formation of the IRIS consortium, which has enabled an unparalleled level of open exchange of seismic instrumentation, data, and methods. The production of these massive volumes of data has generated new and serious data management challenges for the seismological community. A significant challenge is the maintenance and updating of seismic metadata, which includes information such as station location, sensor orientation, instrument response, and clock timing data. This key information changes at unknown intervals, and the changes are not generally communicated to data users who have already downloaded and processed data. Another basic challenge is the ability to handle massive seismic datasets when waveform file volumes exceed the fundamental limitations of a computer’s operating system. A third, long-standing challenge is the difficulty of exchanging seismic processing codes between researchers; each scientist typically develops his or her own unique directory structure and file naming convention, requiring that codes developed by another researcher be rewritten before they can be used. To address these challenges, we are developing EMERALD (Explore, Manage, Edit, Reduce, & Analyze Large Datasets). The overarching goal of the EMERALD project is to enable more efficient and effective use of seismic datasets ranging from just a few hundred to millions of waveforms with a complete database-driven system, leading to higher quality seismic datasets for scientific analysis and enabling faster, more efficient scientific research. We will present a preliminary (beta) version of EMERALD, an integrated, extensible, standalone database server system based on the open-source PostgreSQL database engine. The system is designed for fast and easy processing of seismic datasets, and provides the necessary tools to manage very large datasets and all associated metadata. EMERALD provides methods for efficient preprocessing of seismic records; large record sets can be easily and quickly searched, reviewed, revised, reprocessed, and exported. EMERALD can retrieve and store station metadata and alert the user to metadata changes. The system provides many methods for visualizing data, analyzing dataset statistics, and tracking the processing history of individual datasets. EMERALD allows development and sharing of visualization and processing methods using any of 12 programming languages. EMERALD is designed to integrate existing software tools; the system provides wrapper functionality for existing widely-used programs such as GMT, SOD, and TauP. Users can interact with EMERALD via a web browser interface, or they can directly access their data from a variety of database-enabled external tools. Data can be imported and exported from the system in a variety of file formats, or can be directly requested and downloaded from the IRIS DMC from within EMERALD.

User Guidelines for the Brassica Database: BRAD.

PubMed

Wang, Xiaobo; Cheng, Feng; Wang, Xiaowu

2016-01-01

The genome sequence of Brassica rapa was first released in 2011. Since then, further Brassica genomes have been sequenced or are undergoing sequencing. It is therefore necessary to develop tools that help users to mine information from genomic data efficiently. This will greatly aid scientific exploration and breeding application, especially for those with low levels of bioinformatic training. Therefore, the Brassica database (BRAD) was built to collect, integrate, illustrate, and visualize Brassica genomic datasets. BRAD provides useful searching and data mining tools, and facilitates the search of gene annotation datasets, syntenic or non-syntenic orthologs, and flanking regions of functional genomic elements. It also includes genome-analysis tools such as BLAST and GBrowse. One of the important aims of BRAD is to build a bridge between Brassica crop genomes with the genome of the model species Arabidopsis thaliana, thus transferring the bulk of A. thaliana gene study information for use with newly sequenced Brassica crops.
Global Data Spatially Interrelate System for Scientific Big Data Spatial-Seamless Sharing

NASA Astrophysics Data System (ADS)

Yu, J.; Wu, L.; Yang, Y.; Lei, X.; He, W.

2014-04-01

A good data sharing system with spatial-seamless services will prevent the scientists from tedious, boring, and time consuming work of spatial transformation, and hence encourage the usage of the scientific data, and increase the scientific innovation. Having been adopted as the framework of Earth datasets by Group on Earth Observation (GEO), Earth System Spatial Grid (ESSG) is potential to be the spatial reference of the Earth datasets. Based on the implementation of ESSG, SDOG-ESSG, a data sharing system named global data spatially interrelate system (GASE) was design to make the data sharing spatial-seamless. The architecture of GASE was introduced. The implementation of the two key components, V-Pools, and interrelating engine, and the prototype is presented. Any dataset is firstly resampled into SDOG-ESSG, and is divided into small blocks, and then are mapped into hierarchical system of the distributed file system in V-Pools, which together makes the data serving at a uniform spatial reference and at a high efficiency. Besides, the datasets from different data centres are interrelated by the interrelating engine at the uniform spatial reference of SDOGESSG, which enables the system to sharing the open datasets in the internet spatial-seamless.
Interactive Visualization of Large-Scale Hydrological Data using Emerging Technologies in Web Systems and Parallel Programming

NASA Astrophysics Data System (ADS)

Demir, I.; Krajewski, W. F.

2013-12-01

As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, visualize and share large data sets with general public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and modify the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations or large data repositories. Scientific visualization will be an increasingly important component to build comprehensive environmental information platforms. This presentation provides an overview of the trends and challenges in the field of scientific visualization, and demonstrates information visualization and communication tools developed within the light of these challenges.
Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

PubMed Central

Bhattacherjee, Souvik; Chavan, Amit; Huang, Silu; Deshpande, Amol; Parameswaran, Aditya

2015-01-01

The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of versions of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The fundamental challenge here is the storage-recreation trade-off: the more storage we use, the faster it is to recreate or retrieve versions, while the less storage we use, the slower it is to recreate or retrieve versions. Despite the fundamental nature of this problem, there has been a surprisingly little amount of work on it. In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. We have built a prototype version management system, that aims to serve as a foundation to our DataHub system for facilitating collaborative data science. We demonstrate, via extensive experiments, that our proposed heuristics provide efficient solutions in practical dataset versioning scenarios. PMID:28752014
Dynamic analysis, transformation, dissemination and applications of scientific multidimensional data in ArcGIS Platform

NASA Astrophysics Data System (ADS)

Shrestha, S. R.; Collow, T. W.; Rose, B.

2016-12-01

Scientific datasets are generated from various sources and platforms but they are typically produced either by earth observation systems or by modelling systems. These are widely used for monitoring, simulating, or analyzing measurements that are associated with physical, chemical, and biological phenomena over the ocean, atmosphere, or land. A significant subset of scientific datasets stores values directly as rasters or in a form that can be rasterized. This is where a value exists at every cell in a regular grid spanning the spatial extent of the dataset. Government agencies like NOAA, NASA, EPA, USGS produces large volumes of near real-time, forecast, and historical data that drives climatological and meteorological studies, and underpins operations ranging from weather prediction to sea ice loss. Modern science is computationally intensive because of the availability of an enormous amount of scientific data, the adoption of data-driven analysis, and the need to share these dataset and research results with the public. ArcGIS as a platform is sophisticated and capable of handling such complex domain. We'll discuss constructs and capabilities applicable to multidimensional gridded data that can be conceptualized as a multivariate space-time cube. Building on the concept of a two-dimensional raster, a typical multidimensional raster dataset could contain several "slices" within the same spatial extent. We will share a case from the NOAA Climate Forecast Systems Reanalysis (CFSR) multidimensional data as an example of how large collections of rasters can be efficiently organized and managed through a data model within a geodatabase called "Mosaic dataset" and dynamically transformed and analyzed using raster functions. A raster function is a lightweight, raster-valued transformation defined over a mixed set of raster and scalar input. That means, just like any tool, you can provide a raster function with input parameters. It enables dynamic processing of only the data that's being displayed on the screen or requested by an application. We will present the dynamic processing and analysis of CFSR data using the chains of raster function and share it as dynamic multidimensional image service. This workflow and capabilities can be easily applied to any scientific data formats that are supported in mosaic dataset.
The Visual Geophysical Exploration Environment: A Multi-dimensional Scientific Visualization

NASA Astrophysics Data System (ADS)

Pandya, R. E.; Domenico, B.; Murray, D.; Marlino, M. R.

2003-12-01

The Visual Geophysical Exploration Environment (VGEE) is an online learning environment designed to help undergraduate students understand fundamental Earth system science concepts. The guiding principle of the VGEE is the importance of hands-on interaction with scientific visualization and data. The VGEE consists of four elements: 1) an online, inquiry-based curriculum for guiding student exploration; 2) a suite of El Nino-related data sets adapted for student use; 3) a learner-centered interface to a scientific visualization tool; and 4) a set of concept models (interactive tools that help students understand fundamental scientific concepts). There are two key innovations featured in this interactive poster session. One is the integration of concept models and the visualization tool. Concept models are simple, interactive, Java-based illustrations of fundamental physical principles. We developed eight concept models and integrated them into the visualization tool to enable students to probe data. The ability to probe data using a concept model addresses the common problem of transfer: the difficulty students have in applying theoretical knowledge to everyday phenomenon. The other innovation is a visualization environment and data that are discoverable in digital libraries, and installed, configured, and used for investigations over the web. By collaborating with the Integrated Data Viewer developers, we were able to embed a web-launchable visualization tool and access to distributed data sets into the online curricula. The Thematic Real-time Environmental Data Distributed Services (THREDDS) project is working to provide catalogs of datasets that can be used in new VGEE curricula under development. By cataloging this curricula in the Digital Library for Earth System Education (DLESE), learners and educators can discover the data and visualization tool within a framework that guides their use.
Project EDDIE: Improving Big Data skills in the classroom

NASA Astrophysics Data System (ADS)

Soule, D. C.; Bader, N.; Carey, C.; Castendyk, D.; Fuller, R.; Gibson, C.; Gougis, R.; Klug, J.; Meixner, T.; Nave, L. E.; O'Reilly, C.; Richardson, D.; Stomberg, J.

2015-12-01

High-frequency sensor-based datasets are driving a paradigm shift in the study of environmental processes. The online availability of high-frequency data creates an opportunity to engage undergraduate students in primary research by using large, long-term, and sensor-based, datasets for science courses. Project EDDIE (Environmental Data-Driven Inquiry & Exploration) is developing flexible classroom activity modules designed to (1) improve quantitative and reasoning skills; (2) develop the ability to engage in scientific discourse and argument; and (3) increase students' engagement in science. A team of interdisciplinary faculty from private and public research universities and undergraduate institutions have developed these modules to meet a series of pedagogical goals that include (1) developing skills required to manipulate large datasets at different scales to conduct inquiry-based investigations; (2) developing students' reasoning about statistical variation; and (3) fostering accurate student conceptions about the nature of environmental science. The modules cover a wide range of topics, including lake physics and metabolism, stream discharge, water quality, soil respiration, seismology, and climate change. Assessment data from questionnaire and recordings collected during the 2014-2015 academic year show that our modules are effective at making students more comfortable analyzing data. Continued development is focused on improving student learning outcomes with statistical concepts like variation, randomness and sampling, and fostering scientific discourse during module engagement. In the coming year, increased sample size will expand our assessment opportunities to comparison groups in upper division courses and allow for evaluation of module-specific conceptual knowledge learned. This project is funded by an NSF TUES grant (NSF DEB 1245707).
Mapping the landscape of climate engineering

PubMed Central

Oldham, P.; Szerszynski, B.; Stilgoe, J.; Brown, C.; Eacott, B.; Yuille, A.

2014-01-01

In the absence of a governance framework for climate engineering technologies such as solar radiation management (SRM), the practices of scientific research and intellectual property acquisition can de facto shape the development of the field. It is therefore important to make visible emerging patterns of research and patenting, which we suggest can effectively be done using bibliometric methods. We explore the challenges in defining the boundary of climate engineering, and set out the research strategy taken in this study. A dataset of 825 scientific publications on climate engineering between 1971 and 2013 was identified, including 193 on SRM; these are analysed in terms of trends, institutions, authors and funders. For our patent dataset, we identified 143 first filings directly or indirectly related to climate engineering technologies—of which 28 were related to SRM technologies—linked to 910 family members. We analyse the main patterns discerned in patent trends, applicants and inventors. We compare our own findings with those of an earlier bibliometric study of climate engineering, and show how our method is consistent with the need for transparency and repeatability, and the need to adjust the method as the field develops. We conclude that bibliometric monitoring techniques can play an important role in the anticipatory governance of climate engineering. PMID:25404683
Statistical Physics of Complex Substitutive Systems

NASA Astrophysics Data System (ADS)

Jin, Qing

Diffusion processes are central to human interactions. Despite extensive studies that span multiple disciplines, our knowledge is limited to spreading processes in non-substitutive systems. Yet, a considerable number of ideas, products, and behaviors spread by substitution; to adopt a new one, agents must give up an existing one. This captures the spread of scientific constructs--forcing scientists to choose, for example, a deterministic or probabilistic worldview, as well as the adoption of durable items, such as mobile phones, cars, or homes. In this dissertation, I develop a statistical physics framework to describe, quantify, and understand substitutive systems. By empirically exploring three collected high-resolution datasets pertaining to such systems, I build a mechanistic model describing substitutions, which not only analytically predicts the universal macroscopic phenomenon discovered in the collected datasets, but also accurately captures the trajectories of individual items in a complex substitutive system, demonstrating a high degree of regularity and universality in substitutive systems. I also discuss the origins and insights of the parameters in the substitution model and possible generalization form of the mathematical framework. The systematical study of substitutive systems presented in this dissertation could potentially guide the understanding and prediction of all spreading phenomena driven by substitutions, from electric cars to scientific paradigms, and from renewable energy to new healthy habits.
Data-Oriented Astrophysics at NOAO: The Science Archive & The Data Lab

NASA Astrophysics Data System (ADS)

Juneau, Stephanie; NOAO Data Lab, NOAO Science Archive

2018-06-01

As we keep progressing into an era of increasingly large astronomy datasets, NOAO’s data-oriented mission is growing in prominence. The NOAO Science Archive, which captures and processes the pixel data from mountaintops in Chile and Arizona, now contains holdings at Petabyte scales. Working at the intersection of astronomy and data science, the main goal of the NOAO Data Lab is to provide users with a suite of tools to work close to this data, the catalogs derived from them, as well as externally provided datasets, and thus optimize the scientific productivity of the astronomy community. These tools and services include databases, query tools, virtual storage space, workflows through our Jupyter Notebook server, and scripted analysis. We currently host datasets from NOAO facilities such as the Dark Energy Survey (DES), the DESI imaging Legacy Surveys (LS), the Dark Energy Camera Plane Survey (DECaPS), and the nearly all-sky NOAO Source Catalog (NSC). We are further preparing for large spectroscopy datasets such as DESI. After a brief overview of the Science Archive, the Data Lab and datasets, I will briefly showcase scientific applications showing use of our data holdings. Lastly, I will describe our vision for future developments as we tackle the next technical and scientific challenges.
Uvf - Unified Volume Format: A General System for Efficient Handling of Large Volumetric Datasets.

PubMed

Krüger, Jens; Potter, Kristin; Macleod, Rob S; Johnson, Christopher

2008-01-01

With the continual increase in computing power, volumetric datasets with sizes ranging from only a few megabytes to petascale are generated thousands of times per day. Such data may come from an ordinary source such as simple everyday medical imaging procedures, while larger datasets may be generated from cluster-based scientific simulations or measurements of large scale experiments. In computer science an incredible amount of work worldwide is put into the efficient visualization of these datasets. As researchers in the field of scientific visualization, we often have to face the task of handling very large data from various sources. This data usually comes in many different data formats. In medical imaging, the DICOM standard is well established, however, most research labs use their own data formats to store and process data. To simplify the task of reading the many different formats used with all of the different visualization programs, we present a system for the efficient handling of many types of large scientific datasets (see Figure 1 for just a few examples). While primarily targeted at structured volumetric data, UVF can store just about any type of structured and unstructured data. The system is composed of a file format specification with a reference implementation of a reader. It is not only a common, easy to implement format but also allows for efficient rendering of most datasets without the need to convert the data in memory.
The Path from Large Earth Science Datasets to Information

NASA Astrophysics Data System (ADS)

Vicente, G. A.

2013-12-01

The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.
Data publication, documentation and user friendly landing pages - improving data discovery and reuse

NASA Astrophysics Data System (ADS)

Elger, Kirsten; Ulbricht, Damian; Bertelmann, Roland

2016-04-01

Research data are the basis for scientific research and often irreplaceable (e.g. observational data). Storage of such data in appropriate, theme specific or institutional repositories is an essential part of ensuring their long term preservation and access. The free and open access to research data for reuse and scrutiny has been identified as a key issue by the scientific community as well as by research agencies and the public. To ensure the datasets to intelligible and usable for others they must be accompanied by comprehensive data description and standardized metadata for data discovery, and ideally should be published using digital object identifier (DOI). These make datasets citable and ensure their long-term accessibility and are accepted in reference lists of journal articles (http://www.copdess.org/statement-of-commitment/). The GFZ German Research Centre for Geosciences is the national laboratory for Geosciences in Germany and part of the Helmholtz Association, Germany's largest scientific organization. The development and maintenance of data systems is a key component of 'GFZ Data Services' to support state-of-the-art research. The datasets, archived in and published by the GFZ Data Repository cover all geoscientific disciplines and range from large dynamic datasets deriving from global monitoring seismic or geodetic networks with real-time data acquisition, to remotely sensed satellite products, to automatically generated data publications from a database for data from micro meteorological stations, to various model results, to geochemical and rock mechanical analyses from various labs, and field observations. The user-friendly presentation of published datasets via a DOI landing page is as important for reuse as the storage itself, and the required information is highly specific for each scientific discipline. If dataset descriptions are too general, or require the download of a dataset before knowing its suitability, many researchers often decide not to reuse a published dataset. In contrast to large data repositories without thematic specification, theme-specific data repositories have a large expertise in data discovery and opportunity to develop usable, discipline-specific formats and layouts for specific datasets, including consultation to different formats for the data description (e.g., via a Data Report or an article in a Data Journal) with full consideration of international metadata standards.
Shedding light on the variability of optical skin properties: finding a path towards more accurate prediction of light propagation in human cutaneous compartments

PubMed Central

Mignon, C.; Tobin, D. J.; Zeitouny, M.; Uzunbajakava, N. E.

2018-01-01

Finding a path towards a more accurate prediction of light propagation in human skin remains an aspiration of biomedical scientists working on cutaneous applications both for diagnostic and therapeutic reasons. The objective of this study was to investigate variability of the optical properties of human skin compartments reported in literature, to explore the underlying rational of this variability and to propose a dataset of values, to better represent an in vivo case and recommend a solution towards a more accurate prediction of light propagation through cutaneous compartments. To achieve this, we undertook a novel, logical yet simple approach. We first reviewed scientific articles published between 1981 and 2013 that reported on skin optical properties, to reveal the spread in the reported quantitative values. We found variations of up to 100-fold. Then we extracted the most trust-worthy datasets guided by a rule that the spectral properties should reflect the specific biochemical composition of each of the skin layers. This resulted in the narrowing of the spread in the calculated photon densities to 6-fold. We conclude with a recommendation to use the identified most robust datasets when estimating light propagation in human skin using Monte Carlo simulations. Alternatively, otherwise follow our proposed strategy to screen any new datasets to determine their biological relevance. PMID:29552418
Tools for proactive collection and use of quality metadata in GEOSS

NASA Astrophysics Data System (ADS)

Bastin, L.; Thum, S.; Maso, J.; Yang, K. X.; Nüst, D.; Van den Broek, M.; Lush, V.; Papeschi, F.; Riverola, A.

2012-12-01

The GEOSS Common Infrastructure allows interactive evaluation and selection of Earth Observation datasets by the scientific community and decision makers, but the data quality information needed to assess fitness for use is often patchy and hard to visualise when comparing candidate datasets. In a number of studies over the past decade, users repeatedly identified the same types of gaps in quality metadata, specifying the need for enhancements such as peer and expert review, better traceability and provenance information, information on citations and usage of a dataset, warning about problems identified with a dataset and potential workarounds, and 'soft knowledge' from data producers (e.g. recommendations for use which are not easily encoded using the existing standards). Despite clear identification of these issues in a number of recommendations, the gaps persist in practice and are highlighted once more in our own, more recent, surveys. This continuing deficit may well be the result of a historic paucity of tools to support the easy documentation and continual review of dataset quality. However, more recent developments in tools and standards, as well as more general technological advances, present the opportunity for a community of scientific users to adopt a more proactive attitude by commenting on their uses of data, and for that feedback to be federated with more traditional and static forms of metadata, allowing a user to more accurately assess the suitability of a dataset for their own specific context and reliability thresholds. The EU FP7 GeoViQua project aims to develop this opportunity by adding data quality representations to the existing search and visualisation functionalities of the Geo Portal. Subsequently we will help to close the gap by providing tools to easily create quality information, and to permit user-friendly exploration of that information as the ultimate incentive for improved data quality documentation. Quality information is derived from producer metadata, from the data themselves, from validation of in-situ sensor data, from provenance information and from user feedback, and will be aggregated to produce clear and useful summaries of quality, including a GEO Label. GeoViQua's conceptual quality information models for users and producers are specifically described and illustrated in this presentation. These models (which have been encoded as XML schemas and can be accessed at http://schemas.geoviqua.org/) are designed to satisfy the identified user needs while remaining consistent with current standards such as ISO 19115 and advanced drafts such as ISO 19157. The resulting components being developed for the GEO Portal are designed to lower the entry barrier to users who wish to help to generate and explore rich and useful metadata. This metadata will include reviews, comments and ratings, reports of usage in specific domains and specification of datasets used for benchmarking, as well as rich quantitative information encoded in more traditional data quality elements such as thematic correctness and positional accuracy. The value of the enriched metadata will also be enhanced by graphical tools for visualizing spatially distributed uncertainties. We demonstrate practical example applications in selected environmental application domains.
Educational and Scientific Applications of Climate Model Diagnostic Analyzer

NASA Astrophysics Data System (ADS)

Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Kubar, T. L.; Zhang, J.; Bao, Q.

2016-12-01

Climate Model Diagnostic Analyzer (CMDA) is a web-based information system designed for the climate modeling and model analysis community to analyze climate data from models and observations. CMDA provides tools to diagnostically analyze climate data for model validation and improvement, and to systematically manage analysis provenance for sharing results with other investigators. CMDA utilizes cloud computing resources, multi-threading computing, machine-learning algorithms, web service technologies, and provenance-supporting technologies to address technical challenges that the Earth science modeling and model analysis community faces in evaluating and diagnosing climate models. As CMDA infrastructure and technology have matured, we have developed the educational and scientific applications of CMDA. Educationally, CMDA supported the summer school of the JPL Center for Climate Sciences for three years since 2014. In the summer school, the students work on group research projects where CMDA provide datasets and analysis tools. Each student is assigned to a virtual machine with CMDA installed in Amazon Web Services. A provenance management system for CMDA is developed to keep track of students' usages of CMDA, and to recommend datasets and analysis tools for their research topic. The provenance system also allows students to revisit their analysis results and share them with their group. Scientifically, we have developed several science use cases of CMDA covering various topics, datasets, and analysis types. Each use case developed is described and listed in terms of a scientific goal, datasets used, the analysis tools used, scientific results discovered from the use case, an analysis result such as output plots and data files, and a link to the exact analysis service call with all the input arguments filled. For example, one science use case is the evaluation of NCAR CAM5 model with MODIS total cloud fraction. The analysis service used is Difference Plot Service of Two Variables, and the datasets used are NCAR CAM total cloud fraction and MODIS total cloud fraction. The scientific highlight of the use case is that the CAM5 model overall does a fairly decent job at simulating total cloud cover, though simulates too few clouds especially near and offshore of the eastern ocean basins where low clouds are dominant.
Exploring patterns enriched in a dataset with contrastive principal component analysis.

PubMed

Abid, Abubakar; Zhang, Martin J; Bagaria, Vivek K; Zou, James

2018-05-30

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
FastQuery: A Parallel Indexing System for Scientific Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chou, Jerry; Wu, Kesheng; Prabhat,

2011-07-29

Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also developmore » a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.« less
Assessing the reproducibility of discriminant function analyses

PubMed Central

Andrew, Rose L.; Albert, Arianne Y.K.; Renaut, Sebastien; Rennison, Diana J.; Bock, Dan G.

2015-01-01

Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses. Even if data files are securely stored and accessible, they must also be accompanied by accurate labels and identifiers. To assess how often problems with metadata or data curation affect the reproducibility of published results, we attempted to reproduce Discriminant Function Analyses (DFAs) from the field of organismal biology. DFA is a commonly used statistical analysis that has changed little since its inception almost eight decades ago, and therefore provides an opportunity to test reproducibility among datasets of varying ages. Out of 100 papers we initially surveyed, fourteen were excluded because they did not present the common types of quantitative result from their DFA or gave insufficient details of their DFA. Of the remaining 86 datasets, there were 15 cases for which we were unable to confidently relate the dataset we received to the one used in the published analysis. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified subset of the data, or the dataset we received being incomplete. We focused on reproducing three common summary statistics from DFAs: the percent variance explained, the percentage correctly assigned and the largest discriminant function coefficient. The reproducibility of the first two was fairly high (20 of 26, and 44 of 60 datasets, respectively), whereas our success rate with the discriminant function coefficients was lower (15 of 26 datasets). When considering all three summary statistics, we were able to completely reproduce 46 (65%) of 71 datasets. While our results show that a majority of studies are reproducible, they highlight the fact that many studies still are not the carefully curated research that the scientific community and public expects. PMID:26290793
From Big Data to Knowledge in the Social Sciences.

PubMed

Hesse, Bradford W; Moser, Richard P; Riley, William T

2015-05-01

One of the challenges associated with high-volume, diverse datasets is whether synthesis of open data streams can translate into actionable knowledge. Recognizing that challenge and other issues related to these types of data, the National Institutes of Health developed the Big Data to Knowledge or BD2K initiative. The concept of translating "big data to knowledge" is important to the social and behavioral sciences in several respects. First, a general shift to data-intensive science will exert an influence on all scientific disciplines, but particularly on the behavioral and social sciences given the wealth of behavior and related constructs captured by big data sources. Second, science is itself a social enterprise; by applying principles from the social sciences to the conduct of research, it should be possible to ameliorate some of the systemic problems that plague the scientific enterprise in the age of big data. We explore the feasibility of recalibrating the basic mechanisms of the scientific enterprise so that they are more transparent and cumulative; more integrative and cohesive; and more rapid, relevant, and responsive.

From Big Data to Knowledge in the Social Sciences

PubMed Central

Hesse, Bradford W.; Moser, Richard P.; Riley, William T.

2015-01-01

One of the challenges associated with high-volume, diverse datasets is whether synthesis of open data streams can translate into actionable knowledge. Recognizing that challenge and other issues related to these types of data, the National Institutes of Health developed the Big Data to Knowledge or BD2K initiative. The concept of translating “big data to knowledge” is important to the social and behavioral sciences in several respects. First, a general shift to data-intensive science will exert an influence on all scientific disciplines, but particularly on the behavioral and social sciences given the wealth of behavior and related constructs captured by big data sources. Second, science is itself a social enterprise; by applying principles from the social sciences to the conduct of research, it should be possible to ameliorate some of the systemic problems that plague the scientific enterprise in the age of big data. We explore the feasibility of recalibrating the basic mechanisms of the scientific enterprise so that they are more transparent and cumulative; more integrative and cohesive; and more rapid, relevant, and responsive. PMID:26294799
Gathering and Exploring Scientific Knowledge in Pharmacovigilance

PubMed Central

Lopes, Pedro; Nunes, Tiago; Campos, David; Furlong, Laura Ines; Bauer-Mehren, Anna; Sanz, Ferran; Carrascosa, Maria Carmen; Mestres, Jordi; Kors, Jan; Singh, Bharat; van Mulligen, Erik; Van der Lei, Johan; Diallo, Gayo; Avillach, Paul; Ahlberg, Ernst; Boyer, Scott; Diaz, Carlos; Oliveira, José Luís

2013-01-01

Pharmacovigilance plays a key role in the healthcare domain through the assessment, monitoring and discovery of interactions amongst drugs and their effects in the human organism. However, technological advances in this field have been slowing down over the last decade due to miscellaneous legal, ethical and methodological constraints. Pharmaceutical companies started to realize that collaborative and integrative approaches boost current drug research and development processes. Hence, new strategies are required to connect researchers, datasets, biomedical knowledge and analysis algorithms, allowing them to fully exploit the true value behind state-of-the-art pharmacovigilance efforts. This manuscript introduces a new platform directed towards pharmacovigilance knowledge providers. This system, based on a service-oriented architecture, adopts a plugin-based approach to solve fundamental pharmacovigilance software challenges. With the wealth of collected clinical and pharmaceutical data, it is now possible to connect knowledge providers’ analysis and exploration algorithms with real data. As a result, new strategies allow a faster identification of high-risk interactions between marketed drugs and adverse events, and enable the automated uncovering of scientific evidence behind them. With this architecture, the pharmacovigilance field has a new platform to coordinate large-scale drug evaluation efforts in a unique ecosystem, publicly available at http://bioinformatics.ua.pt/euadr/. PMID:24349421
AceTree: a major update and case study in the long term maintenance of open-source scientific software.

PubMed

Katzman, Braden; Tang, Doris; Santella, Anthony; Bao, Zhirong

2018-04-04

AceTree, a software application first released in 2006, facilitates exploration, curation and editing of tracked C. elegans nuclei in 4-dimensional (4D) fluorescence microscopy datasets. Since its initial release, AceTree has been continuously used to interact with, edit and interpret C. elegans lineage data. In its 11 year lifetime, AceTree has been periodically updated to meet the technical and research demands of its community of users. This paper presents the newest iteration of AceTree which contains extensive updates, demonstrates the new applicability of AceTree in other developmental contexts, and presents its evolutionary software development paradigm as a viable model for maintaining scientific software. Large scale updates have been made to the user interface for an improved user experience. Tools have been grouped according to functionality and obsolete methods have been removed. Internal requirements have been changed that enable greater flexibility of use both in C. elegans contexts and in other model organisms. Additionally, the original 3-dimensional (3D) viewing window has been completely reimplemented. The new window provides a new suite of tools for data exploration. By responding to technical advancements and research demands, AceTree has remained a useful tool for scientific research for over a decade. The updates made to the codebase have extended AceTree's applicability beyond its initial use in C. elegans and enabled its usage with other model organisms. The evolution of AceTree demonstrates a viable model for maintaining scientific software over long periods of time.
The Interplay between Scientific Overlap and Cooperation and the Resulting Gain in Co-Authorship Interactions.

PubMed

Mayrose, Itay; Freilich, Shiri

2015-01-01

Considering the importance of scientific interactions, understanding the principles that govern fruitful scientific research is crucial to policy makers and scientists alike. The outcome of an interaction is to a large extent dependent on the balancing of contradicting motivations accompanying the establishment of collaborations. Here, we assembled a dataset of nearly 20,000 publications authored by researchers affiliated with ten top universities. Based on this data collection, we estimated the extent of different interaction types between pairwise combinations of researchers. We explored the interplay between the overlap in scientific interests and the tendency to collaborate, and associated these estimates with measures of scientific quality and social accessibility aiming at studying the typical resulting gain of different interaction patterns. Our results show that scientists tend to collaborate more often with colleagues with whom they share moderate to high levels of mutual interests and knowledge while cooperative tendency declines at higher levels of research-interest overlap, suggesting fierce competition, and at the lower levels, suggesting communication gaps. Whereas the relative number of alliances dramatically differs across a gradient of research overlap, the scientific impact of the resulting articles remains similar. When considering social accessibility, we find that though collaborations between remote researchers are relatively rare, their quality is significantly higher than studies produced by close-circle scientists. Since current collaboration patterns do not necessarily overlap with gaining optimal scientific quality, these findings should encourage scientists to reconsider current collaboration strategies.
Teaching the Thrill of Discovery: Student Exploration of the Large-Scale Structures of the Universe

NASA Astrophysics Data System (ADS)

Juneau, Stephanie; Dey, Arjun; Walker, Constance E.; NOAO Data Lab

2018-01-01

In collaboration with the Teen Astronomy Cafes program, the NOAO Data Lab is developing online Jupyter Notebooks as a free and publicly accessible tool for students and teachers. Each interactive activity teaches students simultaneously about coding and astronomy with a focus on large datasets. Therefore, students learn state-of-the-art techniques at the cross-section between astronomy and data science. During the activity entitled “Our Vast Universe”, students use real spectroscopic data to measure the distance to galaxies before moving on to a catalog with distances to over 100,000 galaxies. Exploring this dataset gives students an appreciation of the large number of galaxies in the universe (2 trillion!), and leads them to discover how galaxies are located in large and impressive filamentary structures. During the Teen Astronomy Cafes program, the notebook is supplemented with visual material conducive to discussion, and hands-on activities involving cubes representing model universes. These steps contribute to build the students’ physical intuition and give them a better grasp of the concepts before using software and coding. At the end of the activity, students have made their own measurements, and have experienced scientific research directly. More information is available online for the Teen Astronomy Cafes (teensciencecafe.org/cafes) and the NOAO Data Lab (datalab.noao.edu).
Data Basin: Expanding Access to Conservation Data, Tools, and People

NASA Astrophysics Data System (ADS)

Comendant, T.; Strittholt, J.; Frost, P.; Ward, B. C.; Bachelet, D. M.; Osborne-Gowey, J.

2009-12-01

Mapping and spatial analysis are a fundamental part of problem solving in conservation science, yet spatial data are widely scattered, difficult to locate, and often unavailable. Valuable time and resources are wasted locating and gaining access to important biological, cultural, and economic datasets, scientific analysis, and experts. As conservation problems become more serious and the demand to solve them grows more urgent, a new way to connect science and practice is needed. To meet this need, an open-access, web tool called Data Basin (www.databasin.org) has been created by the Conservation Biology Institute in partnership with ESRI and the Wilburforce Foundation. Users of Data Basin can gain quick access to datasets, experts, groups, and tools to help solve real-world problems. Individuals and organizations can perform essential tasks such as exploring and downloading from a vast library of conservation datasets, uploading existing datasets, connecting to other external data sources, create groups, and produce customized maps that can be easily shared. Data Basin encourages sharing and publishing, but also provides privacy and security for sensitive information when needed. Users can publish projects within Data Basin to tell more complete and rich stories of discovery and solutions. Projects are an ideal way to publish collections of datasets, maps and other information on the internet to reach wider audiences. Data Basin also houses individual centers that provide direct access to data, maps, and experts focused on specific geographic areas or conservation topics. Current centers being developed include the Boreal Information Centre, the Data Basin Climate Center, and proposed Aquatic and Forest Conservation Centers.
The Greenwich Photo-heliographic Results (1874 - 1976): Summary of the Observations, Applications, Datasets, Definitions and Errors

NASA Astrophysics Data System (ADS)

Willis, D. M.; Coffey, H. E.; Henwood, R.; Erwin, E. H.; Hoyt, D. V.; Wild, M. N.; Denig, W. F.

2013-11-01

The measurements of sunspot positions and areas that were published initially by the Royal Observatory, Greenwich, and subsequently by the Royal Greenwich Observatory (RGO), as the Greenwich Photo-heliographic Results ( GPR), 1874 - 1976, exist in both printed and digital forms. These printed and digital sunspot datasets have been archived in various libraries and data centres. Unfortunately, however, typographic, systematic and isolated errors can be found in the various datasets. The purpose of the present paper is to begin the task of identifying and correcting these errors. In particular, the intention is to provide in one foundational paper all the necessary background information on the original solar observations, their various applications in scientific research, the format of the different digital datasets, the necessary definitions of the quantities measured, and the initial identification of errors in both the printed publications and the digital datasets. Two companion papers address the question of specific identifiable errors; namely, typographic errors in the printed publications, and both isolated and systematic errors in the digital datasets. The existence of two independently prepared digital datasets, which both contain information on sunspot positions and areas, makes it possible to outline a preliminary strategy for the development of an even more accurate digital dataset. Further work is in progress to generate an extremely reliable sunspot digital dataset, based on the programme of solar observations supported for more than a century by the Royal Observatory, Greenwich, and the Royal Greenwich Observatory. This improved dataset should be of value in many future scientific investigations.
The SPEIbase: a new gridded product for the analysis of drought variability and drought impacts

NASA Astrophysics Data System (ADS)

Begueria-Portugues, S.; Vicente-Serrano, S. M.; López-Moreno, J. I.; Angulo-Martínez, M.; El Kenawy, A.

2010-09-01

Recently a new drought indicator, the Standardised Precipitation-Evapotranspiration Index (SPEI), has been proposed to quantify the drought condition over a given area. The SPEI considers not only precipitation but also evapotranspiration (PET) data on its calculation, allowing for a more complete approach to explore the effects of climate change on drought conditions. The SPEI can be calculated at several time scales to adapt to the characteristic times of response to drought of target natural and economic systems, allowing determining their resistance to drought. Following the formulation of the SPEI a global dataset, the SPEIbase, has been made available to the scientific community. The dataset covers the period 1901-2006 with a monthly frequency, and offers global coverage at a 0.5 degrees resolution. The dataset consists on the monthly values of the SPEI at the time scales from 1 to 48 months. A description of the data and metadata, and links to download the files, are provided at http://sac.csic.es/spei. On this communication we will detail the methodology for computing the SPEI and the characteristics of the SPEIbase. A thorough discussion of the SPEI index, and some examples of use, will be provided in a companion comunication.
Streets? Where We're Going, We Don't Need Streets

NASA Astrophysics Data System (ADS)

Bailey, J.

2017-12-01

In 2007 Google Street View started as a project to provide 360-degree imagery along streets, but in the decade since has evolved into a platform through which to explore everywhere from the slope of everest, to the middle of the Amazon rainforest to under the ocean. As camera technology has evolved it has also become a tool for ground truthing maps, and provided scientific observations, storytelling and education. The Google Street View "special collects" team has undertaken increasingly more challenging projects across 80+ countries and every continent. All of which culminated in possibly the most ambitious collection yet, the capture of Street View on board the International Space Station. Learn about the preparation and obstacles behind this and other special collects. Explore these datasets through both Google Earth and Google Expeditions VR, an educational tool to take students on virtual field trips using 360 degree imagery.
Near Real-Time Collection, Processing, and Publication of Beach Morphology and Oceanographic LIDAR Data

NASA Astrophysics Data System (ADS)

Dyer, T.; Brodie, K. L.; Spore, N.

2016-02-01

Modern LIDAR systems, while capable of providing highly accurate and dense datasets, introduce significant challenges in data processing and end-user accessibility. At the United States Army Corps of Engineers Field Research Facility in Duck, North Carolina, we have developed a stationary LIDAR tower for the continuous monitoring of the ocean, beach, and foredune, as well as an automated workflow capable of providing scientific data products from the LIDAR scanner in near real-time through an online data portal. The LIDAR performs hourly scans, taking approximately 50 minutes to complete and producing datasets on the order of 1GB. Processing of the LIDAR data includes coordinate transformations, data rectification and coregistration, filtering to remove noise and unwanted objects, gridding, and time-series analysis to generate products for use by end-users. Examples of these products include water levels and significant wave heights, virtual wave gauge time-series and FFTs, wave runup, foreshore elevations and slopes, and bare earth DEMs. Immediately after processing, data products are combined with ISO compliant metadata and stored using the NetCDF-4 file format, making them easily discoverable through a web portal which provides an interactive map that allows users to explore datasets both spatially and temporally. End-users can download datasets in user-defined time intervals, which can be used, for example, as forcing or validation parameters in numerical models. Funded by the USACE Coastal Ocean Data Systems Program.
Introducing the VISAGE project - Visualization for Integrated Satellite, Airborne, and Ground-based data Exploration

NASA Astrophysics Data System (ADS)

Gatlin, P. N.; Conover, H.; Berendes, T.; Maskey, M.; Naeger, A. R.; Wingo, S. M.

2017-12-01

A key component of NASA's Earth observation system is its field experiments, for intensive observation of particular weather phenomena, or for ground validation of satellite observations. These experiments collect data from a wide variety of airborne and ground-based instruments, on different spatial and temporal scales, often in unique formats. The field data are often used with high volume satellite observations that have very different spatial and temporal coverage. The challenges inherent in working with such diverse datasets make it difficult for scientists to rapidly collect and analyze the data for physical process studies and validation of satellite algorithms. The newly-funded VISAGE project will address these issues by combining and extending nascent efforts to provide on-line data fusion, exploration, analysis and delivery capabilities. A key building block is the Field Campaign Explorer (FCX), which allows users to examine data collected during field campaigns and simplifies data acquisition for event-based research. VISAGE will extend FCX's capabilities beyond interactive visualization and exploration of coincident datasets, to provide interrogation of data values and basic analyses such as ratios and differences between data fields. The project will also incorporate new, higher level fused and aggregated analysis products from the System for Integrating Multi-platform data to Build the Atmospheric column (SIMBA), which combines satellite and ground-based observations into a common gridded atmospheric column data product; and the Validation Network (VN), which compiles a nationwide database of coincident ground- and satellite-based radar measurements of precipitation for larger scale scientific analysis. The VISAGE proof-of-concept will target "golden cases" from Global Precipitation Measurement Ground Validation campaigns. This presentation will introduce the VISAGE project, initial accomplishments and near term plans.
Earth Exploration Toolbook Workshops: Helping Teachers and Students Analyze Web-based Scientific Data

NASA Astrophysics Data System (ADS)

McAuliffe, C.; Ledley, T.; Dahlman, L.; Haddad, N.

2007-12-01

One of the challenges faced by Earth science teachers, particularly in K-12 settings, is that of connecting scientific research to classroom experiences. Helping teachers and students analyze Web-based scientific data is one way to bring scientific research to the classroom. The Earth Exploration Toolbook (EET) was developed as an online resource to accomplish precisely that. The EET consists of chapters containing step-by-step instructions for accessing Web-based scientific data and for using a software analysis tool to explore issues or concepts in science, technology, and mathematics. For example, in one EET chapter, users download Earthquake data from the USGS and bring it into a geographic information system (GIS), analyzing factors affecting the distribution of earthquakes. The goal of the EET Workshops project is to provide professional development that enables teachers to incorporate Web-based scientific data and analysis tools in ways that meet their curricular needs. In the EET Workshops project, Earth science teachers participate in a pair of workshops that are conducted in a combined teleconference and Web-conference format. In the first workshop, the EET Data Analysis Workshop, participants are introduced to the National Science Digital Library (NSDL) and the Digital Library for Earth System Education (DLESE). They also walk through an Earth Exploration Toolbook (EET) chapter and discuss ways to use Earth science datasets and tools with their students. In a follow-up second workshop, the EET Implementation Workshop, teachers share how they used these materials in the classroom by describing the projects and activities that they carried out with students. The EET Workshops project offers unique and effective professional development. Participants work at their own Internet-connected computers, and dial into a toll-free group teleconference for step-by-step facilitation and interaction. They also receive support via Elluminate, a Web-conferencing software program. The software allows participants to see the facilitator's computer as the analysis techniques of an EET chapter are demonstrated. If needed, the facilitator can also view individual participant's computers, assisting with technical difficulties. In addition, it enables a large number of end users, often widely distributed, to engage in interactive, real-time instruction. In this presentation, we will describe the elements of an EET Workshop pair, highlighting the capabilities and use of Elluminate. We will share lessons learned through several years of conducting this type of professional development. We will also share findings from survey data gathered from teachers who have participated in our workshops.
Gene selection for cancer classification with the help of bees.

PubMed

Moosa, Johra Muhammad; Shakur, Rameen; Kaykobad, Mohammad; Rahman, Mohammad Sohel

2016-08-10

Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses. This study presents a modified Artificial Bee Colony Algorithm (ABC) to select minimum number of genes that are deemed to be significant for cancer along with improvement of predictive accuracy. The search equation of ABC is believed to be good at exploration but poor at exploitation. To overcome this limitation we have modified the ABC algorithm by incorporating the concept of pheromones which is one of the major components of Ant Colony Optimization (ACO) algorithm and a new operation in which successive bees communicate to share their findings. The proposed algorithm is evaluated using a suite of ten publicly available datasets after the parameters are tuned scientifically with one of the datasets. Obtained results are compared to other works that used the same datasets. The performance of the proposed method is proved to be superior. The method presented in this paper can provide subset of genes leading to more accurate classification results while the number of selected genes is smaller. Additionally, the proposed modified Artificial Bee Colony Algorithm could conceivably be applied to problems in other areas as well.
LRO-LAMP Observations of Illumination Conditions in the Lunar South Pole: Multi-Dataset and Model Comparison

NASA Astrophysics Data System (ADS)

Mandt, Kathleen; Mazarico, Erwan; Greathouse, Thomas K.; Byron, Ben; Retherford, Kurt D.; Gladstone, Randy; Liu, Yang; Hendrix, Amanda R.; Hurley, Dana; Stickle, Angela; Wes Patterson, G.; Cahill, Joshua; Williams, Jean-Pierre

2017-10-01

The south pole of the Moon is an area of great interest for exploration and scientific research because many low-lying regions are permanently shaded and are likely to trap volatiles for extended periods of time, while adjacent topographic highs can experience extended periods of sunlight. One of the goals of the Lunar Reconnaissance Orbiter (LRO) mission is to characterize the temporal variability of illumination of the lunar polar regions for the benefit of future exploration efforts. We use far ultraviolet (FUV) observations made by the Lyman Alpha Mapping Project (LAMP) to evaluate illumination at the lunar south pole (within 5° of the pole).LAMP observations are made through passive remote sensing in the FUV wavelength range of 57-196 nm using reflected sunlight during daytime observations and reflected light from the IPM and UV-bright stars during nighttime observations. In this study we focused on the region within 5° of the pole, and produced maps using nighttime data taken between September 2009 and February 2014. Summing over long time periods is necessary to obtain sufficient signal to noise. Many of the maps produced for this study show excess brightness in the “Off Band”, or 155-190 nm, because sunlight scattered into the PSRs is most evident in this wavelength range.LAMP observes the highest rate of scattered sunlight in two large PSRs during nighttime observations: Haworth and Shoemaker. We focus on these craters for comparisons with an illumination model and other LRO datasets. We find that the observations of scattered sunlight do not agree with model predictions. However, preliminary results comparing LAMP maps with other LRO datasets show a correlation between LAMP observations of scattered sunlight and Diviner measurements for maximum temperature.
New Discoveries Resulted from Lidar Investigation of Middle and Upper Atmosphere Temperature, Composition, Chemistry and Dynamics at McMurdo, Antarctica

NASA Astrophysics Data System (ADS)

Chu, X.; Yu, Z.; Fong, W.; Chen, C.; Huang, W.; Lu, X.; Gardner, C. S.; McDonald, A.; Fuller-Rowell, T. J.; Vadas, S.

2013-12-01

The scientific motivation to explore the neutral properties of the polar middle and upper atmosphere is compelling. Human-induced changes in the Earth's climate system are one of the most challenging social and scientific issues in this century. Besides monitoring climate change, to fully explore neutral-ion coupling in the critical region between 100 and 200 km is an objective of highest priority for the upper atmosphere science community. Meteorological sources of wave energy from the lower atmosphere are responsible for producing significant variability in the upper atmosphere. Energetic particles and fields originating from the magnetosphere regularly alter the state of the ionosphere. These influences converge through the tight coupling between the ionosphere plasma and neutral thermosphere gas in the space-atmosphere interaction region (SAIR). Unfortunately measurements of the neutral thermosphere are woefully incomplete and in critical need to advance our understanding of and ability to predict the SAIR. Lidar measurements of neutral thermospheric winds, temperatures and species can enable these explorations. To help address these issues, in December 2010 we deployed an Fe Boltzmann temperature lidar to McMurdo (77.8S, 166.7E), Antarctica via collaboration between the United States Antarctic Program and Antarctica New Zealand. Since then an extensive dataset (~3000 h) has been collected by this lidar during its first 32 months of operation, leading to several important new discoveries. The McMurdo lidar campaign will continue for another five years to acquiring long-term datasets for polar geospace research. In this paper we provide a comprehensive overview of the lidar campaign and scientific results, emphasizing several new discoveries in the polar middle and upper atmosphere research. In particular, the lidar has detected neutral Fe layers reaching 170 km in altitude, and derived neutral temperature from 30 to 170 km for the first time in the world. Such discoveries may have opened the new door to observing the neutral thermosphere with ground-based instruments. Extreme Fe events in summer were observed and understood as the interesting interactions among the meteoric metal atoms, sub-visible ice particles and energetic particles during aurora precipitation. Furthermore, the McMurdo middle and upper atmosphere is found to be very dynamical, especially in winter when inertia-gravity waves and eastward propagating planetary waves are predominant in the mesosphere and lower thermosphere and in the stratosphere, respectively. Despite small amplitudes below 100 km, the diurnal and semidiurnal tidal amplitudes exhibit fast growth from 100 to 110 km depending on the geomagnetic activities. These observations pose great challenges to our understanding of the Earth's upper atmosphere but also provide excellent opportunities to exploring how the electrodynamics and neutral dynamics work together at this high southern latitude to produce many intriguing geophysical phenomena.
Earth Observations in Support of Offshore Wind Energy Management in the Euro-Atlantic Region

NASA Astrophysics Data System (ADS)

Liberato, M. L. R.

2017-12-01

Climate change is one of the most important challenges in the 21st century and the energy sector is a major contributor to GHG emissions. Therefore greater attention has been given to the evaluation of offshore wind energy potentials along coastal areas, as it is expected offshore wind energy to be more efficient and cost-effective in the near future. Europe is developing offshore sites for over two decades and has been growing at gigawatt levels in annual capacity. Portugal is among these countries, with the development of a 25MW WindFloat Atlantic wind farm project. The international scientific community has developed robust ability on the research of the climate system components and their interactions. Climate scientists have gained expertise in the observation and analysis of the climate system as well as on the improvement of model and predictive capabilities. Developments on climate science allow advancing our understanding and prediction of the variability and change of Earth's climate on all space and time scales, while improving skilful climate assessments and tools for dealing with future challenges of a warming planet. However the availability of greater datasets amplifies the complexity on manipulation, representation and consequent analysis and interpretation of such datasets. Today the challenge is to translate scientific understanding of the climate system into climate information for society and decision makers. Here we discuss the development of an integration tool for multidisciplinary research, which allows access, management, tailored pre-processing and visualization of datasets, crucial to foster research as a service to society. One application is the assessment and monitoring of renewable energy variability, such as wind or solar energy, at several time and space scales. We demonstrate the ability of the e-science platform for planning, monitoring and management of renewable energy, particularly offshore wind energy in the Euro-Atlantic region. Further we explore the automatization of processes using different domains and datasets, which facilitate further research in evaluating and understanding renewable energy variability. AcknowledgementsThis work is supported by Foundation for Science and Technology (FCT), Portugal, project UID/GEO/50019/2013 - Instituto Dom Luiz.
Enrichment of Data Publications in Earth Sciences - Data Reports as a Missing Link

NASA Astrophysics Data System (ADS)

Elger, Kirsten; Bertelmann, Roland; Haberland, Christian; Evans, Peter L.

2015-04-01

During the past decade, the relevance of research data stewardship has been rising significantly. Preservation and publication of scientific data for long-term use, including the storage in adequate repositories has been identified as a key issue by the scientific community as well as by bodies like research agencies. Essential for any kind of re-use is a proper description of the datasets. As a result of the increasing interest, data repositories have been developed and the included research data is accompanied with at least a minimum set of metadata. This metadata is useful for data discovery and a first insight to the content of a dataset. But often data re-use needs more and extended information. Many datasets are accompanied by a small 'readme file' with basic information on the data structure, or other accompanying documents. A source of additional information could be an article published in one of the newly emerging data journals (e.g. Copernicus's ESSD Earth System Science Data or Nature's Scientific Data). Obviously there is an information gap between a 'readme file', that is only accessible after data download (which often leads to less usage of published datasets than if the information was available beforehand) and the much larger effort to prepare an article for a peer-reviewed data journal. For many years, GFZ German Research Centre for Geosciences publishes 'Scientific Technical Reports (STR)' as a report series which is electronically persistently available and citable with assigned DOIs. This series was opened for the description of parallel published datasets as 'STR Data'. These are internally reviewed and offer a flexible publication format describing published data in depth, suitable for different datasets ranging from long-term monitoring time series of observatories to field data, to (meta-)databases, and software publications. STR Data offer a full and consistent overview and description to all relevant parameters of a linked published dataset. These reports are readable and citable on their own, but are, of course, closely connected to the respective datasets. Therefore, they give full insight into the framework of the data before data download. This is especially relevant for large and often heterogeneous datasets, like e.g. controlled-source seismic data gathered with instruments of the 'Geophysical Instrument Pool Potsdam GIPP'. Here, details of the instrumentation, data organization, data format, accuracy, geographical coordinates, timing and data completeness, etc. need to be documented. STR Data are also attractive for the publication of historic datasets, e.g. 30-40 years old seismic experiments. It is also possible for one STR Data to describe several datasets, e.g. from multiple diverse instruments types, or distinct regions of interest. The publication of DOI-assigned data reports is a helpful tool to fill the gap between basic metadata and restricted 'readme' information on the one hand and preparing extended journal articles on the other hand. They open the way for informed re-use and, with their comprehensive data description, may act as 'appetizer' for the re-use of published datasets.
Scalable Machine Learning for Massive Astronomical Datasets

NASA Astrophysics Data System (ADS)

Ball, Nicholas M.; Gray, A.

2014-04-01

We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
Scalable Machine Learning for Massive Astronomical Datasets

NASA Astrophysics Data System (ADS)

Ball, Nicholas M.; Astronomy Data Centre, Canadian

2014-01-01

We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
New Views of the Moon II 2008-2018; An initiative to integrate new lunar information into our fundamental understanding of the Moon and the next stages of international lunar exploration.

NASA Astrophysics Data System (ADS)

Shearer, C.; Neal, C. R.; Jolliff, B. L.; Wieczorek, M. A.; Mackwell, S.; Lawrence, S.

2015-10-01

In 1998, the Curation and Analysis Planning Team for Extraterrestrial Materials (CAPTEM)sponsored a longterm initiative to improve our understanding of the Moon and its history by integrating all available types of data: in situ investigations, analyses of lunar samples, telescopic observations, and spacecraft datasets. This initiative, New Views of the Moon (NVM-I),was supported by NASA's Science Mission Directorate andthe Lunar and Planetary Institute and guided principally by Brad Jolliff, Charles Shearer,Mark Wieczorek,and Clive Neal. The goals of the original NVM-Iinitiative were (1) tosummarize new insights that have been gained about the Moon as a result of recent global data sets(Clementine, Lunar Prospector), and their integration with sample and other data;(2) to define current understanding of the Moon's geologic history, resources, and potential for scientific exploration; and (3) to communicate implications ofknowledge gained from research and exploration of the Moon for planetary science and exploration beyond the Moon. The NVM- Iinitiative ultimately involved contributions and data synthesis from over 100 individual scientists and engineers at numerous workshops and special sessions at worldwide scientific meetings.NVM-I culminated in a book "New Views of the Moon" published in 2006 as volume 60 of Reviews in Mineralogy and Geochemistry published by the Mineralogical Society of America. In 2012, the book was translated into Chinese.NVM-I went to press prior to analysis of the data from missions flown since 2000, and before the major discoveries from sample analyses made this century

A call for virtual experiments: accelerating the scientific process.

PubMed

Cooper, Jonathan; Vik, Jon Olav; Waltemath, Dagmar

2015-01-01

Experimentation is fundamental to the scientific method, whether for exploration, description or explanation. We argue that promoting the reuse of virtual experiments (the in silico analogues of wet-lab or field experiments) would vastly improve the usefulness and relevance of computational models, encouraging critical scrutiny of models and serving as a common language between modellers and experimentalists. We review the benefits of reusable virtual experiments: in specifying, assaying, and comparing the behavioural repertoires of models; as prerequisites for reproducible research; to guide model reuse and composition; and for quality assurance in the translational application of models. A key step towards achieving this is that models and experimental protocols should be represented separately, but annotated so as to facilitate the linking of models to experiments and data. Lastly, we outline how the rigorous, streamlined confrontation between experimental datasets and candidate models would enable a "continuous integration" of biological knowledge, transforming our approach to systems biology. Copyright © 2014 Elsevier Ltd. All rights reserved.
The Neotoma Paleoecology Database: An International Community-Curated Resource for Paleoecological and Paleoenvironmental Data

NASA Astrophysics Data System (ADS)

Williams, J. W.; Grimm, E. C.; Ashworth, A. C.; Blois, J.; Charles, D. F.; Crawford, S.; Davis, E.; Goring, S. J.; Graham, R. W.; Miller, D. A.; Smith, A. J.; Stryker, M.; Uhen, M. D.

2017-12-01

The Neotoma Paleoecology Database supports global change research at the intersection of geology and ecology by providing a high-quality, community-curated data repository for paleoecological data. These data are widely used to study biological responses and feedbacks to past environmental change at local to global scales. The Neotoma data model is flexible and can store multiple kinds of fossil, biogeochemical, or physical variables measured from sedimentary archives. Data additions to Neotoma are growing and include >3.5 million observations, >16,000 datasets, and >8,500 sites. Dataset types include fossil pollen, vertebrates, diatoms, ostracodes, macroinvertebrates, plant macrofossils, insects, testate amoebae, geochronological data, and the recently added organic biomarkers, stable isotopes, and specimen-level data. Neotoma data can be found and retrieved in multiple ways, including the Explorer map-based interface, a RESTful Application Programming Interface, the neotoma R package, and digital object identifiers. Neotoma has partnered with the Paleobiology Database to produce a common data portal for paleobiological data, called the Earth Life Consortium. A new embargo management is designed to allow investigators to put their data into Neotoma and then make use of Neotoma's value-added services. Neotoma's distributed scientific governance model is flexible and scalable, with many open pathways for welcoming new members, data contributors, stewards, and research communities. As the volume and variety of scientific data grow, community-curated data resources such as Neotoma have become foundational infrastructure for big data science.
iClimate: a climate data and analysis portal

NASA Astrophysics Data System (ADS)

Goodman, P. J.; Russell, J. L.; Merchant, N.; Miller, S. J.; Juneja, A.

2015-12-01

We will describe a new climate data and analysis portal called iClimate that facilitates direct comparisons between available climate observations and climate simulations. Modeled after the successful iPlant Collaborative Discovery Environment (www.iplantcollaborative.org) that allows plant scientists to trade and share environmental, physiological and genetic data and analyses, iClimate provides an easy-to-use platform for large-scale climate research, including the storage, sharing, automated preprocessing, analysis and high-end visualization of large and often disparate observational and model datasets. iClimate will promote data exploration and scientific discovery by providing: efficient and high-speed transfer of data from nodes around the globe (e.g. PCMDI and NASA); standardized and customized data/model metrics; efficient subsampling of datasets based on temporal period, geographical region or variable; and collaboration tools for sharing data, workflows, analysis results, and data visualizations with collaborators or with the community at large. We will present iClimate's capabilities, and demonstrate how it will simplify and enhance the ability to do basic or cutting-edge climate research by professionals, laypeople and students.
Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures

PubMed Central

Holmes, Avram J.; Hollinshead, Marisa O.; O’Keefe, Timothy M.; Petrov, Victor I.; Fariello, Gabriele R.; Wald, Lawrence L.; Fischl, Bruce; Rosen, Bruce R.; Mair, Ross W.; Roffman, Joshua L.; Smoller, Jordan W.; Buckner, Randy L.

2015-01-01

The goal of the Brain Genomics Superstruct Project (GSP) is to enable large-scale exploration of the links between brain function, behavior, and ultimately genetic variation. To provide the broader scientific community data to probe these associations, a repository of structural and functional magnetic resonance imaging (MRI) scans linked to genetic information was constructed from a sample of healthy individuals. The initial release, detailed in the present manuscript, encompasses quality screened cross-sectional data from 1,570 participants ages 18 to 35 years who were scanned with MRI and completed demographic and health questionnaires. Personality and cognitive measures were obtained on a subset of participants. Each dataset contains a T1-weighted structural MRI scan and either one (n=1,570) or two (n=1,139) resting state functional MRI scans. Test-retest reliability datasets are included from 69 participants scanned within six months of their initial visit. For the majority of participants self-report behavioral and cognitive measures are included (n=926 and n=892 respectively). Analyses of data quality, structure, function, personality, and cognition are presented to demonstrate the dataset’s utility. PMID:26175908
DNAism: exploring genomic datasets on the web with Horizon Charts.

PubMed

Rio Deiros, David; Gibbs, Richard A; Rogers, Jeffrey

2016-01-27

Computational biologists daily face the need to explore massive amounts of genomic data. New visualization techniques can help researchers navigate and understand these big data. Horizon Charts are a relatively new visualization method that, under the right circumstances, maximizes data density without losing graphical perception. Horizon Charts have been successfully applied to understand multi-metric time series data. We have adapted an existing JavaScript library (Cubism) that implements Horizon Charts for the time series domain so that it works effectively with genomic datasets. We call this new library DNAism. Horizon Charts can be an effective visual tool to explore complex and large genomic datasets. Researchers can use our library to leverage these techniques to extract additional insights from their own datasets.
Using the OOI Cabled Array HD Camera to Explore Geophysical and Oceanographic Problems at Axial Seamount

NASA Astrophysics Data System (ADS)

Crone, T. J.; Knuth, F.; Marburg, A.

2016-12-01

A broad array of Earth science problems can be investigated using high-definition video imagery from the seafloor, ranging from those that are geological and geophysical in nature, to those that are biological and water-column related. A high-definition video camera was installed as part of the Ocean Observatory Initiative's core instrument suite on the Cabled Array, a real-time fiber optic data and power system that stretches from the Oregon Coast to Axial Seamount on the Juan de Fuca Ridge. This camera runs a 14-minute pan-tilt-zoom routine 8 times per day, focusing on locations of scientific interest on and near the Mushroom vent in the ASHES hydrothermal field inside the Axial caldera. The system produces 13 GB of lossless HD video every 3 hours, and at the time of this writing it has generated 2100 recordings totaling 28.5 TB since it began streaming data into the OOI archive in August of 2015. Because of the large size of this dataset, downloading the entirety of the video for long timescale investigations is not practical. We are developing a set of user-side tools for downloading single frames and frame ranges from the OOI HD camera raw data archive to aid users interested in using these data for their research. We use these tools to download about one year's worth of partial frame sets to investigate several questions regarding the hydrothermal system at ASHES, including the variability of bacterial "floc" in the water-column, and changes in high temperature fluid fluxes using optical flow techniques. We show that while these user-side tools can facilitate rudimentary scientific investigations using the HD camera data, a server-side computing environment that allows users to explore this dataset without downloading any raw video will be required for more advanced investigations to flourish.
Three visualization approaches for communicating and exploring PIT tag data

USGS Publications Warehouse

Letcher, Benjamin; Walker, Jeffrey D.; O'Donnell, Matthew; Whiteley, Andrew R.; Nislow, Keith; Coombs, Jason

2018-01-01

As the number, size and complexity of ecological datasets has increased, narrative and interactive raw data visualizations have emerged as important tools for exploring and understanding these large datasets. As a demonstration, we developed three visualizations to communicate and explore passive integrated transponder tag data from two long-term field studies. We created three independent visualizations for the same dataset, allowing separate entry points for users with different goals and experience levels. The first visualization uses a narrative approach to introduce users to the study. The second visualization provides interactive cross-filters that allow users to explore multi-variate relationships in the dataset. The last visualization allows users to visualize the movement histories of individual fish within the stream network. This suite of visualization tools allows a progressive discovery of more detailed information and should make the data accessible to users with a wide variety of backgrounds and interests.
The emergence of spatial cyberinfrastructure.

PubMed

Wright, Dawn J; Wang, Shaowen

2011-04-05

Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.
The emergence of spatial cyberinfrastructure

PubMed Central

Wright, Dawn J.; Wang, Shaowen

2011-01-01

Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge. PMID:21467227
CROSS DRIVE: A Collaborative and Distributed Virtual Environment for Exploitation of Atmospherical and Geological Datasets of Mars

NASA Astrophysics Data System (ADS)

Cencetti, Michele

2016-07-01

European space exploration missions have produced huge data sets of potentially immense value for research as well as for planning and operating future missions. For instance, Mars Exploration programs comprise a series of missions with launches ranging from the past to beyond present, which are anticipated to produce exceptional volumes of data which provide prospects for research breakthroughs and advancing further activities in space. These collected data include a variety of information, such as imagery, topography, atmospheric, geochemical datasets and more, which has resulted in and still demands, databases, versatile visualisation tools and data reduction methods. Such rate of valuable data acquisition requires the scientists, researchers and computer scientists to coordinate their storage, processing and relevant tools to enable efficient data analysis. However, the current position is that expert teams from various disciplines, the databases and tools are fragmented, leaving little scope for unlocking its value through collaborative activities. The benefits of collaborative virtual environments have been implemented in various industrial fields allowing real-time multi-user collaborative work among people from different disciplines. Exploiting the benefits of advanced immersive virtual environments (IVE) has been recognized as an important interaction paradigm to facilitate future space exploration. The current work is mainly aimed towards the presentation of the preliminary results coming from the CROSS DRIVE project. This research received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 607177 and is mainly aimed towards the implementation of a distributed virtual workspace for collaborative scientific discovery, mission planning and operations. The purpose of the CROSS DRIVE project is to lay foundations of collaborative European workspaces for space science. It will demonstrate the feasibility and begin to standardize the integration of space datasets, simulators, analytical modules, remote scientific centers and experts to work together to conduct space science activities as well as support the planning and operations of space missions. The development of this collaborative workspace infrastructure will be focused through preparation of the ExoMars 2016 TGO and 2018 rover missions. Three use case scenarios with increasing levels of complexities has been considered to exercise the remote and Collaborative Workspace as it would happen during science mission design or real-time operations: rover landing site characterization; Mars atmospheric data analysis and comparison among datasets; rover target selection and motion planning during real-time operations. A brief overview of the traditional approaches used in the operations domains is provided in the first part of the paper, mainly focusing on the main drawbacks that arise during actual missions. Examples of design, execution and management of the operational activities are introduced in this section, highlighting the main issues and tools that are currently used. The current needs and the possible solutions are introduced in the following section, providing details on how CROSS DRIVE environment can be used to improve space operations. The developed prototype and the related approach are assessed to show the improvements that can be achieved with respect to data exchange and users' interactions. The project results are also intended to show how the same operational philosophy can be extended from robotic exploration to human-rated ones missions.
Leveling data in geochemical mapping: scope of application, pros and cons of existing methods

NASA Astrophysics Data System (ADS)

Pereira, Benoît; Vandeuren, Aubry; Sonnet, Philippe

2017-04-01

Geochemical mapping successfully met a range of needs from mineral exploration to environmental management. In Europe and around the world numerous geochemical datasets already exist. These datasets may originate from geochemical mapping projects or from the collection of sample analyses requested by environmental protection regulatory bodies. Combining datasets can be highly beneficial for establishing geochemical maps with increased resolution and/or coverage area. However this practice requires assessing the equivalence between datasets and, if needed, applying data leveling to remove possible biases between datasets. In the literature, several procedures for assessing dataset equivalence and leveling data are proposed. Daneshfar & Cameron (1998) proposed a method for the leveling of two adjacent datasets while Pereira et al. (2016) proposed two methods for the leveling of datasets that contain records located within the same geographical area. Each discussed method requires its own set of assumptions (underlying populations of data, spatial distribution of data, etc.). Here we propose to discuss the scope of application, pros, cons and practical recommendations for each method. This work is illustrated with several case studies in Wallonia (Southern Belgium) and in Europe involving trace element geochemical datasets. References: Daneshfar, B. & Cameron, E. (1998), Leveling geochemical data between map sheets, Journal of Geochemical Exploration 63(3), 189-201. Pereira, B.; Vandeuren, A.; Govaerts, B. B. & Sonnet, P. (2016), Assessing dataset equivalence and leveling data in geochemical mapping, Journal of Geochemical Exploration 168, 36-48.
Minutes of the CD-ROM Workshop

NASA Technical Reports Server (NTRS)

King, Joseph H.; Grayzeck, Edwin J.

1989-01-01

The workshop described in this document had two goals: (1) to establish guidelines for the CD-ROM as a tool to distribute datasets; and (2) to evaluate current scientific CD-ROM projects as an archive. Workshop attendees were urged to coordinate with European groups to develop CD-ROM, which is already available at low cost in the U.S., as a distribution medium for astronomical datasets. It was noted that NASA has made the CD Publisher at the National Space Science Data Center (NSSDC) available to the scientific community when the Publisher is not needed for NASA work. NSSDC's goal is to provide the Publisher's user with the hardware and software tools needed to design a user's dataset for distribution. This includes producing a master CD and copies. The prerequisite premastering process is described, as well as guidelines for CD-ROM construction. The production of discs was evaluated. CD-ROM projects, guidelines, and problems of the technology were discussed.
Publicly Releasing a Large Simulation Dataset with NDS Labs

NASA Astrophysics Data System (ADS)

Goldbaum, Nathan

2016-03-01

Optimally, all publicly funded research should be accompanied by the tools, code, and data necessary to fully reproduce the analysis performed in journal articles describing the research. This ideal can be difficult to attain, particularly when dealing with large (>10 TB) simulation datasets. In this lightning talk, we describe the process of publicly releasing a large simulation dataset to accompany the submission of a journal article. The simulation was performed using Enzo, an open source, community-developed N-body/hydrodynamics code and was analyzed using a wide range of community- developed tools in the scientific Python ecosystem. Although the simulation was performed and analyzed using an ecosystem of sustainably developed tools, we enable sustainable science using our data by making it publicly available. Combining the data release with the NDS Labs infrastructure allows a substantial amount of added value, including web-based access to analysis and visualization using the yt analysis package through an IPython notebook interface. In addition, we are able to accompany the paper submission to the arXiv preprint server with links to the raw simulation data as well as interactive real-time data visualizations that readers can explore on their own or share with colleagues during journal club discussions. It is our hope that the value added by these services will substantially increase the impact and readership of the paper.
Connecting Provenance with Semantic Descriptions in the NASA Earth Exchange (NEX)

NASA Astrophysics Data System (ADS)

Votava, P.; Michaelis, A.; Nemani, R. R.

2012-12-01

NASA Earth Exchange (NEX) is a data, modeling and knowledge collaboratory that houses NASA satellite data, climate data and ancillary data where a focused community may come together to share modeling and analysis codes, scientific results, knowledge and expertise on a centralized platform. Some of the main goals of NEX are transparency and repeatability and to that extent we have been adding components that enable tracking of provenance of both scientific processes and datasets produced by these processes. As scientific processes become more complex, they are often developed collaboratively and it becomes increasingly important for the research team to be able to track the development of the process and the datasets that are produced along the way. Additionally, we want to be able to link the processes and the datasets developed on NEX to an existing information and knowledge, so that the users can query and compare the provenance of any dataset or process with regard to the component-specific attributes such as data quality, geographic location, related publications, user comments and annotations etc. We have developed several ontologies that describe datasets and workflow components available on NEX using the OWL ontology language as well as a simple ontology that provides linking mechanism to the collected provenance information. The provenance is captured in two ways - we utilize existing provenance infrastructure of VisTrails, which is used as a workflow engine on NEX, and we extend the captured provenance using the PROV data model expressed through the PROV-O ontology. We do this in order to link and query the provenance easier in the context of the existing NEX information and knowledge. The captured provenance graph is processed and stored using RDFlib with MySQL backend that can be queried using either RDFLib or SPARQL. As a concrete example, we show how this information is captured during anomaly detection process in large satellite datasets.
Enabling Linked Science in Global Climate Uncertainty Quantification (UQ) Research

NASA Astrophysics Data System (ADS)

Elsethagen, T.; Stephan, E.; Lin, G.; Williams, D.; Banks, E.

2012-12-01

This paper shares a real-world global climate UQ science use case and illustrates how a linked science application called Provenance Environment (ProvEn), currently being developed, enables and facilitates scientific teams to publish, share, link, and discover new links over their UQ research results. UQ results include terascale datasets that are published to an Earth Systems Grid Federation (ESGF) repository. ProvEn demonstrates how a scientific team conducting UQ studies can discover dataset links using its domain knowledgebase, allowing them to better understand the UQ study research objectives, the experimental protocol used, the resulting dataset lineage, related analytical findings, ancillary literature citations, along with the social network of scientists associated with the study. This research claims that scientists using this linked science approach will not only allow them to greatly benefit from understanding a particular dataset within a knowledge context, a benefit can also be seen by the cross reference of knowledge among the numerous UQ studies being stored in ESGF. ProvEn collects native forms of data provenance resources as the UQ study is carried out. The native data provenance resources can be collected from a variety of sources such as scripts, a workflow engine log, simulation log files, scientific team members etc. Schema alignment is used to translate the native forms of provenance into a set of W3C PROV-O semantic statements used as a common interchange format which will also contain URI references back to resources in the UQ study dataset for querying and cross referencing. ProvEn leverages Fedora Commons' digital object model in a Resource Oriented Architecture (ROA) (i.e. a RESTful framework) to logically organize and partition native and translated provenance resources by UQ study. The ROA also provides scientists the means to both search native and translated forms of provenance.
Dam Removal Information Portal (DRIP)—A map-based resource linking scientific studies and associated geospatial information about dam removals

USGS Publications Warehouse

Duda, Jeffrey J.; Wieferich, Daniel J.; Bristol, R. Sky; Bellmore, J. Ryan; Hutchison, Vivian B.; Vittum, Katherine M.; Craig, Laura; Warrick, Jonathan A.

2016-08-18

The removal of dams has recently increased over historical levels due to aging infrastructure, changing societal needs, and modern safety standards rendering some dams obsolete. Where possibilities for river restoration, or improved safety, exceed the benefits of retaining a dam, removal is more often being considered as a viable option. Yet, as this is a relatively new development in the history of river management, science is just beginning to guide our understanding of the physical and ecological implications of dam removal. Ultimately, the “lessons learned” from previous scientific studies on the outcomes dam removal could inform future scientific understanding of ecosystem outcomes, as well as aid in decision-making by stakeholders. We created a database visualization tool, the Dam Removal Information Portal (DRIP), to display map-based, interactive information about the scientific studies associated with dam removals. Serving both as a bibliographic source as well as a link to other existing databases like the National Hydrography Dataset, the derived National Dam Removal Science Database serves as the foundation for a Web-based application that synthesizes the existing scientific studies associated with dam removals. Thus, using the DRIP application, users can explore information about completed dam removal projects (for example, their location, height, and date removed), as well as discover sources and details of associated of scientific studies. As such, DRIP is intended to be a dynamic collection of scientific information related to dams that have been removed in the United States and elsewhere. This report describes the architecture and concepts of this “metaknowledge” database and the DRIP visualization tool.
Data Visualization in Sociology

PubMed Central

Healy, Kieran; Moody, James

2014-01-01

Visualizing data is central to social scientific work. Despite a promising early beginning, sociology has lagged in the use of visual tools. We review the history and current state of visualization in sociology. Using examples throughout, we discuss recent developments in ways of seeing raw data and presenting the results of statistical modeling. We make a general distinction between those methods and tools designed to help explore datasets, and those designed to help present results to others. We argue that recent advances should be seen as part of a broader shift towards easier sharing of the code and data both between researchers and with wider publics, and encourage practitioners and publishers to work toward a higher and more consistent standard for the graphical display of sociological insights. PMID:25342872
Towards an effective data peer review

NASA Astrophysics Data System (ADS)

Düsterhus, André; Hense, Andreas

2014-05-01

Peer review is an established procedure to ensure the quality of scientific publications and is currently used as a prerequisite for acceptance of papers in the scientific community. In the past years the publication of raw data and its metadata got increased attention, which led to the idea of bringing it to the same standards the journals for traditional publications have. One missing element to achieve this is a comparable peer review scheme. This contribution introduces the idea of a quality evaluation process, which is designed to analyse the technical quality as well as the content of a dataset. It bases on quality tests, which results are evaluated with the help of the knowledge of an expert. The results of the tests and the expert knowledge are evaluated probabilistically and are statistically combined. As a result the quality of a dataset is estimated with a single value only. This approach allows the reviewer to quickly identify the potential weaknesses of a dataset and generate a transparent and comprehensible report. To demonstrate the scheme, an application on a large meteorological dataset will be shown. Furthermore, potentials and risks of such a scheme will be introduced and practical implications for its possible introduction to data centres investigated. Especially, the effects of reducing the estimate of quality of a dataset to a single number will be critically discussed.
Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

NASA Astrophysics Data System (ADS)

Chegwidden, O.; Nijssen, B.; Pytlak, E.

2017-12-01

Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us to develop improved methods for scientists and practitioners alike.
Semantic technologies improving the recall and precision of the Mercury metadata search engine

NASA Astrophysics Data System (ADS)

Pouchard, L. C.; Cook, R. B.; Green, J.; Palanisamy, G.; Noy, N.

2011-12-01

The Mercury federated metadata system [1] was developed at the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-sponsored effort holding datasets about biogeochemical dynamics, ecological data, and environmental processes. Mercury currently indexes over 100,000 records from several data providers conforming to community standards, e.g. EML, FGDC, FGDC Biological Profile, ISO 19115 and DIF. With the breadth of sciences represented in Mercury, the potential exists to address some key interdisciplinary scientific challenges related to climate change, its environmental and ecological impacts, and mitigation of these impacts. However, this wealth of metadata also hinders pinpointing datasets relevant to a particular inquiry. We implemented a semantic solution after concluding that traditional search approaches cannot improve the accuracy of the search results in this domain because: a) unlike everyday queries, scientific queries seek to return specific datasets with numerous parameters that may or may not be exposed to search (Deep Web queries); b) the relevance of a dataset cannot be judged by its popularity, as each scientific inquiry tends to be unique; and c)each domain science has its own terminology, more or less curated, consensual, and standardized depending on the domain. The same terms may refer to different concepts across domains (homonyms), but different terms mean the same thing (synonyms). Interdisciplinary research is arduous because an expert in a domain must become fluent in the language of another, just to find relevant datasets. Thus, we decided to use scientific ontologies because they can provide a context for a free-text search, in a way that string-based keywords never will. With added context, relevant datasets are more easily discoverable. To enable search and programmatic access to ontology entities in Mercury, we are using an instance of the BioPortal ontology repository. Mercury accesses ontology entities using the BioPortal REST API by passing a search parameter to BioPortal that may return domain context, parameter attribute, or entity annotations depending on the entity's associated ontological relationships. As Mercury's facetted search is popular with users, the results are displayed as facets. Unlike a facetted search however, the ontology-based solution implements both restrictions (improving precision) and expansions (improving recall) on the results of the initial search. For instance, "carbon" acquires a scientific context and additional key terms or phrases for discovering domain-specific datasets. A limitation of our solution is that the user must perform an additional step. Another limitation is that the quality of the newly discovered metadata is contingent upon the quality of the ontologies we use. Our solution leverages Mercury's federated capabilities to collect records from heterogeneous domains, and BioPortal's storage, curation and access capabilities for ontology entities. With minimal additional development, our approach builds on two mature systems for finding relevant datasets for interdisciplinary inquiries. We thus indicate a path forward for linking environmental, ecological and biological sciences. References: [1] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94.

Modeling and Databases for Teaching Petrology

NASA Astrophysics Data System (ADS)

Asher, P.; Dutrow, B.

2003-12-01

With the widespread availability of high-speed computers with massive storage and ready transport capability of large amounts of data, computational and petrologic modeling and the use of databases provide new tools with which to teach petrology. Modeling can be used to gain insights into a system, predict system behavior, describe a system's processes, compare with a natural system or simply to be illustrative. These aspects result from data driven or empirical, analytical or numerical models or the concurrent examination of multiple lines of evidence. At the same time, use of models can enhance core foundations of the geosciences by improving critical thinking skills and by reinforcing prior knowledge gained. However, the use of modeling to teach petrology is dictated by the level of expectation we have for students and their facility with modeling approaches. For example, do we expect students to push buttons and navigate a program, understand the conceptual model and/or evaluate the results of a model. Whatever the desired level of sophistication, specific elements of design should be incorporated into a modeling exercise for effective teaching. These include, but are not limited to; use of the scientific method, use of prior knowledge, a clear statement of purpose and goals, attainable goals, a connection to the natural/actual system, a demonstration that complex heterogeneous natural systems are amenable to analyses by these techniques and, ideally, connections to other disciplines and the larger earth system. Databases offer another avenue with which to explore petrology. Large datasets are available that allow integration of multiple lines of evidence to attack a petrologic problem or understand a petrologic process. These are collected into a database that offers a tool for exploring, organizing and analyzing the data. For example, datasets may be geochemical, mineralogic, experimental and/or visual in nature, covering global, regional to local scales. These datasets provide students with access to large amount of related data through space and time. Goals of the database working group include educating earth scientists about information systems in general, about the importance of metadata about ways of using databases and datasets as educational tools and about the availability of existing datasets and databases. The modeling and databases groups hope to create additional petrologic teaching tools using these aspects and invite the community to contribute to the effort.
The Planned Europa Clipper Mission: Exploring Europa to Investigate its Habitability

NASA Astrophysics Data System (ADS)

Pappalardo, Robert T.; Senske, David A.; Korth, Haje; Blaney, Diana L.; Blankenship, Donald D.; Christensen, Philip R.; Kempf, Sascha; Raymond, Carol Anne; Retherford, Kurt D.; Turtle, Elizabeth P.; Waite, J. Hunter; Westlake, Joseph H.; Collins, Geoffrey; Gudipati, Murthy; Lunine, Jonathan I.; Paty, Carol; Rathbun, Julie A.; Roberts, James; E Schmidt, Britney; Soderblom, Jason M.; Europa Clipper Science Team

2017-10-01

A key driver of planetary exploration is to understand the processes that lead to habitability across the solar system. In this context, the science goal of the planned Europa Clipper mission is: Explore Europa to investigate its habitability. Following from this goal are three Mission Objectives: 1) Characterize the ice shell and any subsurface water, including their heterogeneity, ocean properties, and the nature of surface-ice-ocean exchange; 2) Understand the habitability of Europa's ocean through composition and chemistry; and 3) Understand the formation of surface features, including sites of recent or current activity, and characterize localities of high science interest. Folded into these three objectives is the desire to search for and characterize any current activity.To address the Europa science objectives, a highly capable and synergistic suite of nine instruments comprise the mission's scientific payload. This payload includes five remote-sensing instruments that observe the wavelength range from ultraviolet through radar, specifically: Europa UltraViolet Spectrograph (Europa-UVS), Europa Imaging System (EIS), Mapping Imaging Spectrometer for Europa (MISE), Europa THErMal Imaging System (E-THEMIS), and Radar for Europa Assessment and Sounding: Ocean to Near-surface (REASON). In addition, four in-situ instruments measure fields and particles: Interior Characterization of Europa using MAGnetometry (ICEMAG), Plasma Instrument for Magnetic Sounding (PIMS), MAss Spectrometer for Planetary EXploration (MASPEX), and SUrface Dust Analyzer (SUDA). Moreover, gravity science can be addressed via the spacecraft's telecommunication system, and scientifically valuable engineering data from the radiation monitoring system would augment the plasma dataset. Working together, the planned Europa mission’s science payload would allow testing of hypotheses relevant to the composition, interior, and geology of Europa, to address the potential habitability of this intriguing moon.
Tree-based approach for exploring marine spatial patterns with raster datasets.

PubMed

Liao, Xiaohan; Xue, Cunjin; Su, Fenzhen

2017-01-01

From multiple raster datasets to spatial association patterns, the data-mining technique is divided into three subtasks, i.e., raster dataset pretreatment, mining algorithm design, and spatial pattern exploration from the mining results. Comparison with the former two subtasks reveals that the latter remains unresolved. Confronted with the interrelated marine environmental parameters, we propose a Tree-based Approach for eXploring Marine Spatial Patterns with multiple raster datasets called TAXMarSP, which includes two models. One is the Tree-based Cascading Organization Model (TCOM), and the other is the Spatial Neighborhood-based CAlculation Model (SNCAM). TCOM designs the "Spatial node→Pattern node" from top to bottom layers to store the table-formatted frequent patterns. Together with TCOM, SNCAM considers the spatial neighborhood contributions to calculate the pattern-matching degree between the specified marine parameters and the table-formatted frequent patterns and then explores the marine spatial patterns. Using the prevalent quantification Apriori algorithm and a real remote sensing dataset from January 1998 to December 2014, a successful application of TAXMarSP to marine spatial patterns in the Pacific Ocean is described, and the obtained marine spatial patterns present not only the well-known but also new patterns to Earth scientists.
Enabling Data Fusion via a Common Data Model and Programming Interface

NASA Astrophysics Data System (ADS)

Lindholm, D. M.; Wilson, A.

2011-12-01

Much progress has been made in scientific data interoperability, especially in the areas of metadata and discovery. However, while a data user may have improved techniques for finding data, there is often a large chasm to span when it comes to acquiring the desired subsets of various datasets and integrating them into a data processing environment. Some tools such as OPeNDAP servers and the Unidata Common Data Model (CDM) have introduced improved abstractions for accessing data via a common interface, but they alone do not go far enough to enable fusion of data from multidisciplinary sources. Although data from various scientific disciplines may represent semantically similar concepts (e.g. time series), the user may face widely varying structural representations of the data (e.g. row versus column oriented), not to mention radically different storage formats. It is not enough to convert data to a common format. The key to fusing scientific data is to represent each dataset with consistent sampling. This can best be done by using a data model that expresses the functional relationship that each dataset represents. The domain of those functions determines how the data can be combined. The Visualization for Algorithm Development (VisAD) Java API has provided a sophisticated data model for representing the functional nature of scientific datasets for well over a decade. Because VisAD is largely designed for its visualization capabilities, the data model can be cumbersome to use for numerical computation, especially for those not comfortable with Java. Although both VisAD and the implementation of the CDM are written in Java, neither defines a pure Java interface that others could implement and program to, further limiting potential for interoperability. In this talk, we will present a solution for data integration based on a simple discipline-agnostic scientific data model and programming interface that enables a dataset to be defined in terms of three variable types: Scalar (a), Tuple (a,b), and Function (a -> b). These basic building blocks can be combined and nested to represent any arbitrarily complex dataset. For example, a time series of surface temperature and pressure could be represented as: time -> ((lon,lat) -> (T,P)). Our data model is expressed in UML and can be implemented in numerous programming languages. We will demonstrate an implementation of our data model and interface using the Scala programming language. Given its functional programming constructs, sophisticated type system, and other language features, Scala enables us to construct complex data structures that can be manipulated using natural mathematical expressions while taking advantage of the language's ability to operate on collections in parallel. This API will be applied to the problem of assimilating various measurements of the solar spectrum and other proxies from multiple sources to construct a composite Lyman-alpha irradiance dataset.
EarthScope Plate Boundary Observatory Data in the College Classroom (Invited)

NASA Astrophysics Data System (ADS)

Eriksson, S. C.; Olds, S. E.

2009-12-01

The Plate Boundary Observatory (PBO) is the geodetic component of the EarthScope project, designed to study the 3-D strain field across the active boundary zone between the Pacific and North American tectonics plates in the western United States. All PBO data are freely available to scientific and educational communities and have been incorporated into a variety of activities for college and university classrooms. UNAVCO Education and Outreach program staff have worked closely with faculty users, scientific researchers, and facility staff to create materials that are scientifically and technically accurate as well as useful to the classroom user. Availability of processed GPS data is not new to the geoscience community. However, PBO data staff have worked with education staff to deliver data that are readily accessible to educators. The UNAVCO Data for Educators webpage, incorporating an embedded Google Map with PBO GPS locations and providing current GPS time series plots and downloadable data, extends and updates the datasets available to our community. Google Earth allows the visualization GPS data with other types of datasets, e.g. LiDAR, while maintaining the self-contained and easy-to-use interface of UNAVCO’s Jules Verne Voyager map tools, which have multiple sets of geological and geophysical data. Curricular materials provide scaffolds for using EarthScope data in a variety of forms for different learning goals. Simple visualization of earthquake epicenters and locations of volcanoes can be used with velocity vectors to make simple deductions of plate boundary behaviors. Readily available time series plots provide opportunities for additional science skills, and there are web and paper-based support materials for downloading data, manipulating tables, and using plotting programs for processed GPS data. Scientists have provided contextual materials to explore the importance of these data in interpreting the structure and dynamics of the Earth. These data and their scientific context are now incorporated into the Active Earth Display developed by IRIS. Formal and informal evaluations during the past five years have provided useful data for revision and on-line implementation.
Systematic Processing of Clementine Data for Scientific Analyses

NASA Technical Reports Server (NTRS)

Mcewen, A. S.

1993-01-01

If fully successful, the Clementine mission will return about 3,000,000 lunar images and more than 5000 images of Geographos. Effective scientific analyses of such large datasets require systematic processing efforts. Concepts for two such efforts are described: glogal multispectral imaging of the moon; and videos of Geographos.
Ontology-Driven Discovery of Scientific Computational Entities

ERIC Educational Resources Information Center

Brazier, Pearl W.

2010-01-01

Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…
Statistical Exploration of Electronic Structure of Molecules from Quantum Monte-Carlo Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prabhat, Mr; Zubarev, Dmitry; Lester, Jr., William A.

In this report, we present results from analysis of Quantum Monte Carlo (QMC) simulation data with the goal of determining internal structure of a 3N-dimensional phase space of an N-electron molecule. We are interested in mining the simulation data for patterns that might be indicative of the bond rearrangement as molecules change electronic states. We examined simulation output that tracks the positions of two coupled electrons in the singlet and triplet states of an H2 molecule. The electrons trace out a trajectory, which was analyzed with a number of statistical techniques. This project was intended to address the following scientificmore » questions: (1) Do high-dimensional phase spaces characterizing electronic structure of molecules tend to cluster in any natural way? Do we see a change in clustering patterns as we explore different electronic states of the same molecule? (2) Since it is hard to understand the high-dimensional space of trajectories, can we project these trajectories to a lower dimensional subspace to gain a better understanding of patterns? (3) Do trajectories inherently lie in a lower-dimensional manifold? Can we recover that manifold? After extensive statistical analysis, we are now in a better position to respond to these questions. (1) We definitely see clustering patterns, and differences between the H2 and H2tri datasets. These are revealed by the pamk method in a fairly reliable manner and can potentially be used to distinguish bonded and non-bonded systems and get insight into the nature of bonding. (2) Projecting to a lower dimensional subspace ({approx}4-5) using PCA or Kernel PCA reveals interesting patterns in the distribution of scalar values, which can be related to the existing descriptors of electronic structure of molecules. Also, these results can be immediately used to develop robust tools for analysis of noisy data obtained during QMC simulations (3) All dimensionality reduction and estimation techniques that we tried seem to indicate that one needs 4 or 5 components to account for most of the variance in the data, hence this 5D dataset does not necessarily lie on a well-defined, low dimensional manifold. In terms of specific clustering techniques, K-means was generally useful in exploring the dataset. The partition around medoids (pam) technique produced the most definitive results for our data showing distinctive patterns for both a sample of the complete data and time-series. The gap statistic with tibshirani criteria did not provide any distinction across the 2 dataset. The gap statistic w/DandF criteria, Model based clustering and hierarchical modeling simply failed to run on our datasets. Thankfully, the vanilla PCA technique was successful in handling our entire dataset. PCA revealed some interesting patterns for the scalar value distribution. Kernel PCA techniques (vanilladot, RBF, Polynomial) and MDS failed to run on the entire dataset, or even a significant fraction of the dataset, and we resorted to creating an explicit feature map followed by conventional PCA. Clustering using K-means and PAM in the new basis set seems to produce promising results. Understanding the new basis set in the scientific context of the problem is challenging, and we are currently working to further examine and interpret the results.« less
A study of anthropogenic and climatic disturbance of the New River Estuary using a Bayesian belief network.

PubMed

Nojavan A, Farnaz; Qian, Song S; Paerl, Hans W; Reckhow, Kenneth H; Albright, Elizabeth A

2014-06-15

The present paper utilizes a Bayesian Belief Network (BBN) approach to intuitively present and quantify our current understanding of the complex physical, chemical, and biological processes that lead to eutrophication in an estuarine ecosystem (New River Estuary, North Carolina, USA). The model is further used to explore the effects of plausible future climatic and nutrient pollution management scenarios on water quality indicators. The BBN, through visualizing the structure of the network, facilitates knowledge communication with managers/stakeholders who might not be experts in the underlying scientific disciplines. Moreover, the developed structure of the BBN is transferable to other comparable estuaries. The BBN nodes are discretized exploring a new approach called moment matching method. The conditional probability tables of the variables are driven by a large dataset (four years). Our results show interaction among various predictors and their impact on water quality indicators. The synergistic effects caution future management actions. Copyright © 2014 Elsevier Ltd. All rights reserved.
A case study of data integration for aquatic resources using semantic web technologies

USGS Publications Warehouse

Gordon, Janice M.; Chkhenkeli, Nina; Govoni, David L.; Lightsom, Frances L.; Ostroff, Andrea C.; Schweitzer, Peter N.; Thongsavanh, Phethala; Varanka, Dalia E.; Zednik, Stephan

2015-01-01

Use cases, information modeling, and linked data techniques are Semantic Web technologies used to develop a prototype system that integrates scientific observations from four independent USGS and cooperator data systems. The techniques were tested with a use case goal of creating a data set for use in exploring potential relationships among freshwater fish populations and environmental factors. The resulting prototype extracts data from the BioData Retrieval System, the Multistate Aquatic Resource Information System, the National Geochemical Survey, and the National Hydrography Dataset. A prototype user interface allows a scientist to select observations from these data systems and combine them into a single data set in RDF format that includes explicitly defined relationships and data definitions. The project was funded by the USGS Community for Data Integration and undertaken by the Community for Data Integration Semantic Web Working Group in order to demonstrate use of Semantic Web technologies by scientists. This allows scientists to simultaneously explore data that are available in multiple, disparate systems beyond those they traditionally have used.
Data publication and sharing using the SciDrive service

NASA Astrophysics Data System (ADS)

Mishin, Dmitry; Medvedev, D.; Szalay, A. S.; Plante, R. L.

2014-01-01

Despite the last years progress in scientific data storage, still remains the problem of public data storage and sharing system for relatively small scientific datasets. These are collections forming the “long tail” of power log datasets distribution. The aggregated size of the long tail data is comparable to the size of all data collections from large archives, and the value of data is significant. The SciDrive project's main goal is providing the scientific community with a place to reliably and freely store such data and provide access to it to broad scientific community. The primary target audience of the project is astoromy community, and it will be extended to other fields. We're aiming to create a simple way of publishing a dataset, which can be then shared with other people. Data owner controls the permissions to modify and access the data and can assign a group of users or open the access to everyone. The data contained in the dataset will be automaticaly recognized by a background process. Known data formats will be extracted according to the user's settings. Currently tabular data can be automatically extracted to the user's MyDB table where user can make SQL queries to the dataset and merge it with other public CasJobs resources. Other data formats can be processed using a set of plugins that upload the data or metadata to user-defined side services. The current implementation targets some of the data formats commonly used by the astronomy communities, including FITS, ASCII and Excel tables, TIFF images, and YT simulations data archives. Along with generic metadata, format-specific metadata is also processed. For example, basic information about celestial objects is extracted from FITS files and TIFF images, if present. A 100TB implementation has just been put into production at Johns Hopkins University. The system features public data storage REST service supporting VOSpace 2.0 and Dropbox protocols, HTML5 web portal, command-line client and Java standalone client to synchronize a local folder with the remote storage. We use VAO SSO (Single Sign On) service from NCSA for users authentication that provides free registration for everyone.
Advancing the Potential of Citizen Science for Urban Water Quality Monitoring: Exploring Research Design and Methodology in New York City

NASA Astrophysics Data System (ADS)

Hsueh, D.; Farnham, D. J.; Gibson, R.; McGillis, W. R.; Culligan, P. J.; Cooper, C.; Larson, L.; Mailloux, B. J.; Buchanan, R.; Borus, N.; Zain, N.; Eddowes, D.; Butkiewicz, L.; Loiselle, S. A.

2015-12-01

Citizen Science is a fast-growing ecological research tool with proven potential to rapidly produce large datasets. While the fields of astronomy and ornithology demonstrate particularly successful histories of enlisting the public in conducting scientific work, citizen science applications to the field of hydrology have been relatively underutilized. We demonstrate the potential of citizen science for monitoring water quality, particularly in the impervious, urban environment of New York City (NYC) where pollution via stormwater runoff is a leading source of waterway contamination. Through partnerships with HSBC, Earthwatch, and the NYC Water Trail Association, we have trained two citizen science communities to monitor the quality of NYC waterways, testing for a suite of water quality parameters including pH, turbidity, phosphate, nitrate, and Enterococci (an indicator bacteria for the presence of harmful pathogens associated with fecal pollution). We continue to enhance these citizen science programs with two additions to our methodology. First, we designed and produced at-home incubation ovens for Enterococci analysis, and second, we are developing automated photo-imaging for nitrate and phosphate concentrations. These improvements make our work more publicly accessible while maintaining scientific accuracy. We also initiated a volunteer survey assessing the motivations for participation among our citizen scientists. These three endeavors will inform future applications of citizen science for urban hydrological research. Ultimately, the spatiotemporally-rich dataset of waterway quality produced from our citizen science efforts will help advise NYC policy makers about the impacts of green infrastructure and other types of government-led efforts to clean up NYC waterways.
Advances in Multi-Sensor Scanning and Visualization of Complex Plants: the Utmost Case of a Reactor Building

NASA Astrophysics Data System (ADS)

Hullo, J.-F.; Thibault, G.; Boucheny, C.

2015-02-01

In a context of increased maintenance operations and workers generational renewal, a nuclear owner and operator like Electricité de France (EDF) is interested in the scaling up of tools and methods of "as-built virtual reality" for larger buildings and wider audiences. However, acquisition and sharing of as-built data on a large scale (large and complex multi-floored buildings) challenge current scientific and technical capacities. In this paper, we first present a state of the art of scanning tools and methods for industrial plants with very complex architecture. Then, we introduce the inner characteristics of the multi-sensor scanning and visualization of the interior of the most complex building of a power plant: a nuclear reactor building. We introduce several developments that made possible a first complete survey of such a large building, from acquisition, processing and fusion of multiple data sources (3D laser scans, total-station survey, RGB panoramic, 2D floor plans, 3D CAD as-built models). In addition, we present the concepts of a smart application developed for the painless exploration of the whole dataset. The goal of this application is to help professionals, unfamiliar with the manipulation of such datasets, to take into account spatial constraints induced by the building complexity while preparing maintenance operations. Finally, we discuss the main feedbacks of this large experiment, the remaining issues for the generalization of such large scale surveys and the future technical and scientific challenges in the field of industrial "virtual reality".
Creating a FIESTA (Framework for Integrated Earth Science and Technology Applications) with MagIC

NASA Astrophysics Data System (ADS)

Minnett, R.; Koppers, A. A. P.; Jarboe, N.; Tauxe, L.; Constable, C.

2017-12-01

The Magnetics Information Consortium (https://earthref.org/MagIC) has recently developed a containerized web application to considerably reduce the friction in contributing, exploring and combining valuable and complex datasets for the paleo-, geo- and rock magnetic scientific community. The data produced in this scientific domain are inherently hierarchical and the communities evolving approaches to this scientific workflow, from sampling to taking measurements to multiple levels of interpretations, require a large and flexible data model to adequately annotate the results and ensure reproducibility. Historically, contributing such detail in a consistent format has been prohibitively time consuming and often resulted in only publishing the highly derived interpretations. The new open-source (https://github.com/earthref/MagIC) application provides a flexible upload tool integrated with the data model to easily create a validated contribution and a powerful search interface for discovering datasets and combining them to enable transformative science. MagIC is hosted at EarthRef.org along with several interdisciplinary geoscience databases. A FIESTA (Framework for Integrated Earth Science and Technology Applications) is being created by generalizing MagIC's web application for reuse in other domains. The application relies on a single configuration document that describes the routing, data model, component settings and external services integrations. The container hosts an isomorphic Meteor JavaScript application, MongoDB database and ElasticSearch search engine. Multiple containers can be configured as microservices to serve portions of the application or rely on externally hosted MongoDB, ElasticSearch, or third-party services to efficiently scale computational demands. FIESTA is particularly well suited for many Earth Science disciplines with its flexible data model, mapping, account management, upload tool to private workspaces, reference metadata, image galleries, full text searches and detailed filters. EarthRef's Seamount Catalog of bathymetry and morphology data, EarthRef's Geochemical Earth Reference Model (GERM) databases, and Oregon State University's Marine and Geology Repository (http://osu-mgr.org) will benefit from custom adaptations of FIESTA.
Chairmanship of the Neptune/Pluto outer planets science working group

NASA Astrophysics Data System (ADS)

Stern, S. Alan

1993-11-01

The Outer Planets Science Working Group (OPSWG) is the NASA Solar System Exploration Division (SSED) scientific steering committee for the Outer Solar System missions. OPSWG consists of 19 members and is chaired by Dr. S. Alan Stern. This proposal summarizes the FY93 activities of OPSWG, describes a set of objectives for OPSWG in FY94, and outlines the SWG's activities for FY95. As chair of OPSWG, Dr. Stern will be responsible for: organizing priorities, setting agendas, conducting meetings of the Outer Planets SWG; reporting the results of OPSWG's work to SSED; supporting those activities relating to OPSWG work, such as briefings to the SSES, COMPLEX, and OSS; supporting the JPL/SAIC Pluto study team; and other tasks requested by SSED. As the Scientific Working Group (SWG) for Jupiter and the planets beyond, OPSWG is the SSED SWG chartered to study and develop mission plans for all missions to the giant planets, Pluto, and other distant objects in the remote outer solar system. In that role, OPSWG is responsible for: defining and prioritizing scientific objectives for missions to these bodies; defining and documenting the scientific goals and rationale behind such missions; defining and prioritizing the datasets to be obtained in these missions; defining and prioritizing measurement objectives for these missions; defining and documenting the scientific rationale for strawman instrument payloads; defining and prioritizing the scientific requirements for orbital tour and flyby encounter trajectories; defining cruise science opportunities plan; providing technical feedback to JPL and SSED on the scientific capabilities of engineering studies for these missions; providing documentation to SSED concerning the scientific goals, objectives, and rationale for the mission; interfacing with other SSED and OSS committees at the request of SSED's Director or those committee chairs; providing input to SSED concerning the structure and content of the Announcement of Opportunity for payload and scientific team selection for such missions; and providing other technical or programmatic inputs concerning outer solar system missions at the request of the Director of SSED.
Chairmanship of the Neptune/Pluto outer planets science working group

NASA Technical Reports Server (NTRS)

Stern, S. Alan

1993-01-01

The Outer Planets Science Working Group (OPSWG) is the NASA Solar System Exploration Division (SSED) scientific steering committee for the Outer Solar System missions. OPSWG consists of 19 members and is chaired by Dr. S. Alan Stern. This proposal summarizes the FY93 activities of OPSWG, describes a set of objectives for OPSWG in FY94, and outlines the SWG's activities for FY95. As chair of OPSWG, Dr. Stern will be responsible for: organizing priorities, setting agendas, conducting meetings of the Outer Planets SWG; reporting the results of OPSWG's work to SSED; supporting those activities relating to OPSWG work, such as briefings to the SSES, COMPLEX, and OSS; supporting the JPL/SAIC Pluto study team; and other tasks requested by SSED. As the Scientific Working Group (SWG) for Jupiter and the planets beyond, OPSWG is the SSED SWG chartered to study and develop mission plans for all missions to the giant planets, Pluto, and other distant objects in the remote outer solar system. In that role, OPSWG is responsible for: defining and prioritizing scientific objectives for missions to these bodies; defining and documenting the scientific goals and rationale behind such missions; defining and prioritizing the datasets to be obtained in these missions; defining and prioritizing measurement objectives for these missions; defining and documenting the scientific rationale for strawman instrument payloads; defining and prioritizing the scientific requirements for orbital tour and flyby encounter trajectories; defining cruise science opportunities plan; providing technical feedback to JPL and SSED on the scientific capabilities of engineering studies for these missions; providing documentation to SSED concerning the scientific goals, objectives, and rationale for the mission; interfacing with other SSED and OSS committees at the request of SSED's Director or those committee chairs; providing input to SSED concerning the structure and content of the Announcement of Opportunity for payload and scientific team selection for such missions; and providing other technical or programmatic inputs concerning outer solar system missions at the request of the Director of SSED.
NASA's Best-Observed X-Class Flare of All Time

NASA Image and Video Library

2014-05-07

A combination of many (but not all) of the datasets which observed this flare. -- On March 29, 2014 the sun released an X-class flare. It was observed by NASA's Interface Region Imaging Spectrograph, or IRIS; NASA's Solar Dynamics Observatory, or SDO; NASA's Reuven Ramaty High Energy Solar Spectroscopic Imager, or RHESSI; the Japanese Aerospace Exploration Agency's Hinode; and the National Solar Observatory's Dunn Solar Telescope located at Sacramento Peak in New Mexico. To have a record of such an intense flare from so many observatories is unprecedented. Such research can help scientists better understand what catalyst sets off these large explosions on the sun. Perhaps we may even some day be able to predict their onset and forewarn of the radio blackouts solar flares can cause near Earth - blackouts that can interfere with airplane, ship and military communications. Read more: 1.usa.gov/1kMDQbO Join our Google+ Hangout on May 8 at 2:30pm EST: go.nasa.gov/1mwbBEZ Credit: NASA Goddard NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
Web-based visualization of very large scientific astronomy imagery

NASA Astrophysics Data System (ADS)

Bertin, E.; Pillay, R.; Marmo, C.

2015-04-01

Visualizing and navigating through large astronomy images from a remote location with current astronomy display tools can be a frustrating experience in terms of speed and ergonomics, especially on mobile devices. In this paper, we present a high performance, versatile and robust client-server system for remote visualization and analysis of extremely large scientific images. Applications of this work include survey image quality control, interactive data query and exploration, citizen science, as well as public outreach. The proposed software is entirely open source and is designed to be generic and applicable to a variety of datasets. It provides access to floating point data at terabyte scales, with the ability to precisely adjust image settings in real-time. The proposed clients are light-weight, platform-independent web applications built on standard HTML5 web technologies and compatible with both touch and mouse-based devices. We put the system to the test and assess the performance of the system and show that a single server can comfortably handle more than a hundred simultaneous users accessing full precision 32 bit astronomy data.
Microbial bebop: creating music from complex dynamics in microbial ecology.

PubMed

Larsen, Peter; Gilbert, Jack

2013-01-01

In order for society to make effective policy decisions on complex and far-reaching subjects, such as appropriate responses to global climate change, scientists must effectively communicate complex results to the non-scientifically specialized public. However, there are few ways however to transform highly complicated scientific data into formats that are engaging to the general community. Taking inspiration from patterns observed in nature and from some of the principles of jazz bebop improvisation, we have generated Microbial Bebop, a method by which microbial environmental data are transformed into music. Microbial Bebop uses meter, pitch, duration, and harmony to highlight the relationships between multiple data types in complex biological datasets. We use a comprehensive microbial ecology, time course dataset collected at the L4 marine monitoring station in the Western English Channel as an example of microbial ecological data that can be transformed into music. Four compositions were generated (www.bio.anl.gov/MicrobialBebop.htm.) from L4 Station data using Microbial Bebop. Each composition, though deriving from the same dataset, is created to highlight different relationships between environmental conditions and microbial community structure. The approach presented here can be applied to a wide variety of complex biological datasets.
Organization of Biomedical Data for Collaborative Scientific Research: A Research Information Management System

PubMed Central

Myneni, Sahiti; Patel, Vimla L.

2010-01-01

Biomedical researchers often work with massive, detailed and heterogeneous datasets. These datasets raise new challenges of information organization and management for scientific interpretation, as they demand much of the researchers’ time and attention. The current study investigated the nature of the problems that researchers face when dealing with such data. Four major problems identified with existing biomedical scientific information management methods were related to data organization, data sharing, collaboration, and publications. Therefore, there is a compelling need to develop an efficient and user-friendly information management system to handle the biomedical research data. This study evaluated the implementation of an information management system, which was introduced as part of the collaborative research to increase scientific productivity in a research laboratory. Laboratory members seemed to exhibit frustration during the implementation process. However, empirical findings revealed that they gained new knowledge and completed specified tasks while working together with the new system. Hence, researchers are urged to persist and persevere when dealing with any new technology, including an information management system in a research laboratory environment. PMID:20543892

Organization of Biomedical Data for Collaborative Scientific Research: A Research Information Management System.

PubMed

Myneni, Sahiti; Patel, Vimla L

2010-06-01

Biomedical researchers often work with massive, detailed and heterogeneous datasets. These datasets raise new challenges of information organization and management for scientific interpretation, as they demand much of the researchers' time and attention. The current study investigated the nature of the problems that researchers face when dealing with such data. Four major problems identified with existing biomedical scientific information management methods were related to data organization, data sharing, collaboration, and publications. Therefore, there is a compelling need to develop an efficient and user-friendly information management system to handle the biomedical research data. This study evaluated the implementation of an information management system, which was introduced as part of the collaborative research to increase scientific productivity in a research laboratory. Laboratory members seemed to exhibit frustration during the implementation process. However, empirical findings revealed that they gained new knowledge and completed specified tasks while working together with the new system. Hence, researchers are urged to persist and persevere when dealing with any new technology, including an information management system in a research laboratory environment.
Geographic information system datasets of regolith-thickness data, regolith-thickness contours, raster-based regolith thickness, and aquifer-test and specific-capacity data for the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado

USGS Publications Warehouse

Arnold, L. Rick

2010-01-01

These datasets were compiled in support of U.S. Geological Survey Scientific-Investigations Report 2010-5082-Hydrogeology and Steady-State Numerical Simulation of Groundwater Flow in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. The datasets were developed by the U.S. Geological Survey in cooperation with the Lost Creek Ground Water Management District and the Colorado Geological Survey. The four datasets are described as follows and methods used to develop the datasets are further described in Scientific-Investigations Report 2010-5082: (1) ds507_regolith_data: This point dataset contains geologic information concerning regolith (unconsolidated sediment) thickness and top-of-bedrock altitude at selected well and test-hole locations in and near the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. Data were compiled from published reports, consultant reports, and from lithologic logs of wells and test holes on file with the U.S. Geological Survey Colorado Water Science Center and the Colorado Division of Water Resources. (2) ds507_regthick_contours: This dataset consists of contours showing generalized lines of equal regolith thickness overlying bedrock in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. Regolith thickness was contoured manually on the basis of information provided in the dataset ds507_regolith_data. (3) ds507_regthick_grid: This dataset consists of raster-based generalized thickness of regolith overlying bedrock in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. Regolith thickness in this dataset was derived from contours presented in the dataset ds507_regthick_contours. (4) ds507_welltest_data: This point dataset contains estimates of aquifer transmissivity and hydraulic conductivity at selected well locations in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. This dataset also contains hydrologic information used to estimate transmissivity from specific capacity at selected well locations. Data were compiled from published reports, consultant reports, and from well-test records on file with the U.S. Geological Survey Colorado Water Science Center and the Colorado Division of Water Resources.
Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data

PubMed Central

2013-01-01

Background The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. Results In this paper, we present our approach to the generation of self-describing machine-readable scholarly documents. We understand the scientific document as an entry point and interface to the Web of Data. We have semantically processed the full-text, open-access subset of PubMed Central. Our RDF model and resulting dataset make extensive use of existing ontologies and semantic enrichment services. We expose our model, services, prototype, and datasets at http://biotea.idiginfo.org/ Conclusions The semantic processing of biomedical literature presented in this paper embeds documents within the Web of Data and facilitates the execution of concept-based queries against the entire digital library. Our approach delivers a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. Our model delivers a semantically rich and highly interconnected dataset with self-describing content so that software can make effective use of it. PMID:23734622
Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

PubMed Central

Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

2011-01-01

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874
Comparative microbial modules resource: generation and visualization of multi-species biclusters.

PubMed

Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

2011-12-01

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.
The connectome viewer toolkit: an open source framework to manage, analyze, and visualize connectomes.

PubMed

Gerhard, Stephan; Daducci, Alessandro; Lemkaddem, Alia; Meuli, Reto; Thiran, Jean-Philippe; Hagmann, Patric

2011-01-01

Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/
The Connectome Viewer Toolkit: An Open Source Framework to Manage, Analyze, and Visualize Connectomes

PubMed Central

Gerhard, Stephan; Daducci, Alessandro; Lemkaddem, Alia; Meuli, Reto; Thiran, Jean-Philippe; Hagmann, Patric

2011-01-01

Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit – a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/ PMID:21713110
The FaceBase Consortium: a comprehensive resource for craniofacial researchers

PubMed Central

Brinkley, James F.; Fisher, Shannon; Harris, Matthew P.; Holmes, Greg; Hooper, Joan E.; Wang Jabs, Ethylin; Jones, Kenneth L.; Kesselman, Carl; Klein, Ophir D.; Maas, Richard L.; Marazita, Mary L.; Selleri, Licia; Spritz, Richard A.; van Bakel, Harm; Visel, Axel; Williams, Trevor J.; Wysocka, Joanna

2016-01-01

The FaceBase Consortium, funded by the National Institute of Dental and Craniofacial Research, National Institutes of Health, is designed to accelerate understanding of craniofacial developmental biology by generating comprehensive data resources to empower the research community, exploring high-throughput technology, fostering new scientific collaborations among researchers and human/computer interactions, facilitating hypothesis-driven research and translating science into improved health care to benefit patients. The resources generated by the FaceBase projects include a number of dynamic imaging modalities, genome-wide association studies, software tools for analyzing human facial abnormalities, detailed phenotyping, anatomical and molecular atlases, global and specific gene expression patterns, and transcriptional profiling over the course of embryonic and postnatal development in animal models and humans. The integrated data visualization tools, faceted search infrastructure, and curation provided by the FaceBase Hub offer flexible and intuitive ways to interact with these multidisciplinary data. In parallel, the datasets also offer unique opportunities for new collaborations and training for researchers coming into the field of craniofacial studies. Here, we highlight the focus of each spoke project and the integration of datasets contributed by the spokes to facilitate craniofacial research. PMID:27287806
Determining similarity of scientific entities in annotation datasets

PubMed Central

Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

2015-01-01

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057
Determining similarity of scientific entities in annotation datasets.

PubMed

Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

2015-01-01

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ © The Author(s) 2015. Published by Oxford University Press.
FoodMicrobionet: A database for the visualisation and exploration of food bacterial communities based on network analysis.

PubMed

Parente, Eugenio; Cocolin, Luca; De Filippis, Francesca; Zotta, Teresa; Ferrocino, Ilario; O'Sullivan, Orla; Neviani, Erasmo; De Angelis, Maria; Cotter, Paul D; Ercolini, Danilo

2016-02-16

Amplicon targeted high-throughput sequencing has become a popular tool for the culture-independent analysis of microbial communities. Although the data obtained with this approach are portable and the number of sequences available in public databases is increasing, no tool has been developed yet for the analysis and presentation of data obtained in different studies. This work describes an approach for the development of a database for the rapid exploration and analysis of data on food microbial communities. Data from seventeen studies investigating the structure of bacterial communities in dairy, meat, sourdough and fermented vegetable products, obtained by 16S rRNA gene targeted high-throughput sequencing, were collated and analysed using Gephi, a network analysis software. The resulting database, which we named FoodMicrobionet, was used to analyse nodes and network properties and to build an interactive web-based visualisation. The latter allows the visual exploration of the relationships between Operational Taxonomic Units (OTUs) and samples and the identification of core- and sample-specific bacterial communities. It also provides additional search tools and hyperlinks for the rapid selection of food groups and OTUs and for rapid access to external resources (NCBI taxonomy, digital versions of the original articles). Microbial interaction network analysis was carried out using CoNet on datasets extracted from FoodMicrobionet: the complexity of interaction networks was much lower than that found for other bacterial communities (human microbiome, soil and other environments). This may reflect both a bias in the dataset (which was dominated by fermented foods and starter cultures) and the lower complexity of food bacterial communities. Although some technical challenges exist, and are discussed here, the net result is a valuable tool for the exploration of food bacterial communities by the scientific community and food industry. Copyright © 2015. Published by Elsevier B.V.
Autonomous localisation of rovers for future planetary exploration

NASA Astrophysics Data System (ADS)

Bajpai, Abhinav

Future Mars exploration missions will have increasingly ambitious goals compared to current rover and lander missions. There will be a need for extremely long distance traverses over shorter periods of time. This will allow more varied and complex scientific tasks to be performed and increase the overall value of the missions. The missions may also include a sample return component, where items collected on the surface will be returned to a cache in order to be returned to Earth, for further study. In order to make these missions feasible, future rover platforms will require increased levels of autonomy, allowing them to operate without heavy reliance on a terrestrial ground station. Being able to autonomously localise the rover is an important element in increasing the rover's capability to independently explore. This thesis develops a Planetary Monocular Simultaneous Localisation And Mapping (PM-SLAM) system aimed specifically at a planetary exploration context. The system uses a novel modular feature detection and tracking algorithm called hybrid-saliency in order to achieve robust tracking, while maintaining low computational complexity in the SLAM filter. The hybrid saliency technique uses a combination of cognitive inspired saliency features with point-based feature descriptors as input to the SLAM filter. The system was tested on simulated datasets generated using the Planetary, Asteroid and Natural scene Generation Utility (PANGU) as well as two real world datasets which closely approximated images from a planetary environment. The system was shown to provide a higher accuracy of localisation estimate than a state-of-the-art VO system tested on the same data set. In order to be able to localise the rover absolutely, further techniques are investigated which attempt to determine the rover's position in orbital maps. Orbiter Mask Matching uses point-based features detected by the rover to associate descriptors with large features extracted from orbital imagery and stored in the rover memory prior the mission launch. A proof of concept is evaluated using a PANGU simulated boulder field.
Data publication - policies and procedures from the PREPARDE project

NASA Astrophysics Data System (ADS)

Callaghan, Sarah; Murphy, Fiona; Tedds, Jonathan; Kunze, John; Lawrence, Rebecca; Mayernik, , Matthew S.; Whyte, Angus; Roberts, Timothy

2013-04-01

Data are widely acknowledged as a first class scientific output. Increases in researchers' abilities to create data need to be matched by corresponding infrastructures for them to manage and share their data. At the same time, the quality and persistence of the datasets need to be ensured, providing the dataset creators with the recognition they deserve for their efforts. Formal publication of data takes advantage of the processes and procedures already in place to publish academic articles about scientific results, enabling data to be reviewed and more broadly disseminated. Data are vastly more varied in format than papers, and so the policies required to manage and publish data must take into account the complexities associated with different data types, scientific fields, licensing rules etc. The Peer REview for Publication & Accreditation of Research Data in the Earth sciences (PREPARDE) project is JISC- and NERC-funded, and aims to investigate the policies and procedures required for the formal publication of research data. The project is investigating the whole workflow of data publication, from ingestion into a data repository, through to formal publication in a data journal. To limit the scope of the project, the focus is primarily on the policies required for the Royal Meteorological Society and Wiley's Geoscience Data Journal, though members of the project team include representatives from the life sciences (F1000Research), and will generalise the policies to other disciplines. PREPARDE addresses key issues arising in the data publication paradigm, such as: what criteria are needed for a repository to be considered objectively trustworthy; how does one peer-review a dataset; and how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community and the completeness of the scientific record? To answer these questions, the project is hosting workshops addressing these issues, with interactions from key stakeholders, including data and repository managers, researchers, funders and publishers. The results of these workshops will be presented and further comment and interaction sought from interested parties.
WASP (Write a Scientific Paper) using Excel - 2: Pivot tables.

PubMed

Grech, Victor

2018-02-01

Data analysis at the descriptive stage and the eventual presentation of results requires the tabulation and summarisation of data. This exercise should always precede inferential statistics. Pivot tables and pivot charts are one of Excel's most powerful and underutilised features, with tabulation functions that immensely facilitate descriptive statistics. Pivot tables permit users to dynamically summarise and cross-tabulate data, create tables in several dimensions, offer a range of summary statistics and can be modified interactively with instant outputs. Large and detailed datasets are thereby easily manipulated making pivot tables arguably the best way to explore, summarise and present data from many different angles. This second paper in the WASP series in Early Human Development provides pointers for pivot table manipulation in Excel™. Copyright © 2018 Elsevier B.V. All rights reserved.
Xarray: multi-dimensional data analysis in Python

NASA Astrophysics Data System (ADS)

Hoyer, Stephan; Hamman, Joe; Maussion, Fabien

2017-04-01

xarray (http://xarray.pydata.org) is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays, which are the bread and butter of modern geoscientific data analysis. Key features of the package include label-based indexing and arithmetic, interoperability with the core scientific Python packages (e.g., pandas, NumPy, Matplotlib, Cartopy), out-of-core computation on datasets that don't fit into memory, a wide range of input/output options, and advanced multi-dimensional data manipulation tools such as group-by and resampling. In this contribution we will present the key features of the library and demonstrate its great potential for a wide range of applications, from (big-)data processing on super computers to data exploration in front of a classroom.
Efficiently Exploring Multilevel Data with Recursive Partitioning

ERIC Educational Resources Information Center

Martin, Daniel P.; von Oertzen, Timo; Rimm-Kaufman, Sara E.

2015-01-01

There is an increasing number of datasets with many participants, variables, or both, in education and other fields that often deal with large, multilevel data structures. Once initial confirmatory hypotheses are exhausted, it can be difficult to determine how best to explore the dataset to discover hidden relationships that could help to inform…
Authoring Tours of Geospatial Data With KML and Google Earth

NASA Astrophysics Data System (ADS)

Barcay, D. P.; Weiss-Malik, M.

2008-12-01

As virtual globes become widely adopted by the general public, the use of geospatial data has expanded greatly. With the popularization of Google Earth and other platforms, GIS systems have become virtual reality platforms. Using these platforms, a casual user can easily explore the world, browse massive data-sets, create powerful 3D visualizations, and share those visualizations with millions of people using the KML language. This technology has raised the bar for professionals and academics alike. It is now expected that studies and projects will be accompanied by compelling, high-quality visualizations. In this new landscape, a presentation of geospatial data can be the most effective form of advertisement for a project: engaging both the general public and the scientific community in a unified interactive experience. On the other hand, merely dumping a dataset into a virtual globe can be a disorienting, alienating experience for many users. To create an effective, far-reaching presentation, an author must take care to make their data approachable to a wide variety of users with varying knowledge of the subject matter, expertise in virtual globes, and attention spans. To that end, we present techniques for creating self-guided interactive tours of data represented in KML and visualized in Google Earth. Using these methods, we provide the ability to move the camera through the world while dynamically varying the content, style, and visibility of the displayed data. Such tours can automatically guide users through massive, complex datasets: engaging a broad user-base, and conveying subtle concepts that aren't immediately apparent when viewing the raw data. To the casual user these techniques result in an extremely compelling experience similar to watching video. Unlike video though, these techniques maintain the rich interactive environment provided by the virtual globe, allowing users to explore the data in detail and to add other data sources to the presentation.
Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database

PubMed Central

Davis, Allan Peter; Wiegers, Thomas C.; King, Benjamin L.; Wiegers, Jolene; Grondin, Cynthia J.; Sciaky, Daniela; Johnson, Robin J.; Mattingly, Carolyn J.

2016-01-01

Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD’s gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects. PMID:27171405
Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

PubMed

Davis, Allan Peter; Wiegers, Thomas C; King, Benjamin L; Wiegers, Jolene; Grondin, Cynthia J; Sciaky, Daniela; Johnson, Robin J; Mattingly, Carolyn J

2016-01-01

Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects.
Open NASA Earth Exchange (OpenNEX): Strategies for enabling cross organization collaboration in the earth sciences

NASA Astrophysics Data System (ADS)

Michaelis, A.; Ganguly, S.; Nemani, R. R.; Votava, P.; Wang, W.; Lee, T. J.; Dungan, J. L.

2014-12-01

Sharing community-valued codes, intermediary datasets and results from individual efforts with others that are not in a direct funded collaboration can be a challenge. Cross organization collaboration is often impeded due to infrastructure security constraints, rigid financial controls, bureaucracy, and workforce nationalities, etc., which can force groups to work in a segmented fashion and/or through awkward and suboptimal web services. We show how a focused community may come together, share modeling and analysis codes, computing configurations, scientific results, knowledge and expertise on a public cloud platform; diverse groups of researchers working together at "arms length". Through the OpenNEX experimental workshop, users can view short technical "how-to" videos and explore encapsulated working environment. Workshop participants can easily instantiate Amazon Machine Images (AMI) or launch full cluster and data processing configurations within minutes. Enabling users to instantiate computing environments from configuration templates on large public cloud infrastructures, such as Amazon Web Services, may provide a mechanism for groups to easily use each others work and collaborate indirectly. Moreover, using the public cloud for this workshop allowed a single group to host a large read only data archive, making datasets of interest to the community widely available on the public cloud, enabling other groups to directly connect to the data and reduce the costs of the collaborative work by freeing other individual groups from redundantly retrieving, integrating or financing the storage of the datasets of interest.

Decibel: The Relational Dataset Branching System

PubMed Central

Maddox, Michael; Goehring, David; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya; Deshpande, Amol

2017-01-01

As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs. PMID:28149668
SPHEREx: Probing the Physics of Inflation with an All-Sky Spectroscopic Galaxy Survey

NASA Astrophysics Data System (ADS)

Dore, Olivier; SPHEREx Science Team

2018-01-01

SPHEREx, a mission in NASA's Medium Explorer (MIDEX) program that was selected for Phase A in August 2017, is an all-sky survey satellite designed to address all three science goals in NASA’s astrophysics division: probe the origin and destiny of our Universe; explore whether planets around other stars could harbor life; and explore the origin and evolution of galaxies. These themes are addressed by a single survey, with a single instrument.In this poster, we describe how SPHEREx can probe the physics of inflationary non-Gaussianity by measuring large-scale structure with galaxy redshifts over a large cosmological volume at low redshifts, complementing high-redshift surveys optimized to constrain dark energy.SPHEREx will be the first all-sky near-infrared spectral survey, creating a legacy archive of spectra. In particular, it will measure the redshifts of over 500 million galaxies of all types, an unprecedented dataset. Using this catalog, SPHEREx will reduce the uncertainty in fNL -- a parameter describing the inflationary initial conditions -- by a factor of more than 10 compared with CMB measurements. At the same time, this catalog will enable strong scientific synergies with Euclid, WFIRST and LSST
Highly comparative time-series analysis: the empirical structure of time series and their methods.

PubMed

Fulcher, Ben D; Little, Max A; Jones, Nick S

2013-06-06

The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording and analysing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series, and over 9000 time-series analysis algorithms are analysed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heartbeat intervals, speech signals and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.
Highly comparative time-series analysis: the empirical structure of time series and their methods

PubMed Central

Fulcher, Ben D.; Little, Max A.; Jones, Nick S.

2013-01-01

The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording and analysing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series, and over 9000 time-series analysis algorithms are analysed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heartbeat intervals, speech signals and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines. PMID:23554344
Omicseq: a web-based search engine for exploring omics datasets

PubMed Central

Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

2017-01-01

Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
SATORI: a system for ontology-guided visual exploration of biomedical data repositories.

PubMed

Lekschas, Fritz; Gehlenborg, Nils

2018-04-01

The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.
Exploring Relationships in Big Data

NASA Astrophysics Data System (ADS)

Mahabal, A.; Djorgovski, S. G.; Crichton, D. J.; Cinquini, L.; Kelly, S.; Colbert, M. A.; Kincaid, H.

2015-12-01

Big Data are characterized by several different 'V's. Volume, Veracity, Volatility, Value and so on. For many datasets inflated Volumes through redundant features often make the data more noisy and difficult to extract Value out of. This is especially true if one is comparing/combining different datasets, and the metadata are diverse. We have been exploring ways to exploit such datasets through a variety of statistical machinery, and visualization. We show how we have applied it to time-series from large astronomical sky-surveys. This was done in the Virtual Observatory framework. More recently we have been doing similar work for a completely different domain viz. biology/cancer. The methodology reuse involves application to diverse datasets gathered through the various centers associated with the Early Detection Research Network (EDRN) for cancer, an initiative of the National Cancer Institute (NCI). Application to Geo datasets is a natural extension.
Keeping Research Data from the Continental Deep Drilling Programme (KTB) Accessible and Taking First Steps Towards Digital Preservation

NASA Astrophysics Data System (ADS)

Klump, J. F.; Ulbricht, D.; Conze, R.

2014-12-01

The Continental Deep Drilling Programme (KTB) was a scientific drilling project from 1987 to 1995 near Windischeschenbach, Bavaria. The main super-deep borehole reached a depth of 9,101 meters into the Earth's continental crust. The project used the most current equipment for data capture and processing. After the end of the project key data were disseminated through the web portal of the International Continental Scientific Drilling Program (ICDP). The scientific reports were published as printed volumes. As similar projects have also experienced, it becomes increasingly difficult to maintain a data portal over a long time. Changes in software and underlying hardware make a migration of the entire system inevitable. Around 2009 the data presented on the ICDP web portal were migrated to the Scientific Drilling Database (SDDB) and published through DataCite using Digital Object Identifiers (DOI) as persistent identifiers. The SDDB portal used a relational database with a complex data model to store data and metadata. A PHP-based Content Management System with custom modifications made it possible to navigate and browse datasets using the metadata and then download datasets. The data repository software eSciDoc allows storing self-contained packages consistent with the OAIS reference model. Each package consists of binary data files and XML-metadata. Using a REST-API the packages can be stored in the eSciDoc repository and can be searched using the XML-metadata. During the last maintenance cycle of the SDDB the data and metadata were migrated into the eSciDoc repository. Discovery metadata was generated following the GCMD-DIF, ISO19115 and DataCite schemas. The eSciDoc repository allows to store an arbitrary number of XML-metadata records with each data object. In addition to descriptive metadata each data object may contain pointers to related materials, such as IGSN-metadata to link datasets to physical specimens, or identifiers of literature interpreting the data. Datasets are presented by XSLT-stylesheet transformation using the stored metadata. The presentation shows several migration cycles of data and metadata, which were driven by aging software systems. Currently the datasets reside as self-contained entities in a repository system that is ready for digital preservation.
The tragedy of the biodiversity data commons: a data impediment creeping nigher?

PubMed Central

Galicia, David; Ariño, Arturo H

2018-01-01

Abstract Researchers are embracing the open access movement to facilitate unrestricted availability of scientific results. One sign of this willingness is the steady increase in data freely shared online, which has prompted a corresponding increase in the number of papers using such data. Publishing datasets is a time-consuming process that is often seen as a courtesy, rather than a necessary step in the research process. Making data accessible allows further research, provides basic information for decision-making and contributes to transparency in science. Nevertheless, the ease of access to heaps of data carries a perception of ‘free lunch for all’, and the work of data publishers is largely going unnoticed. Acknowledging such a significant effort involving the creation, management and publication of a dataset remains a flimsy, not well established practice in the scientific community. In a meta-analysis of published literature, we have observed various dataset citation practices, but mostly (92%) consisting of merely citing the data repository rather than the data publisher. Failing to recognize the work of data publishers might lead to a decrease in the number of quality datasets shared online, compromising potential research that is dependent on the availability of such data. We make an urgent appeal to raise awareness about this issue. PMID:29688384
Discovering new methods of data fusion, visualization, and analysis in 3D immersive environments for hyperspectral and laser altimetry data

NASA Astrophysics Data System (ADS)

Moore, C. A.; Gertman, V.; Olsoy, P.; Mitchell, J.; Glenn, N. F.; Joshi, A.; Norpchen, D.; Shrestha, R.; Pernice, M.; Spaete, L.; Grover, S.; Whiting, E.; Lee, R.

2011-12-01

Immersive virtual reality environments such as the IQ-Station or CAVE° (Cave Automated Virtual Environment) offer new and exciting ways to visualize and explore scientific data and are powerful research and educational tools. Combining remote sensing data from a range of sensor platforms in immersive 3D environments can enhance the spectral, textural, spatial, and temporal attributes of the data, which enables scientists to interact and analyze the data in ways never before possible. Visualization and analysis of large remote sensing datasets in immersive environments requires software customization for integrating LiDAR point cloud data with hyperspectral raster imagery, the generation of quantitative tools for multidimensional analysis, and the development of methods to capture 3D visualizations for stereographic playback. This study uses hyperspectral and LiDAR data acquired over the China Hat geologic study area near Soda Springs, Idaho, USA. The data are fused into a 3D image cube for interactive data exploration and several methods of recording and playback are investigated that include: 1) creating and implementing a Virtual Reality User Interface (VRUI) patch configuration file to enable recording and playback of VRUI interactive sessions within the CAVE and 2) using the LiDAR and hyperspectral remote sensing data and GIS data to create an ArcScene 3D animated flyover, where left- and right-eye visuals are captured from two independent monitors for playback in a stereoscopic player. These visualizations can be used as outreach tools to demonstrate how integrated data and geotechnology techniques can help scientists see, explore, and more adequately comprehend scientific phenomena, both real and abstract.
Publishing and Editing of Semantically-Enabled Scientific Metadata Across Multiple Web Platforms: Challenges and Experiences

NASA Astrophysics Data System (ADS)

Patton, E. W.; West, P.; Greer, R.; Jin, B.

2011-12-01

Following on work presented at the 2010 AGU Fall Meeting, we present a number of real-world collections of semantically-enabled scientific metadata ingested into the Tetherless World RDF2HTML system as structured data and presented and edited using that system. Two separate datasets from two different domains (oceanography and solar sciences) are made available using existing web standards and services, e.g. encoded using ontologies represented with the Web Ontology Language (OWL) and stored in a SPARQL endpoint for querying. These datasets are deployed for use in three different web environments, i.e. Drupal, MediaWiki, and a custom web portal written in Java, to highlight the cross-platform nature of the data presentation. Stylesheets used to transform concepts in each domain as well as shared terms into HTML will be presented to show the power of using common ontologies to publish data and support reuse of existing terminologies. In addition, a single domain dataset is shared between two separate portal instances to demonstrate the ability for this system to offer distributed access and modification of content across the Internet. Lastly, we will highlight challenges that arose in the software engineering process, outline the design choices we made in solving those issues, and discuss how future improvements to this and other systems will enable the evolution of distributed, decentralized collaborations for scientific data sharing across multiple research groups.
Managing Astronomy Research Data: Data Practices in the Sloan Digital Sky Survey and Large Synoptic Survey Telescope Projects

ERIC Educational Resources Information Center

Sands, Ashley Elizabeth

2017-01-01

Ground-based astronomy sky surveys are massive, decades-long investments in scientific data collection. Stakeholders expect these datasets to retain scientific value well beyond the lifetime of the sky survey. However, the necessary investments in knowledge infrastructures for managing sky survey data are not yet in place to ensure the long-term…
Segmentation of Unstructured Datasets

NASA Technical Reports Server (NTRS)

Bhat, Smitha

1996-01-01

Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.
Modern data science for analytical chemical data - A comprehensive review.

PubMed

Szymańska, Ewa

2018-10-22

Efficient and reliable analysis of chemical analytical data is a great challenge due to the increase in data size, variety and velocity. New methodologies, approaches and methods are being proposed not only by chemometrics but also by other data scientific communities to extract relevant information from big datasets and provide their value to different applications. Besides common goal of big data analysis, different perspectives and terms on big data are being discussed in scientific literature and public media. The aim of this comprehensive review is to present common trends in the analysis of chemical analytical data across different data scientific fields together with their data type-specific and generic challenges. Firstly, common data science terms used in different data scientific fields are summarized and discussed. Secondly, systematic methodologies to plan and run big data analysis projects are presented together with their steps. Moreover, different analysis aspects like assessing data quality, selecting data pre-processing strategies, data visualization and model validation are considered in more detail. Finally, an overview of standard and new data analysis methods is provided and their suitability for big analytical chemical datasets shortly discussed. Copyright © 2018 Elsevier B.V. All rights reserved.
Management and assimilation of diverse, distributed watershed datasets

NASA Astrophysics Data System (ADS)

Varadharajan, C.; Faybishenko, B.; Versteeg, R.; Agarwal, D.; Hubbard, S. S.; Hendrix, V.

2016-12-01

The U.S. Department of Energy's (DOE) Watershed Function Scientific Focus Area (SFA) seeks to determine how perturbations to mountainous watersheds (e.g., floods, drought, early snowmelt) impact the downstream delivery of water, nutrients, carbon, and metals over seasonal to decadal timescales. We are building a software platform that enables integration of diverse and disparate field, laboratory, and simulation datasets, of various types including hydrological, geological, meteorological, geophysical, geochemical, ecological and genomic datasets across a range of spatial and temporal scales within the Rifle floodplain and the East River watershed, Colorado. We are using agile data management and assimilation approaches, to enable web-based integration of heterogeneous, multi-scale dataSensor-based observations of water-level, vadose zone and groundwater temperature, water quality, meteorology as well as biogeochemical analyses of soil and groundwater samples have been curated and archived in federated databases. Quality Assurance and Quality Control (QA/QC) are performed on priority datasets needed for on-going scientific analyses, and hydrological and geochemical modeling. Automated QA/QC methods are used to identify and flag issues in the datasets. Data integration is achieved via a brokering service that dynamically integrates data from distributed databases via web services, based on user queries. The integrated results are presented to users in a portal that enables intuitive search, interactive visualization and download of integrated datasets. The concepts, approaches and codes being used are shared across various data science components of various large DOE-funded projects such as the Watershed Function SFA, Next Generation Ecosystem Experiment (NGEE) Tropics, Ameriflux/FLUXNET, and Advanced Simulation Capability for Environmental Management (ASCEM), and together contribute towards DOE's cyberinfrastructure for data management and model-data integration.
PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Pattern

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gong, Zhenhuan; Boyuka, David; Zou, X

Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less
Existing Instrumentation and Scientific Drivers for a Subduction Zone Observatory in Latin America

NASA Astrophysics Data System (ADS)

Frassetto, A.; Woodward, R.; Detrick, R. S.

2015-12-01

The subduction zones along the western shore of the Americas provide numerous societally relevant scientific questions that have yet to be fully explored and would make an excellent target for a comprehensive, integrated Subduction Zone Observatory (SZO). Further, recent discussions in Latin America indicate that there are a large number of existing stations that could serve as a backbone for an SZO. Such preexisting geophysical infrastructure commonly plays a vital role in new science initiatives, from small PI-led experiments to the establishment of the USArray Transportable Array, Reference Network, Cascadia Amphibious Array, and the redeployment of EarthScope Transportable Array stations to Alaska. Creating an SZO along the western coast of the Americas could strongly leverage the portfolio of existing seismic and geodetic stations across regions of interest. In this presentation, we will discuss the concept and experience of leveraging existing infrastructure in major new observational programs, outline the state of geophysical networks in the Americas (emphasizing current seismic networks but also looking back on historical temporary deployments), and provide an overview of potential scientific targets in the Americas that encompass a sampling of recently produced research results and datasets. Additionally, we will reflect on strategies for establishing meaningful collaborations across Latin America, an aspect that will be critical to the international partnerships, and associated capacity building, needed for a successful SZO initiative.
SADI, SHARE, and the in silico scientific method

PubMed Central

2010-01-01

Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986
The health care and life sciences community profile for dataset descriptions

PubMed Central

Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko

2016-01-01

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. PMID:27602295
Use of Electronic Health-Related Datasets in Nursing and Health-Related Research.

PubMed

Al-Rawajfah, Omar M; Aloush, Sami; Hewitt, Jeanne Beauchamp

2015-07-01

Datasets of gigabyte size are common in medical sciences. There is increasing consensus that significant untapped knowledge lies hidden in these large datasets. This review article aims to discuss Electronic Health-Related Datasets (EHRDs) in terms of types, features, advantages, limitations, and possible use in nursing and health-related research. Major scientific databases, MEDLINE, ScienceDirect, and Scopus, were searched for studies or review articles regarding using EHRDs in research. A total number of 442 articles were located. After application of study inclusion criteria, 113 articles were included in the final review. EHRDs were categorized into Electronic Administrative Health-Related Datasets and Electronic Clinical Health-Related Datasets. Subcategories of each major category were identified. EHRDs are invaluable assets for nursing the health-related research. Advanced research skills such as using analytical softwares, advanced statistical procedures, dealing with missing data and missing variables will maximize the efficient utilization of EHRDs in research. © The Author(s) 2014.

A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity.

PubMed

Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

2016-03-15

We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103-189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven's Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association.
CheS-Mapper - Chemical Space Mapping and Visualization in 3D.

PubMed

Gütlein, Martin; Karwath, Andreas; Kramer, Stefan

2012-03-17

Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis.
CheS-Mapper - Chemical Space Mapping and Visualization in 3D

PubMed Central

2012-01-01

Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis. PMID:22424447
Data always getting bigger -- A scalable DOI architecture for big and expanding scientific data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prakash, Giri; Shrestha, Biva; Younkin, Katarina

The Atmospheric Radiation Measurement (ARM) Data Archive established a data citation strategy based on Digital Object Identifiers (DOIs) for the ARM datasets in order to facilitate citing continuous and diverse ARM datasets in articles and other papers. This strategy eases the tracking of data provided as supplements to articles and papers. Additionally, it allows future data users and the ARM Climate Research Facility to easily locate the exact data used in various articles. Traditionally, DOIs are assigned to individual digital objects (a report or a data table), but for ARM datasets, these DOIs are assigned to an ARM data product.more » This eliminates the need for creating DOIs for numerous components of the ARM data product, in turn making it easier for users to manage and cite the ARM data with fewer DOIs. In addition, the ARM data infrastructure team, with input from scientific users, developed a citation format and an online data citation generation tool for continuous data streams. As a result, this citation format includes DOIs along with additional details such as spatial and temporal information.« less
Data always getting bigger -- A scalable DOI architecture for big and expanding scientific data

DOE PAGES

Prakash, Giri; Shrestha, Biva; Younkin, Katarina; ...

2016-08-31

The Atmospheric Radiation Measurement (ARM) Data Archive established a data citation strategy based on Digital Object Identifiers (DOIs) for the ARM datasets in order to facilitate citing continuous and diverse ARM datasets in articles and other papers. This strategy eases the tracking of data provided as supplements to articles and papers. Additionally, it allows future data users and the ARM Climate Research Facility to easily locate the exact data used in various articles. Traditionally, DOIs are assigned to individual digital objects (a report or a data table), but for ARM datasets, these DOIs are assigned to an ARM data product.more » This eliminates the need for creating DOIs for numerous components of the ARM data product, in turn making it easier for users to manage and cite the ARM data with fewer DOIs. In addition, the ARM data infrastructure team, with input from scientific users, developed a citation format and an online data citation generation tool for continuous data streams. As a result, this citation format includes DOIs along with additional details such as spatial and temporal information.« less
A social-ecological database to advance research on infrastructure development impacts in the Brazilian Amazon.

PubMed

Tucker Lima, Joanna M; Valle, Denis; Moretto, Evandro Mateus; Pulice, Sergio Mantovani Paiva; Zuca, Nadia Lucia; Roquetti, Daniel Rondinelli; Beduschi, Liviam Elizabeth Cordeiro; Praia, Amanda Salles; Okamoto, Claudia Parucce Franco; da Silva Carvalhaes, Vinicius Leite; Branco, Evandro Albiach; Barbezani, Bruna; Labandera, Emily; Timpe, Kelsie; Kaplan, David

2016-08-30

Recognized as one of the world's most vital natural and cultural resources, the Amazon faces a wide variety of threats from natural resource and infrastructure development. Within this context, rigorous scientific study of the region's complex social-ecological system is critical to inform and direct decision-making toward more sustainable environmental and social outcomes. Given the Amazon's tightly linked social and ecological components and the scope of potential development impacts, effective study of this system requires an easily accessible resource that provides a broad and reliable data baseline. This paper brings together multiple datasets from diverse disciplines (including human health, socio-economics, environment, hydrology, and energy) to provide investigators with a variety of baseline data to explore the multiple long-term effects of infrastructure development in the Brazilian Amazon.
A social-ecological database to advance research on infrastructure development impacts in the Brazilian Amazon

PubMed Central

Tucker Lima, Joanna M.; Valle, Denis; Moretto, Evandro Mateus; Pulice, Sergio Mantovani Paiva; Zuca, Nadia Lucia; Roquetti, Daniel Rondinelli; Beduschi, Liviam Elizabeth Cordeiro; Praia, Amanda Salles; Okamoto, Claudia Parucce Franco; da Silva Carvalhaes, Vinicius Leite; Branco, Evandro Albiach; Barbezani, Bruna; Labandera, Emily; Timpe, Kelsie; Kaplan, David

2016-01-01

Recognized as one of the world’s most vital natural and cultural resources, the Amazon faces a wide variety of threats from natural resource and infrastructure development. Within this context, rigorous scientific study of the region’s complex social-ecological system is critical to inform and direct decision-making toward more sustainable environmental and social outcomes. Given the Amazon’s tightly linked social and ecological components and the scope of potential development impacts, effective study of this system requires an easily accessible resource that provides a broad and reliable data baseline. This paper brings together multiple datasets from diverse disciplines (including human health, socio-economics, environment, hydrology, and energy) to provide investigators with a variety of baseline data to explore the multiple long-term effects of infrastructure development in the Brazilian Amazon. PMID:27575915
Omicseq: a web-based search engine for exploring omics datasets.

PubMed

Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

2017-07-03

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Artificial intelligence support for scientific model-building

NASA Technical Reports Server (NTRS)

Keller, Richard M.

1992-01-01

Scientific model-building can be a time-intensive and painstaking process, often involving the development of large and complex computer programs. Despite the effort involved, scientific models cannot easily be distributed and shared with other scientists. In general, implemented scientific models are complex, idiosyncratic, and difficult for anyone but the original scientific development team to understand. We believe that artificial intelligence techniques can facilitate both the model-building and model-sharing process. In this paper, we overview our effort to build a scientific modeling software tool that aids the scientist in developing and using models. This tool includes an interactive intelligent graphical interface, a high-level domain specific modeling language, a library of physics equations and experimental datasets, and a suite of data display facilities.
Intensification and Structure Change of Super Typhoon Flo as Related to the Large-Scale Environment.

DTIC Science & Technology

1998-06-01

large dataset is a challenge. Schiavone and Papathomas (1990) summarize methods currently available for visualizing scientific 116 datasets. These...Prediction and Dynamic Meteorology, Second Edition. John Wiley and Sons, 477 pp. Hardy, R. L., 1971: Multiquadric equations of topography and other...Inter. Corp., Monterey CA, 40 pp. Sawyer, J. S., 1947: Notes on the theory of tropical cyclones. Quart. J. Roy. Meteor. Soc, 73, 101-126. Schiavone
VisIVO: A Library and Integrated Tools for Large Astrophysical Dataset Exploration

NASA Astrophysics Data System (ADS)

Becciani, U.; Costa, A.; Ersotelos, N.; Krokos, M.; Massimino, P.; Petta, C.; Vitello, F.

2012-09-01

VisIVO provides an integrated suite of tools and services that can be used in many scientific fields. VisIVO development starts in the Virtual Observatory framework. VisIVO allows users to visualize meaningfully highly-complex, large-scale datasets and create movies of these visualizations based on distributed infrastructures. VisIVO supports high-performance, multi-dimensional visualization of large-scale astrophysical datasets. Users can rapidly obtain meaningful visualizations while preserving full and intuitive control of the relevant parameters. VisIVO consists of VisIVO Desktop - a stand-alone application for interactive visualization on standard PCs, VisIVO Server - a platform for high performance visualization, VisIVO Web - a custom designed web portal, VisIVOSmartphone - an application to exploit the VisIVO Server functionality and the latest VisIVO features: VisIVO Library allows a job running on a computational system (grid, HPC, etc.) to produce movies directly with the code internal data arrays without the need to produce intermediate files. This is particularly important when running on large computational facilities, where the user wants to have a look at the results during the data production phase. For example, in grid computing facilities, images can be produced directly in the grid catalogue while the user code is running in a system that cannot be directly accessed by the user (a worker node). The deployment of VisIVO on the DG and gLite is carried out with the support of EDGI and EGI-Inspire projects. Depending on the structure and size of datasets under consideration, the data exploration process could take several hours of CPU for creating customized views and the production of movies could potentially last several days. For this reason an MPI parallel version of VisIVO could play a fundamental role in increasing performance, e.g. it could be automatically deployed on nodes that are MPI aware. A central concept in our development is thus to produce unified code that can run either on serial nodes or in parallel by using HPC oriented grid nodes. Another important aspect, to obtain as high performance as possible, is the integration of VisIVO processes with grid nodes where GPUs are available. We have selected CUDA for implementing a range of computationally heavy modules. VisIVO is supported by EGI-Inspire, EDGI and SCI-BUS projects.
Exploring Global Exposure Factors Resources URLs

EPA Pesticide Factsheets

The dataset is a compilation of hyperlinks (URLs) for resources (databases, compendia, published articles, etc.) useful for exposure assessment specific to consumer product use.This dataset is associated with the following publication:Zaleski, R., P. Egeghy, and P. Hakkinen. Exploring Global Exposure Factors Resources for Use in Consumer Exposure Assessments. International Journal of Environmental Research and Public Health. Molecular Diversity Preservation International, Basel, SWITZERLAND, 13(7): 744, (2016).
VIPER: a visualisation tool for exploring inheritance inconsistencies in genotyped pedigrees

PubMed Central

2012-01-01

Background Pedigree genotype datasets are used for analysing genetic inheritance and to map genetic markers and traits. Such datasets consist of hundreds of related animals genotyped for thousands of genetic markers and invariably contain multiple errors in both the pedigree structure and in the associated individual genotype data. These errors manifest as apparent inheritance inconsistencies in the pedigree, and invalidate analyses of marker inheritance patterns across the dataset. Cleaning raw datasets of bad data points (incorrect pedigree relationships, unreliable marker assays, suspect samples, bad genotype results etc.) requires expert exploration of the patterns of exposed inconsistencies in the context of the inheritance pedigree. In order to assist this process we are developing VIPER (Visual Pedigree Explorer), a software tool that integrates an inheritance-checking algorithm with a novel space-efficient pedigree visualisation, so that reported inheritance inconsistencies are overlaid on an interactive, navigable representation of the pedigree structure. Methods and results This paper describes an evaluation of how VIPER displays the different scales and types of dataset that occur experimentally, with a description of how VIPER's display interface and functionality meet the challenges presented by such data. We examine a range of possible error types found in real and simulated pedigree genotype datasets, demonstrating how these errors are exposed and explored using the VIPER interface and we evaluate the utility and usability of the interface to the domain expert. Evaluation was performed as a two stage process with the assistance of domain experts (geneticists). The initial evaluation drove the iterative implementation of further features in the software prototype, as required by the users, prior to a final functional evaluation of the pedigree display for exploring the various error types, data scales and structures. Conclusions The VIPER display was shown to effectively expose the range of errors found in experimental genotyped pedigrees, allowing users to explore the underlying causes of reported inheritance inconsistencies. This interface will provide the basis for a full data cleaning tool that will allow the user to remove isolated bad data points, and reversibly test the effect of removing suspect genotypes and pedigree relationships. PMID:22607476
WDS Trusted Data Services in Support of International Science

NASA Astrophysics Data System (ADS)

Mokrane, M.; Minster, J. B. H.

2014-12-01

Today's research is international, transdisciplinary, and data-enabled, which requires scrupulous data stewardship, full and open access to data, and efficient collaboration and coordination. New expectations on researchers based on policies from governments and funders to share data fully, openly, and in a timely manner present significant challenges but are also opportunities to improve the quality and efficiency of research and its accountability to society. Researchers should be able to archive and disseminate data as required by many institutions or funders, and civil society to scrutinize datasets underlying public policies. Thus, the trustworthiness of data services must be verifiable. In addition, the need to integrate large and complex datasets across disciplines and domains with variable levels of maturity calls for greater coordination to achieve sufficient interoperability and sustainability. The World Data System (WDS) of the International Council for Science (ICSU) promotes long-term stewardship of, and universal and equitable access to, quality-assured scientific data and services across a range of disciplines in the natural and social sciences. WDS aims at coordinating and supporting trusted scientific data services for the provision, use, and preservation of relevant datasets to facilitate scientific research, in particular under the ICSU umbrella, while strengthening their links with the research community. WDS certifies it Members, holders and providers of data or data products, using internationally recognized standards. Thus, providing the building blocks of a searchable common infrastructure, from which a data system that is both interoperable and distributed can be formed. This presentation will describe the coordination role of WDS and more specifically activities developed by its Scientific Committee to: Improve and stimulate basic level Certification for Scientific Data Services, in particular through collaboration with the Data Seal of Approval. Identify and define best practices for Publishing Data and to test their implementation by involving the core stakeholders i.e. researchers, institutions, data centres, scholarly publishers, and funders. Establish an open WDS Metadata Catalogue, Knowledge Network, and Global Registry of Trusted Data Services.
SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data

NASA Astrophysics Data System (ADS)

Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.

2015-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.
Data publication activities in the Natural Environment Research Council

NASA Astrophysics Data System (ADS)

Leadbetter, A.; Callaghan, S.; Lowry, R.; Moncoiffé, G.; Donnegan, S.; Pepler, S.; Cunningham, N.; Kirsch, P.; Ault, L.; Bell, P.; Bowie, R.; Harrison, K.; Smith-Haddon, B.; Wetherby, A.; Wright, D.; Thorley, M.

2012-04-01

The Natural Environment Research Council (NERC) is implementing its Science Information Strategy in order to provide a world class service to deliver integrated data for earth system science. One project within this strategy is Data Citation and Publication, which aims to put the promotion and recognition stages of the data lifecycle into place alongside the traditional data management activities of NERC's Environmental Data Centres (EDCs). The NERC EDCs have made a distinction between the serving of data and its publication. Data serving is defined in this case as the day-to-day data management tasks of: • acquiring data and metadata from the originating scientists; • metadata and format harmonisation prior to database ingestion; • ensuring the metadata is adequate and accurate and that the data are available in appropriate file formats; • and making the data available for interested parties. Whereas publication: • requires the assignment of a digital object identifier to a dataset which guarantees that an EDC has assessed the quality of the metadata and the file format and will maintain an unchanged version of the data for the foreseeable future • requires the peer-review of the scientific quality of the data by a scientist with knowledge of the scientific domain in which the data were collected, using a framework for peer-review of datasets such as that developed by the CLADDIER project. • requires collaboration with journal publishers who have access to a well established peer-review system The first of these requirements can be managed in-house by the EDCs, while the remainder require collaboration with the wider scientific and publishing communities. It is anticipated that a scientist may achieve a lower level of academic credit for a dataset which is assigned a DOI but does not follow through to the scientific peer-review stage, similar to publication in a report or other non-peer reviewed publication normally described as grey literature, or in a conference proceedings. At the time of writing, the project has successfully assigned DOIs to more than ten legacy datasets held by EDCs through the British Library acting on behalf of the DataCite network. The project is in the process of developing guidelines for which datasets are suitable for submission to an EDC by a scientist wishing to receive a DOI for their data. While maintaining a United Kingdom focus, this project is not operating in isolation as its members are working alongside international groups such as the CODATA-ICSTI Task Group on Data Citations, the DataCite Working Group on Criteria for Datacentres, and the joint Scientific Commission for Oceanography / International Oceanographic Data and Information Exchange / Marine Biological Laboratory, Woods Hole Oceanographic Institution Library working group on data publication.
Nursing Theory, Terminology, and Big Data: Data-Driven Discovery of Novel Patterns in Archival Randomized Clinical Trial Data.

PubMed

Monsen, Karen A; Kelechi, Teresa J; McRae, Marion E; Mathiason, Michelle A; Martin, Karen S

The growth and diversification of nursing theory, nursing terminology, and nursing data enable a convergence of theory- and data-driven discovery in the era of big data research. Existing datasets can be viewed through theoretical and terminology perspectives using visualization techniques in order to reveal new patterns and generate hypotheses. The Omaha System is a standardized terminology and metamodel that makes explicit the theoretical perspective of the nursing discipline and enables terminology-theory testing research. The purpose of this paper is to illustrate the approach by exploring a large research dataset consisting of 95 variables (demographics, temperature measures, anthropometrics, and standardized instruments measuring quality of life and self-efficacy) from a theory-based perspective using the Omaha System. Aims were to (a) examine the Omaha System dataset to understand the sample at baseline relative to Omaha System problem terms and outcome measures, (b) examine relationships within the normalized Omaha System dataset at baseline in predicting adherence, and (c) examine relationships within the normalized Omaha System dataset at baseline in predicting incident venous ulcer. Variables from a randomized clinical trial of a cryotherapy intervention for the prevention of venous ulcers were mapped onto Omaha System terms and measures to derive a theoretical framework for the terminology-theory testing study. The original dataset was recoded using the mapping to create an Omaha System dataset, which was then examined using visualization to generate hypotheses. The hypotheses were tested using standard inferential statistics. Logistic regression was used to predict adherence and incident venous ulcer. Findings revealed novel patterns in the psychosocial characteristics of the sample that were discovered to be drivers of both adherence (Mental health Behavior: OR = 1.28, 95% CI [1.02, 1.60]; AUC = .56) and incident venous ulcer (Mental health Behavior: OR = 0.65, 95% CI [0.45, 0.93]; Neuro-musculo-skeletal function Status: OR = 0.69, 95% CI [0.47, 1.00]; male: OR = 3.08, 95% CI [1.15, 8.24]; not married: OR = 2.70, 95% CI [1.00, 7.26]; AUC = .76). The Omaha System was employed as ontology, nursing theory, and terminology to bridge data and theory and may be considered a data-driven theorizing methodology. Novel findings suggest a relationship between psychosocial factors and incident venous ulcer outcomes. There is potential to employ this method in further research, which is needed to generate and test hypotheses from other datasets to extend scientific investigations from existing data.
Enhancing GIS Capabilities for High Resolution Earth Science Grids

NASA Astrophysics Data System (ADS)

Koziol, B. W.; Oehmke, R.; Li, P.; O'Kuinghttons, R.; Theurich, G.; DeLuca, C.

2017-12-01

Applications for high performance GIS will continue to increase as Earth system models pursue more realistic representations of Earth system processes. Finer spatial resolution model input and output, unstructured or irregular modeling grids, data assimilation, and regional coordinate systems present novel challenges for GIS frameworks operating in the Earth system modeling domain. This presentation provides an overview of two GIS-driven applications that combine high performance software with big geospatial datasets to produce value-added tools for the modeling and geoscientific community. First, a large-scale interpolation experiment using National Hydrography Dataset (NHD) catchments, a high resolution rectilinear CONUS grid, and the Earth System Modeling Framework's (ESMF) conservative interpolation capability will be described. ESMF is a parallel, high-performance software toolkit that provides capabilities (e.g. interpolation) for building and coupling Earth science applications. ESMF is developed primarily by the NOAA Environmental Software Infrastructure and Interoperability (NESII) group. The purpose of this experiment was to test and demonstrate the utility of high performance scientific software in traditional GIS domains. Special attention will be paid to the nuanced requirements for dealing with high resolution, unstructured grids in scientific data formats. Second, a chunked interpolation application using ESMF and OpenClimateGIS (OCGIS) will demonstrate how spatial subsetting can virtually remove computing resource ceilings for very high spatial resolution interpolation operations. OCGIS is a NESII-developed Python software package designed for the geospatial manipulation of high-dimensional scientific datasets. An overview of the data processing workflow, why a chunked approach is required, and how the application could be adapted to meet operational requirements will be discussed here. In addition, we'll provide a general overview of OCGIS's parallel subsetting capabilities including challenges in the design and implementation of a scientific data subsetter.
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem

NASA Astrophysics Data System (ADS)

Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.

2015-12-01

Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.
An overview of results from the GEWEX radiation flux assessment

NASA Astrophysics Data System (ADS)

Raschke, E.; Stackhouse, P.; Kinne, S.; Contributors from Europe; the USA

2013-05-01

Multi-annual radiative flux averages of the International Cloud Climatology Project (ISCCP), of the GEWEX - Surface Radiation Budget Project (SRB) and of the Clouds and Earth Radiative Energy System (CERES) are compared and analyzed to characterize the Earth's radiative budget, assess differences and identify possible causes. These satellite based data-sets are also compared to results of a median model, which represents 20 climate models, that participated in the 4th IPCC assessment. Consistent distribution patterns and seasonal variations among the satellite data-sets demonstrate their scientific value, which would further increase if the datasets would be reanalyzed with more accurate and consistent ancillary data.

Interactive Visualization and Analysis of Geospatial Data Sets - TrikeND-iGlobe

NASA Astrophysics Data System (ADS)

Rosebrock, Uwe; Hogan, Patrick; Chandola, Varun

2013-04-01

The visualization of scientific datasets is becoming an ever-increasing challenge as advances in computing technologies have enabled scientists to build high resolution climate models that have produced petabytes of climate data. To interrogate and analyze these large datasets in real-time is a task that pushes the boundaries of computing hardware and software. But integration of climate datasets with geospatial data requires considerable amount of effort and close familiarity of various data formats and projection systems, which has prevented widespread utilization outside of climate community. TrikeND-iGlobe is a sophisticated software tool that bridges this gap, allows easy integration of climate datasets with geospatial datasets and provides sophisticated visualization and analysis capabilities. The objective for TrikeND-iGlobe is the continued building of an open source 4D virtual globe application using NASA World Wind technology that integrates analysis of climate model outputs with remote sensing observations as well as demographic and environmental data sets. This will facilitate a better understanding of global and regional phenomenon, and the impact analysis of climate extreme events. The critical aim is real-time interactive interrogation. At the data centric level the primary aim is to enable the user to interact with the data in real-time for the purpose of analysis - locally or remotely. TrikeND-iGlobe provides the basis for the incorporation of modular tools that provide extended interactions with the data, including sub-setting, aggregation, re-shaping, time series analysis methods and animation to produce publication-quality imagery. TrikeND-iGlobe may be run locally or can be accessed via a web interface supported by high-performance visualization compute nodes placed close to the data. It supports visualizing heterogeneous data formats: traditional geospatial datasets along with scientific data sets with geographic coordinates (NetCDF, HDF, etc.). It also supports multiple data access mechanisms, including HTTP, FTP, WMS, WCS, and Thredds Data Server (for NetCDF data and for scientific data, TrikeND-iGlobe supports various visualization capabilities, including animations, vector field visualization, etc. TrikeND-iGlobe is a collaborative open-source project, contributors include NASA (ARC-PX), ORNL (Oakridge National Laboratories), Unidata, Kansas University, CSIRO CMAR Australia and Geoscience Australia.
A Python Geospatial Language Toolkit

NASA Astrophysics Data System (ADS)

Fillmore, D.; Pletzer, A.; Galloy, M.

2012-12-01

The volume and scope of geospatial data archives, such as collections of satellite remote sensing or climate model products, has been rapidly increasing and will continue to do so in the near future. The recently launched (October 2011) Suomi National Polar-orbiting Partnership satellite (NPP) for instance, is the first of a new generation of Earth observation platforms that will monitor the atmosphere, oceans, and ecosystems, and its suite of instruments will generate several terabytes each day in the form of multi-spectral images and derived datasets. Full exploitation of such data for scientific analysis and decision support applications has become a major computational challenge. Geophysical data exploration and knowledge discovery could benefit, in particular, from intelligent mechanisms for extracting and manipulating subsets of data relevant to the problem of interest. Potential developments include enhanced support for natural language queries and directives to geospatial datasets. The translation of natural language (that is, human spoken or written phrases) into complex but unambiguous objects and actions can be based on a context, or knowledge domain, that represents the underlying geospatial concepts. This poster describes a prototype Python module that maps English phrases onto basic geospatial objects and operations. This module, along with the associated computational geometry methods, enables the resolution of natural language directives that include geographic regions of arbitrary shape and complexity.
Epoch of Reionisation

NASA Astrophysics Data System (ADS)

Barry, N.; Beardsley, A.; Bowman, J.; Briggs, F.; Byrne, R.; Carroll, P.; Hazelton, B.; Jacobs, D.; Jordan, C.; Kittiwisit, P.; Lanman, A.; Lenc, E.; Li, W.; Line, J.; McKinley, B.; Mitchell, D.; Morales, M.; Murray, S.; Paul, S.; Pindor, B.; Pober, J.; Rahimi, M.; Riding, J.; Sethi, S.; Shankar, U.; Subrahmanyan, R.; Sullivan, I.; Takahashi, K.; Thyagarajan, N.; Tingay, S.; Trott, C.; Wayth, R.; Webster, R.; Wyithe, S.

2017-01-01

The Murchison Widefield Array is designed to measure the fluctuations in the 21cm emission from neutral hydrogen during the Epoch of Reionisation. The new hex configuration is explicitly designed to test the predicted increase in sensitivity of redundant baselines. However the challenge of the new array is to understand calibration with the new configuration. We have developed two new pipelines to reduce the hex data, and will compare the results with previous datasets from the Phase 1 array. We have now processed 80 hours of data refining the data analysis through our two established Phase 1 pipelines. This proposal requests as much observing time as possible in semester 2017-A to (1) obtain a comparable hex dataset to test the sensitivity and systematic limits with redundant arrays, (2) establish the optimal observing strategy for an EoR detection, and (3) continue to explore observational strategies in the three EoR fields to advise the design of SKA-low experiments. Due to the proposed changes in the array during the upcoming semester, we have not requested a specific number of hours, but will optimise our observing program as availability of the telescope becomes clear. We note that this observing proposal implements the key scientific program that can benefit from the new hex configuration.
Data mining to predict climate hotspots: an experiment in aligning federal climate enterprises in the Northwest

NASA Astrophysics Data System (ADS)

Mote, P.; Foster, J. G.; Daley-Laursen, S. B.

2014-12-01

The Northwest has the nation's strongest geographic, institutional, and scientific alignment between NOAA RISA, DOI Climate Science Center, USDA Climate Hub, and participating universities. Considering each of those institutions' distinct mission, funding structures, governance, stakeholder engagement, methods of priority-setting, and deliverables, it is a challenge to find areas of common interest and ways for these institutions to work together. In view of the rich history of stakeholder engagement and the deep base of previous research on climate change in the region, these institutions are cooperating in developing a regional capacity to mine the vast available data in ways that are mutually beneficial, synergistic, and regionally relevant. Fundamentally, data mining means exploring connections across and within multiple datasets using advanced statistical techniques, development of multidimensional indices, machine learning, and more. The challenge is not just what we do with big datasets, but how we integrate the wide variety and types of data coming out of scenario analyses to create knowledge and inform decision-making. Federal agencies and their partners need to learn integrate big data on climate change and develop useful tools for important stake-holders to assist them in anticipating the main stresses of climate change to their own resources and preparing to abate those stresses.
Sonification Prototype for Space Physics

NASA Astrophysics Data System (ADS)

Candey, R. M.; Schertenleib, A. M.; Diaz Merced, W. L.

2005-12-01

As an alternative and adjunct to visual displays, auditory exploration of data via sonification (data controlled sound) and audification (audible playback of data samples) is promising for complex or rapidly/temporally changing visualizations, for data exploration of large datasets (particularly multi-dimensional datasets), and for exploring datasets in frequency rather than spatial dimensions (see also International Conferences on Auditory Display ). Besides improving data exploration and analysis for most researchers, the use of sound is especially valuable as an assistive technology for visually-impaired people and can make science and math more exciting for high school and college students. Only recently have the hardware and software come together to make a cross-platform open-source sonification tool feasible. We have developed a prototype sonification data analysis tool using the JavaSound API and NASA GSFC's ViSBARD software . Wanda Diaz Merced, a blind astrophysicist from Puerto Rico, is instrumental in advising on and testing the tool.
Using Real-Time Oceanic and Atmospheric Data in the Classroom

NASA Astrophysics Data System (ADS)

Culbertson, Britta

2014-05-01

While serving as an Einstein Fellow at the National Oceanic and Atmospheric Administration (NOAA), I conducted a research project based on the question, "How can science teachers use real-time oceanic and atmospheric data in their classrooms?" In the United States, new national science standards called the Next Generation Science Standards (NGSS) have been created. These standards provide more emphasis on the analysis of data and on modeling than previous state or national standards. Teachers are more tech-savvy than ever before and the internet provides free access to numerous scientific datasets. These data are useful when teachers have limited time and/or equipment to have students conduct their own experiments. However, the time it takes for practicing educators, even those with a scientific background, to understand how to access these data and use them in a meaningful way is a huge barrier. I wanted to find a way for teachers to make use of this readily available information and to create an online community where educators share best practices and lesson examples. I began by researching all of the websites hosted by NOAA that provide free, online access to real-time scientific data. I sorted the sites into categories based on their ease of usability for the non-scientist (e.g. teachers and their students). I gave several presentations on the use of real-time data in the classroom to teachers at National Science Teachers Association conferences and gathered teacher feedback on the successes and struggles of using data in the classroom. I began researching best practices for data use with the ultimate goal of creating a framework for matching available datasets from NOAA to the Next Generation Science Standards. After working on a NOAA research vessel, I developed a lesson using online data from the Alaska Fisheries Science Center Groundfish Survey. The overarching questions for this lesson are "How can pre-existing, large datasets help science students to answer open-ended questions?" and "What can we learn about an ecosystem by analyzing real-time data?" There are several focus questions to guide students through the assignment. In summary, students will examine a large fisheries dataset and develop research questions based on the information. Students will analyze the data and create graphical or other mathematical representations of the data and develop conclusions. Students will also gain practice in making inferences, looking for connections, observing trends (or lack thereof), and drawing conclusions from real scientific data. This type of lesson is highly valuable because unlike the typical classroom experiment where the outcome is often known at the onset, it allows students to see the open-ended nature of scientific investigation, to discover the flaws or holes in a dataset or experimental design, and to develop questions for further investigation. In order to share these resources with other educators, I have created a collection of oceanic and atmospheric datasets and relevant activities to be posted on the NOAA Education website. I have also created an online community for interested educators by forming a group on the Edmodo website for collaboration and sharing.
SPICE for ESA Planetary Missions

NASA Astrophysics Data System (ADS)

Costa, M.

2017-09-01

SPICE is an information system that provides the geometry needed to plan scientific observations and to analyze the obtained. The ESA SPICE Service generates the SPICE Kernel datasets for missions in all the active ESA Missions. This contribution describes the current status of the datasets, the extended services and the SPICE support provided to the ESA Planetary Missions (Mars-Express, ExoMars2016, BepiColombo, JUICE, Rosetta, Venus-Express and SMART-1) for the benefit of the science community.
Implementing interactive decision support: A case for combining cyberinfrastructure, data fusion, and social process to mobilize scientific knowledge in sustainability problems

NASA Astrophysics Data System (ADS)

Pierce, S. A.

2014-12-01

Geosciences are becoming increasingly data intensive, particularly in relation to sustainability problems, which are multi-dimensional, weakly structured and characterized by high levels of uncertainty. In the case of complex resource management problems, the challenge is to extract meaningful information from data and make sense of it. Simultaneously, scientific knowledge alone is insufficient to change practice. Creating tools, and group decision support processes for end users to interact with data are key challenges to transforming science-based information into actionable knowledge. The ENCOMPASS project began as a multi-year case study in the Atacama Desert of Chile to design and implement a knowledge transfer model for energy-water-mining conflicts in the region. ENCOMPASS combines the use of cyberinfrastructure (CI), automated data collection, interactive interfaces for dynamic decision support, and participatory modelling to support social learning. A pilot version of the ENCOMPASS CI uses open source systems and serves as a structure to integrate and store multiple forms of data and knowledge, such as DEM, meteorological, water quality, geomicrobiological, energy demand, and groundwater models. In the case study, informatics and data fusion needs related to scientific uncertainty around deep groundwater flowpaths and energy-water connections. Users may upload data from field sites with handheld devices or desktops. Once uploaded, data assets are accessible for a variety of uses. To address multi-attributed decision problems in the Atacama region a standalone application with touch-enabled interfaces was created to improve real-time interactions with datasets by groups. The tool was used to merge datasets from the ENCOMPASS CI to support exploration among alternatives and build shared understanding among stakeholders. To date, the project has increased technical capacity among stakeholders, resulted in the creation of both for-profit and non-profit entities, enabled cross-sector collaboration with mining-indigenous stakeholders, and produced an interactive application for group decision support. ENCOMPASS leverages advances in computational tools to deliver data and models for group decision support applied to sustainability science problems.
Enhancing endorsement of scientific inquiry increases support for pro-environment policies.

PubMed

Drummond, Aaron; Palmer, Matthew A; Sauer, James D

2016-09-01

Pro-environment policies require public support and engagement, but in countries such as the USA, public support for pro-environment policies remains low. Increasing public scientific literacy is unlikely to solve this, because increased scientific literacy does not guarantee increased acceptance of critical environmental issues (e.g. that climate change is occurring). We distinguish between scientific literacy (basic scientific knowledge) and endorsement of scientific inquiry (perceiving science as a valuable way of accumulating knowledge), and examine the relationship between people's endorsement of scientific inquiry and their support for pro-environment policy. Analysis of a large, publicly available dataset shows that support for pro-environment policies is more strongly related to endorsement of scientific inquiry than to scientific literacy among adolescents. An experiment demonstrates that a brief intervention can increase support for pro-environment policies via increased endorsement of scientific inquiry among adults. Public education about the merits of scientific inquiry may facilitate increased support for pro-environment policies.
Enhancing endorsement of scientific inquiry increases support for pro-environment policies

PubMed Central

Palmer, Matthew A.; Sauer, James D.

2016-01-01

Pro-environment policies require public support and engagement, but in countries such as the USA, public support for pro-environment policies remains low. Increasing public scientific literacy is unlikely to solve this, because increased scientific literacy does not guarantee increased acceptance of critical environmental issues (e.g. that climate change is occurring). We distinguish between scientific literacy (basic scientific knowledge) and endorsement of scientific inquiry (perceiving science as a valuable way of accumulating knowledge), and examine the relationship between people's endorsement of scientific inquiry and their support for pro-environment policy. Analysis of a large, publicly available dataset shows that support for pro-environment policies is more strongly related to endorsement of scientific inquiry than to scientific literacy among adolescents. An experiment demonstrates that a brief intervention can increase support for pro-environment policies via increased endorsement of scientific inquiry among adults. Public education about the merits of scientific inquiry may facilitate increased support for pro-environment policies. PMID:27703700
MATISSE: A novel tool to access, visualize and analyse data from planetary exploration missions

NASA Astrophysics Data System (ADS)

Zinzi, A.; Capria, M. T.; Palomba, E.; Giommi, P.; Antonelli, L. A.

2016-04-01

The increasing number and complexity of planetary exploration space missions require new tools to access, visualize and analyse data to improve their scientific return. ASI Science Data Center (ASDC) addresses this request with the web-tool MATISSE (Multi-purpose Advanced Tool for the Instruments of the Solar System Exploration), allowing the visualization of single observation or real-time computed high-order products, directly projected on the three-dimensional model of the selected target body. Using MATISSE it will be no longer needed to download huge quantity of data or to write down a specific code for every instrument analysed, greatly encouraging studies based on joint analysis of different datasets. In addition the extremely high-resolution output, to be used offline with a Python-based free software, together with the files to be read with specific GIS software, makes it a valuable tool to further process the data at the best spatial accuracy available. MATISSE modular structure permits addition of new missions or tasks and, thanks to dedicated future developments, it would be possible to make it compliant to the Planetary Virtual Observatory standards currently under definition. In this context the recent development of an interface to the NASA ODE REST API by which it is possible to access to public repositories is set.
AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven

Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications,more » we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.« less
BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters.

PubMed

Biswas, Mithun; Islam, Rafiqul; Shom, Gautam Kumar; Shopon, Md; Mohammed, Nabeel; Momen, Sifat; Abedin, Anowarul

2017-06-01

BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.
The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

PubMed

Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

2016-01-01

Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."
NHDPlusHR: A national geospatial framework for surface-water information

USGS Publications Warehouse

Viger, Roland; Rea, Alan H.; Simley, Jeffrey D.; Hanson, Karen M.

2016-01-01

The U.S. Geological Survey is developing a new geospatial hydrographic framework for the United States, called the National Hydrography Dataset Plus High Resolution (NHDPlusHR), that integrates a diversity of the best-available information, robustly supports ongoing dataset improvements, enables hydrographic generalization to derive alternate representations of the network while maintaining feature identity, and supports modern scientific computing and Internet accessibility needs. This framework is based on the High Resolution National Hydrography Dataset, the Watershed Boundaries Dataset, and elevation from the 3-D Elevation Program, and will provide an authoritative, high precision, and attribute-rich geospatial framework for surface-water information for the United States. Using this common geospatial framework will provide a consistent basis for indexing water information in the United States, eliminate redundancy, and harmonize access to, and exchange of water information.
NEOview: Near Earth Object Data Discovery and Query

NASA Astrophysics Data System (ADS)

Tibbetts, M.; Elvis, M.; Galache, J. L.; Harbo, P.; McDowell, J. C.; Rudenko, M.; Van Stone, D.; Zografou, P.

2013-10-01

Missions to Near Earth Objects (NEOs) figure prominently in NASA's Flexible Path approach to human space exploration. NEOs offer insight into both the origins of the Solar System and of life, as well as a source of materials for future missions. With NEOview scientists can locate NEO datasets, explore metadata provided by the archives, and query or combine disparate NEO datasets in the search for NEO candidates for exploration. NEOview is a software system that illustrates how standards-based interfaces facilitate NEO data discovery and research. NEOview software follows a client-server architecture. The server is a configurable implementation of the International Virtual Observatory Alliance (IVOA) Table Access Protocol (TAP), a general interface for tabular data access, that can be deployed as a front end to existing NEO datasets. The TAP client, seleste, is a graphical interface that provides intuitive means of discovering NEO providers, exploring dataset metadata to identify fields of interest, and constructing queries to retrieve or combine data. It features a powerful, graphical query builder capable of easing the user's introduction to table searches. Through science use cases, NEOview demonstrates how potential targets for NEO rendezvous could be identified by combining data from complementary sources. Through deployment and operations, it has been shown that the software components are data independent and configurable to many different data servers. As such, NEOview's TAP server and seleste TAP client can be used to create a seamless environment for data discovery and exploration for tabular data in any astronomical archive.
Blood is Thicker Than Water

PubMed Central

Carmichael, Sarah; Rijpma, Auke

2017-01-01

This article introduces a new dataset of historical family characteristics based on ethnographic literature. The novelty of the dataset lies in the fact that it is constructed at the level of the ethnic group. To test the possibilities of the dataset, we construct a measure of family constraints on women’s agency from it and explore its correlation to a number of geographical factors. PMID:28490859
30 CFR 251.8 - Inspection and reporting requirements for activities under a permit.

Code of Federal Regulations, 2010 CFR

2010-07-01

... activities. You must allow MMS representatives to inspect your exploration or scientific research activities... final report of exploration or scientific research activities under a permit within 30 days after the... and blocks in which any exploration or permitted scientific research activities were conducted...
DOIs for Data: Progress in Data Citation and Publication in the Geosciences

NASA Astrophysics Data System (ADS)

Callaghan, S.; Murphy, F.; Tedds, J.; Allan, R.

2012-12-01

Identifiers for data are the bedrock on which data citation and publication rests. These, in their turn, are widely proposed as methods for encouraging researchers to share their datasets, and at the same time receive academic credit for their efforts in producing them. However, neither data citation nor publication can be properly achieved without a method of identifying clearly what is, and what isn't, part of the dataset. Once a dataset becomes part of the scientific record (either through formal data publication or through being cited) then issues such as dataset stability and permanence become vital to address. In the geosciences, several projects in the UK are concentrating on issues of dataset identification, citation and publication. The UK's Natural Environment Research Council's (NERC) Science Information Strategy data citation and publication project is addressing the issue of identifiers for data, stability, transparency, and credit for data producers through data citation. At a data publication level, 2012 has seen the launch of the new Wiley title Geoscience Data Journal and the PREPARDE (Peer Review for Publication & Accreditation of Research Data in the Earth sciences) project, both aiming to encourage data publication by addressing issues such as data paper submission workflows and the scientific peer-review of data. All of these initiatives work with a range of partners including academic institutions, learned societies, data centers and commercial publishers, both nationally and internationally, with a cross-project aim of developing the mechanisms so data can be identified, cited and published with confidence. This involves investigating barriers and drivers to data publishing and sharing, peer review, and re-use of geoscientific datasets, and specifically such topics as dataset requirements for citation, workflows for dataset ingestion into data centers and publishers, procedures and policies for editors, reviewers and authors of data publication, and assessing the trustworthiness of data archives. A key goal is to ensure that these projects reach out to, and are informed by, other related initiatives on a global basis, in particular anyone interested in developing long-term sustainable policies, processes, incentives and business models for managing and publishing research data. This presentation will give an overview of progress in the projects mentioned above, specifically focussing on the use of DOIs for datasets hosted in the NERC environmental data centers, and how DOIs are enabling formal data citation and publication in the geosciences.
Accuracy assessment of the U.S. Geological Survey National Elevation Dataset, and comparison with other large-area elevation datasets: SRTM and ASTER

USGS Publications Warehouse

Gesch, Dean B.; Oimoen, Michael J.; Evans, Gayla A.

2014-01-01

The National Elevation Dataset (NED) is the primary elevation data product produced and distributed by the U.S. Geological Survey. The NED provides seamless raster elevation data of the conterminous United States, Alaska, Hawaii, U.S. island territories, Mexico, and Canada. The NED is derived from diverse source datasets that are processed to a specification with consistent resolutions, coordinate system, elevation units, and horizontal and vertical datums. The NED serves as the elevation layer of The National Map, and it provides basic elevation information for earth science studies and mapping applications in the United States and most of North America. An important part of supporting scientific and operational use of the NED is provision of thorough dataset documentation including data quality and accuracy metrics. The focus of this report is on the vertical accuracy of the NED and on comparison of the NED with other similar large-area elevation datasets, namely data from the Shuttle Radar Topography Mission (SRTM) and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER).

Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets

PubMed Central

McKinney, Bill; Meyer, Peter A.; Crosas, Mercè; Sliz, Piotr

2016-01-01

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension—functionality supporting preservation of filesystem structure within Dataverse—which is essential for both in-place computation and supporting non-http data transfers. PMID:27862010
Closing the data gap: Creating an open data environment

NASA Astrophysics Data System (ADS)

Hester, J. R.

2014-02-01

Poor data management brought on by increasing volumes of complex data undermines both the integrity of the scientific process and the usefulness of datasets. Researchers should endeavour both to make their data citeable and to cite data whenever possible. The reusability of datasets is improved by community adoption of comprehensive metadata standards and public availability of reversibly reduced data. Where standards are not yet defined, as much information as possible about the experiment and samples should be preserved in datafiles written in a standard format.
Volcano Geodesy: Recent developments and future challenges

USGS Publications Warehouse

Fernandez, Jose F.; Pepe, Antonio; Poland, Michael; Sigmundsson, Freysteinn

2017-01-01

Ascent of magma through Earth's crust is normally associated with, among other effects, ground deformation and gravity changes. Geodesy is thus a valuable tool for monitoring and hazards assessment during volcanic unrest, and it provides valuable data for exploring the geometry and volume of magma plumbing systems. Recent decades have seen an explosion in the quality and quantity of volcano geodetic data. New datasets (some made possible by regional and global scientific initiatives), as well as new analysis methods and modeling practices, have resulted in important changes to our understanding of the geodetic characteristics of active volcanism and magmatic processes, from the scale of individual eruptive vents to global compilations of volcano deformation. Here, we describe some of the recent developments in volcano geodesy, both in terms of data and interpretive tools, and discuss the role of international initiatives in meeting future challenges for the field.
Unsupervised feature learning for autonomous rock image classification

NASA Astrophysics Data System (ADS)

Shu, Lei; McIsaac, Kenneth; Osinski, Gordon R.; Francis, Raymond

2017-09-01

Autonomous rock image classification can enhance the capability of robots for geological detection and enlarge the scientific returns, both in investigation on Earth and planetary surface exploration on Mars. Since rock textural images are usually inhomogeneous and manually hand-crafting features is not always reliable, we propose an unsupervised feature learning method to autonomously learn the feature representation for rock images. In our tests, rock image classification using the learned features shows that the learned features can outperform manually selected features. Self-taught learning is also proposed to learn the feature representation from a large database of unlabelled rock images of mixed class. The learned features can then be used repeatedly for classification of any subclass. This takes advantage of the large dataset of unlabelled rock images and learns a general feature representation for many kinds of rocks. We show experimental results supporting the feasibility of self-taught learning on rock images.
Virtual Exploitation Environment Demonstration for Atmospheric Missions

NASA Astrophysics Data System (ADS)

Natali, Stefano; Mantovani, Simone; Hirtl, Marcus; Santillan, Daniel; Triebnig, Gerhard; Fehr, Thorsten; Lopes, Cristiano

2017-04-01

The scientific and industrial communities are being confronted with a strong increase of Earth Observation (EO) satellite missions and related data. This is in particular the case for the Atmospheric Sciences communities, with the upcoming Copernicus Sentinel-5 Precursor, Sentinel-4, -5 and -3, and ESA's Earth Explorers scientific satellites ADM-Aeolus and EarthCARE. The challenge is not only to manage the large volume of data generated by each mission / sensor, but to process and analyze the data streams. Creating synergies among the different datasets will be key to exploit the full potential of the available information. As a preparation activity supporting scientific data exploitation for Earth Explorer and Sentinel atmospheric missions, ESA funded the "Technology and Atmospheric Mission Platform" (TAMP) [1] [2] project; a scientific and technological forum (STF) has been set-up involving relevant European entities from different scientific and operational fields to define the platforḿs requirements. Data access, visualization, processing and download services have been developed to satisfy useŕs needs; use cases defined with the STF, such as study of the SO2 emissions for the Holuhraun eruption (2014) by means of two numerical models, two satellite platforms and ground measurements, global Aerosol analyses from long time series of satellite data, and local Aerosol analysis using satellite and LIDAR, have been implemented to ensure acceptance of TAMP by the atmospheric sciences community. The platform pursues the "virtual workspace" concept: all resources (data, processing, visualization, collaboration tools) are provided as "remote services", accessible through a standard web browser, to avoid the download of big data volumes and for allowing utilization of provided infrastructure for computation, analysis and sharing of results. Data access and processing are achieved through standardized protocols (WCS, WPS). As evolution toward a pre-operational environment, the "Virtual Exploitation Environment Demonstration for Atmospheric Missions" (VEEDAM) aims at maintaining, running and evolving the platform, demonstrating e.g. the possibility to perform massive processing over heterogeneous data sources. This work presents the VEEDAM concepts, provides pre-operational examples, stressing on the interoperability achievable exposing standardized data access and processing services (e.g. making accessible data and processing resources from different VREs). [1] TAMP platform landing page http://vtpip.zamg.ac.at/ [2] TAMP introductory video https://www.youtube.com/watch?v=xWiy8h1oXQY
A dataset of forest biomass structure for Eurasia.

PubMed

Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

2017-05-16

The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.
A dataset of forest biomass structure for Eurasia

NASA Astrophysics Data System (ADS)

Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

2017-05-01

The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.
Automatic Beam Path Analysis of Laser Wakefield Particle Acceleration Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rubel, Oliver; Geddes, Cameron G.R.; Cormier-Michel, Estelle

2009-10-19

Numerical simulations of laser wakefield particle accelerators play a key role in the understanding of the complex acceleration process and in the design of expensive experimental facilities. As the size and complexity of simulation output grows, an increasingly acute challenge is the practical need for computational techniques that aid in scientific knowledge discovery. To that end, we present a set of data-understanding algorithms that work in concert in a pipeline fashion to automatically locate and analyze high energy particle bunches undergoing acceleration in very large simulation datasets. These techniques work cooperatively by first identifying features of interest in individual timesteps,more » then integrating features across timesteps, and based on the information derived perform analysis of temporally dynamic features. This combination of techniques supports accurate detection of particle beams enabling a deeper level of scientific understanding of physical phenomena than hasbeen possible before. By combining efficient data analysis algorithms and state-of-the-art data management we enable high-performance analysis of extremely large particle datasets in 3D. We demonstrate the usefulness of our methods for a variety of 2D and 3D datasets and discuss the performance of our analysis pipeline.« less
Seasonal-scale Observational Data Analysis and Atmospheric Phenomenology for the Cold Land Processes Experiment

NASA Technical Reports Server (NTRS)

Poulos, Gregory S.; Stamus, Peter A.; Snook, John S.

2005-01-01

The Cold Land Processes Experiment (CLPX) experiment emphasized the development of a strong synergism between process-oriented understanding, land surface models and microwave remote sensing. Our work sought to investigate which topographically- generated atmospheric phenomena are most relevant to the CLPX MSA's for the purpose of evaluating their climatic importance to net local moisture fluxes and snow transport through the use of high-resolution data assimilation/atmospheric numerical modeling techniques. Our task was to create three long-term, scientific quality atmospheric datasets for quantitative analysis (for all CLPX researchers) and provide a summary of the meteorologically-relevant phenomena of the three MSAs (see Figure) over northern Colorado. Our efforts required the ingest of a variety of CLPX datasets and the execution an atmospheric and land surface data assimilation system based on the Navier-Stokes equations (the Local Analysis and Prediction System, LAPS, and an atmospheric numerical weather prediction model, as required) at topographically- relevant grid spacing (approx. 500 m). The resulting dataset will be analyzed by the CLPX community as a part of their larger research goals to determine the relative influence of various atmospheric phenomena on processes relevant to CLPX scientific goals.
Assessment of the NASA-USGS Global Land Survey (GLS) Datasets

USGS Publications Warehouse

Gutman, Garik; Huang, Chengquan; Chander, Gyanesh; Noojipady, Praveen; Masek, Jeffery G.

2013-01-01

The Global Land Survey (GLS) datasets are a collection of orthorectified, cloud-minimized Landsat-type satellite images, providing near complete coverage of the global land area decadally since the early 1970s. The global mosaics are centered on 1975, 1990, 2000, 2005, and 2010, and consist of data acquired from four sensors: Enhanced Thematic Mapper Plus, Thematic Mapper, Multispectral Scanner, and Advanced Land Imager. The GLS datasets have been widely used in land-cover and land-use change studies at local, regional, and global scales. This study evaluates the GLS datasets with respect to their spatial coverage, temporal consistency, geodetic accuracy, radiometric calibration consistency, image completeness, extent of cloud contamination, and residual gaps. In general, the three latest GLS datasets are of a better quality than the GLS-1990 and GLS-1975 datasets, with most of the imagery (85%) having cloud cover of less than 10%, the acquisition years clustered much more tightly around their target years, better co-registration relative to GLS-2000, and better radiometric absolute calibration. Probably, the most significant impediment to scientific use of the datasets is the variability of image phenology (i.e., acquisition day of year). This paper provides end-users with an assessment of the quality of the GLS datasets for specific applications, and where possible, suggestions for mitigating their deficiencies.
ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery.

PubMed

Partl, Christian; Lex, Alexander; Streit, Marc; Strobelt, Hendrik; Wassermann, Anne-Mai; Pfister, Hanspeter; Schmalstieg, Dieter

2014-12-01

Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets. At its core ConTour lists all items of each dataset in a column. Relationships between the columns are revealed through interaction: selecting one or multiple items in one column highlights and re-sorts the items in other columns. Filters based on relationships enable drilling down into the large data space. To identify interesting items in the first place, ConTour employs advanced sorting strategies, including strategies based on connectivity strength and uniqueness, as well as sorting based on item attributes. ConTour also introduces interactive nesting of columns, a powerful method to show the related items of a child column for each item in the parent column. Within the columns, ConTour shows rich attribute data about the items as well as information about the connection strengths to other datasets. Finally, ConTour provides a number of detail views, which can show items from multiple datasets and their associated data at the same time. We demonstrate the utility of our system in case studies conducted with a team of chemical biologists, who investigate the effects of chemical compounds on cells and need to understand the underlying mechanisms.
Correlates of Research Effort in Carnivores: Body Size, Range Size and Diet Matter

PubMed Central

Brooke, Zoe M.; Bielby, Jon; Nambiar, Kate; Carbone, Chris

2014-01-01

Given the budgetary restrictions on scientific research and the increasing need to better inform conservation actions, it is important to identify the patterns and causes of biases in research effort. We combine bibliometric information from a literature review of almost 16,500 peer-reviewed publications on a well-known group of 286 species, the Order Carnivora, with global datasets on species' life history and ecological traits to explore patterns in research effort. Our study explores how species' characteristics influenced the degree to which they were studied (measured as the number of publications). We identified a wide variation in intensity of research effort at both Family and Species levels, with some of the least studied being those which may need protection in future. Our findings hint at the complex role of human perspectives in setting research agendas. We found that better-studied species tended to be large-bodied and have a large geographic range whilst omnivory had a negative relationship with research effort. IUCN threat status did not exhibit a strong relationship with research effort which suggests that the conservation needs of individual species are not major drivers of research interest. This work is the first to use a combination of bibliometric analysis and biological data to quantify and interpret gaps in research knowledge across an entire Order. Our results could be combined with other resources, such as Biodiversity Action Plans, to prioritise and co-ordinate future research effort, whilst our methods can be applied across many scientific disciplines to describe knowledge gaps. PMID:24695422
Measuring the effectiveness of scientific gatekeeping.

PubMed

Siler, Kyle; Lee, Kirby; Bero, Lisa

2015-01-13

Peer review is the main institution responsible for the evaluation and gestation of scientific research. Although peer review is widely seen as vital to scientific evaluation, anecdotal evidence abounds of gatekeeping mistakes in leading journals, such as rejecting seminal contributions or accepting mediocre submissions. Systematic evidence regarding the effectiveness--or lack thereof--of scientific gatekeeping is scant, largely because access to rejected manuscripts from journals is rarely available. Using a dataset of 1,008 manuscripts submitted to three elite medical journals, we show differences in citation outcomes for articles that received different appraisals from editors and peer reviewers. Among rejected articles, desk-rejected manuscripts, deemed as unworthy of peer review by editors, received fewer citations than those sent for peer review. Among both rejected and accepted articles, manuscripts with lower scores from peer reviewers received relatively fewer citations when they were eventually published. However, hindsight reveals numerous questionable gatekeeping decisions. Of the 808 eventually published articles in our dataset, our three focal journals rejected many highly cited manuscripts, including the 14 most popular; roughly the top 2 percent. Of those 14 articles, 12 were desk-rejected. This finding raises concerns regarding whether peer review is ill--suited to recognize and gestate the most impactful ideas and research. Despite this finding, results show that in our case studies, on the whole, there was value added in peer review. Editors and peer reviewers generally--but not always-made good decisions regarding the identification and promotion of quality in scientific manuscripts.
Geospatial Visualization of Scientific Data Through Keyhole Markup Language

NASA Astrophysics Data System (ADS)

Wernecke, J.; Bailey, J. E.

2008-12-01

The development of virtual globes has provided a fun and innovative tool for exploring the surface of the Earth. However, it has been the paralleling maturation of Keyhole Markup Language (KML) that has created a new medium and perspective through which to visualize scientific datasets. Originally created by Keyhole Inc., and then acquired by Google in 2004, in 2007 KML was given over to the Open Geospatial Consortium (OGC). It became an OGC international standard on 14 April 2008, and has subsequently been adopted by all major geobrowser developers (e.g., Google, Microsoft, ESRI, NASA) and many smaller ones (e.g., Earthbrowser). By making KML a standard at a relatively young stage in its evolution, developers of the language are seeking to avoid the issues that plagued the early World Wide Web and development of Hypertext Markup Language (HTML). The popularity and utility of Google Earth, in particular, has been enhanced by KML features such as the Smithsonian volcano layer and the dynamic weather layers. Through KML, users can view real-time earthquake locations (USGS), view animations of polar sea-ice coverage (NSIDC), or read about the daily activities of chimpanzees (Jane Goodall Institute). Perhaps even more powerful is the fact that any users can create, edit, and share their own KML, with no or relatively little knowledge of manipulating computer code. We present an overview of the best current scientific uses of KML and a guide to how scientists can learn to use KML themselves.
C3: A Collaborative Web Framework for NASA Earth Exchange

NASA Astrophysics Data System (ADS)

Foughty, E.; Fattarsi, C.; Hardoyo, C.; Kluck, D.; Wang, L.; Matthews, B.; Das, K.; Srivastava, A.; Votava, P.; Nemani, R. R.

2010-12-01

The NASA Earth Exchange (NEX) is a new collaboration platform for the Earth science community that provides a mechanism for scientific collaboration and knowledge sharing. NEX combines NASA advanced supercomputing resources, Earth system modeling, workflow management, NASA remote sensing data archives, and a collaborative communication platform to deliver a complete work environment in which users can explore and analyze large datasets, run modeling codes, collaborate on new or existing projects, and quickly share results among the Earth science communities. NEX is designed primarily for use by the NASA Earth science community to address scientific grand challenges. The NEX web portal component provides an on-line collaborative environment for sharing of Eearth science models, data, analysis tools and scientific results by researchers. In addition, the NEX portal also serves as a knowledge network that allows researchers to connect and collaborate based on the research they are involved in, specific geographic area of interest, field of study, etc. Features of the NEX web portal include: Member profiles, resource sharing (data sets, algorithms, models, publications), communication tools (commenting, messaging, social tagging), project tools (wikis, blogs) and more. The NEX web portal is built on the proven technologies and policies of DASHlink.arc.nasa.gov, (one of NASA's first science social media websites). The core component of the web portal is a C3 framework, which was built using Django and which is being deployed as a common framework for a number of collaborative sites throughout NASA.
Data Publishing and Sharing Via the THREDDS Data Repository

NASA Astrophysics Data System (ADS)

Wilson, A.; Caron, J.; Davis, E.; Baltzer, T.

2007-12-01

The terms "Team Science" and "Networked Science" have been coined to describe a virtual organization of researchers tied via some intellectual challenge, but often located in different organizations and locations. A critical component to these endeavors is publishing and sharing of content, including scientific data. Imagine pointing your web browser to a web page that interactively lets you upload data and metadata to a repository residing on a remote server, which can then be accessed by others in a secure fasion via the web. While any content can be added to this repository, it is designed particularly for storing and sharing scientific data and metadata. Server support includes uploading of data files that can subsequently be subsetted, aggregrated, and served in NetCDF or other scientific data formats. Metadata can be associated with the data and interactively edited. The THREDDS Data Repository (TDR) is a server that provides client initiated, on demand, location transparent storage for data of any type that can then be served by the THREDDS Data Server (TDS). The TDR provides functionality to: * securely store and "own" data files and associated metadata * upload files via HTTP and gridftp * upload a collection of data as single file * modify and restructure repository contents * incorporate metadata provided by the user * generate additional metadata programmatically * edit individual metadata elements The TDR can exist separately from a TDS, serving content via HTTP. Also, it can work in conjunction with the TDS, which includes functionality to provide: * access to data in a variety of formats via -- OPeNDAP -- OGC Web Coverage Service (for gridded datasets) -- bulk HTTP file transfer * a NetCDF view of datasets in NetCDF, OPeNDAP, HDF-5, GRIB, and NEXRAD formats * serving of very large volume datasets, such as NEXRAD radar * aggregation into virtual datasets * subsetting via OPeNDAP and NetCDF Subsetting services This talk will discuss TDR/TDS capabilities as well as how users can install this software to create their own repositories.
The application of cloud computing to scientific workflows: a study of cost and performance.

PubMed

Berriman, G Bruce; Deelman, Ewa; Juve, Gideon; Rynge, Mats; Vöckler, Jens-S

2013-01-28

The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. We report here the results of investigations of the applicability of commercial cloud computing to scientific computing, with an emphasis on astronomy, including investigations of what types of applications can be run cheaply and efficiently on the cloud, and an example of an application well suited to the cloud: processing a large dataset to create a new science product.
Facing the Challenges of Accessing, Managing, and Integrating Large Observational Datasets in Ecology: Enabling and Enriching the Use of NEON's Observational Data

NASA Astrophysics Data System (ADS)

Thibault, K. M.

2013-12-01

As the construction of NEON and its transition to operations progresses, more and more data will become available to the scientific community, both from NEON directly and from the concomitant growth of existing data repositories. Many of these datasets include ecological observations of a diversity of taxa in both aquatic and terrestrial environments. Although observational data have been collected and used throughout the history of organismal biology, the field has not yet fully developed a culture of data management, documentation, standardization, sharing and discoverability to facilitate the integration and synthesis of datasets. Moreover, the tools required to accomplish these goals, namely database design, implementation, and management, and automation and parallelization of analytical tasks through computational techniques, have not historically been included in biology curricula, at either the undergraduate or graduate levels. To ensure the success of data-generating projects like NEON in advancing organismal ecology and to increase transparency and reproducibility of scientific analyses, an acceleration of the cultural shift to open science practices, the development and adoption of data standards, such as the DarwinCore standard for taxonomic data, and increased training in computational approaches for biologists need to be realized. Here I highlight several initiatives that are intended to increase access to and discoverability of publicly available datasets and equip biologists and other scientists with the skills that are need to manage, integrate, and analyze data from multiple large-scale projects. The EcoData Retriever (ecodataretriever.org) is a tool that downloads publicly available datasets, re-formats the data into an efficient relational database structure, and then automatically imports the data tables onto a user's local drive into the database tool of the user's choice. The automation of these tasks results in nearly instantaneous execution of tasks that previously required hours to days of each data user's time, with decreased error rates and increased useability of the data. The Ecological Data wiki (ecologicaldata.org) provides a forum for users of ecological datasets to share relevant metadata and tips and tricks for using the data, in order to flatten learning curves, as well as minimize redundancy of efforts among users of the same datasets. Finally, Software Carpentry (software-carpentry.org) has developed curricula for scientific computing and provides both online training and low cost, short courses that can be tailored to the specific needs of the students. Demand for these courses has been increasing exponentially in recent years, and represent a significant educational resource for biologists. I will conclude by linking these initiatives to the challenges facing ecologists related to the effective and efficient exploitation of NEON's diverse data streams.
Quantitative monitoring of Arabidopsis thaliana growth and development using high-throughput plant phenotyping

PubMed Central

Arend, Daniel; Lange, Matthias; Pape, Jean-Michel; Weigelt-Fischer, Kathleen; Arana-Ceballos, Fernando; Mücke, Ingo; Klukas, Christian; Altmann, Thomas; Scholz, Uwe; Junker, Astrid

2016-01-01

With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments. PMID:27529152
Heliophysics Legacy Data Restoration

NASA Astrophysics Data System (ADS)

Candey, R. M.; Bell, E. V., II; Bilitza, D.; Chimiak, R.; Cooper, J. F.; Garcia, L. N.; Grayzeck, E. J.; Harris, B. T.; Hills, H. K.; Johnson, R. C.; Kovalick, T. J.; Lal, N.; Leckner, H. A.; Liu, M. H.; McCaslin, P. W.; McGuire, R. E.; Papitashvili, N. E.; Rhodes, S. A.; Roberts, D. A.; Yurow, R. E.

2016-12-01

The Space Physics Data Facility (SPDF) , in collaboration with the National Space Science Data Coordinated Archive (NSSDCA), is converting datasets from older NASA missions to online storage. Valuable science is still buried within these datasets, particularly by applying modern algorithms on computers with vastly more storage and processing power than available when originally measured, and when analyzed in conjunction with other data and models. The data were also not readily accessible as archived on 7- and 9-track tapes, microfilm and microfiche and other media. Although many datasets have now been moved online in formats that are readily analyzed, others will still require some deciphering to puzzle out the data values and scientific meaning. There is an ongoing effort to convert the datasets to a modern Common Data Format (CDF) and add metadata for use in browse and analysis tools such as CDAWeb .

Quantitative monitoring of Arabidopsis thaliana growth and development using high-throughput plant phenotyping.

PubMed

Arend, Daniel; Lange, Matthias; Pape, Jean-Michel; Weigelt-Fischer, Kathleen; Arana-Ceballos, Fernando; Mücke, Ingo; Klukas, Christian; Altmann, Thomas; Scholz, Uwe; Junker, Astrid

2016-08-16

With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments.
Persistent Identifiers Implementation in EOSDIS

NASA Technical Reports Server (NTRS)

Ramapriyan, H. K. " Rama"

2016-01-01

This presentation provides the motivation for and status of implementation of persistent identifiers in NASA's Earth Observation System Data and Information System (EOSDIS). The motivation is provided from the point of view of long-term preservation of datasets such that a number of questions raised by current and future users can be answered easily and precisely. A number of artifacts need to be preserved along with datasets to make this possible, especially when the authors of datasets are no longer available to address users questions. The artifacts and datasets need to be uniquely and persistently identified and linked with each other for full traceability, understandability and scientific reproducibility. Current work in the Earth Science Data and Information System (ESDIS) Project and the Distributed Active Archive Centers (DAACs) in assigning Digital Object Identifiers (DOI) is discussed as well as challenges that remain to be addressed in the future.
Developing an EarthCube Governance Structure for Big Data Preservation and Access

NASA Astrophysics Data System (ADS)

Leetaru, H. E.; Leetaru, K. H.

2012-12-01

The underlying vision of the NSF EarthCube initiative is of an enduring resource serving the needs of the earth sciences for today and the future. We must therefore view this effort through the lens of what the earth sciences will need tomorrow and on how the underlying processes of data compilation, preservation, and access interplay with the scientific processes within the communities EarthCube will serve. Key issues that must be incorporated into the EarthCube governance structure include authentication, retrieval, and unintended use cases, the emerging role of whole-corpus data mining, and how inventory, citation, and archive practices will impact the ability of scientists to use EarthCube's collections into the future. According to the National Academies, the US federal government spends over $140 billion dollars a year in support of the nation's research base. Yet, a critical issue confronting all of the major scientific disciplines in building upon this investment is the lack of processes that guide how data are preserved for the long-term, ensuring that studies can be replicated and that experimental data remains accessible as new analytic methods become available or theories evolve. As datasets are used years or even decades after their creation, far richer metadata is needed to describe the underlying simulation, smoothing algorithms or bounding parameters of the data collection process. This is even truer as data are increasingly used outside their intended disciplines, as geoscience researchers apply algorithms from one discipline to datasets from another, where their analytical techniques may make extensive assumptions about the data. As science becomes increasingly interdisciplinary and emerging computational approaches begin applying whole-corpus methodologies and blending multiple archives together, we are facing new data access modalities distinct from the needs of the past, drawing into focus the question of centralized versus distributed architectures. In the past geoscience data have been distributed, with each site maintaining its own collections and centralized inventory metadata supporting discovery. This was based on the historical search-browse-download modality where access was primarily to download a copy to a researcher's own machine and datasets were measured in gigabytes. New "big data" approaches to the geosciences are already demonstrating the need to analyze the entirety of multiple collections from multiple sites totaling hundreds of terabytes in size. Yet, datasets are outpacing the ability of networks to share them, forcing a new paradigm in high-performance computing where computation must migrate to centralized data stores. The next generation of geoscientists are going to need a system designed for exploring and understanding data from multiple scientific domains and vantages where data are preserved for decades. We are not alone in this endeavor and there are many lessons we can learn from similar initiatives such as more than 40 years of governance policies for data warehouses and 15 years of open web archives, all of which face the same challenges. The entire EarthCube project will fail if the new governance structure does not account for the needs of integrated cyberinfrastructure that allows big data to stored, archived, analyzed, and made accessible to large numbers of scientists.
Bridging Informatics and Earth Science: a Look at Gregory Leptoukh's Contributions

NASA Technical Reports Server (NTRS)

2012-01-01

With the tragic passing this year of Gregory Leptoukh, the Earth and Space Sciences community lost a tireless participant in--and advocate for--science informatics. Throughout his career at NASA, Dr. Leptoukh established a theme of bridging the gulf between the informatics and science communities. Nowhere is this more evident than his leadership in the development of Giovanni (GES DISC Interactive Online Visualization ANd aNalysis Infrastructure). Giovanni is an online tool that serves to hide the often-complex technical details of data format and structure, making science data easier to explore and use by Earth scientists. To date Giovanni has been acknowledged as a contributor in 500-odd scientific articles. In recent years, Leptoukh concentrated his efforts on multi-sensor data inter-comparison, merging and fusion. This work exposed several challenges at the intersection of data and science. One of these was the ease with which a naive user might generate spurious comparisons, a potential hazard that was the genesis of the Multi-sensor Data Synergy Advisor (MDSA). The MDSA uses semantic ontologies and inference rules to organize knowledge about dataset quality and other salient characteristics in order to advise users on potential caveats for comparing or merging two datasets. Recently, Leptoukh also led the development of AeroStat, an online Giovanni instance to investigate aerosols via statistics from station and satellite comparisons and merged maps of data from more than one instrument. Aerostat offers a neural net based bias adjustment to harmonize the data by removing systematic offsets between datasets before merging. These examples exhibit Leptoukh's talent for adopting advanced computer technologies in the service of making science data more accessible to researchers. In this, he set an example that is at once both vital and challenging for the ESSI community to emulate.
Bridging Informatics and Earth Science: a Look at Gregory Leptoukh's Contributions

NASA Astrophysics Data System (ADS)

Lynnes, C.

2012-12-01

With the tragic passing this year of Gregory Leptoukh, the Earth and Space Sciences community lost a tireless participant in--and advocate for--science informatics. Throughout his career at NASA, Dr. Leptoukh established a theme of bridging the gulf between the informatics and science communities. Nowhere is this more evident than his leadership in the development of Giovanni (GES DISC Interactive Online Visualization ANd aNalysis Infrastructure). Giovanni is an online tool that serves to hide the often-complex technical details of data format and structure, making science data easier to explore and use by Earth scientists. To date Giovanni has been acknowledged as a contributor in 500-odd scientific articles. In recent years, Leptoukh concentrated his efforts on multi-sensor data inter-comparison, merging and fusion. This work exposed several challenges at the intersection of data and science. One of these was the ease with which a naive user might generate spurious comparisons, a potential hazard that was the genesis of the Multi-sensor Data Synergy Advisor (MDSA). The MDSA uses semantic ontologies and inference rules to organize knowledge about dataset quality and other salient characteristics in order to advise users on potential caveats for comparing or merging two datasets. Recently, Leptoukh also led the development of AeroStat, an online Giovanni instance to investigate aerosols via statistics from station and satellite comparisons and merged maps of data from more than one instrument. Aerostat offers a neural net based bias adjustment to "harmonize" the data by removing systematic offsets between datasets before merging. These examples exhibit Leptoukh's talent for adopting advanced computer technologies in the service of making science data more accessible to researchers. In this, he set an example that is at once both vital and challenging for the ESSI community to emulate.
Physical Characterization of Warm Spitzer-observed Near-Earth Objects

NASA Technical Reports Server (NTRS)

Thomas, Cristina A.; Emery, Joshua P.; Trilling, David E.; Delbo, Marco; Hora, Joseph L.; Mueller, Michael

2014-01-01

Near-infrared spectroscopy of Near-Earth Objects (NEOs) connects diagnostic spectral features to specific surface mineralogies. The combination of spectroscopy with albedos and diameters derived from thermal infrared observations can increase the scientific return beyond that of the individual datasets. For instance, some taxonomic classes can be separated into distinct compositional groupings with albedo and different mineralogies with similar albedos can be distinguished with spectroscopy. To that end, we have completed a spectroscopic observing campaign to complement the ExploreNEOs Warm Spitzer program that obtained albedos and diameters of nearly 600 NEOs (Trilling et al., 2010). The spectroscopy campaign included visible and near-infrared observations of ExploreNEOs targets from various observatories. Here we present the results of observations using the low-resolution prism mode (approx. 0.7-2.5 microns) of the SpeX instrument on the NASA Infrared Telescope Facility (IRTF). We also include near-infrared observations of Explore-NEOs targets from the MIT-UH-IRTF Joint Campaign for Spectral Reconnaissance. Our dataset includes near-infrared spectra of 187 ExploreNEOs targets (125 observations of 92 objects from our survey and 213 observations of 154 objects from the MIT survey). We identify a taxonomic class for each spectrum and use band parameter analysis to investigate the mineralogies for the S-, Q-, and V-complex objects. Our analysis suggests that for spectra that contain near-infrared data but lack the visible wavelength region, the Bus-DeMeo system misidentifies some S-types as Q-types. We find no correlation between spectral band parameters and ExploreNEOs albedos and diameters. We investigate the correlations of phase angle with band area ratio and near-infrared spectral slope. We find slightly negative Band Area Ratio (BAR) correlations with phase angle for Eros and Ivar, but a positive BAR correlation with phase angle for Ganymed.The results of our phase angle study are consistent with those of (Sanchez et al., 2012). We find evidence for spectral phase reddening for Eros, Ganymed, and Ivar. We identify the likely ordinary chondrite type analog for an appropriate subset of our sample. Our resulting proportions of H, L, and LL ordinary chondrites differ from those calculated for meteorite falls and in previous studies of ordinary chondrite-like NEOs.
Publishing datasets with eSciDoc and panMetaDocs

NASA Astrophysics Data System (ADS)

Ulbricht, D.; Klump, J.; Bertelmann, R.

2012-04-01

Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of publishing scientific datasets as electronic data supplements to research papers. Publication of research manuscripts has an already well established workflow that shares junctures with other processes and involves several parties in the process of dataset publication. Activities of the author, the reviewer, the print publisher and the data publisher have to be coordinated into a common data publication workflow. The case of data publication at GFZ Potsdam displays some specifics, e.g. the DOIDB webservice. The DOIDB is a proxy service at GFZ for the DataCite [4] DOI registration and its metadata store. DOIDB provides a local summary of the dataset DOIs registered through GFZ as a publication agent. An additional use case for the DOIDB is its function to enrich the datacite metadata with additional custom attributes, like a geographic reference in a DIF record. These attributes are at the moment not available in the datacite metadata schema but would be valuable elements for the compilation of data catalogues in the earth sciences and for dissemination of catalogue data via OAI-PMH. [1] http://www.escidoc.org , eSciDoc, FIZ Karlruhe, Germany [2] http://panmetadocs.sf.net , panMetaDocs, GFZ Potsdam, Germany [3] http://metaworks.pangaea.de , panMetaWorks, Dr. R. Huber, MARUM, Univ. Bremen, Germany [4] http://www.datacite.org
MaGnET: Malaria Genome Exploration Tool.

PubMed

Sharman, Joanna L; Gerloff, Dietlind L

2013-09-15

The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.
The Ophidia Stack: Toward Large Scale, Big Data Analytics Experiments for Climate Change

NASA Astrophysics Data System (ADS)

Fiore, S.; Williams, D. N.; D'Anca, A.; Nassisi, P.; Aloisio, G.

2015-12-01

The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a "datacube-oriented" framework responsible for atomically processing and manipulating scientific datasets, by providing a common way to run distributive tasks on large set of data fragments (chunks). Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and On-Line Analytical Processing (OLAP) systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate datacubes (about 50), and array-based primitives (more than 100) to perform data analysis on large scientific data arrays. To address interoperability, Ophidia provides multiple server interfaces (e.g. OGC-WPS). From a client standpoint, a Python interface enables the exploitation of the framework into Python-based eco-systems/applications (e.g. IPython) and the straightforward adoption of a strong set of related libraries (e.g. SciPy, NumPy). The talk will highlight a key feature of the Ophidia framework stack: the "Analytics Workflow Management System" (AWfMS). The Ophidia AWfMS coordinates, orchestrates, optimises and monitors the execution of multiple scientific data analytics and visualization tasks, thus supporting "complex analytics experiments". Some real use cases related to the CMIP5 experiment will be discussed. In particular, with regard to the "Climate models intercomparison data analysis" case study proposed in the EU H2020 INDIGO-DataCloud project, workflows related to (i) anomalies, (ii) trend, and (iii) climate change signal analysis will be presented. Such workflows will be distributed across multiple sites - according to the datasets distribution - and will include intercomparison, ensemble, and outlier analysis. The two-level workflow solution envisioned in INDIGO (coarse grain for distributed tasks orchestration, and fine grain, at the level of a single data analytics cluster instance) will be presented and discussed.
FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web.

PubMed

Probst, Daniel; Reymond, Jean-Louis

2018-04-15

During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing and summarizing enormous amounts of data, the results do not allow for a visual inspection of the entire data. Current scientific software, including R packages and Python libraries such as ggplot2, matplotlib and plot.ly, do not support interactive visualizations of datasets exceeding 100 000 data points on the web. Other solutions enable the web-based visualization of big data only through data reduction or statistical representations. However, recent hardware developments, especially advancements in graphical processing units, allow for the rendering of millions of data points on a wide range of consumer hardware such as laptops, tablets and mobile phones. Similar to the challenges and opportunities brought to virtually every scientific field by big data, both the visualization of and interaction with copious amounts of data are both demanding and hold great promise. Here we present FUn, a framework consisting of a client (Faerun) and server (Underdark) module, facilitating the creation of web-based, interactive 3D visualizations of large datasets, enabling record level visual inspection. We also introduce a reference implementation providing access to SureChEMBL, a database containing patent information on more than 17 million chemical compounds. The source code and the most recent builds of Faerun and Underdark, Lore.js and the data preprocessing toolchain used in the reference implementation, are available on the project website (http://doc.gdb.tools/fun/). daniel.probst@dcb.unibe.ch or jean-louis.reymond@dcb.unibe.ch.
Blood is Thicker Than Water: Geography and the Dispersal of Family Characteristics Across the Globe.

PubMed

Carmichael, Sarah; Rijpma, Auke

2017-04-01

This article introduces a new dataset of historical family characteristics based on ethnographic literature. The novelty of the dataset lies in the fact that it is constructed at the level of the ethnic group. To test the possibilities of the dataset, we construct a measure of family constraints on women's agency from it and explore its correlation to a number of geographical factors.
Time-Series Analysis: A Cautionary Tale

NASA Technical Reports Server (NTRS)

Damadeo, Robert

2015-01-01

Time-series analysis has often been a useful tool in atmospheric science for deriving long-term trends in various atmospherically important parameters (e.g., temperature or the concentration of trace gas species). In particular, time-series analysis has been repeatedly applied to satellite datasets in order to derive the long-term trends in stratospheric ozone, which is a critical atmospheric constituent. However, many of the potential pitfalls relating to the non-uniform sampling of the datasets were often ignored and the results presented by the scientific community have been unknowingly biased. A newly developed and more robust application of this technique is applied to the Stratospheric Aerosol and Gas Experiment (SAGE) II version 7.0 ozone dataset and the previous biases and newly derived trends are presented.
Interactive exploration of coastal restoration modeling in virtual environments

NASA Astrophysics Data System (ADS)

Gerndt, Andreas; Miller, Robert; Su, Simon; Meselhe, Ehab; Cruz-Neira, Carolina

2009-02-01

Over the last decades, Louisiana has lost a substantial part of its coastal region to the Gulf of Mexico. The goal of the project depicted in this paper is to investigate the complex ecological and geophysical system not only to find solutions to reverse this development but also to protect the southern landscape of Louisiana for disastrous impacts of natural hazards like hurricanes. This paper sets a focus on the interactive data handling of the Chenier Plain which is only one scenario of the overall project. The challenge addressed is the interactive exploration of large-scale time-depending 2D simulation results and of terrain data with a high resolution that is available for this region. Besides data preparation, efficient visualization approaches optimized for the usage in virtual environments are presented. These are embedded in a complex framework for scientific visualization of time-dependent large-scale datasets. To provide a straightforward interface for rapid application development, a software layer called VRFlowVis has been developed. Several architectural aspects to encapsulate complex virtual reality aspects like multi-pipe vs. cluster-based rendering are discussed. Moreover, the distributed post-processing architecture is investigated to prove its efficiency for the geophysical domain. Runtime measurements conclude this paper.
Environmental Data-Driven Inquiry and Exploration (EDDIE)- Water Focused Modules for interacting with Big Hydrologic Data

NASA Astrophysics Data System (ADS)

Meixner, T.; Gougis, R.; O'Reilly, C.; Klug, J.; Richardson, D.; Castendyk, D.; Carey, C.; Bader, N.; Stomberg, J.; Soule, D. C.

2016-12-01

High-frequency sensor data are driving a shift in the Earth and environmental sciences. The availability of high-frequency data creates an engagement opportunity for undergraduate students in primary research by using large, long-term, and sensor-based, data directly in the scientific curriculum. Project EDDIE (Environmental Data-Driven Inquiry & Exploration) has developed flexible classroom activity modules designed to meet a series of pedagogical goals that include (1) developing skills required to manipulate large datasets at different scales to conduct inquiry-based investigations; (2) developing students' reasoning about statistical variation; and (3) fostering accurate student conceptions about the nature of environmental science. The modules cover a wide range of topics, including lake physics and metabolism, stream discharge, water quality, soil respiration, seismology, and climate change. In this presentation we will focus on a sequence of modules of particular interest to hydrologists - stream discharge, water quality and nutrient loading. Assessment results show that our modules are effective at making students more comfortable analyzing data, improved understanding of statistical concepts, and stronger data analysis capability. This project is funded by an NSF TUES grant (NSF DEB 1245707).
Exploring the Determinants of the Perceived Risk of Food Allergies in Canada

PubMed Central

Harrington, Daniel W.; Elliott, Susan J.; Clarke, Ann E.; Ben-Shoshan, Moshe; Godefroy, Samuel

2012-01-01

Food allergies are emerging health risks in much of the Western world, and some evidence suggests prevalence is increasing. Despite lacking scientific consensus around prevalence and management, policies and regulations are being implemented in public spaces (e.g., schools). These policies have been criticized as extreme in the literature, in the media, and by the non-allergic population. Backlash appears to be resulting from different perceptions of risk between different groups. This article uses a recently assembled national dataset (n = 3,666) to explore how Canadians perceive the risks of food allergy. Analyses revealed that almost 20% self-report having an allergic person in the household, while the average respondent estimated the prevalence of food allergies in Canada to be 30%. Both of these measures overestimate the true clinically defined prevalence (7.5%), indicating an inflated public understanding of the risks of food allergies. Seventy percent reported food allergies to be substantial risks to the Canadian population. Multivariate logistic regression models revealed important determinants of risk perception including demographic, experience-based, attitudinal, and regional predictors. Results are discussed in terms of understanding emerging health risks in the post-industrial era, and implications for both policy and risk communication. PMID:23172987
Exploring Venus: the Venus Exploration Analysis Group (VEXAG)

NASA Astrophysics Data System (ADS)

Ocampo, A.; Atreya, S.; Thompson, T.; Luhmann, J.; Mackwell, S.; Baines, K.; Cutts, J.; Robinson, J.; Saunders, S.

In July 2005 NASA s Planetary Division established the Venus Exploration Analysis Group VEXAG http www lpi usra edu vexag in order to engage the scientific community at large in identifying scientific priorities and strategies for the exploration of Venus VEXAG is a community-based forum open to all interested in the exploration of Venus VEXAG was designed to provide scientific input and technology development plans for planning and prioritizing the study of Venus over the next several decades including a Venus surface sample return VEXAG regularly evaluates NASA s Venus exploration goals scientific objectives investigations and critical measurement requirements including the recommendations in the National Research Council Decadal Survey and NASA s Solar System Exploration Strategic Roadmap VEXAG will take into consideration the latest scientific results from ESA s Venus Express mission and the MESSENGER flybys as well as the results anticipated from JAXA s Venus Climate Orbiter together with science community inputs from venues such as the February 13-16 2006 AGU Chapman Conference to identify the scientific priorities and strategies for future NASA Venus exploration VEXAG is composed of two co-chairs Sushil Atreya University of Michigan Ann Arbor and Janet Luhmann University of California Berkeley VEXAG has formed three focus groups in the areas of 1 Planetary Formation and Evolution Surface and Interior Volcanism Geodynamics etc Focus Group Lead Steve Mackwell LPI 2 Atmospheric Evolution Dynamics Meteorology
Scientific drilling projects in ancient lakes: Integrating geological and biological histories

NASA Astrophysics Data System (ADS)

Wilke, Thomas; Wagner, Bernd; Van Bocxlaer, Bert; Albrecht, Christian; Ariztegui, Daniel; Delicado, Diana; Francke, Alexander; Harzhauser, Mathias; Hauffe, Torsten; Holtvoeth, Jens; Just, Janna; Leng, Melanie J.; Levkov, Zlatko; Penkman, Kirsty; Sadori, Laura; Skinner, Alister; Stelbrink, Björn; Vogel, Hendrik; Wesselingh, Frank; Wonik, Thomas

2016-08-01

Sedimentary sequences in ancient or long-lived lakes can reach several thousands of meters in thickness and often provide an unrivalled perspective of the lake's regional climatic, environmental, and biological history. Over the last few years, deep-drilling projects in ancient lakes became increasingly multi- and interdisciplinary, as, among others, seismological, sedimentological, biogeochemical, climatic, environmental, paleontological, and evolutionary information can be obtained from sediment cores. However, these multi- and interdisciplinary projects pose several challenges. The scientists involved typically approach problems from different scientific perspectives and backgrounds, and setting up the program requires clear communication and the alignment of interests. One of the most challenging tasks, besides the actual drilling operation, is to link diverse datasets with varying resolution, data quality, and age uncertainties to answer interdisciplinary questions synthetically and coherently. These problems are especially relevant when secondary data, i.e., datasets obtained independently of the drilling operation, are incorporated in analyses. Nonetheless, the inclusion of secondary information, such as isotopic data from fossils found in outcrops or genetic data from extant species, may help to achieve synthetic answers. Recent technological and methodological advances in paleolimnology are likely to increase the possibilities of integrating secondary information. Some of the new approaches have started to revolutionize scientific drilling in ancient lakes, but at the same time, they also add a new layer of complexity to the generation and analysis of sediment-core data. The enhanced opportunities presented by new scientific approaches to study the paleolimnological history of these lakes, therefore, come at the expense of higher logistic, communication, and analytical efforts. Here we review types of data that can be obtained in ancient lake drilling projects and the analytical approaches that can be applied to empirically and statistically link diverse datasets to create an integrative perspective on geological and biological data. In doing so, we highlight strengths and potential weaknesses of new methods and analyses, and provide recommendations for future interdisciplinary deep-drilling projects.
The Ophidia framework: toward cloud-based data analytics for climate change

NASA Astrophysics Data System (ADS)

Fiore, Sandro; D'Anca, Alessandro; Elia, Donatello; Mancini, Marco; Mariello, Andrea; Mirto, Maria; Palazzo, Cosimo; Aloisio, Giovanni

2015-04-01

The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in the climate change domain. It provides parallel (server-side) data analysis, an internal storage model and a hierarchical data organization to manage large amount of multidimensional scientific data. The Ophidia analytics platform provides several MPI-based parallel operators to manipulate large datasets (data cubes) and array-based primitives to perform data analysis on large arrays of scientific data. The most relevant data analytics use cases implemented in national and international projects target fire danger prevention (OFIDIA), interactions between climate change and biodiversity (EUBrazilCC), climate indicators and remote data analysis (CLIP-C), sea situational awareness (TESSA), large scale data analytics on CMIP5 data in NetCDF format, Climate and Forecast (CF) convention compliant (ExArch). Two use cases regarding the EU FP7 EUBrazil Cloud Connect and the INTERREG OFIDIA projects will be presented during the talk. In the former case (EUBrazilCC) the Ophidia framework is being extended to integrate scalable VM-based solutions for the management of large volumes of scientific data (both climate and satellite data) in a cloud-based environment to study how climate change affects biodiversity. In the latter one (OFIDIA) the data analytics framework is being exploited to provide operational support regarding processing chains devoted to fire danger prevention. To tackle the project challenges, data analytics workflows consisting of about 130 operators perform, among the others, parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, import/export of datasets in NetCDF format. Finally, the entire Ophidia software stack has been deployed at CMCC on 24-nodes (16-cores/node) of the Athena HPC cluster. Moreover, a cloud-based release tested with OpenNebula is also available and running in the private cloud infrastructure of the CMCC Supercomputing Centre.
a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

NASA Astrophysics Data System (ADS)

Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

2017-09-01

Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.
Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

PubMed

McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

2017-01-01

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers. © 2016 New York Academy of Sciences.

Local Sparse Bump Hunting

PubMed Central

Dazard, Jean-Eudes; Rao, J. Sunil

2010-01-01

The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-specifying their total number. A number of related methods already exist, yet are challenged in the context of high dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive non-parametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer micro-array dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online. PMID:22399839
Analysis of lipid experiments (ALEX): a software framework for analysis of high-resolution shotgun lipidomics data.

PubMed

Husen, Peter; Tarasov, Kirill; Katafiasz, Maciej; Sokol, Elena; Vogt, Johannes; Baumgart, Jan; Nitsch, Robert; Ekroos, Kim; Ejsing, Christer S

2013-01-01

Global lipidomics analysis across large sample sizes produces high-content datasets that require dedicated software tools supporting lipid identification and quantification, efficient data management and lipidome visualization. Here we present a novel software-based platform for streamlined data processing, management and visualization of shotgun lipidomics data acquired using high-resolution Orbitrap mass spectrometry. The platform features the ALEX framework designed for automated identification and export of lipid species intensity directly from proprietary mass spectral data files, and an auxiliary workflow using database exploration tools for integration of sample information, computation of lipid abundance and lipidome visualization. A key feature of the platform is the organization of lipidomics data in "database table format" which provides the user with an unsurpassed flexibility for rapid lipidome navigation using selected features within the dataset. To demonstrate the efficacy of the platform, we present a comparative neurolipidomics study of cerebellum, hippocampus and somatosensory barrel cortex (S1BF) from wild-type and knockout mice devoid of the putative lipid phosphate phosphatase PRG-1 (plasticity related gene-1). The presented framework is generic, extendable to processing and integration of other lipidomic data structures, can be interfaced with post-processing protocols supporting statistical testing and multivariate analysis, and can serve as an avenue for disseminating lipidomics data within the scientific community. The ALEX software is available at www.msLipidomics.info.
Neural Network Machine Learning and Dimension Reduction for Data Visualization

NASA Technical Reports Server (NTRS)

Liles, Charles A.

2014-01-01

Neural network machine learning in computer science is a continuously developing field of study. Although neural network models have been developed which can accurately predict a numeric value or nominal classification, a general purpose method for constructing neural network architecture has yet to be developed. Computer scientists are often forced to rely on a trial-and-error process of developing and improving accurate neural network models. In many cases, models are constructed from a large number of input parameters. Understanding which input parameters have the greatest impact on the prediction of the model is often difficult to surmise, especially when the number of input variables is very high. This challenge is often labeled the "curse of dimensionality" in scientific fields. However, techniques exist for reducing the dimensionality of problems to just two dimensions. Once a problem's dimensions have been mapped to two dimensions, it can be easily plotted and understood by humans. The ability to visualize a multi-dimensional dataset can provide a means of identifying which input variables have the highest effect on determining a nominal or numeric output. Identifying these variables can provide a better means of training neural network models; models can be more easily and quickly trained using only input variables which appear to affect the outcome variable. The purpose of this project is to explore varying means of training neural networks and to utilize dimensional reduction for visualizing and understanding complex datasets.
Finite Adaptation and Multistep Moves in the Metropolis-Hastings Algorithm for Variable Selection in Genome-Wide Association Analysis

PubMed Central

Peltola, Tomi; Marttinen, Pekka; Vehtari, Aki

2012-01-01

High-dimensional datasets with large amounts of redundant information are nowadays available for hypothesis-free exploration of scientific questions. A particular case is genome-wide association analysis, where variations in the genome are searched for effects on disease or other traits. Bayesian variable selection has been demonstrated as a possible analysis approach, which can account for the multifactorial nature of the genetic effects in a linear regression model. Yet, the computation presents a challenge and application to large-scale data is not routine. Here, we study aspects of the computation using the Metropolis-Hastings algorithm for the variable selection: finite adaptation of the proposal distributions, multistep moves for changing the inclusion state of multiple variables in a single proposal and multistep move size adaptation. We also experiment with a delayed rejection step for the multistep moves. Results on simulated and real data show increase in the sampling efficiency. We also demonstrate that with application specific proposals, the approach can overcome a specific mixing problem in real data with 3822 individuals and 1,051,811 single nucleotide polymorphisms and uncover a variant pair with synergistic effect on the studied trait. Moreover, we illustrate multimodality in the real dataset related to a restrictive prior distribution on the genetic effect sizes and advocate a more flexible alternative. PMID:23166669
Secondary Analysis and Integration of Existing Data to Elucidate the Genetic Architecture of Cancer Risk and Related Outcomes, R21 | Informatics Technology for Cancer Research (ITCR)

Cancer.gov

This funding opportunity announcement (FOA) encourages applications that propose to conduct secondary data analysis and integration of existing datasets and database resources, with the ultimate aim to elucidate the genetic architecture of cancer risk and related outcomes. The goal of this initiative is to address key scientific questions relevant to cancer epidemiology by supporting the analysis of existing genetic or genomic datasets, possibly in combination with environmental, outcomes, behavioral, lifestyle, and molecular profiles data.
Secondary Analysis and Integration of Existing Data to Elucidate the Genetic Architecture of Cancer Risk and Related Outcomes, R01 | Informatics Technology for Cancer Research (ITCR)

Cancer.gov

This funding opportunity announcement (FOA) encourages applications that propose to conduct secondary data analysis and integration of existing datasets and database resources, with the ultimate aim to elucidate the genetic architecture of cancer risk and related outcomes. The goal of this initiative is to address key scientific questions relevant to cancer epidemiology by supporting the analysis of existing genetic or genomic datasets, possibly in combination with environmental, outcomes, behavioral, lifestyle, and molecular profiles data.
Explore Earth Science Datasets for STEM with the NASA GES DISC Online Visualization and Analysis Tool, GIOVANNI

NASA Astrophysics Data System (ADS)

Liu, Z.; Acker, J. G.; Kempler, S. J.

2016-12-01

The NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to research scientists, applications scientists, applications users, and students around the world. The GES DISC is the home (archive) of NASA Precipitation and Hydrology, as well as Atmospheric Composition and Dynamics remote sensing data and information. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI, http://giovanni.gsfc.nasa.gov/) allows users to explore satellite-based data using sophisticated analyses and visualizations without downloading data and software, which is particularly suitable for novices to use NASA datasets in STEM activities. In this presentation, we will briefly introduce GIOVANNI and recommend datasets for STEM. Examples of using these datasets in STEM activities will be presented as well.
Explore Earth Science Datasets for STEM with the NASA GES DISC Online Visualization and Analysis Tool, Giovanni

NASA Technical Reports Server (NTRS)

Liu, Z.; Acker, J.; Kempler, S.

2016-01-01

The NASA Goddard Earth Sciences (GES) Data and Information Services Center(DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to users around the world including research and application scientists, students, citizen scientists, etc. The GESDISC is the home (archive) of remote sensing datasets for NASA Precipitation and Hydrology, Atmospheric Composition and Dynamics, etc. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels in different countries. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (Giovanni, http:giovanni.gsfc.nasa.gov) allows users to explore satellite-based datasets using sophisticated analyses and visualization without downloading data and software, which is particularly suitable for novices (such as students) to use NASA datasets in STEM (science, technology, engineering and mathematics) activities. In this presentation, we will briefly introduce Giovanni along with examples for STEM activities.
SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

PubMed

Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

2011-02-01

Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.
Concept for Future Data Services at the Long-Term Archive of WDCC combining DOIs with common PIDs

NASA Astrophysics Data System (ADS)

Stockhause, Martina; Weigel, Tobias; Toussaint, Frank; Höck, Heinke; Thiemann, Hannes; Lautenschlager, Michael

2013-04-01

The World Data Center for Climate (WDCC) hosted at the German Climate Computing Center (DKRZ) maintains a long-term archive (LTA) of climate model data as well as observational data. WDCC distinguishes between two types of LTA data: Structured data: Data output of an instrument or of a climate model run consists of numerous, highly structured individual datasets in a uniform format. Part of these data is also published on an ESGF (Earth System Grid Federation) data node. Detailed metadata is available allowing for fine-grained user-defined data access. Unstructured data: LTA data of finished scientific projects are in general unstructured and consist of datasets of different formats, different sizes, and different contents. For these data compact metadata is available as content information. The structured data is suitable for WDCC's DataCite DOI process, the project data only in exceptional cases. The DOI process includes a thorough quality control process of technical as well as scientific aspects by the publication agent and the data creator. DOIs are assigned to data collections appropriate to be cited in scientific publications, like a simulation run. The data collection is defined in agreement with the data creator. At the moment there is no possibility to identify and cite individual datasets within this DOI data collection analogous to the citation of chapters in a book. Also missing is a compact citation regulation for a user-specified collection of data. WDCC therefore complements its existing LTA/DOI concept by Persistent Identifier (PID) assignment to datasets using Handles. In addition to data identification for internal and external use, the concept of PIDs allows to define relations among PIDs. Such structural information is stored as key-value pair directly in the handles. Thus, relations provide basic provenance or lineage information, even if part of the data like intermediate results are lost. WDCC intends to use additional PIDs on metadata entities with a relation to the data PID(s). These add background information on the data creation process (e.g. descriptions of experiment, model, model set-up, and platform for the model run etc.) to the data. These pieces of additional information increase the re-usability of the archived model data, significantly. Other valuable additional information for scientific collaboration could be added by the same mechanism, like quality information and annotations. Apart from relations among data and metadata entities, PIDs on collections are advantageous for model data: Collections allow for persistent references to single datasets or subsets of data assigned a DOI, Data objects and additional information objects can be consistently connected via relations (provenance, creation, quality information for data),
Enabling joined-up decision making with geotemporal information

NASA Astrophysics Data System (ADS)

Smith, M. J.; Ahmed, S. E.; Purves, D. W.; Emmott, S.; Joppa, L. N.; Caldararu, S.; Visconti, P.; Newbold, T.; Formica, A. F.

2015-12-01

While the use of geospatial data to assist in decision making is becoming increasingly common, the use of geotemporal information: information that can be indexed by geographical space AND time, is much rarer. I will describe our scientific research and software development efforts intended to advance the availability and use of geotemporal information in general. I will show two recent examples of "stacking" geotemporal information to support land use decision making in the Brazilian Amazon and Kenya, involving data-constrained predictive models and empirically derived datasets of road development, deforestation, carbon, agricultural yields, water purification and poverty alleviation services and will show how we use trade-off analyses and constraint reasoning algorithms to explore the costs and benefits of different decisions. For the Brazilian Amazon we explore tradeoffs involved in different deforestation scenarios, while for Kenya we explore the impacts of conserving forest to support international carbon conservation initiatives (REDD+). I will also illustrate the cloud-based software tools we have developed to enable anyone to access geotemporal information, gridded (e.g. climate) or non-gridded (e.g. protected areas), for the past, present or future and incorporate such information into their analyses (e.g. www.fetchclimate.org), including how we train new predictive models to such data using Bayesian techniques: on this latter point I will show how we combine satellite and ground measured data with predictive models to forecast how crops might respond to climate change.
Statistical tests and identifiability conditions for pooling and analyzing multisite datasets

PubMed Central

Zhou, Hao Henry; Singh, Vikas; Johnson, Sterling C.; Wahba, Grace

2018-01-01

When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. However, variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions where we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable. We analyze various properties of the framework for testing model fit, constructing confidence intervals, and evaluating consistency characteristics. Our technical development is motivated by Alzheimer’s disease (AD) studies, and we present empirical results showing that our framework enables harmonizing of protein biomarkers, even when the assays across sites differ. Our contribution may, in part, mitigate a bottleneck that researchers face in clinical research when pooling smaller sized datasets and may offer benefits when the subjects of interest are difficult to recruit or when resources prohibit large single-site studies. PMID:29386387
DATS, the data tag suite to enable discoverability of datasets.

PubMed

Sansone, Susanna-Assunta; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Alter, George; Grethe, Jeffrey S; Xu, Hua; Fore, Ian M; Lyle, Jared; Gururaj, Anupama E; Chen, Xiaoling; Kim, Hyeon-Eui; Zong, Nansu; Li, Yueling; Liu, Ruiling; Ozyurt, I Burak; Ohno-Machado, Lucila

2017-06-06

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.
Significantly reducing the processing times of high-speed photometry data sets using a distributed computing model

NASA Astrophysics Data System (ADS)

Doyle, Paul; Mtenzi, Fred; Smith, Niall; Collins, Adrian; O'Shea, Brendan

2012-09-01

The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the "ACN pipeline", a distributed pipeline spanning multiple academic institutes.
Bayesian modeling to assess populated areas impacted by radiation from Fukushima

NASA Astrophysics Data System (ADS)

Hultquist, C.; Cervone, G.

2017-12-01

Citizen-led movements producing spatio-temporal big data are increasingly important sources of information about populations that are impacted by natural disasters. Citizen science can be used to fill gaps in disaster monitoring data, in addition to inferring human exposure and vulnerability to extreme environmental impacts. As a response to the 2011 release of radiation from Fukushima, Japan, the Safecast project began collecting open radiation data which grew to be a global dataset of over 70 million measurements to date. This dataset is spatially distributed primarily where humans are located and demonstrates abnormal patterns of population movements as a result of the disaster. Previous work has demonstrated that Safecast is highly correlated in comparison to government radiation observations. However, there is still a scientific need to understand the geostatistical variability of Safecast data and to assess how reliable the data are over space and time. The Bayesian hierarchical approach can be used to model the spatial distribution of datasets and flexibly integrate new flows of data without losing previous information. This enables an understanding of uncertainty in the spatio-temporal data to inform decision makers on areas of high levels of radiation where populations are located. Citizen science data can be scientifically evaluated and used as a critical source of information about populations that are impacted by a disaster.
On-the-fly segmentation approaches for x-ray diffraction datasets for metallic glasses

DOE PAGES

Ren, Fang; Williams, Travis; Hattrick-Simpers, Jason; ...

2017-08-30

Investment in brighter sources and larger detectors has resulted in an explosive rise in the data collected at synchrotron facilities. Currently, human experts extract scientific information from these data, but they cannot keep pace with the rate of data collection. Here, we present three on-the-fly approaches—attribute extraction, nearest-neighbor distance, and cluster analysis—to quickly segment x-ray diffraction (XRD) data into groups with similar XRD profiles. An expert can then analyze representative spectra from each group in detail with much reduced time, but without loss of scientific insights. As a result, on-the-fly segmentation would, therefore, result in accelerated scientific productivity.
International Soil Carbon Network (ISCN) Database v3-1

DOE Data Explorer

Nave, Luke [University of Michigan] (ORCID:0000000182588335); Johnson, Kris [USDA-Forest Service; van Ingen, Catharine [Microsoft Research; Agarwal, Deborah [Lawrence Berkeley National Laboratory] (ORCID:0000000150452396); Humphrey, Marty [University of Virginia; Beekwilder, Norman [University of Virginia

2016-01-01

The ISCN is an international scientific community devoted to the advancement of soil carbon research. The ISCN manages an open-access, community-driven soil carbon database. This is version 3-1 of the ISCN Database, released in December 2015. It gathers 38 separate dataset contributions, totalling 67,112 sites with data from 71,198 soil profiles and 431,324 soil layers. For more information about the ISCN, its scientific community and resources, data policies and partner networks visit: http://iscn.fluxdata.org/.
Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories.

PubMed

Jong, Victor L; Novianti, Putri W; Roes, Kit C B; Eijkemans, Marinus J C

2014-12-01

The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box's M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.
Challenges and Best Practices for the Curation and Publication of Long-Tail Data with GFZ Data Services

NASA Astrophysics Data System (ADS)

Elger, Kirsten; Ulbricht, Damian; Bertelmann, Roland

2017-04-01

Open access to research data is an increasing international request and includes not only data underlying scholarly publication, but also raw and curated data. Especially in the framework of the observed shift in many scientific fields towards data science and data mining, data repositories are becoming important player as data archives and access point to curated research data. While general and institutional data repositories are available across all scientific disciplines, domain-specific data repositories are specialised for scientific disciplines, like, e.g., bio- or geosciences, with the possibility to use more discipline-specific and richer metadata models than general repositories. Data publication is increasingly regarded as important scientific achievement, and datasets with digital object identifier (DOI) are now fully citable in journal articles. Moreover, following in their signature of the "Statement of Commitment of the Coalition on Publishing Data in the Earth and Space Sciences" (COPDESS), many publishers have adopted their data policies and recommend and even request to store and publish data underlying scholarly publications in (domain-specific) data repositories and not as classical supplementary material directly attached to the respective article. The curation of large dynamic data from global networks in, e.g., seismology, magnetics or geodesy, always required a high grade of professional, IT-supported data management, simply to be able to store and access the huge number of files and manage dynamic datasets. In contrast to these, the vast amount of research data acquired by individual investigators or small teams known as 'long-tail data' was often not the focus for the development of data curation infrastructures. Nevertheless, even though they are small in size and highly variable, in total they represent a significant portion of the total scientific outcome. The curation of long-tail data requires more individual approaches and personal involvement of the data curator, especially regarding the data description. Here we will introduce best practices for the publication of long-tail data that are helping to reduce the individual effort, improve the quality of the data description. The data repository of GFZ Data Services, which is hosted at GFZ German Research Centre for Geosciences in Potsdam, is a domain-specific data repository for geosciences. In addition to large dynamic datasets from different disciplines, it has a large focus on the DOI-referenced publication of long-tail data with the aim to reach a high grade of reusability through a comprehensive data description and in the same time provide and distribute standardised, machine actionable metadata for data discovery (FAIR data). The development of templates for data reports, metadata provision by scientists via an XML Metadata Editor and discipline-specific DOI landing pages are helping both, the data curators to handle all kinds of datasets and enabling the scientists, i.e. user, to quickly decide whether a published dataset is fulfilling their needs. In addition, GFZ Data Services have developed DOI-registration services for several international networks (e.g. ICGEM, World Stress Map, IGETS, etc.). In addition, we have developed project-or network-specific designs of the DOI landing pages with the logo or design of the networks or project
Review of NASA's Planned Mars Program

NASA Technical Reports Server (NTRS)

1996-01-01

Contents include the following: Executive Summary; Introduction; Scientific Goals for the Exploration of Mars; Overview of Mars Surveyor and Others Mars Missions; Key Issues for NASA's Mars Exploration Program; and Assessment of the Scientific Potential of NASA's Mars Exploration Program.

The AMMA information system

NASA Astrophysics Data System (ADS)

Brissebrat, Guillaume; Fleury, Laurence; Boichard, Jean-Luc; Cloché, Sophie; Eymard, Laurence; Mastrorillo, Laurence; Moulaye, Oumarou; Ramage, Karim; Asencio, Nicole; Favot, Florence; Roussot, Odile

2013-04-01

The AMMA information system aims at expediting data and scientific results communication inside the AMMA community and beyond. It has already been adopted as the data management system by several projects and is meant to become a reference information system about West Africa area for the whole scientific community. The AMMA database and the associated on line tools have been developed and are managed by two French teams (IPSL Database Centre, Palaiseau and OMP Data Service, Toulouse). The complete system has been fully duplicated and is operated by AGRHYMET Regional Centre in Niamey, Niger. The AMMA database contains a wide variety of datasets: - about 250 local observation datasets, that cover geophysical components (atmosphere, ocean, soil, vegetation) and human activities (agronomy, health...) They come from either operational networks or scientific experiments, and include historical data in West Africa from 1850; - 1350 outputs of a socio-economics questionnaire; - 60 operational satellite products and several research products; - 10 output sets of meteorological and ocean operational models and 15 of research simulations. Database users can access all the data using either the portal http://database.amma-international.org or http://amma.agrhymet.ne/amma-data. Different modules are available. The complete catalogue enables to access metadata (i.e. information about the datasets) that are compliant with the international standards (ISO19115, INSPIRE...). Registration pages enable to read and sign the data and publication policy, and to apply for a user database account. The data access interface enables to easily build a data extraction request by selecting various criteria like location, time, parameters... At present, the AMMA database counts more than 740 registered users and process about 80 data requests every month In order to monitor day-to-day meteorological and environment information over West Africa, some quick look and report display websites have been developed. They met the operational needs for the observational teams during the AMMA 2006 (http://aoc.amma-international.org) and FENNEC 2011 (http://fenoc.sedoo.fr) campaigns. But they also enable scientific teams to share physical indices along the monsoon season (http://misva.sedoo.fr from 2011). A collaborative WIKINDX tool has been set on line in order to manage scientific publications and communications of interest to AMMA (http://biblio.amma-international.org). Now the bibliographic database counts about 1200 references. It is the most exhaustive document collection about African Monsoon available for all. Every scientist is invited to make use of the different AMMA on line tools and data. Scientists or project leaders who have data management needs for existing or future datasets over West Africa are welcome to use the AMMA database framework and to contact ammaAdmin@sedoo.fr .
Efficient genotype compression and analysis of large genetic variation datasets

PubMed Central

Layer, Ryan M.; Kindlon, Neil; Karczewski, Konrad J.; Quinlan, Aaron R.

2015-01-01

Genotype Query Tools (GQT) is a new indexing strategy that expedites analyses of genome variation datasets in VCF format based on sample genotypes, phenotypes and relationships. GQT’s compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size. We show substantial (up to 443 fold) performance gains over existing methods and demonstrate GQT’s utility for exploring massive datasets involving thousands to millions of genomes. PMID:26550772
Hunting for Habitable Worlds: Engaging Students in an Adaptive Online Setting

NASA Astrophysics Data System (ADS)

Horodyskyj, L.; Ben-Naim, D.; Anbar, A. D.; Semken, S. C.

2011-12-01

The field of astrobiology, through its breadth of scope and high level of public interest, offers a unique prospect for introductory science curricula, particularly at the undergraduate level. Traditional university-level science instruction consists of lectures and accompanying lab courses that are highly scripted to emphasize correct replication of results rather than inquiry-driven exploration. These methodologies give students the impression that science is an authoritative list of abstract concepts and experimental results requiring memorization, rather than a methodology for narrowing uncertainties in our knowledge. Additionally, this particular class structure does not take advantage of many new and emerging online multimedia technologies. To address the shortcomings of current pedagogical approaches, we adapted the Arizona State University introductory-level course "Habitable Worlds" for online delivery in the fall semester of 2011. This course is built around the Drake Equation, which allows us to introduce non-science students to the basics of scientific thought and methodology while exploring disciplines as diverse as astronomy, geology, biology, and sustainability in an integrated manner. The online version of this course is structured around a habitable-worlds-hunting quest, where each student is provided with an individualized universe and tasked with finding scientifically realistic computer-generated inhabited planets around realistic stars. In order to successfully complete this mission, students work their way through the course curriculum via interactive exercises that focus on the discovery of basic scientific concepts followed by the mathematics and models that explain them, hence inverting the lecture-lab paradigm. The "Habitable Worlds" course is built on the Adaptive eLearning Platform (AeLP), an innovative educational technology that provides a "tutor over the shoulder" learning experience for students. Our focus is on engaging students with rich interactions (such as data collection using Google Earth, virtual field trips, and interactive simulations) while providing them with intelligent and adaptive feedback and lesson structure. As such, advanced students proceed quickly and are kept engaged, while students with difficulty receive the appropriate remediation and support they need. The AeLP's analytics engine allows instructors to explore large datasets of students' interaction, and assists in identifying problematic concepts or flaws in instructional design. Subsequently, instructors can further adapt and improve the content to their students' specific needs.
Open Core Data: Connecting scientific drilling data to scientists and community data resources

NASA Astrophysics Data System (ADS)

Fils, D.; Noren, A. J.; Lehnert, K.; Diver, P.

2016-12-01

Open Core Data (OCD) is an innovative, efficient, and scalable infrastructure for data generated by scientific drilling and coring to improve discoverability, accessibility, citability, and preservation of data from the oceans and continents. OCD is building on existing community data resources that manage, store, publish, and preserve scientific drilling data, filling a critical void that currently prevents linkages between these and other data systems and tools to realize the full potential of data generated through drilling and coring. We are developing this functionality through Linked Open Data (LOD) and semantic patterns that enable data access through the use of community ontologies such as GeoLink (geolink.org, an EarthCube Building Block), a collection of protocols, formats and vocabularies from a set of participating geoscience repositories. Common shared concepts of classes such as cruise, dataset, person and others allow easier resolution of common references through shared resource IDs. These graphs are then made available via SPARQL as well as incorporated into web pages following schema.org approaches. Additionally the W3C PROV vocabulary is under evaluation for use for documentation of provenance. Further, the application of persistent identifiers for samples (IGSNs); datasets, expeditions, and projects (DOIs); and people (ORCIDs), combined with LOD approaches, provides methods to resolve and incorporate metadata and datasets. Application Program Interfaces (APIs) complement these semantic approaches to the OCD data holdings. APIs are exposed following the Swagger guidelines (swagger.io) and will be evolved into the OpenAPI (openapis.org) approach. Currently APIs are in development for the NSF funded Flyover Country mobile geoscience app (fc.umn.edu), the Neotoma Paleoecology Database (neotomadb.org), Magnetics Information Consortium (MagIC; earthref.org/MagIC), and other community tools and data systems, as well as for internal OCD use.
Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure

PubMed Central

Badger, Jonathan; LaRose, Eric; Shirzadi, Ehsan; Mahnke, Andrea; Mayer, John; Ye, Zhan; Page, David; Peissig, Peggy

2017-01-01

Background The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. Objective The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. Methods We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. Results The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. Conclusions To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. PMID:29222076
Building Bridges Between IPY Scientists and the Educational Community: A Spectrum of IPY Education and Outreach Activities

NASA Astrophysics Data System (ADS)

Ledley, T. S.; Dahlman, L.; McAuliffe, C.; Domenico, B.; Taber, M. R.

2006-12-01

The International Polar Year is an opportunity to simultaneously increase our scientific understanding of the polar regions and to engage the next generation of Earth scientists and socially responsible citizens. However, building the bridge between the scientific community who conduct the research and the education community who convey that information to students requires specific and continuing efforts. The Earth Exploration Toolbook (EET, http://serc.carleton.edu/eet) and the accompanying spectrum of activities encompassing development of materials that can provide access and understanding of IPY data and knowledge, and teacher professional development to facilitate the effective use of these materials with students can help build that bridge. The EET is an online resource that provides an easy way for educators to learn how to use Earth science datasets and data analysis tools to convey science concepts. Modules (called chapters) in the EET provide step-by-step instructions for accessing and analyzing these datasets within compelling case studies, and provide pedagogical information to help the educator use the data with their students. New EET chapters, featuring IPY data, can be developed through the use of an EET chapter template that standardizes the content and structure of the chapter. The initiation of new chapters can be facilitated through our Data in Education Workshops (previously DLESE Data Services Workshops, http://swiki.dlese.org/2006- dataservicesworkshop/). During these workshops IPY data providers, analysis tool specialists, IPY scientists, curriculum developers, and educators participate on teams of 5-6 members to create an outline of a new EET chapter featuring the IPY data and analysis tools represented on the team. New chapters will be completed by a curriculum developer following the workshop. Use of the IPY EET chapters will be facilitated by a range of professional development activities ranging from two 2-hour telecon-online workshops over the period of a month, to a year long professional development program that includes telecon-online workshops, a two-week summer workshop, follow-up online discussions and one-day meetings. In this paper we will discuss the EET and the spectrum of activities that can facilitate building a bridge between the IPY scientific community and future scientists and socially responsible citizens.
Using the VEPP website in a Master of Education in Earth Sciences course (Invited)

NASA Astrophysics Data System (ADS)

Richardson, E.

2010-12-01

Secondary science teachers are better able to transmit the excitement of the process of science when they have access to real-time or near-real-time datasets. Large digital databases are ubiquitous in many subfields of the geosciences; the experience of working with such data is valuable as an authentic teaching and learning tool. In Penn State’s Master of Education in Earth Sciences program, course activities are carefully designed keeping in mind the twin goals of the program: teachers will participate in the process of science by interacting with genuine scientific data, and teachers will observe the process of science by reading and discussing scientific papers. A second objective is for teachers to be able to repurpose any activities and datasets for their own classrooms. Therefore, course activities must use openly available data in a format requiring little or no pre-processing. Here I present an example of such an activity involving near-real-time data made available by the Volcano Exploration Project at Pu’u O’o (VEPP). It is designed as a problem set housed in a week-long lesson concerning volcanic eruptions. Students read a paper in which recent volcanic activity at Kilauea’s east flank is described based on observations from several instruments. They use the figures and data presented in the paper to predict hypothetical instrument responses to certain volcanic activities, and calculate the rate of magma movement based on measured seismicity. Next, students must interact with the web-based VALVE software package available at the VEPP website which allows them to visualize several kinds of geophysical data sources collected at Pu’u O’o. Their assignment is to discover and describe a recent deflation-inflation caldera event recorded simultaneously by seismometers, GPS stations, and tiltmeters. The course in which this problem set is given has been taught twice since this activity was designed: spring and summer semesters 2010. Students were able to interact effectively with the VEPP website as well as the VALVE3 software. They reported in an informal poll that their experience working with this data enabled them to convey to their own students the importance of making scientific observations with a variety of instruments that work in concert to monitor a system of interest.
A global dataset of crowdsourced land cover and land use reference data.

PubMed

Fritz, Steffen; See, Linda; Perger, Christoph; McCallum, Ian; Schill, Christian; Schepaschenko, Dmitry; Duerauer, Martina; Karner, Mathias; Dresel, Christopher; Laso-Bayas, Juan-Carlos; Lesiv, Myroslava; Moorthy, Inian; Salk, Carl F; Danylo, Olha; Sturn, Tobias; Albrecht, Franziska; You, Liangzhi; Kraxner, Florian; Obersteiner, Michael

2017-06-13

Global land cover is an essential climate variable and a key biophysical driver for earth system models. While remote sensing technology, particularly satellites, have played a key role in providing land cover datasets, large discrepancies have been noted among the available products. Global land use is typically more difficult to map and in many cases cannot be remotely sensed. In-situ or ground-based data and high resolution imagery are thus an important requirement for producing accurate land cover and land use datasets and this is precisely what is lacking. Here we describe the global land cover and land use reference data derived from the Geo-Wiki crowdsourcing platform via four campaigns. These global datasets provide information on human impact, land cover disagreement, wilderness and land cover and land use. Hence, they are relevant for the scientific community that requires reference data for global satellite-derived products, as well as those interested in monitoring global terrestrial ecosystems in general.
Robust continuous clustering

PubMed Central

Shah, Sohil Atul

2017-01-01

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
A global dataset of crowdsourced land cover and land use reference data

PubMed Central

Fritz, Steffen; See, Linda; Perger, Christoph; McCallum, Ian; Schill, Christian; Schepaschenko, Dmitry; Duerauer, Martina; Karner, Mathias; Dresel, Christopher; Laso-Bayas, Juan-Carlos; Lesiv, Myroslava; Moorthy, Inian; Salk, Carl F.; Danylo, Olha; Sturn, Tobias; Albrecht, Franziska; You, Liangzhi; Kraxner, Florian; Obersteiner, Michael

2017-01-01

Global land cover is an essential climate variable and a key biophysical driver for earth system models. While remote sensing technology, particularly satellites, have played a key role in providing land cover datasets, large discrepancies have been noted among the available products. Global land use is typically more difficult to map and in many cases cannot be remotely sensed. In-situ or ground-based data and high resolution imagery are thus an important requirement for producing accurate land cover and land use datasets and this is precisely what is lacking. Here we describe the global land cover and land use reference data derived from the Geo-Wiki crowdsourcing platform via four campaigns. These global datasets provide information on human impact, land cover disagreement, wilderness and land cover and land use. Hence, they are relevant for the scientific community that requires reference data for global satellite-derived products, as well as those interested in monitoring global terrestrial ecosystems in general. PMID:28608851
Enhanced systems for measuring and monitoring REDD+: Opportunities to improve the accuracy of emission factor and activity data in Indonesia

NASA Astrophysics Data System (ADS)

Solichin

The importance of accurate measurement of forest biomass in Indonesia has been growing ever since climate change mitigation schemes, particularly the reduction of emissions from deforestation and forest degradation scheme (known as REDD+), were constitutionally accepted by the government of Indonesia. The need for an accurate system of historical and actual forest monitoring has also become more pronounced, as such a system would afford a better understanding of the role of forests in climate change and allow for the quantification of the impact of activities implemented to reduce greenhouse gas emissions. The aim of this study was to enhance the accuracy of estimations of carbon stocks and to monitor emissions in tropical forests. The research encompassed various scales (from trees and stands to landscape-sized scales) and a wide range of aspects, from evaluation and development of allometric equations to exploration of the potential of existing forest inventory databases and evaluation of cutting-edge technology for non-destructive sampling and accurate forest biomass mapping over large areas. In this study, I explored whether accuracy--especially regarding the identification and reduction of bias--of forest aboveground biomass (AGB) estimates in Indonesia could be improved through (1) development and refinement of allometric equations for major forest types, (2) integration of existing large forest inventory datasets, (3) assessing nondestructive sampling techniques for tree AGB measurement, and (4) landscape-scale mapping of AGB and forest cover using lidar. This thesis provides essential foundations to improve the estimation of forest AGB at tree scale through development of new AGB equations for several major forest types in Indonesia. I successfully developed new allometric equations using large datasets from various forest types that enable us to estimate tree aboveground biomass for both forest type specific and generic equations. My models outperformed the existing local equations, with lower bias and higher precision of the AGB estimates. This study also highlights the potential advantages and challenges of using terrestrial lidar and the acoustic velocity tool for non-destructive sampling of tree biomass to enable more sample collection without the felling of trees. Further, I explored whether existing forest inventories and permanent sample plot datasets can be integrated into Indonesia's existing carbon accounting system. My investigation of these existing datasets found that through quality assurance tests these datasets are essential to be integrated into national and provincial forest monitoring and carbon accounting systems. Integration of this information would eventually improve the accuracy of the estimates of forest carbon stocks, biomass growth, mortality and emission factors from deforestation and forest degradation. At landscape scale, this study demonstrates the capability of airborne lidar for forest monitoring and forest cover classification in tropical peat swamp ecosystems. The mapping application using airborne lidar showed a more accurate and precise classification of land and forest cover when compared with mapping using optical and active sensors. To reduce the cost of lidar acquisition, this study assessed the optimum lidar return density for forest monitoring. I found that the density of lidar return could be reduced to at least 1 return per 4 m2. Overall, this study provides essential scientific background to improve the accuracy of forest AGB estimates. Therefore, the described results and techniques should be integrated into the existing monitoring systems to assess emission reduction targets and the impact of REDD+ implementation.
30 CFR 251.8 - Inspection and reporting requirements for activities under a permit.

Code of Federal Regulations, 2011 CFR

2011-07-01

... exploration or scientific research activities under a permit. They will determine whether operations are... operations. (2) You must submit a final report of exploration or scientific research activities under a... scientific research activities were conducted. Identify the lines of geophysical traverses and their...
Science in Writing: Learning Scientific Argument in Principle and Practice

ERIC Educational Resources Information Center

Cope, Bill; Kalantzis, Mary; Abd-El-Khalick, Fouad; Bagley, Elizabeth

2013-01-01

This article explores the processes of writing in science and in particular the "complex performance" of writing a scientific argument. The article explores in general terms the nature of scientific argumentation in which the author-scientist makes claims, provides evidence to support these claims, and develops chains of scientific…
An Innovative Infrastructure with a Universal Geo-Spatiotemporal Data Representation Supporting Cost-Effective Integration of Diverse Earth Science Data

NASA Technical Reports Server (NTRS)

Rilee, Michael Lee; Kuo, Kwo-Sen

2017-01-01

The SpatioTemporal Adaptive Resolution Encoding (STARE) is a unifying scheme encoding geospatial and temporal information for organizing data on scalable computing/storage resources, minimizing expensive data transfers. STARE provides a compact representation that turns set-logic functions into integer operations, e.g. conditional sub-setting, taking into account representative spatiotemporal resolutions of the data in the datasets. STARE geo-spatiotemporally aligns data placements of diverse data on massive parallel resources to maximize performance. Automating important scientific functions (e.g. regridding) and computational functions (e.g. data placement) allows scientists to focus on domain-specific questions instead of expending their efforts and expertise on data processing. With STARE-enabled automation, SciDB (Scientific Database) plus STARE provides a database interface, reducing costly data preparation, increasing the volume and variety of interoperable data, and easing result sharing. Using SciDB plus STARE as part of an integrated analysis infrastructure dramatically eases combining diametrically different datasets.
[German national consensus on wound documentation of leg ulcer : Part 1: Routine care - standard dataset and minimum dataset].

PubMed

Heyer, K; Herberger, K; Protz, K; Mayer, A; Dissemond, J; Debus, S; Augustin, M

2017-09-01

Standards for basic documentation and the course of treatment increase quality assurance and efficiency in health care. To date, no standards for the treatment of patients with leg ulcers are available in Germany. The aim of the study was to develop standards under routine conditions in the documentation of patients with leg ulcers. This article shows the recommended variables of a "standard dataset" and a "minimum dataset". Consensus building among experts from 38 scientific societies, professional associations, insurance and supply networks (n = 68 experts) took place. After conducting a systematic international literature research, available standards were reviewed and supplemented with our own considerations of the expert group. From 2012-2015 standards for documentation were defined in multistage online visits and personal meetings. A consensus was achieved for 18 variables for the minimum dataset and 48 variables for the standard dataset in a total of seven meetings and nine online Delphi visits. The datasets involve patient baseline data, data on the general health status, wound characteristics, diagnostic and therapeutic interventions, patient reported outcomes, nutrition, and education status. Based on a multistage continuous decision-making process, a standard in the measurement of events in routine care in patients with a leg ulcer was developed.
MaGnET: Malaria Genome Exploration Tool

PubMed Central

Sharman, Joanna L.; Gerloff, Dietlind L.

2013-01-01

Summary: The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive ‘exploration-style’ visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein–protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Availability and Implementation: Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org Contact: joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23894142
Data Citation Concept for CMIP6

NASA Astrophysics Data System (ADS)

Stockhause, M.; Toussaint, F.; Lautenschlager, M.; Lawrence, B.

2015-12-01

There is a broad consensus among data centers and scientific publishers on Force 11's 'Joint Declaration of Data Citation Principles'. To put these principles into operation is not always as straight forward. The focus for CMIP6 data citations lies on the citation of data created by others and used in an analysis underlying the article. And for this source data usually no article of the data creators is available ('stand-alone data publication'). The planned data citation granularities are model data (data collections containing all datasets provided for the project by a single model) and experiment data (data collections containing all datasets for a scientific experiment run by a single model). In case of large international projects or activities like CMIP, the data is commonly stored and disseminated by multiple repositories in a federated data infrastructure such as the Earth System Grid Federation (ESGF). The individual repositories are subject to different institutional and national policies. A Data Management Plan (DMP) will define a certain standard for the repositories including data handling procedures. Another aspect of CMIP data, relevant for data citations, is its dynamic nature. For such large data collections, datasets are added, revised and retracted for years, before the data collection becomes stable for a data citation entity including all model or simulation data. Thus, a critical issue for ESGF is data consistency, requiring thorough dataset versioning to enable the identification of the data collection in the cited version. Currently, the ESGF is designed for accessing the latest dataset versions. Data citation introduces the necessity to support older and retracted dataset versions by storing metadata even beyond data availability (data unpublished in ESGF). Apart from ESGF, other infrastructure components exist for CMIP, which provide information that has to be connected to the CMIP6 data, e.g. ES-DOC providing information on models and simulations and the IPCC Data Distribution Centre (DDC) storing a subset of data together with available metadata (ES-DOC) for the long-term reuse of the interdisciplinary community. Other connections exist to standard project vocabularies, to personal identifiers (e.g. ORCID), or to data products (including provenance information).
Implementing a Community-Driven Cyberinfrastructure Platform for the Paleo- and Rock Magnetic Scientific Fields that Generalizes to Other Geoscience Disciplines

NASA Astrophysics Data System (ADS)

Minnett, R.; Jarboe, N.; Koppers, A. A.; Tauxe, L.; Constable, C.

2013-12-01

EarthRef.org is a geoscience umbrella website for several databases and data and model repository portals. These portals, unified in the mandate to preserve their respective data and promote scientific collaboration in their fields, are also disparate in their schemata. The Magnetics Information Consortium (http://earthref.org/MagIC/) is a grass-roots cyberinfrastructure effort envisioned by the paleo- and rock magnetic scientific community to archive their wealth of peer-reviewed raw data and interpretations from studies on natural and synthetic samples and relies on a partially strict subsumptive hierarchical data model. The Geochemical Earth Reference Model (http://earthref.org/GERM/) portal focuses on the chemical characterization of the Earth and relies on two data schemata: a repository of peer-reviewed reservoir geochemistry, and a database of partition coefficients for rocks, minerals, and elements. The Seamount Biogeosciences Network (http://earthref.org/SBN/) encourages the collaboration between the diverse disciplines involved in seamount research and includes the Seamount Catalog (http://earthref.org/SC/) of bathymetry and morphology. All of these portals also depend on the EarthRef Reference Database (http://earthref.org/ERR/) for publication reference metadata and the EarthRef Digital Archive (http://earthref.org/ERDA/), a generic repository of data objects and their metadata. The development of the new MagIC Search Interface (http://earthref.org/MagIC/search/) centers on a reusable platform designed to be flexible enough for largely heterogeneous datasets and to scale up to datasets with tens of millions of records. The HTML5 web application and Oracle 11g database residing at the San Diego Supercomputer Center (SDSC) support the online contribution and editing of complex datasets in a spreadsheet environment and the browsing and filtering of these contributions in the context of thousands of other datasets. EarthRef.org is in the process of implementing this platform across all of its data portals in spite of the wide variety of data schemata and is dedicated to serving the geoscience community with as little effort from the end-users as possible.
Non-negative Tensor Factorization for Robust Exploratory Big-Data Analytics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alexandrov, Boian; Vesselinov, Velimir Valentinov; Djidjev, Hristo Nikolov

Currently, large multidimensional datasets are being accumulated in almost every field. Data are: (1) collected by distributed sensor networks in real-time all over the globe, (2) produced by large-scale experimental measurements or engineering activities, (3) generated by high-performance simulations, and (4) gathered by electronic communications and socialnetwork activities, etc. Simultaneous analysis of these ultra-large heterogeneous multidimensional datasets is often critical for scientific discoveries, decision-making, emergency response, and national and global security. The importance of such analyses mandates the development of the next-generation of robust machine learning (ML) methods and tools for bigdata exploratory analysis.
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

PubMed

Scheuch, Matthias; Höper, Dirk; Beer, Martin

2015-03-03

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

Marine Phytoplankton Temperature versus Growth Responses from Polar to Tropical Waters – Outcome of a Scientific Community-Wide Study

PubMed Central

Boyd, Philip W.; Rynearson, Tatiana A.; Armstrong, Evelyn A.; Fu, Feixue; Hayashi, Kendra; Hu, Zhangxi; Hutchins, David A.; Kudela, Raphael M.; Litchman, Elena; Mulholland, Margaret R.; Passow, Uta; Strzepek, Robert F.; Whittaker, Kerry A.; Yu, Elizabeth; Thomas, Mridul K.

2013-01-01

“It takes a village to finish (marine) science these days” Paraphrased from Curtis Huttenhower (the Human Microbiome project) The rapidity and complexity of climate change and its potential effects on ocean biota are challenging how ocean scientists conduct research. One way in which we can begin to better tackle these challenges is to conduct community-wide scientific studies. This study provides physiological datasets fundamental to understanding functional responses of phytoplankton growth rates to temperature. While physiological experiments are not new, our experiments were conducted in many laboratories using agreed upon protocols and 25 strains of eukaryotic and prokaryotic phytoplankton isolated across a wide range of marine environments from polar to tropical, and from nearshore waters to the open ocean. This community-wide approach provides both comprehensive and internally consistent datasets produced over considerably shorter time scales than conventional individual and often uncoordinated lab efforts. Such datasets can be used to parameterise global ocean model projections of environmental change and to provide initial insights into the magnitude of regional biogeographic change in ocean biota in the coming decades. Here, we compare our datasets with a compilation of literature data on phytoplankton growth responses to temperature. A comparison with prior published data suggests that the optimal temperatures of individual species and, to a lesser degree, thermal niches were similar across studies. However, a comparison of the maximum growth rate across studies revealed significant departures between this and previously collected datasets, which may be due to differences in the cultured isolates, temporal changes in the clonal isolates in cultures, and/or differences in culture conditions. Such methodological differences mean that using particular trait measurements from the prior literature might introduce unknown errors and bias into modelling projections. Using our community-wide approach we can reduce such protocol-driven variability in culture studies, and can begin to address more complex issues such as the effect of multiple environmental drivers on ocean biota. PMID:23704890
A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children.

PubMed

Verhagen, Lilly M; Zomer, Aldert; Maes, Mailis; Villalba, Julian A; Del Nogal, Berenice; Eleveld, Marc; van Hijum, Sacha Aft; de Waard, Jacobus H; Hermans, Peter Wm

2013-02-01

Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.
A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children

PubMed Central

2013-01-01

Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts. PMID:23375113
WFIRST: Science from the Guest Investigator and Parallel Observation Programs

NASA Astrophysics Data System (ADS)

Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

2018-01-01

The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.
EmailTime: visual analytics and statistics for temporal email

NASA Astrophysics Data System (ADS)

Erfani Joorabchi, Minoo; Yim, Ji-Dong; Shaw, Christopher D.

2011-01-01

Although the discovery and analysis of communication patterns in large and complex email datasets are difficult tasks, they can be a valuable source of information. We present EmailTime, a visual analysis tool of email correspondence patterns over the course of time that interactively portrays personal and interpersonal networks using the correspondence in the email dataset. Our approach is to put time as a primary variable of interest, and plot emails along a time line. EmailTime helps email dataset explorers interpret archived messages by providing zooming, panning, filtering and highlighting etc. To support analysis, it also measures and visualizes histograms, graph centrality and frequency on the communication graph that can be induced from the email collection. This paper describes EmailTime's capabilities, along with a large case study with Enron email dataset to explore the behaviors of email users within different organizational positions from January 2000 to December 2001. We defined email behavior as the email activity level of people regarding a series of measured metrics e.g. sent and received emails, numbers of email addresses, etc. These metrics were calculated through EmailTime. Results showed specific patterns in the use email within different organizational positions. We suggest that integrating both statistics and visualizations in order to display information about the email datasets may simplify its evaluation.
Multi-azimuth 3D Seismic Exploration and Processing in the Jeju Basin, the Northern East China Sea

NASA Astrophysics Data System (ADS)

Yoon, Youngho; Kang, Moohee; Kim, Jin-Ho; Kim, Kyong-O.

2015-04-01

Multi-azimuth(MAZ) 3D seismic exploration is one of the most advanced seismic survey methods to improve illumination and multiple attenuation for better image of the subsurface structures. 3D multi-channel seismic data were collected in two phases during 2012, 2013, and 2014 in Jeju Basin, the northern part of the East China Sea Basin where several oil and gas fields were discovered. Phase 1 data were acquired at 135° and 315° azimuths in 2012 and 2013 comprised a full 3D marine seismic coverage of 160 km2. In 2014, phase 2 data were acquired at the azimuths 45° and 225°, perpendicular to those of phase 1. These two datasets were processed through the same processing workflow prior to velocity analysis and merged to one MAZ dataset. We performed velocity analysis on the MAZ dataset as well as two phases data individually and then stacked these three datasets separately. We were able to pick more accurate velocities in the MAZ dataset compare to phase 1 and 2 data while velocity picking. Consequently, the MAZ seismic volume provide us better resolution and improved images since different shooting directions illuminate different parts of the structures and stratigraphic features.
I Wonder…Scientific Exploration and Experimentation as a Practice of Christian Faith

ERIC Educational Resources Information Center

Shaver, Ruth E.

2016-01-01

"I Wonder...Gaining Wisdom and Growing Faith Through Scientific Exploration" is an intergenerational science curriculum designed to be used in congregations. The goal of this curriculum and the theoretical work underpinning it is to counter the perception that people of faith cannot also be people who possess a scientific understanding…
Implementation of a Flexible Tool for Automated Literature-Mining and Knowledgebase Development (DevToxMine)

EPA Science Inventory

Deriving novel relationships from the scientific literature is an important adjunct to datamining activities for complex datasets in genomics and high-throughput screening activities. Automated text-mining algorithms can be used to extract relevant content from the literature and...
Predictive modeling of treatment resistant depression using data from STAR*D and an independent clinical study.

PubMed

Nie, Zhi; Vairavan, Srinivasan; Narayan, Vaibhav A; Ye, Jieping; Li, Qingqin S

2018-01-01

Identification of risk factors of treatment resistance may be useful to guide treatment selection, avoid inefficient trial-and-error, and improve major depressive disorder (MDD) care. We extended the work in predictive modeling of treatment resistant depression (TRD) via partition of the data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) cohort into a training and a testing dataset. We also included data from a small yet completely independent cohort RIS-INT-93 as an external test dataset. We used features from enrollment and level 1 treatment (up to week 2 response only) of STAR*D to explore the feature space comprehensively and applied machine learning methods to model TRD outcome at level 2. For TRD defined using QIDS-C16 remission criteria, multiple machine learning models were internally cross-validated in the STAR*D training dataset and externally validated in both the STAR*D testing dataset and RIS-INT-93 independent dataset with an area under the receiver operating characteristic curve (AUC) of 0.70-0.78 and 0.72-0.77, respectively. The upper bound for the AUC achievable with the full set of features could be as high as 0.78 in the STAR*D testing dataset. Model developed using top 30 features identified using feature selection technique (k-means clustering followed by χ2 test) achieved an AUC of 0.77 in the STAR*D testing dataset. In addition, the model developed using overlapping features between STAR*D and RIS-INT-93, achieved an AUC of > 0.70 in both the STAR*D testing and RIS-INT-93 datasets. Among all the features explored in STAR*D and RIS-INT-93 datasets, the most important feature was early or initial treatment response or symptom severity at week 2. These results indicate that prediction of TRD prior to undergoing a second round of antidepressant treatment could be feasible even in the absence of biomarker data.
Improved Detection and Mapping of Deepwater Hydrocarbon Seeps: Optimizing Acquisition and Processing Parameters for Marine Seep Hunting

NASA Astrophysics Data System (ADS)

Mitchell, G. A.; Orange, D.; Gharib, J. J.; Saade, E. J.; Joye, S. B.

2016-12-01

Marine seep hunting surveys are a current focus of hydrocarbon exploration due to recent advances in offshore geophysical and geochemical technologies. Hydrocarbon seeps are ephemeral, small, discrete, and often difficult to sample on the deep seafloor. Low to mid-frequency multibeam echosounders (MBES) are an ideal exploration tool to remotely locate and map seafloor features associated with seepage. Geophysical signatures from hydrocarbon seeps are evident in bathymetric datasets (fluid expulsion features), seafloor backscatter datasets (carbonate outcrops, gassy sediments, methane hydrate deposits), and midwater backscatter datasets (gas bubble and oil droplet plumes). Interpretation of these geophysical seep signatures in backscatter datasets is a fundamental component in seep hunting. Degradation of backscatter datasets resulting from environmental, geometric, and system noise can interfere with the detection and delineation of seeps. We present a backscatter intensity normalization method and a 2X acquisition technique that can enhance the geologic resolvability within backscatter datasets and assist in interpretation and characterization of seeps. We use GC600 in the Northern Gulf of Mexico as a seep calibration site for a Kongsberg EM302 30 kHz MBES prior to the start of the Gigante seep hunting survey. We analyze the results of a backscatter intensity normalization, assess the effectiveness of 2X seafloor coverage in resolving geologic features in backscatter data, and determine off-nadir detection limits of bubble plumes. GC600's location and robust venting make it a natural laboratory in which to study natural hydrocarbon seepage. The site has been the focus of several near-seafloor surveys as well as in-situ studies using advanced deepwater technologies analyzing fluid flux and composition. These datasets allow for ground-truthing of our remote backscatter measurements prior to commencing exploration within the frontier regions of the Southern Gulf of Mexico and Caribbean Sea. Our study shows that a comprehensive multibeam calibration involving bathymetric difference grids, a seafloor backscatter intensity normalization, a 2X acquisition survey technique, and processing with multiple processing packages can improve resolvability of seep features and interpretation.
A collection of European sweet cherry phenology data for assessing climate change

NASA Astrophysics Data System (ADS)

Wenden, Bénédicte; Campoy, José Antonio; Lecourt, Julien; López Ortega, Gregorio; Blanke, Michael; Radičević, Sanja; Schüller, Elisabeth; Spornberger, Andreas; Christen, Danilo; Magein, Hugo; Giovannini, Daniela; Campillo, Carlos; Malchev, Svetoslav; Peris, José Miguel; Meland, Mekjell; Stehr, Rolf; Charlot, Gérard; Quero-García, José

2016-12-01

Professional and scientific networks built around the production of sweet cherry (Prunus avium L.) led to the collection of phenology data for a wide range of cultivars grown in experimental sites characterized by highly contrasted climatic conditions. We present a dataset of flowering and maturity dates, recorded each year for one tree when available, or the average of several trees for each cultivar, over a period of 37 years (1978-2015). Such a dataset is extremely valuable for characterizing the phenological response to climate change, and the plasticity of the different cultivars' behaviour under different environmental conditions. In addition, this dataset will support the development of predictive models for sweet cherry phenology exploitable at the continental scale, and will help anticipate breeding strategies in order to maintain and improve sweet cherry production in Europe.
A collection of European sweet cherry phenology data for assessing climate change.

PubMed

Wenden, Bénédicte; Campoy, José Antonio; Lecourt, Julien; López Ortega, Gregorio; Blanke, Michael; Radičević, Sanja; Schüller, Elisabeth; Spornberger, Andreas; Christen, Danilo; Magein, Hugo; Giovannini, Daniela; Campillo, Carlos; Malchev, Svetoslav; Peris, José Miguel; Meland, Mekjell; Stehr, Rolf; Charlot, Gérard; Quero-García, José

2016-12-06

Professional and scientific networks built around the production of sweet cherry (Prunus avium L.) led to the collection of phenology data for a wide range of cultivars grown in experimental sites characterized by highly contrasted climatic conditions. We present a dataset of flowering and maturity dates, recorded each year for one tree when available, or the average of several trees for each cultivar, over a period of 37 years (1978-2015). Such a dataset is extremely valuable for characterizing the phenological response to climate change, and the plasticity of the different cultivars' behaviour under different environmental conditions. In addition, this dataset will support the development of predictive models for sweet cherry phenology exploitable at the continental scale, and will help anticipate breeding strategies in order to maintain and improve sweet cherry production in Europe.
A collection of European sweet cherry phenology data for assessing climate change

PubMed Central

Wenden, Bénédicte; Campoy, José Antonio; Lecourt, Julien; López Ortega, Gregorio; Blanke, Michael; Radičević, Sanja; Schüller, Elisabeth; Spornberger, Andreas; Christen, Danilo; Magein, Hugo; Giovannini, Daniela; Campillo, Carlos; Malchev, Svetoslav; Peris, José Miguel; Meland, Mekjell; Stehr, Rolf; Charlot, Gérard; Quero-García, José

2016-01-01

Professional and scientific networks built around the production of sweet cherry (Prunus avium L.) led to the collection of phenology data for a wide range of cultivars grown in experimental sites characterized by highly contrasted climatic conditions. We present a dataset of flowering and maturity dates, recorded each year for one tree when available, or the average of several trees for each cultivar, over a period of 37 years (1978–2015). Such a dataset is extremely valuable for characterizing the phenological response to climate change, and the plasticity of the different cultivars’ behaviour under different environmental conditions. In addition, this dataset will support the development of predictive models for sweet cherry phenology exploitable at the continental scale, and will help anticipate breeding strategies in order to maintain and improve sweet cherry production in Europe. PMID:27922629
A method and software framework for enriching private biomedical sources with data from public online repositories.

PubMed

Anguita, Alberto; García-Remesal, Miguel; Graf, Norbert; Maojo, Victor

2016-04-01

Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public databases in order to enrich the information about the identified entities-e.g. scientific literature, published clinical trial results, etc. While semantic integration techniques have traditionally focused on providing homogeneous access to private datasets-thus helping automate the first part of the research, and there exist different solutions for browsing public data, there is still a need for tools that facilitate merging public repositories with private datasets. This paper presents a framework that automatically locates public data of interest to the researcher and semantically integrates it with existing private datasets. The framework has been designed as an extension of traditional data integration systems, and has been validated with an existing data integration platform from a European research project by integrating a private biological dataset with data from the National Center for Biotechnology Information (NCBI). Copyright © 2016 Elsevier Inc. All rights reserved.
An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.

PubMed

Mansouri, K; Grulke, C M; Richard, A M; Judson, R S; Williams, A J

2016-11-01

The increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experimental data. Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publicly available PHYSPROP physicochemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest-quality subset of the original dataset was compared with the larger curated and corrected dataset. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publicly available for further usage and integration by the scientific community.
Statistical tests and identifiability conditions for pooling and analyzing multisite datasets.

PubMed

Zhou, Hao Henry; Singh, Vikas; Johnson, Sterling C; Wahba, Grace

2018-02-13

When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. However, variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions where we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable. We analyze various properties of the framework for testing model fit, constructing confidence intervals, and evaluating consistency characteristics. Our technical development is motivated by Alzheimer's disease (AD) studies, and we present empirical results showing that our framework enables harmonizing of protein biomarkers, even when the assays across sites differ. Our contribution may, in part, mitigate a bottleneck that researchers face in clinical research when pooling smaller sized datasets and may offer benefits when the subjects of interest are difficult to recruit or when resources prohibit large single-site studies. Copyright © 2018 the Author(s). Published by PNAS.
Exploration Science Opportunities for Students within Higher Education

NASA Astrophysics Data System (ADS)

Bailey, Brad; Minafra, Joseph; Schmidt, Gregory

2016-10-01

The NASA Solar System Exploration Research Virtual Institute (SSERVI) is a virtual institute focused on exploration science related to near-term human exploration targets, training the next generation of lunar scientists, and education and public outreach. As part of the SSERVI mission, we act as a hub for opportunities that engage the public through education and outreach efforts in addition to forming new interdisciplinary, scientific collaborations.SSERVI provides opportunities for students to bridge the scientific and generational gap currently existing in the planetary exploration field. This bridge is essential to the continued international success of scientific, as well as human and robotic, exploration.The decline in funding opportunities after the termination of the Apollo missions to the Moon in the early 1970's produced a large gap in both the scientific knowledge and experience of the original lunar Apollo researchers and the resurgent group of young lunar/NEA researchers that have emerged within the last 15 years. One of SSERVI's many goals is to bridge this gap through the many networking and scientific connections made between young researchers and established planetary principle investigators. To this end, SSERVI has supported the establishment of NextGen Lunar Scientists and Engineers group (NGLSE), a group of students and early-career professionals designed to build experience and provide networking opportunities to its members. SSERVI has also created the LunarGradCon, a scientific conference dedicated solely to graduate and undergraduate students working in the lunar field. Additionally, SSERVI produces monthly seminars and bi-yearly virtual workshops that introduce students to the wide variety of exploration science being performed in today's research labs. SSERVI also brokers opportunities for domestic and international student exchange between collaborating laboratories as well as internships at our member institutions. SSERVI provides a bridge that is essential to the continued international success of scientific, as well as human and robotic, exploration.
Collaboration as a means toward a better dataset for both stakeholders and scientist

NASA Astrophysics Data System (ADS)

Chegwidden, O.; Rupp, D. E.; Nijssen, B.; Pytlak, E.; Knight, K.

2016-12-01

In 2013, the University of Washington (UW) and Oregon State University began a three-year project to evaluate climate change impacts in the Columbia River Basin (CRB) in the North American Pacific Northwest. The project was funded and coordinated by the River Management Joint Operating Committee (RMJOC), consisting of the Bonneville Power Administration (BPA), US Army Corps of Engineers (USACE), and US Bureau of Reclamation (USBR) and included a host of stakeholders in the region. The team worked to foster communication and collaboration throughout the production process, and also discovered effective collaborative strategies along the way. Project status updates occurred through a variety of outlets, ranging from monthly team check-ins to bi-annual workshops for a much larger audience. The workshops were used to solicit ongoing and timely feedback from a variety of stakeholders including RMJOC members, fish habitat advocates, tribal representatives and public utilities. To further facilitate collaboration, the team restructured the original project timeline, opting for delivering a provisional dataset nine months before the scheduled delivery of the final dataset. This allowed for a previously unplanned series of reviews from stakeholders in the region, who contributed their own expertise and interests to the dataset. The restructuring also encouraged the development of a streamlined infrastructure for performing the actual model simulation, resulting in two benefits: (1) reproducibility, an oft-touted goal within the scientific community, and (2) the ability to incorporate improvements from both stakeholders and scientists at a late stage in the project. We will highlight some of the key scientist-stakeholder engagement interactions throughout the project. We will show that active co-production resulted in a product more useful for not only stakeholders in the region, but also the scientific community.
Digital Rocks Portal: a sustainable platform for imaged dataset sharing, translation and automated analysis

NASA Astrophysics Data System (ADS)

Prodanovic, M.; Esteva, M.; Hanlon, M.; Nanda, G.; Agarwal, P.

2015-12-01

Recent advances in imaging have provided a wealth of 3D datasets that reveal pore space microstructure (nm to cm length scale) and allow investigation of nonlinear flow and mechanical phenomena from first principles using numerical approaches. This framework has popularly been called "digital rock physics". Researchers, however, have trouble storing and sharing the datasets both due to their size and the lack of standardized image types and associated metadata for volumetric datasets. This impedes scientific cross-validation of the numerical approaches that characterize large scale porous media properties, as well as development of multiscale approaches required for correct upscaling. A single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal, that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Once widely accepter, the repository will jumpstart productivity and enable scientific inquiry and engineering decisions founded on a data-driven basis. This is the first repository of its kind. We show initial results on incorporating essential software tools and pipelines that make it easier for researchers to store and reuse data, and for educators to quickly visualize and illustrate concepts to a wide audience. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.
AWS-Glacier As A Storage Foundation For AWS-EC2 Hosted Scientific Data Services

NASA Astrophysics Data System (ADS)

Gallagher, J. H. R.; Potter, N.

2016-12-01

Using AWS Glacier as a base level data store for a scientific data service presents new challenges for the web accessible data services, along with their software clients and human operators. All meaningful Glacier transactions take at least 4 hours to complete. This is in contrast to the various web APIs for data such as WMS, WFS, WCS, DAP2, and Netcdf tools which were all written based on the premise that the response will be (nearly) immediate. Only DAP4 and WPS contain an explicit asynchronous component to their respective protocols which allows for "return later" behaviors. We were able to put Hyrax (a DAP4 server) in front of Glacier-held resources, but there were significant issues. Any kind of probing of the datasets happens at the cost of the Glacier retrieval period, 4 hours. A couple of crucial things fall out of this: The first is that the service must cache metadata, including coordinate map arrays, so that a client can have enough information available in the "immediate" time frame to make a decisions about what to ask for from the dataset. This type of request planning is important because a data access request will take 4 hours to complete unless the data resource has been cached. The second thing is that the clients need to change their behavior when accessing datasets in an asynchronous system, even if the metadata is cached. Commonly, client applications will request a number of data components from a DAP2 service in the course of "discovering" the dataset. This may not be a well-supported model of interaction with Glacier or any other high latency data store.

The centrality of meta-programming in the ES-DOC eco-system

NASA Astrophysics Data System (ADS)

Greenslade, Mark

2017-04-01

The Earth System Documentation (ES-DOC) project is an international effort aiming to deliver a robust earth system model inter-comparison project documentation infrastructure. Such infrastructure both simplifies & standardizes the process of documenting (in detail) projects, experiments, models, forcings & simulations. In support of CMIP6, ES-DOC has upgraded its eco-system of tools, web-services & web-sites. The upgrade consolidates the existing infrastructure (built for CMIP5) and extends it with the introduction of new capabilities. The strategic focus of the upgrade is improvements in the documentation experience and broadening the range of scientific use-cases that the archived documentation may help deliver. Whether it is highlighting dataset errors, exploring experimental protocols, comparing forcings across ensemble runs, understanding MIP objectives, reviewing citations, exploring component properties of configured models, visualising inter-model relationships, scientists involved in CMIP6 will find the ES-DOC infrastructure helpful. This presentation underlines the centrality of meta-programming within the ES-DOC eco-system. We will demonstrate how agility is greatly enhanced by taking a meta-programming approach to representing data models and controlled vocabularies. Such an approach nicely decouples representations from encodings. Meta-models will be presented along with the associated tooling chain that forward engineers artefacts as diverse as: class hierarchies, IPython notebooks, mindmaps, configuration files, OWL & SKOS documents, spreadsheets …etc.
Greater widespread functional connectivity of the caudate in older adults who practice kripalu yoga and vipassana meditation than in controls

PubMed Central

Gard, Tim; Taquet, Maxime; Dixit, Rohan; Hölzel, Britta K.; Dickerson, Bradford C.; Lazar, Sara W.

2015-01-01

There has been a growing interest in understanding how contemplative practices affect brain functional organization. However, most studies have restricted their exploration to predefined networks. Furthermore, scientific comparisons of different contemplative traditions are largely lacking. Here we explored differences in whole brain resting state functional connectivity between experienced yoga practitioners, experienced meditators, and matched controls. Analyses were repeated in an independent sample of experienced meditators and matched controls. Analyses utilizing Network-Based Statistics (Zalesky et al., 2010) revealed difference components for yoga practitioners > controls and meditators > controls in which the right caudate was a central node. Follow up analyses revealed that yoga practitioners and meditators had significantly greater degree centrality in the caudate than controls. This greater degree centrality was not driven by single connections but by greater connectivity between the caudate and numerous brain regions. Findings of greater caudate connectivity in meditators than in controls was replicated in an independent dataset. These findings suggest that yoga and meditation practitioners have stronger functional connectivity within basal ganglia cortico-thalamic feedback loops than non-practitioners. Although we could not provide evidence for its mechanistic role, this greater connectivity might be related to the often reported effects of meditation and yoga on behavioral flexibility, mental health, and well-being. PMID:25852521
What could you do with 400 years of biological history on african americans? Evaluating the potential scientific benefit of systematic studies of dental and skeletal materials on African Americans from the 17th through 20th centuries

PubMed Central

Jackson, Latifa; Cross, Christopher; Clarke, Cameron

2016-01-01

Objectives How important is it to be able to reconstruct the lives of a highly diverse, historically recent macroethnic group over the course of 400 years? How many insights into human evolutionary biology and disease susceptibilities could be gained, even with this relatively recent window into the past? In this article, we explore the potential ramifications of a newly constructed dataset of Four Centuries of African American Biological Variation (4Cs). Methods This article provides initial lists of digitized variables formatted as SQL tables for the 17th and 18th century samples and for the 19th and 20th century samples. Results This database is dynamic and new information is added yearly. The database provides novel opportunities for significant insights into the past biological history of this group and three case study applications are detailed for comparative computational systems biology studies of (1) hypertension, (2) the oral microbiome, and (3) mental health disorders. Conclusions The 4Cs dataset is ideal for interdisciplinary “next generation” science research and these data represent a unique step toward the accumulation of historically contextualized Big Data on an underrepresented group known to have experienced differential survival over time. Am. J. Hum. Biol. 28:510–513, 2016. © 2016 The Authors American Journal of Human Biology Published byWiley Periodicals, Inc. PMID:26749025
A Student-Friendly Graphical User Interface to Extract Data from Remote Sensing Level-2 Products.

NASA Astrophysics Data System (ADS)

Bernardello, R.

2016-02-01

Remote sensing era has provided an unprecedented amount of publicly available data. The United States National Aeronautics and Space Administration Goddard Space Flight Center (NASA-GSFC) has achieved remarkable results in the distribution of these data to the scientific community through the OceanColor web page (http://oceancolor.gsfc.nasa.gov/). However, the access to these data, is not straightforward and needs a certain investment of time in learning the use of existing software. Satellite sensors acquire raw data that are processed through several steps towards a format usable by the scientific community. These products are distributed in Hierarchical Data Format (HDF) which often represents the first obstacle for students, teachers and scientists not used to deal with extensive matrices. We present here SATellite data PROcessing (SATPRO) a newly developed Graphical User Interface (GUI) designed in MATLAB environment to provide an easy, immediate yet reliable way to select and extract Level-2 data from NASA SeaWIFS and MODIS-Aqua databases for oceanic surface temperature and chlorophyll. Since no previous experience with MATLAB is required, SATPRO allows the user to explore the available dataset without investing any software-learning time. SATPRO is an ideal tool to introduce undergraduate students to the use of remote sensing data in oceanography and can also be useful for research projects at the graduate level.
Inter-comparison of multiple statistically downscaled climate datasets for the Pacific Northwest, USA

PubMed Central

Jiang, Yueyang; Kim, John B.; Still, Christopher J.; Kerns, Becky K.; Kline, Jeffrey D.; Cunningham, Patrick G.

2018-01-01

Statistically downscaled climate data have been widely used to explore possible impacts of climate change in various fields of study. Although many studies have focused on characterizing differences in the downscaling methods, few studies have evaluated actual downscaled datasets being distributed publicly. Spatially focusing on the Pacific Northwest, we compare five statistically downscaled climate datasets distributed publicly in the US: ClimateNA, NASA NEX-DCP30, MACAv2-METDATA, MACAv2-LIVNEH and WorldClim. We compare the downscaled projections of climate change, and the associated observational data used as training data for downscaling. We map and quantify the variability among the datasets and characterize the spatio-temporal patterns of agreement and disagreement among the datasets. Pair-wise comparisons of datasets identify the coast and high-elevation areas as areas of disagreement for temperature. For precipitation, high-elevation areas, rainshadows and the dry, eastern portion of the study area have high dissimilarity among the datasets. By spatially aggregating the variability measures into watersheds, we develop guidance for selecting datasets within the Pacific Northwest climate change impact studies. PMID:29461513
Inter-comparison of multiple statistically downscaled climate datasets for the Pacific Northwest, USA.

PubMed

Jiang, Yueyang; Kim, John B; Still, Christopher J; Kerns, Becky K; Kline, Jeffrey D; Cunningham, Patrick G

2018-02-20

Statistically downscaled climate data have been widely used to explore possible impacts of climate change in various fields of study. Although many studies have focused on characterizing differences in the downscaling methods, few studies have evaluated actual downscaled datasets being distributed publicly. Spatially focusing on the Pacific Northwest, we compare five statistically downscaled climate datasets distributed publicly in the US: ClimateNA, NASA NEX-DCP30, MACAv2-METDATA, MACAv2-LIVNEH and WorldClim. We compare the downscaled projections of climate change, and the associated observational data used as training data for downscaling. We map and quantify the variability among the datasets and characterize the spatio-temporal patterns of agreement and disagreement among the datasets. Pair-wise comparisons of datasets identify the coast and high-elevation areas as areas of disagreement for temperature. For precipitation, high-elevation areas, rainshadows and the dry, eastern portion of the study area have high dissimilarity among the datasets. By spatially aggregating the variability measures into watersheds, we develop guidance for selecting datasets within the Pacific Northwest climate change impact studies.
Physical characterization of Warm Spitzer-observed near-Earth objects

NASA Astrophysics Data System (ADS)

Thomas, Cristina A.; Emery, Joshua P.; Trilling, David E.; Delbó, Marco; Hora, Joseph L.; Mueller, Michael

2014-01-01

Near-infrared spectroscopy of Near-Earth Objects (NEOs) connects diagnostic spectral features to specific surface mineralogies. The combination of spectroscopy with albedos and diameters derived from thermal infrared observations can increase the scientific return beyond that of the individual datasets. For instance, some taxonomic classes can be separated into distinct compositional groupings with albedo and different mineralogies with similar albedos can be distinguished with spectroscopy. To that end, we have completed a spectroscopic observing campaign to complement the ExploreNEOs Warm Spitzer program that obtained albedos and diameters of nearly 600 NEOs (Trilling, D.E. et al. [2010]. Astron. J. 140, 770-784. http://dx.doi.org/10.1088/0004-6256/140/3/770). The spectroscopy campaign included visible and near-infrared observations of ExploreNEOs targets from various observatories. Here we present the results of observations using the low-resolution prism mode (˜0.7-2.5 μm) of the SpeX instrument on the NASA Infrared Telescope Facility (IRTF). We also include near-infrared observations of ExploreNEOs targets from the MIT-UH-IRTF Joint Campaign for Spectral Reconnaissance. Our dataset includes near-infrared spectra of 187 ExploreNEOs targets (125 observations of 92 objects from our survey and 213 observations of 154 objects from the MIT survey). We identify a taxonomic class for each spectrum and use band parameter analysis to investigate the mineralogies for the S-, Q-, and V-complex objects. Our analysis suggests that for spectra that contain near-infrared data but lack the visible wavelength region, the Bus-DeMeo system misidentifies some S-types as Q-types. We find no correlation between spectral band parameters and ExploreNEOs albedos and diameters. We investigate the correlations of phase angle with Band Area Ratio and near-infrared spectral slope. We find slightly negative Band Area Ratio (BAR) correlations with phase angle for Eros and Ivar, but a positive BAR correlation with phase angle for Ganymed. The results of our phase angle study are consistent with those of (Sanchez, J.A., Reddy, V., Nathues, A., Cloutis, E.A., Mann, P., Hiesinger, H. [2012]. Icarus 220, 36-50. http://dx.doi.org/10.1016/j.icarus.2012.04.008, arXiv:1205.0248). We find evidence for spectral phase reddening for Eros, Ganymed, and Ivar. We identify the likely ordinary chondrite type analog for an appropriate subset of our sample. Our resulting proportions of H, L, and LL ordinary chondrites differ from those calculated for meteorite falls and in previous studies of ordinary chondrite-like NEOs.
Quest for Value in Big Earth Data

NASA Astrophysics Data System (ADS)

Kuo, Kwo-Sen; Oloso, Amidu O.; Rilee, Mike L.; Doan, Khoa; Clune, Thomas L.; Yu, Hongfeng

2017-04-01

Among all the V's of Big Data challenges, such as Volume, Variety, Velocity, Veracity, etc., we believe Value is the ultimate determinant, because a system delivering better value has a competitive edge over others. Although it is not straightforward to assess the value of scientific endeavors, we believe the ratio of scientific productivity increase to investment is a reasonable measure. Our research in Big Data approaches to data-intensive analysis for Earth Science has yielded some insights, as well as evidences, as to how optimal value might be attained. The first insight is that we should avoid, as much as possible, moving data through connections with relatively low bandwidth. That is, we recognize that moving data is expensive, albeit inevitable. They must at least be moved from the storage device into computer main memory and then to CPU registers for computation. When data must be moved it is better to move them via relatively high-bandwidth connections and avoid low-bandwidth ones. For this reason, a technology that can best exploit data locality will have an advantage over others. Data locality is easy to achieve and exploit with only one dataset. With multiple datasets, data colocation becomes important in addition to data locality. However, the organization of datasets can only be co-located for certain types of analyses. It is impossible for them to be co-located for all analyses. Therefore, our second insight is that we need to co-locate the datasets for the most commonly used analyses. In Earth Science, we believe the most common analysis requirement is "spatiotemporal coincidence". For example, when we analyze precipitation systems, we often would like to know the environment conditions "where and when" (i.e. at the same location and time) there is precipitation. This "where and when" indicates the "spatiotemporal coincidence" requirement. Thus, an associated insight is that datasets need to be partitioned per the physical dimensions, i.e. space and time, rather than their array index dimensions to achieve co-location for spatiotemporal coincidence. This leads further to the insight that, in terms of optimizing Value, achieving good scalability in Variety is more crucial than good scalability in Volume. Therefore, we will discuss our innovative approach to improving productivity by homogenizing the daunting varieties in Earth Science data to enable data co-location systematically. In addition, a Big Data system incorporating the capabilities described above has the potential to drastically shorten the data preparation period of machine learning, better facilitate automated machine learning operations, and further boost scientific productivity.
The Role of the Spacecraft Operator in Scientific Exploration

NASA Astrophysics Data System (ADS)

Love, S. G.

2011-03-01

Pilot and flight engineer crew members can improve scientific exploration missions and effectively support field work that they may not understand by contributing leadership, teamwork, communication, and operational thinking skills.
Latest processing status and quality assessment of the GOMOS, MIPAS and SCIAMACHY ESA dataset

NASA Astrophysics Data System (ADS)

Niro, F.; Brizzi, G.; Saavedra de Miguel, L.; Scarpino, G.; Dehn, A.; Fehr, T.; von Kuhlmann, R.

2011-12-01

GOMOS, MIPAS and SCIAMACHY instruments are successfully observing the changing Earth's atmosphere since the launch of the ENVISAT-ESA platform on March 2002. The measurements recorded by these instruments are relevant for the Atmospheric-Chemistry community both in terms of time extent and variety of observing geometry and techniques. In order to fully exploit these measurements, it is crucial to maintain a good reliability in the data processing and distribution and to continuously improving the scientific output. The goal is to meet the evolving needs of both the near-real-time and research applications. Within this frame, the ESA operational processor remains the reference code, although many scientific algorithms are nowadays available to the users. In fact, the ESA algorithm has a well-established calibration and validation scheme, a certified quality assessment process and the possibility to reach a wide users' community. Moreover, the ESA algorithm upgrade procedures and the re-processing performances have much improved during last two years, thanks to the recent updates of the Ground Segment infrastructure and overall organization. The aim of this paper is to promote the usage and stress the quality of the ESA operational dataset for the GOMOS, MIPAS and SCIAMACHY missions. The recent upgrades in the ESA processor (GOMOS V6, MIPAS V5 and SCIAMACHY V5) will be presented, with detailed information on improvements in the scientific output and preliminary validation results. The planned algorithm evolution and on-going re-processing campaigns will be mentioned that involves the adoption of advanced set-up, such as the MIPAS V6 re-processing on a clouds-computing system. Finally, the quality control process will be illustrated that allows to guarantee a standard of quality to the users. In fact, the operational ESA algorithm is carefully tested before switching into operations and the near-real time and off-line production is thoughtfully verified via the implementation of automatic quality control procedures. The scientific validity of the ESA dataset will be additionally illustrated with examples of applications that can be supported, such as ozone-hole monitoring, volcanic ash detection and analysis of atmospheric composition changes during the past years.
Linked Metadata - lightweight semantics for data integration (Invited)

NASA Astrophysics Data System (ADS)

Hendler, J. A.

2013-12-01

The "Linked Open Data" cloud (http://linkeddata.org) is currently used to show how the linking of datasets, supported by SPARQL endpoints, is creating a growing set of linked data assets. This linked data space has been growing rapidly, and the last version collected is estimated to have had over 35 billion 'triples.' As impressive as this may sound, there is an inherent flaw in the way the linked data story is conceived. The idea is that all of the data is represented in a linked format (generally RDF) and applications will essentially query this cloud and provide mashup capabilities between the various kinds of data that are found. The view of linking in the cloud is fairly simple -links are provided by either shared URIs or by URIs that are asserted to be owl:sameAs. This view of the linking, which primarily focuses on shared objects and subjects in RDF's subject-predicate-object representation, misses a critical aspect of Semantic Web technology. Given triples such as * A:person1 foaf:knows A:person2 * B:person3 foaf:knows B:person4 * C:person5 foaf:name 'John Doe' this view would not consider them linked (barring other assertions) even though they share a common vocabulary. In fact, we get significant clues that there are commonalities in these data items from the shared namespaces and predicates, even if the traditional 'graph' view of RDF doesn't appear to join on these. Thus, it is the linking of the data descriptions, whether as metadata or other vocabularies, that provides the linking in these cases. This observation is crucial to scientific data integration where the size of the datasets, or even the individual relationships within them, can be quite large. (Note that this is not restricted to scientific data - search engines, social networks, and massive multiuser games also create huge amounts of data.) To convert all the triples into RDF and provide individual links is often unnecessary, and is both time and space intensive. Those looking to do on the fly integration may prefer to do more traditional data queries and then convert and link the 'views' returned at retrieval time, providing another means of using the linked data infrastructure without having to convert whole datasets to triples to provide linking. Web companies have been taking advantage of 'lightweight' semantic metadata for search quality and optimization (cf. schema.org), linking networks within and without web sites (cf. Facebook's Open Graph Protocol), and in doing various kinds of advertisement and user modeling across datasets. Scientific metadata, on the other hand, has traditionally been geared at being largescale and highly descriptive, and scientific ontologies have been aimed at high expressivity, essentially providing complex reasoning services rather than the less expressive vocabularies needed for data discovery and simple mappings that can allow humans (or more complex systems) when full scale integration is needed. Although this work is just the beginning for providing integration, as the community creates more and more datasets, discovery of these data resources on the Web becomes a crucial starting place. Simple descriptors, that can be combined with textual fields and/or common community vocabularies, can be a great starting place on bringing scientific data into the Web of Data that is growing in other communities. References: [1] Pouchard, Line C., et al. "A Linked Science investigation: enhancing climate change data discovery with semantic technologies." Earth science informatics 6.3 (2013): 175-185.
Behavioral and Physiological Neural Network Analyses: A Common Pathway toward Pattern Recognition and Prediction

ERIC Educational Resources Information Center

Ninness, Chris; Lauter, Judy L.; Coffee, Michael; Clary, Logan; Kelly, Elizabeth; Rumph, Marilyn; Rumph, Robin; Kyle, Betty; Ninness, Sharon K.

2012-01-01

Using 3 diversified datasets, we explored the pattern-recognition ability of the Self-Organizing Map (SOM) artificial neural network as applied to diversified nonlinear data distributions in the areas of behavioral and physiological research. Experiment 1 employed a dataset obtained from the UCI Machine Learning Repository. Data for this study…
Towards AN Integrated Scientific and Social Case for Human Space Exploration

NASA Astrophysics Data System (ADS)

Crawford, I. A.

2004-06-01

I will argue that an ambitious programme of human space exploration, involving a return to the Moon, and eventually human missions to Mars, will add greatly to human knowledge. Gathering such knowledge is the primary aim of science, but science’s compartmentalisation into isolated academic disciplines tends to obscure the overall strength of the scientific case. Any consideration of the scientific arguments for human space exploration must therefore take a holistic view, and integrate the potential benefits over the entire spectrum of human knowledge. Moreover, science is only one thread in a much larger overall case for human space exploration. Other threads include economic, industrial, educational, geopolitical and cultural benefits. Any responsibly formulated public space policy must weigh all of these factors before deciding whether or not an investment in human space activities is scientifically and socially desirable.
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae

PubMed Central

Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike

2006-01-01

Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047
Exploring drivers of wetland hydrologic fluxes across parameters and space

NASA Astrophysics Data System (ADS)

Jones, C. N.; Cheng, F. Y.; Mclaughlin, D. L.; Basu, N. B.; Lang, M.; Alexander, L. C.

2017-12-01

Depressional wetlands provide diverse ecosystem services, ranging from critical habitat to the regulation of landscape hydrology. The latter is of particular interest, because while hydrologic connectivity between depressional wetlands and downstream waters has been a focus of both scientific research and policy, it remains difficult to quantify the mode, magnitude, and timing of this connectivity at varying spatial and temporary scales. To do so requires robust empirical and modeling tools that accurately represent surface and subsurface flowpaths between depressional wetlands and other landscape elements. Here, we utilize a parsimonious wetland hydrology model to explore drivers of wetland water fluxes in different archetypal wetland-rich landscapes. We validated the model using instrumented sites from regions that span North America: Prairie Pothole Region (south-central Canada), Delmarva Peninsula (Mid-Atlantic Coastal Plain), and Big Cypress Swamp (southern Florida). Then, using several national scale datasets (e.g., National Wetlands Inventory, USFWS; National Hydrography Dataset, USGS; Soil Survey Geographic Database, NRCS), we conducted a global sensitivity analysis to elucidate dominant drivers of simulated fluxes. Finally, we simulated and compared wetland hydrology in five contrasting landscapes dominated by depressional wetlands: prairie potholes, Carolina and Delmarva bays, pocosins, western vernal pools, and Texas coastal prairie wetlands. Results highlight specific drivers that vary across these regions. Largely, hydroclimatic variables (e.g., PET/P ratios) controlled the timing and magnitude of wetland connectivity, whereas both wetland morphology (e.g., storage capacity and watershed size) and soil characteristics (e.g., ksat and confining layer depth) controlled the duration and mode (surface vs. subsurface) of wetland connectivity. Improved understanding of the drivers of wetland hydrologic connectivity supports enhanced, region-specific management and conservation of these critical ecosystems through more informed decision-making. Findings and conclusions expressed in this article are those of the authors and do not necessarily reflect the views or policies of the US EPA or USFWS.
Genelab: Scientific Partnerships and an Open-Access Database to Maximize Usage of Omics Data from Space Biology Experiments

NASA Technical Reports Server (NTRS)

Reinsch, S. S.; Galazka, J..; Berrios, D. C; Chakravarty, K.; Fogle, H.; Lai, S.; Bokyo, V.; Timucin, L. R.; Tran, P.; Skidmore, M.

2016-01-01

NASA's mission includes expanding our understanding of biological systems to improve life on Earth and to enable long-duration human exploration of space. The GeneLab Data System (GLDS) is NASA's premier open-access omics data platform for biological experiments. GLDS houses standards-compliant, high-throughput sequencing and other omics data from spaceflight-relevant experiments. The GeneLab project at NASA-Ames Research Center is developing the database, and also partnering with spaceflight projects through sharing or augmentation of experiment samples to expand omics analyses on precious spaceflight samples. The partnerships ensure that the maximum amount of data is garnered from spaceflight experiments and made publically available as rapidly as possible via the GLDS. GLDS Version 1.0, went online in April 2015. Software updates and new data releases occur at least quarterly. As of October 2016, the GLDS contains 80 datasets and has search and download capabilities. Version 2.0 is slated for release in September of 2017 and will have expanded, integrated search capabilities leveraging other public omics databases (NCBI GEO, PRIDE, MG-RAST). Future versions in this multi-phase project will provide a collaborative platform for omics data analysis. Data from experiments that explore the biological effects of the spaceflight environment on a wide variety of model organisms are housed in the GLDS including data from rodents, invertebrates, plants and microbes. Human datasets are currently limited to those with anonymized data (e.g., from cultured cell lines). GeneLab ensures prompt release and open access to high-throughput genomics, transcriptomics, proteomics, and metabolomics data from spaceflight and ground-based simulations of microgravity, radiation or other space environment factors. The data are meticulously curated to assure that accurate experimental and sample processing metadata are included with each data set. GLDS download volumes indicate strong interest of the scientific community in these data. To date GeneLab has partnered with multiple experiments including two plant (Arabidopsis thaliana) experiments, two mice experiments, and several microbe experiments. GeneLab optimized protocols in the rodent partnerships for maximum yield of RNA, DNA and protein from tissues harvested and preserved during the SpaceX-4 mission, as well as from tissues from mice that were frozen intact during spaceflight and later dissected on the ground. Analysis of GeneLab data will contribute fundamental knowledge of how the space environment affects biological systems, and as well as yield terrestrial benefits resulting from mitigation strategies to prevent effects observed during exposure to space environments.
Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold

PubMed Central

Nijkamp, Jurgen F.; Pop, Mihai; Reinders, Marcel J. T.; de Ridder, Dick

2013-01-01

Motivation: Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes. Results: We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation. We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets. Availability: MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/software Contact: d.deridder@tudelft.nl PMID:24058058
Patterns, biases and prospects in the distribution and diversity of Neotropical snakes.

PubMed

Guedes, Thaís B; Sawaya, Ricardo J; Zizka, Alexander; Laffan, Shawn; Faurby, Søren; Pyron, R Alexander; Bérnils, Renato S; Jansen, Martin; Passos, Paulo; Prudente, Ana L C; Cisneros-Heredia, Diego F; Braz, Henrique B; Nogueira, Cristiano de C; Antonelli, Alexandre; Meiri, Shai

2018-01-01

We generated a novel database of Neotropical snakes (one of the world's richest herpetofauna) combining the most comprehensive, manually compiled distribution dataset with publicly available data. We assess, for the first time, the diversity patterns for all Neotropical snakes as well as sampling density and sampling biases. We compiled three databases of species occurrences: a dataset downloaded from the Global Biodiversity Information Facility (GBIF), a verified dataset built through taxonomic work and specialized literature, and a combined dataset comprising a cleaned version of the GBIF dataset merged with the verified dataset. Neotropics, Behrmann projection equivalent to 1° × 1°. Specimens housed in museums during the last 150 years. Squamata: Serpentes. Geographical information system (GIS). The combined dataset provides the most comprehensive distribution database for Neotropical snakes to date. It contains 147,515 records for 886 species across 12 families, representing 74% of all species of snakes, spanning 27 countries in the Americas. Species richness and phylogenetic diversity show overall similar patterns. Amazonia is the least sampled Neotropical region, whereas most well-sampled sites are located near large universities and scientific collections. We provide a list and updated maps of geographical distribution of all snake species surveyed. The biodiversity metrics of Neotropical snakes reflect patterns previously documented for other vertebrates, suggesting that similar factors may determine the diversity of both ectothermic and endothermic animals. We suggest conservation strategies for high-diversity areas and sampling efforts be directed towards Amazonia and poorly known species.
dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface.

PubMed

Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

2009-08-25

Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms.
dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface

PubMed Central

Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

2009-01-01

Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156

Multi-Factor Analysis for Selecting Lunar Exploration Soft Landing Area and the best Cruise Route

NASA Astrophysics Data System (ADS)

Mou, N.; Li, J.; Meng, Z.; Zhang, L.; Liu, W.

2018-04-01

Selecting the right soft landing area and planning a reasonable cruise route are the basic tasks of lunar exploration. In this paper, the Von Karman crater in the Antarctic Aitken basin on the back of the moon is used as the study area, and multi-factor analysis is used to evaluate the landing area and cruise route of lunar exploration. The evaluation system mainly includes the factors such as the density of craters, the impact area of craters, the formation of the whole area and the formation of some areas, such as the vertical structure, rock properties and the content of (FeO + TiO2), which can reflect the significance of scientific exploration factor. And the evaluation of scientific exploration is carried out on the basis of safety and feasibility. On the basis of multi-factor superposition analysis, three landing zones A, B and C are selected, and the appropriate cruising route is analyzed through scientific research factors. This study provides a scientific basis for the lunar probe landing and cruise route planning, and it provides technical support for the subsequent lunar exploration.
A geologic and mineral exploration spatial database for the Stillwater Complex, Montana

USGS Publications Warehouse

Zientek, Michael L.; Parks, Heather L.

2014-01-01

This report provides essential spatially referenced datasets based on geologic mapping and mineral exploration activities conducted from the 1920s to the 1990s. This information will facilitate research on the complex and provide background material needed to explore for mineral resources and to develop sound land-management policy.
Facilitating Stewardship of scientific data through standards based workflows

NASA Astrophysics Data System (ADS)

Bastrakova, I.; Kemp, C.; Potter, A. K.

2013-12-01

There are main suites of standards that can be used to define the fundamental scientific methodology of data, methods and results. These are firstly Metadata standards to enable discovery of the data (ISO 19115), secondly the Sensor Web Enablement (SWE) suite of standards that include the O&M and SensorML standards and thirdly Ontology that provide vocabularies to define the scientific concepts and relationships between these concepts. All three types of standards have to be utilised by the practicing scientist to ensure that those who ultimately have to steward the data stewards to ensure that the data can be preserved curated and reused and repurposed. Additional benefits of this approach include transparency of scientific processes from the data acquisition to creation of scientific concepts and models, and provision of context to inform data use. Collecting and recording metadata is the first step in scientific data flow. The primary role of metadata is to provide details of geographic extent, availability and high-level description of data suitable for its initial discovery through common search engines. The SWE suite provides standardised patterns to describe observations and measurements taken for these data, capture detailed information about observation or analytical methods, used instruments and define quality determinations. This information standardises browsing capability over discrete data types. The standardised patterns of the SWE standards simplify aggregation of observation and measurement data enabling scientists to transfer disintegrated data to scientific concepts. The first two steps provide a necessary basis for the reasoning about concepts of ';pure' science, building relationship between concepts of different domains (linked-data), and identifying domain classification and vocabularies. Geoscience Australia is re-examining its marine data flows, including metadata requirements and business processes, to achieve a clearer link between scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.
30 CFR 251.3 - Authority and applicability of this part.

Code of Federal Regulations, 2010 CFR

2010-07-01

... applicability of this part. MMS authorizes you to conduct exploration or scientific research activities under... agencies are exempt from the regulations in this part. (c) G&G exploration or G&G scientific research...
3 CFR 8335 - Proclamation 8335 of January 6, 2009. Establishment of the Marianas Trench Marine National Monument

Code of Federal Regulations, 2010 CFR

2010-01-01

... monument-related scientific exploration and research, tourism, and recreational and economic activities and... and enforcement necessary to ensure that scientific exploration and research, tourism, and...
Space Exploration as a Human Enterprise: The Scientific Interest

ERIC Educational Resources Information Center

Sagan, Carl

1973-01-01

Presents examples which illustrate the importance of space exploration in diverse aspects of scientific knowledge. Indicates that human beings are today not wise enough to anticipate the practical benefits of planetary studies. (CC)
Public Data Archiving in Ecology and Evolution: How Well Are We Doing?

PubMed Central

Roche, Dominique G.; Kruuk, Loeske E. B.; Lanfear, Robert; Binning, Sandra A.

2015-01-01

Policies that mandate public data archiving (PDA) successfully increase accessibility to data underlying scientific publications. However, is the data quality sufficient to allow reuse and reanalysis? We surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and have a strong PDA policy. Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely prevented reuse. We suggest that cultural shifts facilitating clearer benefits to authors are necessary to achieve high-quality PDA and highlight key guidelines to help authors increase their data’s reuse potential and compliance with journal data policies. PMID:26556502
Strategy for outer planets exploration

NASA Technical Reports Server (NTRS)

1975-01-01

NASA's Planetary Programs Office formed a number of scientific working groups to study in depth the potential scientific return from the various candidate missions to the outer solar system. The results of these working group studies were brought together in a series of symposia to evaluate the potential outer planet missions and to discuss strategies for exploration of the outer solar system that were consistent with fiscal constraints and with anticipated spacecraft and launch vehicle capabilities. A logical, scientifically sound, and cost effective approach to exploration of the outer solar system is presented.
Characterizing scientific production and consumption in Physics

PubMed Central

Zhang, Qian; Perra, Nicola; Gonçalves, Bruno; Ciulla, Fabio; Vespignani, Alessandro

2013-01-01

We analyze the entire publication database of the American Physical Society generating longitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key cities in the production and consumption of knowledge in Physics as a function of time. The results from the scientific production ranking algorithm allow us to characterize the top cities for scholarly research in Physics. Although we focus on a single dataset concerning a specific field, the methodology presented here opens the path to comparative studies of the dynamics of knowledge across disciplines and research areas. PMID:23571320
Large and linked in scientific publishing

PubMed Central

2012-01-01

We are delighted to announce the launch of GigaScience, an online open-access journal that focuses on research using or producing large datasets in all areas of biological and biomedical sciences. GigaScience is a new type of journal that provides standard scientific publishing linked directly to a database that hosts all the relevant data. The primary goals for the journal, detailed in this editorial, are to promote more rapid data release, broader use and reuse of data, improved reproducibility of results, and direct, easy access between analyses and their data. Direct and permanent connections of scientific analyses and their data (achieved by assigning all hosted data a citable DOI) will enable better analysis and deeper interpretation of the data in the future. PMID:23587310
Large and linked in scientific publishing.

PubMed

Goodman, Laurie; Edmunds, Scott C; Basford, Alexandra T

2012-07-12

We are delighted to announce the launch of GigaScience, an online open-access journal that focuses on research using or producing large datasets in all areas of biological and biomedical sciences. GigaScience is a new type of journal that provides standard scientific publishing linked directly to a database that hosts all the relevant data. The primary goals for the journal, detailed in this editorial, are to promote more rapid data release, broader use and reuse of data, improved reproducibility of results, and direct, easy access between analyses and their data. Direct and permanent connections of scientific analyses and their data (achieved by assigning all hosted data a citable DOI) will enable better analysis and deeper interpretation of the data in the future.
A game theoretic analysis of research data sharing.

PubMed

Pronk, Tessa E; Wiersma, Paulien H; van Weerden, Anne; Schieving, Feike

2015-01-01

While reusing research data has evident benefits for the scientific community as a whole, decisions to archive and share these data are primarily made by individual researchers. In this paper we analyse, within a game theoretical framework, how sharing and reuse of research data affect individuals who share or do not share their datasets. We construct a model in which there is a cost associated with sharing datasets whereas reusing such sets implies a benefit. In our calculations, conflicting interests appear for researchers. Individual researchers are always better off not sharing and omitting the sharing cost, at the same time both sharing and not sharing researchers are better off if (almost) all researchers share. Namely, the more researchers share, the more benefit can be gained by the reuse of those datasets. We simulated several policy measures to increase benefits for researchers sharing or reusing datasets. Results point out that, although policies should be able to increase the rate of sharing researchers, and increased discoverability and dataset quality could partly compensate for costs, a better measure would be to directly lower the cost for sharing, or even turn it into a (citation-) benefit. Making data available would in that case become the most profitable, and therefore stable, strategy. This means researchers would willingly make their datasets available, and arguably in the best possible way to enable reuse.
App-lifying USGS Earth Science Data: Engaging the public through Challenge.gov

NASA Astrophysics Data System (ADS)

Frame, M. T.

2013-12-01

With the goal of promoting innovative use and applications of USGS data, USGS Core Science Analytics and Synthesis (CSAS) launched the first USGS Challenge: App-lifying USGS Earth Science Data. While initiated before the recent Office of Science and Technology Policy's memorandum 'Increasing Access to the Results of Federally Funded Scientific Research', our challenge focused on one of the core tenets of the memorandum- expanding discoverability, accessibility and usability of CSAS data. From January 9 to April 1, 2013, we invited developers, information scientists, biologists/ecologists, and scientific data visualization specialists to create applications for selected USGS datasets. Identifying new, innovative ways to represent, apply, and make these data available is a high priority for our leadership. To help boost innovation, our only constraint on the challengers stated they must incorporate at least one of the identified datasets in their application. Winners were selected based on the relevance to the USGS and CSAS missions, innovation in design, and overall ease of use of the application. The winner for Best Overall App was TaxaViewer by the rOpenSci group. TaxaViewer is a Web interface to a mashup of data from the USGS-sponsored interagency Integrated Taxonomic Information System (ITIS) and other data from the Phylotastic taxonomic Name service, the Global Invasive Species Database, Phylomatic, and the Global Biodiversity Information Facility. The Popular Choice App award, selected through a public vote on the submissions, went to the Species Comparison Tool by Kimberly Sparks of Raleigh, N.C., which allows users to explore the USGS Gap Analysis Program habitat distribution and/or range of two species concurrently. The application also incorporates ITIS data and provides external links to NatureServe species information. Our results indicated that running a challenge was an effective method for promoting our data products and therefore improving accessibility. We had approximately 7,000 unique visitors to our challenge site and a corresponding increase in visits of 50% to our CSAS Web site. Similarly, we saw an increase for some of our data product's Web sites. For instance, ScienceBase received three times more visits during the period of the challenge. Using the challenge as a test case for accessibility of our data, we identified improvements for making our datasets more accessible, identified new ways to integrate across our datasets, and increased the visibility of our program. Feedback we received from participants led us to form a Web Services Team to create good governance by a best practices approach to the data services for our national products. Because this is the first challenge that USGS has done, all of our documentation is available for others in the USGS to use in running their own challenges hopefully leading to an increase in accessibility not just for CSAS but for all of USGS. In future challenges, we expect to focus more narrowly on specific natural resource questions.
Exploring the Possibilities: Earth and Space Science Missions in the Context of Exploration

NASA Technical Reports Server (NTRS)

Pfarr, Barbara; Calabrese, Michael; Kirkpatrick, James; Malay, Jonathan T.

2006-01-01

According to Dr. Edward J. Weiler, Director of the Goddard Space Flight Center, "Exploration without science is tourism". At the American Astronautical Society's 43rd Annual Robert H. Goddard Memorial Symposium it was quite apparent to all that NASA's current Exploration Initiative is tightly coupled to multiple scientific initiatives: exploration will enable new science and science will enable exploration. NASA's Science Mission Directorate plans to develop priority science missions that deliver science that is vital, compelling and urgent. This paper will discuss the theme of the Goddard Memorial Symposium that science plays a key role in exploration. It will summarize the key scientific questions and some of the space and Earth science missions proposed to answer them, including the Mars and Lunar Exploration Programs, the Beyond Einstein and Navigator Programs, and the Earth-Sun System missions. It will also discuss some of the key technologies that will enable these missions, including the latest in instruments and sensors, large space optical system technologies and optical communications, and briefly discuss developments and achievements since the Symposium. Throughout history, humans have made the biggest scientific discoveries by visiting unknown territories; by going to the Moon and other planets and by seeking out habitable words, NASA is continuing humanity's quest for scientific knowledge.
Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure.

PubMed

P Tafti, Ahmad; Badger, Jonathan; LaRose, Eric; Shirzadi, Ehsan; Mahnke, Andrea; Mayer, John; Ye, Zhan; Page, David; Peissig, Peggy

2017-12-08

The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. ©Ahmad P Tafti, Jonathan Badger, Eric LaRose, Ehsan Shirzadi, Andrea Mahnke, John Mayer, Zhan Ye, David Page, Peggy Peissig. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 08.12.2017.
Standardized data sharing in a paediatric oncology research network--a proof-of-concept study.

PubMed

Hochedlinger, Nina; Nitzlnader, Michael; Falgenhauer, Markus; Welte, Stefan; Hayn, Dieter; Koumakis, Lefteris; Potamias, George; Tsiknakis, Manolis; Saraceno, Davide; Rinaldi, Eugenia; Ladenstein, Ruth; Schreier, Günter

2015-01-01

Data that has been collected in the course of clinical trials are potentially valuable for additional scientific research questions in so called secondary use scenarios. This is of particular importance in rare disease areas like paediatric oncology. If data from several research projects need to be connected, so called Core Datasets can be used to define which information needs to be extracted from every involved source system. In this work, the utility of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) as a format for Core Datasets was evaluated and a web tool was developed which received Source ODM XML files and--via Extensible Stylesheet Language Transformation (XSLT)--generated standardized Core Dataset ODM XML files. Using this tool, data from different source systems were extracted and pooled for joined analysis in a proof-of-concept study, facilitating both, basic syntactic and semantic interoperability.
Data integration: Combined imaging and electrophysiology data in the cloud.

PubMed

Kini, Lohith G; Davis, Kathryn A; Wagenaar, Joost B

2016-01-01

There has been an increasing effort to correlate electrophysiology data with imaging in patients with refractory epilepsy over recent years. IEEG.org provides a free-access, rapidly growing archive of imaging data combined with electrophysiology data and patient metadata. It currently contains over 1200 human and animal datasets, with multiple data modalities associated with each dataset (neuroimaging, EEG, EKG, de-identified clinical and experimental data, etc.). The platform is developed around the concept that scientific data sharing requires a flexible platform that allows sharing of data from multiple file formats. IEEG.org provides high- and low-level access to the data in addition to providing an environment in which domain experts can find, visualize, and analyze data in an intuitive manner. Here, we present a summary of the current infrastructure of the platform, available datasets and goals for the near future. Copyright © 2015 Elsevier Inc. All rights reserved.
Data integration: Combined Imaging and Electrophysiology data in the cloud

PubMed Central

Kini, Lohith G.; Davis, Kathryn A.; Wagenaar, Joost B.

2015-01-01

There has been an increasing effort to correlate electrophysiology data with imaging in patients with refractory epilepsy over recent years. IEEG.org provides a free-access, rapidly growing archive of imaging data combined with electrophysiology data and patient metadata. It currently contains over 1200 human and animal datasets, with multiple data modalities associated with each dataset (neuroimaging, EEG, EKG, de-identified clinical and experimental data, etc.). The platform is developed around the concept that scientific data sharing requires a flexible platform that allows sharing of data from multiple file-formats. IEEG.org provides high and low-level access to the data in addition to providing an environment in which domain experts can find, visualize, and analyze data in an intuitive manner. Here, we present a summary of the current infrastructure of the platform, available datasets and goals for the near future. PMID:26044858
Use Hierarchical Storage and Analysis to Exploit Intrinsic Parallelism

NASA Astrophysics Data System (ADS)

Zender, C. S.; Wang, W.; Vicente, P.

2013-12-01

Big Data is an ugly name for the scientific opportunities and challenges created by the growing wealth of geoscience data. How to weave large, disparate datasets together to best reveal their underlying properties, to exploit their strengths and minimize their weaknesses, to continually aggregate more information than the world knew yesterday and less than we will learn tomorrow? Data analytics techniques (statistics, data mining, machine learning, etc.) can accelerate pattern recognition and discovery. However, often researchers must, prior to analysis, organize multiple related datasets into a coherent framework. Hierarchical organization permits entire dataset to be stored in nested groups that reflect their intrinsic relationships and similarities. Hierarchical data can be simpler and faster to analyze by coding operators to automatically parallelize processes over isomorphic storage units, i.e., groups. The newest generation of netCDF Operators (NCO) embody this hierarchical approach, while still supporting traditional analysis approaches. We will use NCO to demonstrate the trade-offs involved in processing a prototypical Big Data application (analysis of CMIP5 datasets) using hierarchical and traditional analysis approaches.
Securely Measuring the Overlap between Private Datasets with Cryptosets

PubMed Central

Swamidass, S. Joshua; Matlock, Matthew; Rozenblit, Leon

2015-01-01

Many scientific questions are best approached by sharing data—collected by different groups or across large collaborative networks—into a combined analysis. Unfortunately, some of the most interesting and powerful datasets—like health records, genetic data, and drug discovery data—cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset’s contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach “information-theoretic” security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure. PMID:25714898

Research Infrastructure and Scientific Collections: The Supply and Demand of Scientific Research

NASA Astrophysics Data System (ADS)

Graham, E.; Schindel, D. E.

2016-12-01

Research infrastructure is essential in both experimental and observational sciences and is commonly thought of as single-sited facilities. In contrast, object-based scientific collections are distributed in nearly every way, including by location, taxonomy, geologic epoch, discipline, collecting processes, benefits sharing rules, and many others. These diffused collections may have been amassed for a particular discipline, but their potential for use and impact in other fields needs to be explored. Through a series of cross-disciplinary activities, Scientific Collections International (SciColl) has explored and developed new ways in which the supply of scientific collections can meet the demand of researchers in unanticipated ways. From cross-cutting workshops on emerging infectious diseases and food security, to an online portal of collections, SciColl aims to illustrate the scope and value of object-based scientific research infrastructure. As distributed infrastructure, the full impact of scientific collections to the research community is a result of discovering, utilizing, and networking these resources. Examples and case studies from infectious disease research, food security topics, and digital connectivity will be explored.
Exploring Sedimentary Basins with High Frequency Receiver Function: the Dublin Basin Case Study

NASA Astrophysics Data System (ADS)

Licciardi, A.; Piana Agostinetti, N.

2015-12-01

The Receiver Function (RF) method is a widely applied seismological tool for the imaging of crustal and lithospheric structures beneath a single seismic station with one to tens kilometers of vertical resolution. However, detailed information about the upper crust (0-10 km depth) can also be retrieved by increasing the frequency content of the analyzed RF data-set (with a vertical resolution lower than 0.5km). This information includes depth of velocity contrasts, S-wave velocities within layers, as well as presence and location of seismic anisotropy or dipping interfaces (e.g., induced by faulting) at depth. These observables provides valuable constraints on the structural settings and properties of sedimentary basins both for scientific and industrial applications. To test the RF capabilities for this high resolution application, six broadband seismic stations have been deployed across the southwestern margin of the Dublin Basin (DB), Ireland, whose geothermal potential has been investigated in the last few years. With an inter-station distance of about 1km, this closely spaced array has been designed to provide a clear picture of the structural transition between the margin and the inner portion of the basin. In this study, a Bayesian approach is used to retrieve the posterior probability distributions of S-wave velocity at depth beneath each seismic station. A multi-frequency RF data-set is analyzed and RF and curves of apparent velocity are jointly inverted to better constrain absolute velocity variations. A pseudo 2D section is built to observe the lateral changes in elastic properties across the margin of the basin with a focus in the shallow portion of the crust. Moreover, by means of the harmonic decomposition technique, the azimuthal variations in the RF data-set are isolated and interpreted in terms of anisotropy and dipping interfaces associated with the major fault system in the area. These results are compared with the available information from previous seismic active surveys in the area, including boreholes data.
Gene essentiality, conservation index and co-evolution of genes in cyanobacteria.

PubMed

Tiruveedula, Gopi Siva Sai; Wangikar, Pramod P

2017-01-01

Cyanobacteria, a group of photosynthetic prokaryotes, dominate the earth with ~ 1015 g wet biomass. Despite diversity in habitats and an ancient origin, cyanobacterial phylum has retained a significant core genome. Cyanobacteria are being explored for direct conversion of solar energy and carbon dioxide into biofuels. For this, efficient cyanobacterial strains will need to be designed via metabolic engineering. This will require identification of target knockouts to channelize the flow of carbon toward the product of interest while minimizing deletions of essential genes. We propose "Gene Conservation Index" (GCI) as a quick measure to predict gene essentiality in cyanobacteria. GCI is based on phylogenetic profile of a gene constructed with a reduced dataset of cyanobacterial genomes. GCI is the percentage of organism clusters in which the query gene is present in the reduced dataset. Of the 750 genes deemed to be essential in the experimental study on S. elongatus PCC 7942, we found 494 to be conserved across the phylum which largely comprise of the essential metabolic pathways. On the contrary, the conserved but non-essential genes broadly comprise of genes required under stress conditions. Exceptions to this rule include genes such as the glycogen synthesis and degradation enzymes, deoxyribose-phosphate aldolase (DERA), glucose-6-phosphate 1-dehydrogenase (zwf) and fructose-1,6-bisphosphatase class1, which are conserved but non-essential. While the essential genes are to be avoided during gene knockout studies as potentially lethal deletions, the non-essential but conserved set of genes could be interesting targets for metabolic engineering. Further, we identify clusters of co-evolving genes (CCG), which provide insights that may be useful in annotation. Principal component analysis (PCA) plots of the CCGs are demonstrated as data visualization tools that are complementary to the conventional heatmaps. Our dataset consists of phylogenetic profiles for 23,643 non-redundant cyanobacterial genes. We believe that the data and the analysis presented here will be a great resource to the scientific community interested in cyanobacteria.
Data Prospecting Framework - a new approach to explore "big data" in Earth Science

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Rushing, J.; Lin, A.; Kuo, K.

2012-12-01

Due to advances in sensors, computation and storage, cost and effort required to produce large datasets have been significantly reduced. As a result, we are seeing a proliferation of large-scale data sets being assembled in almost every science field, especially in geosciences. Opportunities to exploit the "big data" are enormous as new hypotheses can be generated by combining and analyzing large amounts of data. However, such a data-driven approach to science discovery assumes that scientists can find and isolate relevant subsets from vast amounts of available data. Current Earth Science data systems only provide data discovery through simple metadata and keyword-based searches and are not designed to support data exploration capabilities based on the actual content. Consequently, scientists often find themselves downloading large volumes of data, struggling with large amounts of storage and learning new analysis technologies that will help them separate the wheat from the chaff. New mechanisms of data exploration are needed to help scientists discover the relevant subsets We present data prospecting, a new content-based data analysis paradigm to support data-intensive science. Data prospecting allows the researchers to explore big data in determining and isolating data subsets for further analysis. This is akin to geo-prospecting in which mineral sites of interest are determined over the landscape through screening methods. The resulting "data prospects" only provide an interaction with and feel for the data through first-look analytics; the researchers would still have to download the relevant datasets and analyze them deeply using their favorite analytical tools to determine if the datasets will yield new hypotheses. Data prospecting combines two traditional categories of data analysis, data exploration and data mining within the discovery step. Data exploration utilizes manual/interactive methods for data analysis such as standard statistical analysis and visualization, usually on small datasets. On the other hand, data mining utilizes automated algorithms to extract useful information. Humans guide these automated algorithms and specify algorithm parameters (training samples, clustering size, etc.). Data Prospecting combines these two approaches using high performance computing and the new techniques for efficient distributed file access.
Mesoscale brain explorer, a flexible python-based image analysis and visualization tool.

PubMed

Haupt, Dirk; Vanni, Matthieu P; Bolanos, Federico; Mitelut, Catalin; LeDue, Jeffrey M; Murphy, Tim H

2017-07-01

Imaging of mesoscale brain activity is used to map interactions between brain regions. This work has benefited from the pioneering studies of Grinvald et al., who employed optical methods to image brain function by exploiting the properties of intrinsic optical signals and small molecule voltage-sensitive dyes. Mesoscale interareal brain imaging techniques have been advanced by cell targeted and selective recombinant indicators of neuronal activity. Spontaneous resting state activity is often collected during mesoscale imaging to provide the basis for mapping of connectivity relationships using correlation. However, the information content of mesoscale datasets is vast and is only superficially presented in manuscripts given the need to constrain measurements to a fixed set of frequencies, regions of interest, and other parameters. We describe a new open source tool written in python, termed mesoscale brain explorer (MBE), which provides an interface to process and explore these large datasets. The platform supports automated image processing pipelines with the ability to assess multiple trials and combine data from different animals. The tool provides functions for temporal filtering, averaging, and visualization of functional connectivity relations using time-dependent correlation. Here, we describe the tool and show applications, where previously published datasets were reanalyzed using MBE.
GeoNotebook: Browser based Interactive analysis and visualization workflow for very large climate and geospatial datasets

NASA Astrophysics Data System (ADS)

Ozturk, D.; Chaudhary, A.; Votava, P.; Kotfila, C.

2016-12-01

Jointly developed by Kitware and NASA Ames, GeoNotebook is an open source tool designed to give the maximum amount of flexibility to analysts, while dramatically simplifying the process of exploring geospatially indexed datasets. Packages like Fiona (backed by GDAL), Shapely, Descartes, Geopandas, and PySAL provide a stack of technologies for reading, transforming, and analyzing geospatial data. Combined with the Jupyter notebook and libraries like matplotlib/Basemap it is possible to generate detailed geospatial visualizations. Unfortunately, visualizations generated is either static or does not perform well for very large datasets. Also, this setup requires a great deal of boilerplate code to create and maintain. Other extensions exist to remedy these problems, but they provide a separate map for each input cell and do not support map interactions that feed back into the python environment. To support interactive data exploration and visualization on large datasets we have developed an extension to the Jupyter notebook that provides a single dynamic map that can be managed from the Python environment, and that can communicate back with a server which can perform operations like data subsetting on a cloud-based cluster.
MIPE: A metagenome-based community structure explorer and SSU primer evaluation tool

PubMed Central

Zhou, Quan

2017-01-01

An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples. PMID:28350876
Explore with Us

NASA Technical Reports Server (NTRS)

Morales, Lester

2012-01-01

The fundamental goal of this vision is to advance U.S. scientific, security and economic interest through a robust space exploration program. Implement a sustained and affordable human and robotic program to explore the solar system and beyond. Extend human presence across the solar system, starting with a human return to the Moon by the year 2020, in preparation for human exploration of Mars and other destinations. Develop the innovative technologies, knowledge, and infrastructures both to explore and to support decisions about the destinations for human exploration. Promote international and commercial participation in exploration to further U.S. scientific, security, and economic interests.
Advanced Aerobots for Scientific Exploration

NASA Technical Reports Server (NTRS)

Behar, Alberto; Raymond, Carol A.; Matthews, Janet B.; Nicaise, Fabien; Jones, Jack A.

2010-01-01

The Picosat and Uninhabited Aerial Vehicle Systems Engineering (PAUSE) project is developing balloon-borne instrumentation systems as aerobots for scientific exploration of remote planets and for diverse terrestrial purposes that can include scientific exploration, mapping, and military surveillance. The underlying concept of balloon-borne gondolas housing outer-space-qualified scientific instruments and associated data-processing and radio-communication equipment is not new. Instead, the novelty lies in numerous design details that, taken together, make a PAUSE aerobot smaller, less expensive, and less massive, relative to prior aerobots developed for similar purposes: Whereas the gondola (including the instrumentation system housed in it) of a typical prior aerobot has a mass of hundreds of kilograms, the mass of the gondola (with instrumentation system) of a PAUSE aerobot is a few kilograms.
Study of a comet rendezvous mission, volume 1

NASA Technical Reports Server (NTRS)

1972-01-01

The feasibility, scientific objectives, modes of exploration and implementation alternatives of a rendezvous mission to Encke's comet in 1984 are considered. Principal emphasis is placed on developing the scientific rationale for such a mission, based on available knowledge and best estimates of this comet's physical characteristics, including current theories of its origin, evolution and composition. Studied are mission profile alternatives, performance tradeoffs, preferred exploration strategy, and a spacecraft design concept capable of performing this mission. The study showed that the major scientific objectives can be met by a Titan IIID/Centaur-launched 17.5 kw solar electric propulsion spacecraft which carries 60 kg of scientific instruments and is capable of extensive maneuvering within the comet envelope to explore the coma, tail and nucleus.
Technical and Scientific Support for Passive Acoustic Monitoring in the Research Cruise MED09

DTIC Science & Technology

2009-09-30

The sound analysis workstation developed for previous Sirena cruises was further improved with 8 channels recording capability at 192kHz and 2...This dataset will be compared with the one produced in Sirena 08 (Alboran Sea only) and included in distribution/density models being developed at
The Alzheimer’s Disease Neuroimaging Initiative Informatics Core: A Decade in Review

PubMed Central

Toga, Arthur W.; Crawford, Karen L.

2015-01-01

The Informatics Core of the Alzheimer’s Diseases Neuroimaging Initiative (ADNI) has coordinated data integration and dissemination for a continually growing and complex dataset in which both data contributors and recipients span institutions, scientific disciplines and geographic boundaries. This article provides an update on the accomplishments and future plans. PMID:26194316
The Model Analyst’s Toolkit: Scientific Model Development, Analysis, and Validation

DTIC Science & Technology

2014-08-20

many different synthetic series can be generated at once. If the series already exists in the dataset, it is updated to reflect the new values. The...Testing for causality: a personal viewpoint. Journal of Economic Dynamics and Control, 2, 329-352. Manning, C., Raghavan, R., and Schutze , H. (2008
Open Simulation Laboratories [Guest editors' introduction

DOE PAGES

Alexander, Francis J.; Meneveau, Charles

2015-09-01

The introduction for the special issue on open simulation laboratories, the guest editors describe how OSLs will become more common as their potential is better understood and they begin providing access to valuable datasets to much larger segments of the scientific community. Moreover, new analysis tools and ways to do science will inevitably develop as a result.
Optimizing Performance of Scientific Visualization Software to Support Frontier-Class Computations

DTIC Science & Technology

2015-08-01

Hypersonic Sciences Branch) for providing sample datasets and permission to use an image of Q_Criterion isosurface for this report; Dr Anders Grimsrud...10.1. EnSight CSM and CFD Post processing; c2014 [accessed 2015 July 6] http:// www.ceisoftware.com. Main Page. XDMF; 2014 Nov 7 [2015 July 6] http
Scientific Rationale and Requirements for a Global Seismic Network on Mars

NASA Technical Reports Server (NTRS)

Solomon, Sean C.; Anderson, Don L.; Banerdt, W. Bruce; Butler, Rhett G.; Davis, Paul M.; Duennebier, Frederick K.; Nakamura, Yosio; Okal, Emile A.; Phillips, Roger J.

1991-01-01

Following a brief overview of the mission concepts for a Mars Global Network Mission as of the time of the workshop, we present the principal scientific objectives to be achieved by a Mars seismic network. We review the lessons for extraterrestrial seismology gained from experience to date on the Moon and on Mars. An important unknown on Mars is the expected rate of seismicity, but theoretical expectations and extrapolation from lunar experience both support the view that seismicity rates, wave propagation characteristics, and signal-to-noise ratios are favorable to the collection of a scientifically rich dataset during the multiyear operation of a global seismic experiment. We discuss how particular types of seismic waves will provide the most useful information to address each of the scientific objectives, and this discussion provides the basis for a strategy for station siting. Finally, we define the necessary technical requirements for the seismic stations.
Stewardship of Integrity in Scientific Communication.

PubMed

Albertine, Kurt H

2018-06-14

Integrity in the pursuit of discovery through application of the scientific method and reporting the results is an obligation for each of us as scientists. We cannot let the value of science be diminished because discovering knowledge is vital to understand ourselves and our impacts on the earth. We support the value of science by our stewardship of integrity in the conduct, training, reporting, and proposing of scientific investigation. The players who have these responsibilities are authors, reviewers, editors, and readers. Each role has to be played with vigilance for ethical behavior, including compliance with regulations for protections of study subjects, use of select agents and biohazards, regulations of use of stem cells, resource sharing, posting datasets to public repositories, etc. The positive take-home message is that the scientific community is taking steps in behavior to protect the integrity of science. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Distinguishing Provenance Equivalence of Earth Science Data

NASA Technical Reports Server (NTRS)

Tilmes, Curt; Yesha, Ye; Halem, M.

2010-01-01

Reproducibility of scientific research relies on accurate and precise citation of data and the provenance of that data. Earth science data are often the result of applying complex data transformation and analysis workflows to vast quantities of data. Provenance information of data processing is used for a variety of purposes, including understanding the process and auditing as well as reproducibility. Certain provenance information is essential for producing scientifically equivalent data. Capturing and representing that provenance information and assigning identifiers suitable for precisely distinguishing data granules and datasets is needed for accurate comparisons. This paper discusses scientific equivalence and essential provenance for scientific reproducibility. We use the example of an operational earth science data processing system to illustrate the application of the technique of cascading digital signatures or hash chains to precisely identify sets of granules and as provenance equivalence identifiers to distinguish data made in an an equivalent manner.
EarthServer: Visualisation and use of uncertainty as a data exploration tool

NASA Astrophysics Data System (ADS)

Walker, Peter; Clements, Oliver; Grant, Mike

2013-04-01

The Ocean Science/Earth Observation community generates huge datasets from satellite observation. Until recently it has been difficult to obtain matching uncertainty information for these datasets and to apply this to their processing. In order to make use of uncertainty information when analysing "Big Data" we need both the uncertainty itself (attached to the underlying data) and a means of working with the combined product without requiring the entire dataset to be downloaded. The European Commission FP7 project EarthServer (http://earthserver.eu) is addressing the problem of accessing and ad-hoc analysis of extreme-size Earth Science data using cutting-edge Array Database technology. The core software (Rasdaman) and web services wrapper (Petascope) allow huge datasets to be accessed using Open Geospatial Consortium (OGC) standard interfaces including the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on any of the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. The ESA Ocean Colour - Climate Change Initiative (OC-CCI) project (http://www.esa-oceancolour-cci.org/), is producing high-resolution, global ocean colour datasets over the full time period (1998-2012) where high quality observations were available. This climate data record includes per-pixel uncertainty data for each variable, based on an analytic method that classifies how much and which types of water are present in a pixel, and assigns uncertainty based on robust comparisons to global in-situ validation datasets. These uncertainty values take two forms, Root Mean Square (RMS) and Bias uncertainty, respectively representing the expected variability and expected offset error. By combining the data produced through the OC-CCI project with the software from the EarthServer project we can produce a novel data offering that allows the use of traditional exploration and access mechanisms such as WMS and WCS. However the real benefits can be seen when utilising WCPS to explore the data . We will show two major benefits to this infrastructure. Firstly we will show that the visualisation of the combined chlorophyll and uncertainty datasets through a web based GIS portal gives users the ability to instantaneously assess the quality of the data they are exploring using traditional web based plotting techniques as well as through novel web based 3 dimensional visualisation. Secondly we will showcase the benefits available when combining these data with the WCPS standard. The uncertainty data can be utilised in queries using the standard WCPS query language. This allows selection of data either for download or use within the query, based on the respective uncertainty values as well as the possibility of incorporating both the chlorophyll data and uncertainty data into complex queries to produce additional novel data products. By filtering with uncertainty at the data source rather than the client we can minimise traffic over the network allowing huge datasets to be worked on with a minimal time penalty.
Scientific Opportunities with ispace, a Lunar Exploration Company

NASA Astrophysics Data System (ADS)

Acierno, K. T.

2016-11-01

This presentation introduces ispace, a Tokyo-based lunar exploration company. Technology applied to the Team Hakuto Google Lunar XPRIZE mission will be described. Finally, it will discuss how developing low cost and mass efficient rovers can support scientific opportunities.

Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment

ERIC Educational Resources Information Center

Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas

2015-01-01

The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…
Interactive visualization and analysis of multimodal datasets for surgical applications.

PubMed

Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

2012-12-01

Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.
Geothermal Case Studies

DOE Data Explorer

Young, Katherine

2014-09-30

database.) In fiscal year 2015, NREL is working with universities to populate additional case studies on OpenEI. The goal is to provide a large enough dataset to start conducting analyses of exploration programs to identify correlations between successful exploration plans for areas with similar geologic occurrence models.
Using Graph Indices for the Analysis and Comparison of Chemical Datasets.

PubMed

Fourches, Denis; Tropsha, Alexander

2013-10-01

In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Role of Datasets on Scientific Influence within Conflict Research

PubMed Central

Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

2016-01-01

We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the operationalization of conflict. In fact, 94% of the works on the CP that analyzed data either relied on publically available datasets, or they generated a dataset and made it public. These datasets appear to be important in the development of conflict research, allowing for cross-case comparisons, and comparisons to previous works. PMID:27124569
The Role of Datasets on Scientific Influence within Conflict Research.

PubMed

Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

2016-01-01

We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the operationalization of conflict. In fact, 94% of the works on the CP that analyzed data either relied on publically available datasets, or they generated a dataset and made it public. These datasets appear to be important in the development of conflict research, allowing for cross-case comparisons, and comparisons to previous works.
Clickstream data yields high-resolution maps of science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bollen, Johan; Van De Sompel, Herbert; Hagberg, Aric

2009-01-01

Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantagees of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science.
Toward Robust Climate Baselining: Objective Assessment of Climate Change Using Widely Distributed Miniaturized Sensors for Accurate World-Wide Geophysical Measurements

DOE R&D Accomplishments Database

Teller, E.; Leith, C.; Canavan, G.; Marion, J.; Wood, L.

2001-11-13

A gap-free, world-wide, ocean-, atmosphere-, and land surface-spanning geophysical data-set of three decades time-duration containing the full set of geophysical parameters characterizing global weather is the scientific perquisite for defining the climate; the generally-accepted definition in the meteorological community is that climate is the 30-year running-average of weather. Until such a tridecadal climate baseline exists, climate change discussions inevitably will have a semi-speculative, vs. a purely scientific, character, as the baseline against which changes are referenced will at least somewhat uncertain.
Downscaled climate projections for the Southeast United States: evaluation and use for ecological applications

USGS Publications Warehouse

Wootten, Adrienne; Smith, Kara; Boyles, Ryan; Terando, Adam; Stefanova, Lydia; Misra, Vasru; Smith, Tom; Blodgett, David L.; Semazzi, Fredrick

2014-01-01

Climate change is likely to have many effects on natural ecosystems in the Southeast U.S. The National Climate Assessment Southeast Technical Report (SETR) indicates that natural ecosystems in the Southeast are likely to be affected by warming temperatures, ocean acidification, sea-level rise, and changes in rainfall and evapotranspiration. To better assess these how climate changes could affect multiple sectors, including ecosystems, climatologists have created several downscaled climate projections (or downscaled datasets) that contain information from the global climate models (GCMs) translated to regional or local scales. The process of creating these downscaled datasets, known as downscaling, can be carried out using a broad range of statistical or numerical modeling techniques. The rapid proliferation of techniques that can be used for downscaling and the number of downscaled datasets produced in recent years present many challenges for scientists and decisionmakers in assessing the impact or vulnerability of a given species or ecosystem to climate change. Given the number of available downscaled datasets, how do these model outputs compare to each other? Which variables are available, and are certain downscaled datasets more appropriate for assessing vulnerability of a particular species? Given the desire to use these datasets for impact and vulnerability assessments and the lack of comparison between these datasets, the goal of this report is to synthesize the information available in these downscaled datasets and provide guidance to scientists and natural resource managers with specific interests in ecological modeling and conservation planning related to climate change in the Southeast U.S. This report enables the Southeast Climate Science Center (SECSC) to address an important strategic goal of providing scientific information and guidance that will enable resource managers and other participants in Landscape Conservation Cooperatives to make science-based climate change adaptation decisions.
Citizen Monitoring during Hazards: The Case of Fukushima Radiation after the 2011 Japanese Earthquake

NASA Astrophysics Data System (ADS)

Hultquist, C.; Cervone, G.

2015-12-01

Citizen-led movements producing scientific environmental information are increasingly common during hazards. After the Japanese earthquake-triggered tsunami in 2011, the government produced airborne remote sensing data of the radiation levels after the Fukushima nuclear reactor failures. Advances in technology enabled citizens to monitor radiation by innovative mobile devices built from components bought on the Internet. The citizen-led Safecast project measured on-ground levels of radiation in the Fukushima prefecture which total 14 million entries to date in Japan. This non-authoritative citizen science collection recorded radiation levels at specific coordinates and times is available online, yet the reliability and validity of the data had not been assessed. The nuclear incident provided a case for assessment with comparable dimensions of citizen science and authoritative data. To perform a comparison of the datasets, standardization was required. The sensors were calibrated scientifically but collected using different units of measure. Radiation decays over time so temporal interpolation was necessary for comparison of measurements as being the same time frame. Finally, the GPS located points were selected within the overlapping spatial extent of 500 meters. This study spatially analyzes and statistically compares citizen-volunteered and government-generated radiation data. Quantitative measures are used to assess the similarity and difference in the datasets. Radiation measurements from the same geographic extents show similar spatial variations which suggests that citizen science data can be comparable with government-generated measurements. Validation of Safecast demonstrates that we can infer scientific data from unstructured and not vested data. Citizen science can provide real-time data for situational awareness which is crucial for decision making during disasters. This project provides a methodology for comparing datasets of radiological measurements over time and space. Integrating data for assessment from different earth sensing systems is an essential step to address the big data challenges of volume, velocity, variety, and veracity.
Effects of VR system fidelity on analyzing isosurface visualization of volume datasets.

PubMed

Laha, Bireswar; Bowman, Doug A; Socha, John J

2014-04-01

Volume visualization is an important technique for analyzing datasets from a variety of different scientific domains. Volume data analysis is inherently difficult because volumes are three-dimensional, dense, and unfamiliar, requiring scientists to precisely control the viewpoint and to make precise spatial judgments. Researchers have proposed that more immersive (higher fidelity) VR systems might improve task performance with volume datasets, and significant results tied to different components of display fidelity have been reported. However, more information is needed to generalize these results to different task types, domains, and rendering styles. We visualized isosurfaces extracted from synchrotron microscopic computed tomography (SR-μCT) scans of beetles, in a CAVE-like display. We ran a controlled experiment evaluating the effects of three components of system fidelity (field of regard, stereoscopy, and head tracking) on a variety of abstract task categories that are applicable to various scientific domains, and also compared our results with those from our prior experiment using 3D texture-based rendering. We report many significant findings. For example, for search and spatial judgment tasks with isosurface visualization, a stereoscopic display provides better performance, but for tasks with 3D texture-based rendering, displays with higher field of regard were more effective, independent of the levels of the other display components. We also found that systems with high field of regard and head tracking improve performance in spatial judgment tasks. Our results extend existing knowledge and produce new guidelines for designing VR systems to improve the effectiveness of volume data analysis.
The Kalman Filter and High Performance Computing at NASA's Data Assimilation Office (DAO)

NASA Technical Reports Server (NTRS)

Lyster, Peter M.

1999-01-01

Atmospheric data assimilation is a method of combining actual observations with model simulations to produce a more accurate description of the earth system than the observations alone provide. The output of data assimilation, sometimes called "the analysis", are accurate regular, gridded datasets of observed and unobserved variables. This is used not only for weather forecasting but is becoming increasingly important for climate research. For example, these datasets may be used to assess retrospectively energy budgets or the effects of trace gases such as ozone. This allows researchers to understand processes driving weather and climate, which have important scientific and policy implications. The primary goal of the NASA's Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. This presentation will: (1) describe ongoing work on the advanced Kalman/Lagrangian filter parallel algorithm for the assimilation of trace gases in the stratosphere; and (2) discuss the Kalman filter in relation to other presentations from the DAO on Four Dimensional Data Assimilation at this meeting. Although the designation "Kalman filter" is often used to describe the overarching work, the series of talks will show that the scientific software and the kind of parallelization techniques that are being developed at the DAO are very different depending on the type of problem being considered, the extent to which the problem is mission critical, and the degree of Software Engineering that has to be applied.
An Adaptive Prediction-Based Approach to Lossless Compression of Floating-Point Volume Data.

PubMed

Fout, N; Ma, Kwan-Liu

2012-12-01

In this work, we address the problem of lossless compression of scientific and medical floating-point volume data. We propose two prediction-based compression methods that share a common framework, which consists of a switched prediction scheme wherein the best predictor out of a preset group of linear predictors is selected. Such a scheme is able to adapt to different datasets as well as to varying statistics within the data. The first method, called APE (Adaptive Polynomial Encoder), uses a family of structured interpolating polynomials for prediction, while the second method, which we refer to as ACE (Adaptive Combined Encoder), combines predictors from previous work with the polynomial predictors to yield a more flexible, powerful encoder that is able to effectively decorrelate a wide range of data. In addition, in order to facilitate efficient visualization of compressed data, our scheme provides an option to partition floating-point values in such a way as to provide a progressive representation. We compare our two compressors to existing state-of-the-art lossless floating-point compressors for scientific data, with our data suite including both computer simulations and observational measurements. The results demonstrate that our polynomial predictor, APE, is comparable to previous approaches in terms of speed but achieves better compression rates on average. ACE, our combined predictor, while somewhat slower, is able to achieve the best compression rate on all datasets, with significantly better rates on most of the datasets.
CAFE — A New On-Line Resource for Planning Scientific Field Investigations in Planetary Analogue Environments

NASA Astrophysics Data System (ADS)

Preston, L. J.; Barber, S. J.; Grady, M. M.

2012-03-01

The Concepts for Activities in the Field for Exploration (CAFE) project is creating a complete catalogue of terrestrial analogue environments that are appropriate for testing human space exploration-related scientific field activities.
Exploring the Philosophical Underpinnings of Research: Relating Ontology and Epistemology to the Methodology and Methods of the Scientific, Interpretive, and Critical Research Paradigms

ERIC Educational Resources Information Center

Scotland, James

2012-01-01

This paper explores the philosophical underpinnings of three major educational research paradigms: scientific, interpretive, and critical. The aim was to outline and explore the interrelationships between each paradigm's ontology, epistemology, methodology and methods. This paper reveals and then discusses some of the underlying assumptions of…
Lost in space: design of experiments and scientific exploration in a Hogarth Universe.

PubMed

Lendrem, Dennis W; Lendrem, B Clare; Woods, David; Rowland-Jones, Ruth; Burke, Matthew; Chatfield, Marion; Isaacs, John D; Owen, Martin R

2015-11-01

A Hogarth, or 'wicked', universe is an irregular environment generating data to support erroneous beliefs. Here, we argue that development scientists often work in such a universe. We demonstrate that exploring these multidimensional spaces using small experiments guided by scientific intuition alone, gives rise to an illusion of validity and a misplaced confidence in that scientific intuition. By contrast, design of experiments (DOE) permits the efficient mapping of such complex, multidimensional spaces. We describe simulation tools that enable research scientists to explore these spaces in relative safety. Copyright © 2015 Elsevier Ltd. All rights reserved.
Building the Next Generation of Scientific Explorers through Active Engagement with STEM Experts and International Space Station Resources

NASA Technical Reports Server (NTRS)

Graff, P. V.; Vanderbloemen, L.; Higgins, M.; Stefanov, W. L.; Rampe, E.

2015-01-01

Connecting students and teachers in classrooms with science, technology, engineering, and mathematics (STEM) experts provides an invaluable opportunity for all. These experts can share the benefits and utilization of resources from the International Space Station (ISS) while sharing and "translating" exciting science being conducted by professional scientists. Active engagement with these STEM experts involves students in the journey of science and exploration in an enthralling and understandable manner. This active engagement, connecting classrooms with scientific experts, helps inspire and build the next generation of scientific explorers in academia, private industry, and government.
Cyberinfrastructure for Open Science at the Montreal Neurological Institute

PubMed Central

Das, Samir; Glatard, Tristan; Rogers, Christine; Saigle, John; Paiva, Santiago; MacIntyre, Leigh; Safi-Harab, Mouna; Rousseau, Marc-Etienne; Stirling, Jordan; Khalili-Mahani, Najmeh; MacFarlane, David; Kostopoulos, Penelope; Rioux, Pierre; Madjar, Cecile; Lecours-Boucher, Xavier; Vanamala, Sandeep; Adalat, Reza; Mohaddes, Zia; Fonov, Vladimir S.; Milot, Sylvain; Leppert, Ilana; Degroot, Clotilde; Durcan, Thomas M.; Campbell, Tara; Moreau, Jeremy; Dagher, Alain; Collins, D. Louis; Karamchandani, Jason; Bar-Or, Amit; Fon, Edward A.; Hoge, Rick; Baillet, Sylvain; Rouleau, Guy; Evans, Alan C.

2017-01-01

Data sharing is becoming more of a requirement as technologies mature and as global research and communications diversify. As a result, researchers are looking for practical solutions, not only to enhance scientific collaborations, but also to acquire larger amounts of data, and to access specialized datasets. In many cases, the realities of data acquisition present a significant burden, therefore gaining access to public datasets allows for more robust analyses and broadly enriched data exploration. To answer this demand, the Montreal Neurological Institute has announced its commitment to Open Science, harnessing the power of making both clinical and research data available to the world (Owens, 2016a,b). As such, the LORIS and CBRAIN (Das et al., 2016) platforms have been tasked with the technical challenges specific to the institutional-level implementation of open data sharing, including: Comprehensive linking of multimodal data (phenotypic, clinical, neuroimaging, biobanking, and genomics, etc.)Secure database encryption, specifically designed for institutional and multi-project data sharing, ensuring subject confidentiality (using multi-tiered identifiers).Querying capabilities with multiple levels of single study and institutional permissions, allowing public data sharing for all consented and de-identified subject data.Configurable pipelines and flags to facilitate acquisition and analysis, as well as access to High Performance Computing clusters for rapid data processing and sharing of software tools.Robust Workflows and Quality Control mechanisms ensuring transparency and consistency in best practices.Long term storage (and web access) of data, reducing loss of institutional data assets.Enhanced web-based visualization of imaging, genomic, and phenotypic data, allowing for real-time viewing and manipulation of data from anywhere in the world.Numerous modules for data filtering, summary statistics, and personalized and configurable dashboards. Implementing the vision of Open Science at the Montreal Neurological Institute will be a concerted undertaking that seeks to facilitate data sharing for the global research community. Our goal is to utilize the years of experience in multi-site collaborative research infrastructure to implement the technical requirements to achieve this level of public data sharing in a practical yet robust manner, in support of accelerating scientific discovery. PMID:28111547
Outcomes of the 'Data Curation for Geobiology at Yellowstone National Park' Workshop

NASA Astrophysics Data System (ADS)

Thomer, A.; Palmer, C. L.; Fouke, B. W.; Rodman, A.; Choudhury, G. S.; Baker, K. S.; Asangba, A. E.; Wickett, K.; DiLauro, T.; Varvel, V.

2013-12-01

The continuing proliferation of geological and biological data generated at scientifically significant sites (such as hot springs, coral reefs, volcanic fields and other unique, data-rich locales) has created a clear need for the curation and active management of these data. However, there has been little exploration of what these curation processes and policies would entail. To that end, the Site-Based Data Curation (SBDC) project is developing a framework of guidelines and processes for the curation of research data generated at scientifically significant sites. A workshop was held in April 2013 at Yellowstone National Park (YNP) to gather input from scientists and stakeholders. Workshop participants included nine researchers actively conducting geobiology research at YNP, and seven YNP representatives, including permitting staff and information professionals from the YNP research library and archive. Researchers came from a range of research areas -- geology, molecular and microbial biology, ecology, environmental engineering, and science education. Through group discussions, breakout sessions and hands-on activities, we sought to generate policy recommendations and curation guidelines for the collection, representation, sharing and quality control of geobiological datasets. We report on key themes that emerged from workshop discussions, including: - participants' broad conceptions of the long-term usefulness, reusability and value of data. - the benefits of aggregating site-specific data in general, and geobiological data in particular. - the importance of capturing a dataset's originating context, and the potential usefulness of photographs as a reliable and easy way of documenting context. - researchers' and resource managers' overlapping priorities with regards to 'big picture' data collection and management in the long-term. Overall, we found that workshop participants were enthusiastic and optimistic about future collaboration and development of community approaches to data sharing. We hope to continue discussion of geobiology data curation challenges and potential strategies at AGU. Outcomes from the workshop are guiding next steps in the SBDC project, led by investigators at the Center for Informatics Research in Science and Scholarship and Institute for Genomic Biology at the University of Illinois, in collaboration with partners at Johns Hopkins University and YNP.
Exploring Research Contributions of the North American Carbon Program using Google Earth and Google Map

NASA Astrophysics Data System (ADS)

Griffith, P. C.; Wilcox, L. E.; Morrell, A.

2009-12-01

The central objective of the North American Carbon Program (NACP), a core element of the US Global Change Research Program, is to quantify the sources and sinks of carbon dioxide, carbon monoxide, and methane in North America and adjacent ocean regions. The NACP consists of a wide range of investigators at universities and federal research centers. Although many of these investigators have worked together in the past, many have had few prior interactions and may not know of similar work within knowledge domains, much less across the diversity of environments and scientific approaches in the Program. Coordinating interactions and sharing data are major challenges in conducting NACP. The Google Earth and Google Map Collections on the NACP website (www.nacarbon.org) provide a geographical view of the research products contributed by each core and affiliated NACP project. Other relevant data sources (e.g. AERONET, LVIS) can also be browsed in spatial context with NACP contributions. Each contribution links to project-oriented metadata, or “project profiles”, that provide a greater understanding of the scientific and social context of each dataset and are an important means of communicating within the NACP and to the larger carbon cycle science community. Project profiles store information such as a project's title, leaders, participants, an abstract, keywords, funding agencies, associated intensive campaigns, expected data products, data needs, publications, and URLs to associated data centers, datasets, and metadata. Data products are research contributions that include biometric inventories, flux tower estimates, remote sensing land cover products, tools, services, and model inputs / outputs. Project leaders have been asked to identify these contributions to the site level whenever possible, either through simple latitude/longitude pair, or by uploading a KML, KMZ, or shape file. Project leaders may select custom icons to graphically categorize their contributions; for example, a ship for oceanographic samples, a tower for tower measurements. After post-processing, research contributions are added to the NACP Google Earth and Google Map Collection to facilitate discovery and use in synthesis activities of the Program.

Cyberinfrastructure for Open Science at the Montreal Neurological Institute.

PubMed

Das, Samir; Glatard, Tristan; Rogers, Christine; Saigle, John; Paiva, Santiago; MacIntyre, Leigh; Safi-Harab, Mouna; Rousseau, Marc-Etienne; Stirling, Jordan; Khalili-Mahani, Najmeh; MacFarlane, David; Kostopoulos, Penelope; Rioux, Pierre; Madjar, Cecile; Lecours-Boucher, Xavier; Vanamala, Sandeep; Adalat, Reza; Mohaddes, Zia; Fonov, Vladimir S; Milot, Sylvain; Leppert, Ilana; Degroot, Clotilde; Durcan, Thomas M; Campbell, Tara; Moreau, Jeremy; Dagher, Alain; Collins, D Louis; Karamchandani, Jason; Bar-Or, Amit; Fon, Edward A; Hoge, Rick; Baillet, Sylvain; Rouleau, Guy; Evans, Alan C

2016-01-01

Data sharing is becoming more of a requirement as technologies mature and as global research and communications diversify. As a result, researchers are looking for practical solutions, not only to enhance scientific collaborations, but also to acquire larger amounts of data, and to access specialized datasets. In many cases, the realities of data acquisition present a significant burden, therefore gaining access to public datasets allows for more robust analyses and broadly enriched data exploration. To answer this demand, the Montreal Neurological Institute has announced its commitment to Open Science, harnessing the power of making both clinical and research data available to the world (Owens, 2016a,b). As such, the LORIS and CBRAIN (Das et al., 2016) platforms have been tasked with the technical challenges specific to the institutional-level implementation of open data sharing, including: Comprehensive linking of multimodal data (phenotypic, clinical, neuroimaging, biobanking, and genomics, etc.)Secure database encryption, specifically designed for institutional and multi-project data sharing, ensuring subject confidentiality (using multi-tiered identifiers).Querying capabilities with multiple levels of single study and institutional permissions, allowing public data sharing for all consented and de-identified subject data.Configurable pipelines and flags to facilitate acquisition and analysis, as well as access to High Performance Computing clusters for rapid data processing and sharing of software tools.Robust Workflows and Quality Control mechanisms ensuring transparency and consistency in best practices.Long term storage (and web access) of data, reducing loss of institutional data assets.Enhanced web-based visualization of imaging, genomic, and phenotypic data, allowing for real-time viewing and manipulation of data from anywhere in the world.Numerous modules for data filtering, summary statistics, and personalized and configurable dashboards. Implementing the vision of Open Science at the Montreal Neurological Institute will be a concerted undertaking that seeks to facilitate data sharing for the global research community. Our goal is to utilize the years of experience in multi-site collaborative research infrastructure to implement the technical requirements to achieve this level of public data sharing in a practical yet robust manner, in support of accelerating scientific discovery.
'Tagger' - a Mac OS X Interactive Graphical Application for Data Inference and Analysis of N-Dimensional Datasets in the Natural Physical Sciences.

NASA Astrophysics Data System (ADS)

Morse, P. E.; Reading, A. M.; Lueg, C.

2014-12-01

Pattern-recognition in scientific data is not only a computational problem but a human-observer problem as well. Human observation of - and interaction with - data visualization software can augment, select, interrupt and modify computational routines and facilitate processes of pattern and significant feature recognition for subsequent human analysis, machine learning, expert and artificial intelligence systems.'Tagger' is a Mac OS X interactive data visualisation tool that facilitates Human-Computer interaction for the recognition of patterns and significant structures. It is a graphical application developed using the Quartz Composer framework. 'Tagger' follows a Model-View-Controller (MVC) software architecture: the application problem domain (the model) is to facilitate novel ways of abstractly representing data to a human interlocutor, presenting these via different viewer modalities (e.g. chart representations, particle systems, parametric geometry) to the user (View) and enabling interaction with the data (Controller) via a variety of Human Interface Devices (HID). The software enables the user to create an arbitrary array of tags that may be appended to the visualised data, which are then saved into output files as forms of semantic metadata. Three fundamental problems that are not strongly supported by conventional scientific visualisation software are addressed:1] How to visually animate data over time, 2] How to rapidly deploy unconventional parametrically driven data visualisations, 3] How to construct and explore novel interaction models that capture the activity of the end-user as semantic metadata that can be used to computationally enhance subsequent interrogation. Saved tagged data files may be loaded into Tagger, so that tags may be tagged, if desired. Recursion opens up the possibility of refining or overlapping different types of tags, tagging a variety of different POIs or types of events, and of capturing different types of specialist observations of important or noticeable events. Other visualisations and modes of interaction will also be demonstrated, with the aim of discovering knowledge in large datasets in the natural, physical sciences. Fig.1 Wave height data from an oceanographic Wave Rider Buoy. Colors/radii are driven by wave height data.
Open-source web-enabled data management, analyses, and visualization of very large data in geosciences using Jupyter, Apache Spark, and community tools

NASA Astrophysics Data System (ADS)

Chaudhary, A.

2017-12-01

Current simulation models and sensors are producing high-resolution, high-velocity data in geosciences domain. Knowledge discovery from these complex and large size datasets require tools that are capable of handling very large data and providing interactive data analytics features to researchers. To this end, Kitware and its collaborators are producing open-source tools GeoNotebook, GeoJS, Gaia, and Minerva for geosciences that are using hardware accelerated graphics and advancements in parallel and distributed processing (Celery and Apache Spark) and can be loosely coupled to solve real-world use-cases. GeoNotebook (https://github.com/OpenGeoscience/geonotebook) is co-developed by Kitware and NASA-Ames and is an extension to the Jupyter Notebook. It provides interactive visualization and python-based analysis of geospatial data and depending the backend (KTile or GeoPySpark) can handle data sizes of Hundreds of Gigabytes to Terabytes. GeoNotebook uses GeoJS (https://github.com/OpenGeoscience/geojs) to render very large geospatial data on the map using WebGL and Canvas2D API. GeoJS is more than just a GIS library as users can create scientific plots such as vector and contour and can embed InfoVis plots using D3.js. GeoJS aims for high-performance visualization and interactive data exploration of scientific and geospatial location aware datasets and supports features such as Point, Line, Polygon, and advanced features such as Pixelmap, Contour, Heatmap, and Choropleth. Our another open-source tool Minerva ((https://github.com/kitware/minerva) is a geospatial application that is built on top of open-source web-based data management system Girder (https://github.com/girder/girder) which provides an ability to access data from HDFS or Amazon S3 buckets and provides capabilities to perform visualization and analyses on geosciences data in a web environment using GDAL and GeoPandas wrapped in a unified API provided by Gaia (https://github.com/OpenDataAnalytics/gaia). In this presentation, we will discuss core features of each of these tools and will present lessons learned on handling large data in the context of data management, analyses and visualization.
Lunar Reconnaissance Orbiter Data Enable Science and Terrain Analysis of Potential Landing Sites in South Pole-Aitken Basin

NASA Astrophysics Data System (ADS)

Jolliff, B. L.

2017-12-01

Exploring the South Pole-Aitken basin (SPA), one of the key unsampled geologic terranes on the Moon, is a high priority for Solar System science. As the largest and oldest recognizable impact basin on the Moon, it anchors the heavy bombardment chronology. It is thus a key target for sample return to better understand the impact flux in the Solar System between formation of the Moon and 3.9 Ga when Imbrium, one of the last of the great lunar impact basins, formed. Exploration of SPA has implications for understanding early habitable environments on the terrestrial planets. Global mineralogical and compositional data exist from the Clementine UV-VIS camera, the Lunar Prospector Gamma Ray Spectrometer, the Moon Mineralogy Mapper (M3) on Chandrayaan-1, the Chang'E-1 Imaging Interferometer, the spectral suite on SELENE, and the Lunar Reconnaissance Orbiter Cameras (LROC) Wide Angle Camera (WAC) and Diviner thermal radiometer. Integration of data sets enables synergistic assessment of geology and distribution of units across multiple spatial scales. Mineralogical assessment using hyperspectral data indicates spatial relationships with mineralogical signatures, e.g., central peaks of complex craters, consistent with inferred SPA basin structure and melt differentiation (Moriarty & Pieters, 2015, JGR-P 118). Delineation of mare, cryptomare, and nonmare surfaces is key to interpreting compositional mixing in the formation of SPA regolith to interpret remotely sensed data, and for scientific assessment of landing sites. LROC Narrow Angle Camera (NAC) images show the location and distribution of >0.5 m boulders and fresh craters that constitute the main threats to automated landers and thus provide critical information for landing site assessment and planning. NAC images suitable for geometric stereo derivation and digital terrain models so derived, controlled with Lunar Orbiter Laser Altimeter (LOLA) data, and oblique NAC images made with large slews of the spacecraft, are crucial to both scientific and landing-site assessments. These images, however, require favorable illumination and significant spacecraft resources. Thus they make up only a small percentage of all of the images taken. It is essential for future exploration to support LRO continued operation for these critical datasets.
Politics and the Erosion of Federal Scientific Capacity: Restoring Scientific Integrity to Public Health Science

PubMed Central

Rest, Kathleen M.; Halpern, Michael H.

2007-01-01

Our nation’s health and prosperity are based on a foundation of independent scientific discovery. Yet in recent years, political interference in federal government science has become widespread, threatening this legacy. We explore the ways science has been misused, the attempts to measure the pervasiveness of this problem, and the effects on our long-term capacity to meet today’s most complex public health challenges. Good government and a functioning democracy require public policy decisions to be informed by independent science. The scientific and public health communities must speak out to defend taxpayer-funded science from political interference. Encouragingly, both the scientific community and Congress are exploring ways to restore scientific integrity to federal policymaking. PMID:17901422
Politics and the erosion of federal scientific capacity: restoring scientific integrity to public health science.

PubMed

Rest, Kathleen M; Halpern, Michael H

2007-11-01

Our nation's health and prosperity are based on a foundation of independent scientific discovery. Yet in recent years, political interference in federal government science has become widespread, threatening this legacy. We explore the ways science has been misused, the attempts to measure the pervasiveness of this problem, and the effects on our long-term capacity to meet today's most complex public health challenges. Good government and a functioning democracy require public policy decisions to be informed by independent science. The scientific and public health communities must speak out to defend taxpayer-funded science from political interference. Encouragingly, both the scientific community and Congress are exploring ways to restore scientific integrity to federal policymaking.
A Sample Data Publication: Interactive Access, Analysis and Display of Remotely Stored Datasets From Hurricane Charley

NASA Astrophysics Data System (ADS)

Weber, J.; Domenico, B.

2004-12-01

This paper is an example of what we call data interactive publications. With a properly configured workstation, the readers can click on "hotspots" in the document that launches an interactive analysis tool called the Unidata Integrated Data Viewer (IDV). The IDV will enable the readers to access, analyze and display datasets on remote servers as well as documents describing them. Beyond the parameters and datasets initially configured into the paper, the analysis tool will have access to all the other dataset parameters as well as to a host of other datasets on remote servers. These data interactive publications are built on top of several data delivery, access, discovery, and visualization tools developed by Unidata and its partner organizations. For purposes of illustrating this integrative technology, we will use data from the event of Hurricane Charley over Florida from August 13-15, 2004. This event illustrates how components of this process fit together. The Local Data Manager (LDM), Open-source Project for a Network Data Access Protocol (OPeNDAP) and Abstract Data Distribution Environment (ADDE) services, Thematic Realtime Environmental Distributed Data Service (THREDDS) cataloging services, and the IDV are highlighted in this example of a publication with embedded pointers for accessing and interacting with remote datasets. An important objective of this paper is to illustrate how these integrated technologies foster the creation of documents that allow the reader to learn the scientific concepts by direct interaction with illustrative datasets, and help build a framework for integrated Earth System science.
Making Sense of Scientific Biographies: Scientific Achievement, Nature of Science, and Storylines in College Students' Essays

ERIC Educational Resources Information Center

Hwang, Seyoung

2015-01-01

In this article, the educative value of scientific biographies will be explored, especially for non-science major college students. During the "Scientist's life and thought" course, 66 college students read nine scientific biographies including five biologists, covering the canonical scientific achievements in Western scientific history.…
Exploration of Korean Students' Scientific Imagination Using the Scientific Imagination Inventory

ERIC Educational Resources Information Center

Mun, Jiyeong; Mun, Kongju; Kim, Sung-Won

2015-01-01

This article reports on the study of the components of scientific imagination and describes the scales used to measure scientific imagination in Korean elementary and secondary students. In this study, we developed an inventory, which we call the Scientific Imagination Inventory (SII), in order to examine aspects of scientific imagination. We…
Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

NASA Astrophysics Data System (ADS)

Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

2015-12-01

One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all the code we have developed is open sourced and available to anyone who wants to experiment and collaborate with the project at: http://github.com/b-cube/
Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data.

PubMed

Robinson, James T; Turner, Douglass; Durand, Neva C; Thorvaldsdóttir, Helga; Mesirov, Jill P; Aiden, Erez Lieberman

2018-02-28

Contact mapping experiments such as Hi-C explore how genomes fold in 3D. Here, we introduce Juicebox.js, a cloud-based web application for exploring the resulting datasets. Like the original Juicebox application, Juicebox.js allows users to zoom in and out of such datasets using an interface similar to Google Earth. Juicebox.js also has many features designed to facilitate data reproducibility and sharing. Furthermore, Juicebox.js encodes the exact state of the browser in a shareable URL. Creating a public browser for a new Hi-C dataset does not require coding and can be accomplished in under a minute. The web app also makes it possible to create interactive figures online that can complement or replace ordinary journal figures. When combined with Juicer, this makes the entire process of data analysis transparent, insofar as every step from raw reads to published figure is publicly available as open source code. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
PIVOT: platform for interactive analysis and visualization of transcriptomics data.

PubMed

Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong

2018-01-05

Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.
The Worldviews Network: Transformative Global Change Education in Immersive Environments

NASA Astrophysics Data System (ADS)

Hamilton, H.; Yu, K. C.; Gardiner, N.; McConville, D.; Connolly, R.; "Irving, Lindsay", L. S.

2011-12-01

Our modern age is defined by an astounding capacity to generate scientific information. From DNA to dark matter, human ingenuity and technologies create an endless stream of data about ourselves and the world of which we are a part. Yet we largely founder in transforming information into understanding, and understanding into rational action for our society as a whole. Earth and biodiversity scientists are especially frustrated by this impasse because the data they gather often point to a clash between Earth's capacity to sustain life and the decisions that humans make to garner the planet's resources. Immersive virtual environments offer an underexplored link in the translation of scientific data into public understanding, dialogue, and action. The Worldviews Network is a collaboration of scientists, artists, and educators focused on developing best practices for the use of immersive environments for science-based ecological literacy education. A central tenet of the Worldviews Network is that there are multiple ways to know and experience the world, so we are developing scientifically accurate, geographically relevant, and culturally appropriate programming to promote ecological literacy within informal science education programs across the United States. The goal of Worldviews Network is to offer transformative learning experiences, in which participants are guided on a process integrating immersive visual explorations, critical reflection and dialogue, and design-oriented approaches to action - or more simply, seeing, knowing, and doing. Our methods center on live presentations, interactive scientific visualizations, and sustainability dialogues hosted at informal science institutions. Our approach uses datasets from the life, Earth, and space sciences to illuminate the complex conditions that support life on earth and the ways in which ecological systems interact. We are leveraging scientific data from federal agencies, non-governmental organizations, and our own research to develop a library of immersive visualization stories and templates that explore ecological relationships across time at cosmic, global, and bioregional scales, with learning goals aligned to climate and earth science literacy principles. These experiential narratives are used to increase participants' awareness of global change issues as well as to engage them in dialogues and design processes focused on steps they can take within their own communities to systemically address these interconnected challenges. More than 600 digital planetariums in the U.S. collectively represent a pioneering opportunity for distributing Earth systems messages over large geographic areas. By placing the viewer-and Earth itself-within the context of the rest of the universe, digital planetariums can uniquely provide essential transcalar perspectives on the complex interdependencies of Earth's interacting physical and biological systems. The Worldviews Network is creating innovative, data-driven approaches for engaging the American public in dialogues about human-induced global changes.
Applications of the LBA-ECO Metadata Warehouse

NASA Astrophysics Data System (ADS)

Wilcox, L.; Morrell, A.; Griffith, P. C.

2006-05-01

The LBA-ECO Project Office has developed a system to harvest and warehouse metadata resulting from the Large-Scale Biosphere Atmosphere Experiment in Amazonia. The harvested metadata is used to create dynamically generated reports, available at www.lbaeco.org, which facilitate access to LBA-ECO datasets. The reports are generated for specific controlled vocabulary terms (such as an investigation team or a geospatial region), and are cross-linked with one another via these terms. This approach creates a rich contextual framework enabling researchers to find datasets relevant to their research. It maximizes data discovery by association and provides a greater understanding of the scientific and social context of each dataset. For example, our website provides a profile (e.g. participants, abstract(s), study sites, and publications) for each LBA-ECO investigation. Linked from each profile is a list of associated registered dataset titles, each of which link to a dataset profile that describes the metadata in a user-friendly way. The dataset profiles are generated from the harvested metadata, and are cross-linked with associated reports via controlled vocabulary terms such as geospatial region. The region name appears on the dataset profile as a hyperlinked term. When researchers click on this link, they find a list of reports relevant to that region, including a list of dataset titles associated with that region. Each dataset title in this list is hyperlinked to its corresponding dataset profile. Moreover, each dataset profile contains hyperlinks to each associated data file at its home data repository and to publications that have used the dataset. We also use the harvested metadata in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link to datasets with questionable file formats, and dynamically generated region/site coordinate quality assurance reports. These applications are as important as those that facilitate access to information because they help ensure a high standard of quality for the information. This presentation will discuss reports currently in use, provide a technical overview of the system, and discuss plans to extend this system to harvest metadata resulting from the North American Carbon Program by drawing on datasets in many different formats, residing in many thematic data centers and also distributed among hundreds of investigators.
Adventures in supercomputing: Scientific exploration in an era of change

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gentry, E.; Helland, B.; Summers, B.

1997-11-01

Students deserve the opportunity to explore the world of science surrounding them. Therefore it is important that scientific exploration and investigation be a part of each student`s educational career. The Department of Energy`s Adventures in Superconducting (AiS) takes students beyond mere scientific literacy to a rich embodiment of scientific exploration. AiS provides today`s science and math students with a greater opportunity to investigate science problems, propose solutions, explore different methods of solving the problem, organize their work into a technical paper, and present their results. Students learn at different rates in different ways. Science classes with students having varying learningmore » styles and levels of achievement have always been a challenge for teachers. The AiS {open_quotes}hands-on, minds-on{close_quotes} project-based method of teaching science meets the challenge of this diversity heads on! AiS uses the development of student chosen projects as the means of achieving a lifelong enthusiasm for scientific proficiency. One goal of AiS is to emulate the research that takes place in the everyday environment of scientists. Students work in teams and often collaborate with students nationwide. With the help of mentors from the academic and scientific community, students pose a problem in science, investigate possible solutions, design a mathematical and computational model for the problem, exercise the model to achieve results, and evaluate the implications of the results. The students then have the opportunity to present the project to their peers, teachers, and scientists. Using this inquiry-based technique, students learn more than science skills, they learn to reason and think -- going well beyond the National Science Education Standard. The teacher becomes a resource person actively working together with the students in their quest for scientific knowledge.« less
DIVE: A Graph-based Visual Analytics Framework for Big Data

PubMed Central

Rysavy, Steven J.; Bromley, Dennis; Daggett, Valerie

2014-01-01

The need for data-centric scientific tools is growing; domains like biology, chemistry, and physics are increasingly adopting computational approaches. As a result, scientists must now deal with the challenges of big data. To address these challenges, we built a visual analytics platform named DIVE: Data Intensive Visualization Engine. DIVE is a data-agnostic, ontologically-expressive software framework capable of streaming large datasets at interactive speeds. Here we present the technical details of the DIVE platform, multiple usage examples, and a case study from the Dynameomics molecular dynamics project. We specifically highlight our novel contributions to structured data model manipulation and high-throughput streaming of large, structured datasets. PMID:24808197
Scientific Objectives of China-Russia Joint Mars Exploration Program YH-1

NASA Astrophysics Data System (ADS)

Wu, Ji; Zhu, Guang-Wu; Zhao, Hua; Wang, Chi; Li, Lei; Sun, Yue-Qiang; Guo, Wei; Huang, Cheng-Li

2010-04-01

Compared with other planets, Mars is a planet most similar with the earth and most possible to find the extraterrestrial life on it, and therefore especially concerned about by human beings. In recent years, some countries have launched Mars probes and announced their manned Mars exploration programs. China has become the fifth country in the world to launch independently artificial satellites, and the third country able to carry out an independent manned space program. However, China is just at the beginning of deep space explorations. In 2007, China and Russia signed an agreement on a joint Mars exploration program by sending a Chinese micro-satellite Yinghuo-1 (YH-1) to the Mars orbit. Once YH-1 enters its orbit, it will carry out its own exploration, as well as the joint exploration with the Russian Phobos-Grunt probe. This paper summarizes the scientific background and objectives of YH-1 and describes briefly its payloads for realizing these scientific objectives. In addition, the main exploration tasks of YH-1 and a preliminary prospect on its exploration results are also given.
Lemont B. Kier: a bibliometric exploration of his scientific production and its use.

PubMed

Restrepo, Guillermo; Llanos, Eugenio J; Silva, Adriana E

2013-12-01

We thought an appropriate way to celebrate the seminal contribution of Kier is to explore his influence on science, looking for the impact of his research through the citation of his scientific production. From a bibliometric approach the impact of Kier's work is addressed as an individual within a community. Reviewing data from his curriculum vitae, as well as from the ISI Web of Knowledge (ISI), his role within the scientific community is established and the way his scientific results circulate is studied. His curriculum vitae is explored emphasising the approaches he used in his research activities and the social ties with other actors of the community. The circulation of Kier's publications in the ISI is studied as a means for spreading and installing his discourse within the community. The citation patterns found not only show the usage of Kier's scientific results, but also open the possibility to identify some characteristics of this discursive community, such as a common vocabulary and common research goals. The results show an interdisciplinary research work that consolidates a scientific community on the topic of drug discovery.
Continuation of the NVAP Global Water Vapor Data Sets for Pathfinder Science Analysis

NASA Technical Reports Server (NTRS)

VonderHaar, Thomas H.; Engelen, Richard J.; Forsythe, John M.; Randel, David L.; Ruston, Benjamin C.; Woo, Shannon; Dodge, James (Technical Monitor)

2001-01-01

This annual report covers August 2000 - August 2001 under NASA contract NASW-0032, entitled "Continuation of the NVAP (NASA's Water Vapor Project) Global Water Vapor Data Sets for Pathfinder Science Analysis". NASA has created a list of Earth Science Research Questions which are outlined by Asrar, et al. Particularly relevant to NVAP are the following questions: (a) How are global precipitation, evaporation, and the cycling of water changing? (b) What trends in atmospheric constituents and solar radiation are driving global climate? (c) How well can long-term climatic trends be assessed or predicted? Water vapor is a key greenhouse gas, and an understanding of its behavior is essential in global climate studies. Therefore, NVAP plays a key role in addressing the above climate questions by creating a long-term global water vapor dataset and by updating the dataset with recent advances in satellite instrumentation. The NVAP dataset produced from 1988-1998 has found wide use in the scientific community. Studies of interannual variability are particularly important. A recent paper by Simpson, et al. that examined the NVAP dataset in detail has shown that its relative accuracy is sufficient for the variability studies that contribute toward meeting NASA's goals. In the past year, we have made steady progress towards continuing production of this high-quality dataset as well as performing our own investigations of the data. This report summarizes the past year's work on production of the NVAP dataset and presents results of analyses we have performed in the past year.
A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

NASA Astrophysics Data System (ADS)

Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

2014-12-01

Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.

Where No Man Has Gone Before: A History of Apollo Lunar Exploration Missions

NASA Technical Reports Server (NTRS)

Compton, William David

1988-01-01

This book is a narrative account of the development of the science program for the Apollo lunar landing missions. It focuses on the interaction between scientific interests and operational considerations in such matters as landing site selection and training of crews, quarantine and back contamination control, and presentation of results from scientific investigations. Scientific exploration of the moon on later flights, Apollo 12 through Apollo 17 is emphasized.
Data-driven Ontology Development: A Case Study at NASA's Atmospheric Science Data Center

NASA Astrophysics Data System (ADS)

Hertz, J.; Huffer, E.; Kusterer, J.

2012-12-01

Well-founded ontologies are key to enabling transformative semantic technologies and accelerating scientific research. One example is semantically enabled search and discovery, making scientific data accessible and more understandable by accurately modeling a complex domain. The ontology creation process remains a challenge for many anxious to pursue semantic technologies. The key may be that the creation process -- whether formal, community-based, automated or semi-automated -- should encompass not only a foundational core and supplemental resources but also a focus on the purpose or mission the ontology is created to support. Are there tools or processes to de-mystify, assess or enhance the resulting ontology? We suggest that comparison and analysis of a domain-focused ontology can be made using text engineering tools for information extraction, tokenizers, named entity transducers and others. The results are analyzed to ensure the ontology reflects the core purpose of the domain's mission and that the ontology integrates and describes the supporting data in the language of the domain - how the science is analyzed and discussed among all users of the data. Commonalities and relationships among domain resources describing the Clouds and Earth's Radiant Energy (CERES) Bi-Directional Scan (BDS) datasets from NASA's Atmospheric Science Data Center are compared. The domain resources include: a formal ontology created for CERES; scientific works such as papers, conference proceedings and notes; information extracted from the datasets (i.e., header metadata); and BDS scientific documentation (Algorithm Theoretical Basis Documents, collection guides, data quality summaries and others). These resources are analyzed using the open source software General Architecture for Text Engineering, a mature framework for computational tasks involving human language.
The Ethics of Big Data and Nursing Science.

PubMed

Milton, Constance L

2017-10-01

Big data is a scientific, social, and technological trend referring to the process and size of datasets available for analysis. Ethical implications arise as healthcare disciplines, including nursing, struggle over questions of informed consent, privacy, ownership of data, and its possible use in epistemology. The author offers straight-thinking possibilities for the use of big data in nursing science.
"Whoa! We're Going Deep in the Trees!": Patterns of Collaboration around an Interactive Information Visualization Exhibit

ERIC Educational Resources Information Center

Davis, Pryce; Horn, Michael; Block, Florian; Phillips, Brenda; Evans, E. Margaret; Diamond, Judy; Shen, Chia

2015-01-01

In this paper we present a qualitative analysis of natural history museum visitor interaction around a multi-touch tabletop exhibit called "DeepTree" that we designed around concepts of evolution and common descent. DeepTree combines several large scientific datasets and an innovative visualization technique to display a phylogenetic…
Management and land use implications of continuous nitrogen and phosphorus monitoring in a small non-karst catchment in southeastern PA

USDA-ARS?s Scientific Manuscript database

Long-term climate and water quality monitoring data provide some of the most essential and informative information to the scientific community. These datasets however, are often incomplete and do not have frequent enough sampling to provide full explanations of trends. With the advent of continuous ...
Less Is Less: A Systematic Review of Graph Use in Meta-Analyses

ERIC Educational Resources Information Center

Schild, Anne H. E.; Voracek, Martin

2013-01-01

Graphs are an essential part of scientific communication. Complex datasets, of which meta-analyses are textbook examples, benefit the most from visualization. Although a number of graph options for meta-analyses exist, the extent to which these are used was hitherto unclear. A systematic review on graph use in meta-analyses in three disciplines…
78 FR 29757 - Center for Scientific Review; Notice of Closed Meetings

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-21

... Review Group; Bacterial Pathogenesis Study Section. Date: June 11, 2013. Time: 8:00 a.m. to 6:00 p.m... Special Emphasis Panel; PAR-13-009: Secondary Dataset Analyses in Heart, Lung, and Blood Diseasesand Sleep... Analyses in Heart, Lung, and Blood Diseases and Sleep Disorders: Conflicts. Date: June 17, 2013 Time: 8:30...
Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

PubMed

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S

2018-01-01

Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.
Near Real-time Scientific Data Analysis and Visualization with the ArcGIS Platform

NASA Astrophysics Data System (ADS)

Shrestha, S. R.; Viswambharan, V.; Doshi, A.

2017-12-01

Scientific multidimensional data are generated from a variety of sources and platforms. These datasets are mostly produced by earth observation and/or modeling systems. Agencies like NASA, NOAA, USGS, and ESA produce large volumes of near real-time observation, forecast, and historical data that drives fundamental research and its applications in larger aspects of humanity from basic decision making to disaster response. A common big data challenge for organizations working with multidimensional scientific data and imagery collections is the time and resources required to manage and process such large volumes and varieties of data. The challenge of adopting data driven real-time visualization and analysis, as well as the need to share these large datasets, workflows, and information products to wider and more diverse communities, brings an opportunity to use the ArcGIS platform to handle such demand. In recent years, a significant effort has put in expanding the capabilities of ArcGIS to support multidimensional scientific data across the platform. New capabilities in ArcGIS to support scientific data management, processing, and analysis as well as creating information products from large volumes of data using the image server technology are becoming widely used in earth science and across other domains. We will discuss and share the challenges associated with big data by the geospatial science community and how we have addressed these challenges in the ArcGIS platform. We will share few use cases, such as NOAA High Resolution Refresh Radar (HRRR) data, that demonstrate how we access large collections of near real-time data (that are stored on-premise or on the cloud), disseminate them dynamically, process and analyze them on-the-fly, and serve them to a variety of geospatial applications. We will also share how on-the-fly processing using raster functions capabilities, can be extended to create persisted data and information products using raster analytics capabilities that exploit distributed computing in an enterprise environment.
Scientific Platform as a Service - Tools and solutions for efficient access to and analysis of oceanographic data

NASA Astrophysics Data System (ADS)

Vines, Aleksander; Hansen, Morten W.; Korosov, Anton

2017-04-01

Existing infrastructure international and Norwegian projects, e.g., NorDataNet, NMDC and NORMAP, provide open data access through the OPeNDAP protocol following the conventions for CF (Climate and Forecast) metadata, designed to promote the processing and sharing of files created with the NetCDF application programming interface (API). This approach is now also being implemented in the Norwegian Sentinel Data Hub (satellittdata.no) to provide satellite EO data to the user community. Simultaneously with providing simplified and unified data access, these projects also seek to use and establish common standards for use and discovery metadata. This then allows development of standardized tools for data search and (subset) streaming over the internet to perform actual scientific analysis. A combinnation of software tools, which we call a Scientific Platform as a Service (SPaaS), will take advantage of these opportunities to harmonize and streamline the search, retrieval and analysis of integrated satellite and auxiliary observations of the oceans in a seamless system. The SPaaS is a cloud solution for integration of analysis tools with scientific datasets via an API. The core part of the SPaaS is a distributed metadata catalog to store granular metadata describing the structure, location and content of available satellite, model, and in situ datasets. The analysis tools include software for visualization (also online), interactive in-depth analysis, and server-based processing chains. The API conveys search requests between system nodes (i.e., interactive and server tools) and provides easy access to the metadata catalog, data repositories, and the tools. The SPaaS components are integrated in virtual machines, of which provisioning and deployment are automatized using existing state-of-the-art open-source tools (e.g., Vagrant, Ansible, Docker). The open-source code for scientific tools and virtual machine configurations is under version control at https://github.com/nansencenter/, and is coupled to an online continuous integration system (e.g., Travis CI).
Achieving a balance - Science and human exploration

NASA Technical Reports Server (NTRS)

Duke, Michael B.

1992-01-01

An evaluation is made of the opportunities for advancing the scientific understanding of Mars through a research program, conducted under the egis of NASA's Space Exploration Initiative, which emphasizes the element of human exploration as well as the requisite robotic component. A Mars exploration program that involves such complementary human/robotic components will entail the construction of a closed ecological life-support system, long-duration spacecraft facilities for crews, and the development of extraterrestrial resources; these R&D imperatives will have great subsequent payoffs, both scientific and economic.
What could you do with 400 years of biological history on african americans? Evaluating the potential scientific benefit of systematic studies of dental and skeletal materials on African Americans from the 17th through 20th centuries.

PubMed

Jackson, Fatimah; Jackson, Latifa; Cross, Christopher; Clarke, Cameron

2016-07-01

How important is it to be able to reconstruct the lives of a highly diverse, historically recent macroethnic group over the course of 400 years? How many insights into human evolutionary biology and disease susceptibilities could be gained, even with this relatively recent window into the past? In this article, we explore the potential ramifications of a newly constructed dataset of Four Centuries of African American Biological Variation (4Cs). This article provides initial lists of digitized variables formatted as SQL tables for the 17th and 18th century samples and for the 19th and 20th century samples. This database is dynamic and new information is added yearly. The database provides novel opportunities for significant insights into the past biological history of this group and three case study applications are detailed for comparative computational systems biology studies of (1) hypertension, (2) the oral microbiome, and (3) mental health disorders. The 4Cs dataset is ideal for interdisciplinary "next generation" science research and these data represent a unique step toward the accumulation of historically contextualized Big Data on an underrepresented group known to have experienced differential survival over time. Am. J. Hum. Biol. 28:510-513, 2016. © 2016 The Authors American Journal of Human Biology Published byWiley Periodicals, Inc. © 2016 The Authors American Journal of Human Biology Published by Wiley Periodicals, Inc.
Genomics dataset of unidentified disclosed isolates.

PubMed

Rekadwad, Bhagwan N

2016-09-01

Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Lowering the Barrier to Cross-Disciplinary Scientific Data Access via a Brokering Service Built Around a Unified Data Model

NASA Astrophysics Data System (ADS)

Lindholm, D. M.; Wilson, A.

2012-12-01

The steps many scientific data users go through to use data (after discovering it) can be rather tedious, even when dealing with datasets within their own discipline. Accessing data across domains often seems intractable. We present here, LaTiS, an Open Source brokering solution that bridges the gap between the source data and the user's code by defining a unified data model plus a plugin framework for "adapters" to read data from their native source, "filters" to perform server side data processing, and "writers" to output any number of desired formats or streaming protocols. A great deal of work is being done in the informatics community to promote multi-disciplinary science with a focus on search and discovery based on metadata - information about the data. The goal of LaTiS is to go that last step to provide a uniform interface to read the dataset into computer programs and other applications once it has been identified. The LaTiS solution for integrating a wide variety of data models is to return to mathematical fundamentals. The LaTiS data model emphasizes functional relationships between variables. For example, a time series of temperature measurements can be thought of as a function that maps a time to a temperature. With just three constructs: "Scalar" for a single variable, "Tuple" for a collection of variables, and "Function" to represent a set of independent and dependent variables, the LaTiS data model can represent most scientific datasets at a low level that enables uniform data access. Higher level abstractions can be built on top of the basic model to add more meaningful semantics for specific user communities. LaTiS defines its data model in terms of the Unified Modeling Language (UML). It also defines a very thin Java Interface that can be implemented by numerous existing data interfaces (e.g. NetCDF-Java) such that client code can access any dataset via the Java API, independent of the underlying data access mechanism. LaTiS also provides a reference implementation of the data model and server framework (with a RESTful service interface) in the Scala programming language. Scala can be thought of as the next generation of Java. It runs on the Java Virtual Machine and can directly use Java code. Scala improves upon Java's object-oriented capabilities and adds support for functional programming paradigms which are particularly well suited for scientific data analysis. The Scala implementation of LaTiS can be thought of as a Domain Specific Language (DSL) which presents an API that better matches the semantics of the problems scientific data users are trying to solve. Instead of working with bytes, ints, or arrays, the data user can directly work with data as "time series" or "spectra". LaTiS provides many layers of abstraction with which users can interact to support a wide variety of data access and analysis needs.
Secret Science: Exploring Cold War Greenland

NASA Astrophysics Data System (ADS)

Harper, K.

2013-12-01

During the early Cold War - from the immediate postwar period through the 1960s - the United States military carried out extensive scientific studies and pursued technological developments in Greenland. With few exceptions, most of these were classified - sometimes because new scientific knowledge was born classified, but mostly because the reasons behind the scientific explorations were. Meteorological and climatological, ionospheric, glaciological, seismological, and geological studies were among the geophysical undertakings carried out by military and civilian scientists--some in collaboration with the Danish government, and some carried out without their knowledge. This poster will present some of the results of the Exploring Greenland Project that is coming to a conclusion at Denmark's Aarhus University.
Evaluating science return in space exploration initiative architectures

NASA Technical Reports Server (NTRS)

Budden, Nancy Ann; Spudis, Paul D.

1993-01-01

Science is an important aspect of the Space Exploration Initiative, a program to explore the Moon and Mars with people and machines. Different SEI mission architectures are evaluated on the basis of three variables: access (to the planet's surface), capability (including number of crew, equipment, and supporting infrastructure), and time (being the total number of man-hours available for scientific activities). This technique allows us to estimate the scientific return to be expected from different architectures and from different implementations of the same architecture. Our methodology allows us to maximize the scientific return from the initiative by illuminating the different emphases and returns that result from the alternative architectural decisions.
The Federation of Earth Science Information Partners (ESIP Federation): Facilitating Partnerships that Work to Bring Earth Science Data into Educational Settings

NASA Astrophysics Data System (ADS)

Freuder, R.; Ledley, T. S.; Dahlman, L.

2004-12-01

The Federation of Earth Science Information Partners (ESIP Federation, http://www.esipfed.org) formed seven years ago and now with 77 member organizations is working to "increase the quality and value of Earth science products and services .for the benefit of the ESIP Federation's stakeholder communities." Education (both formal and informal) is a huge audience that we serve. Partnerships formed by members within the ESIP Federation have created bridges that close the gap between Earth science data collection and research and the effective use of that Earth science data to explore concepts in Earth system science by the educational community. The Earth Exploration Toolbook is one of those successful collaborations. The Earth Exploration Toolbook (EET, http://serc.carleton.edu/eet) grew out of a need of the educational community (articulated by the Digital Library for Earth System Education (DLESE) community) to have better access to Earth science data and data analysis tools and help in effectively using them with students. It is a collection of web-accessible chapters, each featuring step-by-step instructions on how to use an Earth science dataset and data analysis tool to investigate an issue or concept in Earth system science. Each chapter also provides the teacher information on the outcome of the activity, grade level, standards addressed, learning goals, time required, and ideas for exploring further. The individual ESIP Federation partners alone could not create the EET. However, the ESIP Federation facilitated the partnering of members, drawing from data providers, researchers and education tool developers, to create the EET. Interest in the EET has grown since it went live with five chapters in July 2003. There are currently seven chapters with another six soon to be released. Monthly online seminars in which over a hundred educators have participated have given very positive feedback. Post workshop surveys from our telecon-online workshops indicate that participants have an increased comfort level in using digital libraries, datasets, and scientific tools after working through an EET chapter. The EET is a vehicle that can grow and support new chapter development. An EET chapter template for creating new "chapters" has been devised. Other research-focused members of the ESIP Federation have expressed interest in working with the EET team to facilitate the use of their Earth science data by educators and students. This presentation will describe how the partnerships were forged, how they are maintained, and how the ESIP Federation is facilitating further growth. http://serc.carleton.edu/eet
Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.

PubMed

Rossi, Jean-Pierre; Nardin, Maxime; Godefroid, Martin; Ruiz-Diaz, Manuela; Sergent, Anne-Sophie; Martinez-Meier, Alejandro; Pâques, Luc; Rozenberg, Philippe

2014-01-01

Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA), a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller) during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i) the temporal evolution of spatial structures and ii) the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.
Seeing the forests and the trees—innovative approaches to exploring heterogeneity in systematic reviews of complex interventions to enhance health system decision-making: a protocol

PubMed Central

2014-01-01

Background To improve quality of care and patient outcomes, health system decision-makers need to identify and implement effective interventions. An increasing number of systematic reviews document the effects of quality improvement programs to assist decision-makers in developing new initiatives. However, limitations in the reporting of primary studies and current meta-analysis methods (including approaches for exploring heterogeneity) reduce the utility of existing syntheses for health system decision-makers. This study will explore the role of innovative meta-analysis approaches and the added value of enriched and updated data for increasing the utility of systematic reviews of complex interventions. Methods/Design We will use the dataset from our recent systematic review of 142 randomized trials of diabetes quality improvement programs to evaluate novel approaches for exploring heterogeneity. These will include exploratory methods, such as multivariate meta-regression analyses and all-subsets combinatorial meta-analysis. We will then update our systematic review to include new trials and enrich the dataset by surveying authors of all included trials. In doing so, we will explore the impact of variables not, reported in previous publications, such as details of study context, on the effectiveness of the intervention. We will use innovative analytical methods on the enriched and updated dataset to identify key success factors in the implementation of quality improvement interventions for diabetes. Decision-makers will be involved throughout to help identify and prioritize variables to be explored and to aid in the interpretation and dissemination of results. Discussion This study will inform future systematic reviews of complex interventions and describe the value of enriching and updating data for exploring heterogeneity in meta-analysis. It will also result in an updated comprehensive systematic review of diabetes quality improvement interventions that will be useful to health system decision-makers in developing interventions to improve outcomes for people with diabetes. Systematic review registration PROSPERO registration no. CRD42013005165 PMID:25115289
Data visualization in interactive maps and time series

NASA Astrophysics Data System (ADS)

Maigne, Vanessa; Evano, Pascal; Brockmann, Patrick; Peylin, Philippe; Ciais, Philippe

2014-05-01

State-of-the-art data visualization has nothing to do with plots and maps we used few years ago. Many opensource tools are now available to provide access to scientific data and implement accessible, interactive, and flexible web applications. Here we will present a web site opened November 2013 to create custom global and regional maps and time series from research models and datasets. For maps, we explore and get access to data sources from a THREDDS Data Server (TDS) with the OGC WMS protocol (using the ncWMS implementation) then create interactive maps with the OpenLayers javascript library and extra information layers from a GeoServer. Maps become dynamic, zoomable, synchroneaously connected to each other, and exportable to Google Earth. For time series, we extract data from a TDS with the Netcdf Subset Service (NCSS) then display interactive graphs with a custom library based on the Data Driven Documents javascript library (D3.js). This time series application provides dynamic functionalities such as interpolation, interactive zoom on different axes, display of point values, and export to different formats. These tools were implemented for the Global Carbon Atlas (http://www.globalcarbonatlas.org): a web portal to explore, visualize, and interpret global and regional carbon fluxes from various model simulations arising from both human activities and natural processes, a work led by the Global Carbon Project.

User applications driven by the community contribution framework MPContribs in the Materials Project

DOE PAGES

Huck, P.; Gunter, D.; Cholia, S.; ...

2015-10-12

This paper discusses how the MPContribs framework in the Materials Project (MP) allows user-contributed data to be shown and analyzed alongside the core MP database. The MP is a searchable database of electronic structure properties of over 65,000 bulk solid materials, which is accessible through a web-based science-gateway. We describe the motivation for enabling user contributions to the materials data and present the framework's features and challenges in the context of two real applications. These use cases illustrate how scientific collaborations can build applications with their own 'user-contributed' data using MPContribs. The Nanoporous Materials Explorer application provides a unique searchmore » interface to a novel dataset of hundreds of thousands of materials, each with tables of user-contributed values related to material adsorption and density at varying temperature and pressure. The Unified Theoretical and Experimental X-ray Spectroscopy application discusses a full workflow for the association, dissemination, and combined analyses of experimental data from the Advanced Light Source with MP's theoretical core data, using MPContribs tools for data formatting, management, and exploration. The capabilities being developed for these collaborations are serving as the model for how new materials data can be incorporated into the MP website with minimal staff overhead while giving powerful tools for data search and display to the user community.« less
Big data analytics workflow management for eScience

NASA Astrophysics Data System (ADS)

Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni

2015-04-01

In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the real challenge in many practical scientific use cases. This talk will specifically address the main needs, requirements and challenges regarding data analytics workflow management applied to large scientific datasets. Three real use cases concerning analytics workflows for sea situational awareness, fire danger prevention, climate change and biodiversity will be discussed in detail.
Tracing Young Children's Scientific Reasoning

NASA Astrophysics Data System (ADS)

Tytler, Russell; Peterson, Suzanne

2003-08-01

This paper explores the scientific reasoning of 14 children across their first two years of primary school. Children's view of experimentation, their approach to exploration, and their negotiation of competing knowledge claims, are interpreted in terms of categories of epistemological reasoning. Children's epistemological reasoning is distinguished from their ability to control variables. While individual children differ substantially, they show a relatively steady growth in their reasoning, with some contextual variation. A number of these children are reasoning at a level well in advance of curriculum expectations, and it is argued that current recommended practice in primary science needs to be rethought. The data is used to explore the relationship between reasoning and knowledge, and to argue that the generation and exploration of ideas must be the key driver of scientific activity in the primary school.
A journey to Semantic Web query federation in the life sciences.

PubMed

Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

2009-10-01

As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
A journey to Semantic Web query federation in the life sciences

PubMed Central

Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

2009-01-01

Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394
The Role of Scientific Collections in Scientific Preparedness

PubMed Central

2015-01-01

Building on the findings and recommendations of the Interagency Working Group on Scientific Collections, Scientific Collections International (SciColl) aims to improve the rapid access to science collections across disciplines within the federal government and globally, between government agencies and private research institutions. SciColl offered a novel opportunity for the US Department of Health and Human Services, Office of the Assistant Secretary for Preparedness and Response, to explore the value of scientific research collections under the science preparedness initiative and integrate it as a research resource at each stage in the emergence of the infectious diseases cycle. Under the leadership of SciColl’s executive secretariat at the Smithsonian Institution, and with multiple federal and international partners, a workshop during October 2014 fully explored the intersections of the infectious disease cycle and the role scientific collections could play as an evidentiary scientific resource to mitigate risks associated with emerging infectious diseases. PMID:26380390
A database of annotated tentative orthologs from crop abiotic stress transcripts.

PubMed

Balaji, Jayashree; Crouch, Jonathan H; Petite, Prasad V N S; Hoisington, David A

2006-10-07

A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.
Inflight Calibration of the Lunar Reconnaissance Orbiter Camera Wide Angle Camera

NASA Astrophysics Data System (ADS)

Mahanti, P.; Humm, D. C.; Robinson, M. S.; Boyd, A. K.; Stelling, R.; Sato, H.; Denevi, B. W.; Braden, S. E.; Bowman-Cisneros, E.; Brylow, S. M.; Tschimmel, M.

2016-04-01

The Lunar Reconnaissance Orbiter Camera (LROC) Wide Angle Camera (WAC) has acquired more than 250,000 images of the illuminated lunar surface and over 190,000 observations of space and non-illuminated Moon since 1 January 2010. These images, along with images from the Narrow Angle Camera (NAC) and other Lunar Reconnaissance Orbiter instrument datasets are enabling new discoveries about the morphology, composition, and geologic/geochemical evolution of the Moon. Characterizing the inflight WAC system performance is crucial to scientific and exploration results. Pre-launch calibration of the WAC provided a baseline characterization that was critical for early targeting and analysis. Here we present an analysis of WAC performance from the inflight data. In the course of our analysis we compare and contrast with the pre-launch performance wherever possible and quantify the uncertainty related to various components of the calibration process. We document the absolute and relative radiometric calibration, point spread function, and scattered light sources and provide estimates of sources of uncertainty for spectral reflectance measurements of the Moon across a range of imaging conditions.
ExpTreeDB: web-based query and visualization of manually annotated gene expression profiling experiments of human and mouse from GEO.

PubMed

Ni, Ming; Ye, Fuqiang; Zhu, Juanjuan; Li, Zongwei; Yang, Shuai; Yang, Bite; Han, Lu; Wu, Yongge; Chen, Ying; Li, Fei; Wang, Shengqi; Bo, Xiaochen

2014-12-01

Numerous public microarray datasets are valuable resources for the scientific communities. Several online tools have made great steps to use these data by querying related datasets with users' own gene signatures or expression profiles. However, dataset annotation and result exhibition still need to be improved. ExpTreeDB is a database that allows for queries on human and mouse microarray experiments from Gene Expression Omnibus with gene signatures or profiles. Compared with similar applications, ExpTreeDB pays more attention to dataset annotations and result visualization. We introduced a multiple-level annotation system to depict and organize original experiments. For example, a tamoxifen-treated cell line experiment is hierarchically annotated as 'agent→drug→estrogen receptor antagonist→tamoxifen'. Consequently, retrieved results are exhibited by an interactive tree-structured graphics, which provide an overview for related experiments and might enlighten users on key items of interest. The database is freely available at http://biotech.bmi.ac.cn/ExpTreeDB. Web site is implemented in Perl, PHP, R, MySQL and Apache. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
TopoLens: Building a cyberGIS community data service for enhancing the usability of high-resolution National Topographic datasets

USGS Publications Warehouse

Hu, Hao; Hong, Xingchen; Terstriep, Jeff; Liu, Yan; Finn, Michael P.; Rush, Johnathan; Wendel, Jeffrey; Wang, Shaowen

2016-01-01

Geospatial data, often embedded with geographic references, are important to many application and science domains, and represent a major type of big data. The increased volume and diversity of geospatial data have caused serious usability issues for researchers in various scientific domains, which call for innovative cyberGIS solutions. To address these issues, this paper describes a cyberGIS community data service framework to facilitate geospatial big data access, processing, and sharing based on a hybrid supercomputer architecture. Through the collaboration between the CyberGIS Center at the University of Illinois at Urbana-Champaign (UIUC) and the U.S. Geological Survey (USGS), a community data service for accessing, customizing, and sharing digital elevation model (DEM) and its derived datasets from the 10-meter national elevation dataset, namely TopoLens, is created to demonstrate the workflow integration of geospatial big data sources, computation, analysis needed for customizing the original dataset for end user needs, and a friendly online user environment. TopoLens provides online access to precomputed and on-demand computed high-resolution elevation data by exploiting the ROGER supercomputer. The usability of this prototype service has been acknowledged in community evaluation.
Assessing land ownership as a driver of change in the distribution, structure, and composition of California's forests.

NASA Astrophysics Data System (ADS)

Easterday, K.; Kelly, M.; McIntyre, P. J.

2015-12-01

Climate change is forecasted to have considerable influence on the distribution, structure, and function of California's forests. However, human interactions with forested landscapes (e.g. fire suppression, resource extraction and etc.) have complicated scientific understanding of the relative contributions of climate change and anthropogenic land management practices as drivers of change. Observed changes in forest structure towards smaller, denser forests across California have been attributed to both climate change (e.g. increased temperatures and declining water availability) and management practices (e.g. fire suppression and logging). Disentangling how these drivers of change act both together and apart is important to developing sustainable policy and land management practices as well as enhancing knowledge of human and natural system interactions. To that end, a comprehensive historical dataset - the Vegetation Type Mapping project (VTM) - and a modern forest inventory dataset (FIA) are used to analyze how spatial variations in vegetation composition and structure over a ~100 year period can be explained by land ownership.Climate change is forecasted to have considerable influence on the distribution, structure, and function of California's forests. However, human interactions with forested landscapes (e.g. fire suppression, resource extraction and etc.) have complicated scientific understanding of the relative contributions of climate change and anthropogenic land management practices as drivers of change. Observed changes in forest structure towards smaller, denser forests across California have been attributed to both climate change (e.g. increased temperatures and declining water availability) and management practices (e.g. fire suppression and logging). Disentangling how these drivers of change act both together and apart is important to developing sustainable policy and land management practices as well as enhancing knowledge of human and natural system interactions. To that end, a comprehensive historical dataset - the Vegetation Type Mapping project (VTM) - and a modern forest inventory dataset (FIA) are used to analyze how spatial variations in vegetation composition and structure over a ~100 year period can be explained by land ownership.
Energize New Mexico - Integration of Diverse Energy-Related Research Data into an Interoperable Geospatial Infrastructure and National Data Repositories

NASA Astrophysics Data System (ADS)

Hudspeth, W. B.; Barrett, H.; Diller, S.; Valentin, G.

2016-12-01

Energize is New Mexico's Experimental Program to Stimulate Competitive Research (NM EPSCoR), funded by the NSF with a focus on building capacity to conduct scientific research. Energize New Mexico leverages the work of faculty and students from NM universities and colleges to provide the tools necessary to a quantitative, science-driven discussion of the state's water policy options and to realize New Mexico's potential for sustainable energy development. This presentation discusses the architectural details of NM EPSCoR's collaborative data management system, GSToRE, and how New Mexico researchers use it to share and analyze diverse research data, with the goal of attaining sustainable energy development in the state.The Earth Data Analysis Center (EDAC) at The University of New Mexico leads the development of computational interoperability capacity that allows the wide use and sharing of energy-related data among NM EPSCoR researchers. Data from a variety of research disciplines is stored and maintained in EDAC's Geographic Storage, Transformation and Retrieval Engine (GSToRE), a distributed platform for large-scale vector and raster data discovery, subsetting, and delivery via Web services that are based on Open Geospatial Consortium (OGC) and REST Web-service standards. Researchers upload and register scientific datasets using a front-end client that collects the critical metadata. In addition, researchers have the option to register their datasets with DataONE, a national, community-driven project that provides access to data across multiple member repositories. The GSToRE platform maintains a searchable, core collection of metadata elements that can be used to deliver metadata in multiple formats, including ISO 19115-2/19139 and FGDC CSDGM. Stored metadata elements also permit the platform to automate the registration of Energize datasets into DataONE, once the datasets are approved for release to the public.
Does Evapotranspiration Increase When Forests are converted to Grasslands?

NASA Astrophysics Data System (ADS)

Varcoe, Robert; Sterling, Shannon

2017-04-01

The conversion of forests to grasslands (FGC) is a widespread land cover change (LCC) and is also among the most commonly studied changes with respect to its impact on ET; such research employs a variety of experimental approaches, including, paired catchment (PC), Budyko and land surface models (LSM), and measurement methods, including the catchment water balance (CWB), eddy covariance (EC) and remote sensing (RS). Until recently, there has been consensus in the scientific literature that rates of ET decrease when a forest is converted to grassland; however, this consensus has recently come into question. Williams (2012) applied the Budyko framework to a global network of eddy covariance measurements with the results that grasslands have a 9% greater evaporative index than forests. In addition, HadGEM2, a recent Hadley Centre LSM, produced increased ET in the northern Amazon Basin after simulating global scale tropical deforestation (Brovkin et al., 2015). Here we present an analysis of available estimates of how ET rates change with FGC to increase our understanding of the forest - grassland-ET paradigm. We used two datasets to investigate the impacts land cover change on ET. I compiled a dataset of change in ET with land cover change (ΔETLCC) using published experiments that compare forest and grassland ET under conditions controlled for meteorological and landscape influences. Using the ΔETLCC dataset, we show that, in all cases, forest ET is higher than grassland under controlled conditions. Results suggest that the eddy covariance method measures smaller changes in ET when forests are converted to grasslands, though more data are needed for this result to be statistically significant. Finally, GETA2.0, a new global dataset of annual ET, projects that forest ET is greater than grassland, except at high latitudes and areas where orography influences precipitation (P). The data included in this study represent the data available on forest and grassland ET comparison and reveal an important gap in the scientific literature: the lack of data available regarding forest to grassland LCC.
Collaborative Planetary GIS with JMARS

NASA Astrophysics Data System (ADS)

Dickenshied, S.; Christensen, P. R.; Edwards, C. S.; Prashad, L. C.; Anwar, S.; Engle, E.; Noss, D.; Jmars Development Team

2010-12-01

Traditional GIS tools have allowed users to work locally with their own datasets in their own computing environment. More recently, data providers have started offering online repositories of preprocessed data which helps minimize the learning curve required to access new datasets. The ideal collaborative GIS tool provides the functionality of a traditional GIS and easy access to preprocessed data repositories while also enabling users to contribute data, analysis, and ideas back into the very tools they're using. JMARS (Java Mission-planning and Analysis for Remote Sensing) is a suite of geospatial applications developed by the Mars Space Flight Facility at Arizona State University. This software is used for mission planning and scientific data analysis by several NASA missions, including Mars Odyssey, Mars Reconnaissance Orbiter, and the Lunar Reconnaissance Orbiter. It is used by scientists, researchers and students of all ages from more than 40 countries around the world. In addition to offering a rich set of global and regional maps and publicly released orbiter images, the JMARS software development team has been working on ways to encourage the creation of collaborative datasets. Bringing together users from diverse teams and backgrounds allows new features to be developed with an interest in making the application useful and accessible to as wide a potential audience as possible. Actively engaging the scientific community in development strategy and hands on tasks allows the creation of user driven data content that would not otherwise be possible. The first community generated dataset to result from this effort is a tool mapping peer-reviewed papers to the locations they relate to on Mars with links to ancillary data. This allows users of JMARS to browse to an area of interest and then quickly locate papers corresponding to that area. Alternately, users can search for published papers over a specified time interval and visually see what areas of Mars have received the most attention over the requested time span.
Multilayered complex network datasets for three supply chain network archetypes on an urban road grid.

PubMed

Viljoen, Nadia M; Joubert, Johan W

2018-02-01

This article presents the multilayered complex network formulation for three different supply chain network archetypes on an urban road grid and describes how 500 instances were randomly generated for each archetype. Both the supply chain network layer and the urban road network layer are directed unweighted networks. The shortest path set is calculated for each of the 1 500 experimental instances. The datasets are used to empirically explore the impact that the supply chain's dependence on the transport network has on its vulnerability in Viljoen and Joubert (2017) [1]. The datasets are publicly available on Mendeley (Joubert and Viljoen, 2017) [2].
Initial Efforts toward Mission-Representative Imaging Surveys from Aerial Explorers

NASA Technical Reports Server (NTRS)

Pisanich, Greg; Plice, Laura; Ippolito, Corey; Young, Larry A.; Lau, Benton; Lee, Pascal

2004-01-01

Numerous researchers have proposed the use of robotic aerial explorers to perform scientific investigation of planetary bodies in our solar system. One of the essential tasks for any aerial explorer is to be able to perform scientifically valuable imaging surveys. The focus of this paper is to discuss the challenges implicit in, and recent observations related to, acquiring mission-representative imaging data from a small fixed-wing UAV, acting as a surrogate planetary aerial explorer. This question of successfully performing aerial explorer surveys is also tied to other topics of technical investigation, including the development of unique bio-inspired technologies.
Providing Geographic Datasets as Linked Data in Sdi

NASA Astrophysics Data System (ADS)

Hietanen, E.; Lehto, L.; Latvala, P.

2016-06-01

In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
The Need for Analogue Missions in Scientific Human and Robotic Planetary Exploration

NASA Technical Reports Server (NTRS)

Snook, K. J.; Mendell, W. W.

2004-01-01

With the increasing challenges of planetary missions, and especially with the prospect of human exploration of the moon and Mars, the need for earth-based mission simulations has never been greater. The current focus on science as a major driver for planetary exploration introduces new constraints in mission design, planning, operations, and technology development. Analogue missions can be designed to address critical new integration issues arising from the new science-driven exploration paradigm. This next step builds on existing field studies and technology development at analogue sites, providing engineering, programmatic, and scientific lessons-learned in relatively low-cost and low-risk environments. One of the most important outstanding questions in planetary exploration is how to optimize the human and robotic interaction to achieve maximum science return with minimum cost and risk. To answer this question, researchers are faced with the task of defining scientific return and devising ways of measuring the benefit of scientific planetary exploration to humanity. Earth-based and spacebased analogue missions are uniquely suited to answer this question. Moreover, they represent the only means for integrating science operations, mission operations, crew training, technology development, psychology and human factors, and all other mission elements prior to final mission design and launch. Eventually, success in future planetary exploration will depend on our ability to prepare adequately for missions, requiring improved quality and quantity of analogue activities. This effort demands more than simply developing new technologies needed for future missions and increasing our scientific understanding of our destinations. It requires a systematic approach to the identification and evaluation of the categories of analogue activities. This paper presents one possible approach to the classification and design of analogue missions based on their degree of fidelity in ten key areas. Various case studies are discussed to illustrate the approach.
U.S. Geological Survey spatial data access

USGS Publications Warehouse

Faundeen, John L.; Kanengieter, Ronald L.; Buswell, Michael D.

2002-01-01

The U.S. Geological Survey (USGS) has done a progress review on improving access to its spatial data holdings over the Web. The USGS EROS Data Center has created three major Web-based interfaces to deliver spatial data to the general public; they are Earth Explorer, the Seamless Data Distribution System (SDDS), and the USGS Web Mapping Portal. Lessons were learned in developing these systems, and various resources were needed for their implementation. The USGS serves as a fact-finding agency in the U.S. Government that collects, monitors, analyzes, and provides scientific information about natural resource conditions and issues. To carry out its mission, the USGS has created and managed spatial data since its inception. Originally relying on paper maps, the USGS now uses advanced technology to produce digital representations of the Earth’s features. The spatial products of the USGS include both source and derivative data. Derivative datasets include Digital Orthophoto Quadrangles (DOQ), Digital Elevation Models, Digital Line Graphs, land-cover Digital Raster Graphics, and the seamless National Elevation Dataset. These products, created with automated processes, use aerial photographs, satellite images, or other cartographic information such as scanned paper maps as source data. With Earth Explorer, users can search multiple inventories through metadata queries and can browse satellite and DOQ imagery. They can place orders and make payment through secure credit card transactions. Some USGS spatial data can be accessed with SDDS. The SDDS uses an ArcIMS map service interface to identify the user’s areas of interest and determine the output format; it allows the user to either download the actual spatial data directly for small areas or place orders for larger areas to be delivered on media. The USGS Web Mapping Portal provides views of national and international datasets through an ArcIMS map service interface. In addition, the map portal posts news about new map services available from the USGS, many simultaneously published on the Environmental Systems Research Institute Geography Network. These three information systems use new software tools and expanded hardware to meet the requirements of the users. The systems are designed to handle the required workload and are relatively easy to enhance and maintain. The software tools give users a high level of functionality and help the system conform to industry standards. The hardware and software architecture is designed to handle the large amounts of spatial data and Internet traffic required by the information systems. Last, customer support was needed to answer questions, monitor e-mail, and report customer problems.
ALSEP arrays A, B, C, and A-2. [lunar surface exploration instrument specifications

NASA Technical Reports Server (NTRS)

1973-01-01

The objectives of the lunar surface exploration packages are defined and the preliminary design of scientific systems hardware is reported. Instrument packages are to collect and transmit to earth scientific data on the lunar interior, the lunar surface composition, and the lunar geomorphology

Far Travelers: The Exploring Machines.

ERIC Educational Resources Information Center

Nicks, Oran W.

The National Aeronautics and Space Administration (NASA) program of lunar and planetary exploration produced a flood of scientific information about the moon, planets and the environment of interplanetary space. This book is an account of the people, machines, and the events of this scientific enterprise. It is a story of organizations,…
Extra-Vehicular Activity (EVA) and Mission Support Center (MSC) Design Elements for Future Human Scientific Exploration of Our Solar System

NASA Astrophysics Data System (ADS)

Miller, M. J.; Abercromby, A. F. J.; Chappell, S.; Beaton, K.; Kobs Nawotniak, S.; Brady, A. L.; Garry, W. B.; Lim, D. S. S.

2017-02-01

For future missions, there is a need to better understand how we can merge EVA operations concepts with the established purpose of performing scientific exploration and examine how human spaceflight could be successful under communication latency.
A statistical shape model of the human second cervical vertebra.

PubMed

Clogenson, Marine; Duff, John M; Luethi, Marcel; Levivier, Marc; Meuli, Reto; Baur, Charles; Henein, Simon

2015-07-01

Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.
Interdisciplinary Collaboration amongst Colleagues and between Initiatives with the Magnetics Information Consortium (MagIC) Database

NASA Astrophysics Data System (ADS)

Minnett, R.; Koppers, A. A. P.; Jarboe, N.; Tauxe, L.; Constable, C.; Jonestrask, L.; Shaar, R.

2014-12-01

Earth science grand challenges often require interdisciplinary and geographically distributed scientific collaboration to make significant progress. However, this organic collaboration between researchers, educators, and students only flourishes with the reduction or elimination of technological barriers. The Magnetics Information Consortium (http://earthref.org/MagIC/) is a grass-roots cyberinfrastructure effort envisioned by the geo-, paleo-, and rock magnetic scientific community to archive their wealth of peer-reviewed raw data and interpretations from studies on natural and synthetic samples. MagIC is dedicated to facilitating scientific progress towards several highly multidisciplinary grand challenges and the MagIC Database team is currently beta testing a new MagIC Search Interface and API designed to be flexible enough for the incorporation of large heterogeneous datasets and for horizontal scalability to tens of millions of records and hundreds of requests per second. In an effort to reduce the barriers to effective collaboration, the search interface includes a simplified data model and upload procedure, support for online editing of datasets amongst team members, commenting by reviewers and colleagues, and automated contribution workflows and data retrieval through the API. This web application has been designed to generalize to other databases in MagIC's umbrella website (EarthRef.org) so the Geochemical Earth Reference Model (http://earthref.org/GERM/) portal, Seamount Biogeosciences Network (http://earthref.org/SBN/), EarthRef Digital Archive (http://earthref.org/ERDA/) and EarthRef Reference Database (http://earthref.org/ERR/) will benefit from its development.
SAADA: Astronomical Databases Made Easier

NASA Astrophysics Data System (ADS)

Michel, L.; Nguyen, H. N.; Motch, C.

2005-12-01

Many astronomers wish to share datasets with their community but have not enough manpower to develop databases having the functionalities required for high-level scientific applications. The SAADA project aims at automatizing the creation and deployment process of such databases. A generic but scientifically relevant data model has been designed which allows one to build databases by providing only a limited number of product mapping rules. Databases created by SAADA rely on a relational database supporting JDBC and covered by a Java layer including a lot of generated code. Such databases can simultaneously host spectra, images, source lists and plots. Data are grouped in user defined collections whose content can be seen as one unique set per data type even if their formats differ. Datasets can be correlated one with each other using qualified links. These links help, for example, to handle the nature of a cross-identification (e.g., a distance or a likelihood) or to describe their scientific content (e.g., by associating a spectrum to a catalog entry). The SAADA query engine is based on a language well suited to the data model which can handle constraints on linked data, in addition to classical astronomical queries. These constraints can be applied on the linked objects (number, class and attributes) and/or on the link qualifier values. Databases created by SAADA are accessed through a rich WEB interface or a Java API. We are currently developing an inter-operability module implanting VO protocols.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Patterns, biases and prospects in the distribution and diversity of Neotropical snakes

PubMed Central

Sawaya, Ricardo J.; Zizka, Alexander; Laffan, Shawn; Faurby, Søren; Pyron, R. Alexander; Bérnils, Renato S.; Jansen, Martin; Passos, Paulo; Prudente, Ana L. C.; Cisneros‐Heredia, Diego F.; Braz, Henrique B.; Nogueira, Cristiano de C.; Antonelli, Alexandre; Meiri, Shai

2017-01-01

Abstract Motivation We generated a novel database of Neotropical snakes (one of the world's richest herpetofauna) combining the most comprehensive, manually compiled distribution dataset with publicly available data. We assess, for the first time, the diversity patterns for all Neotropical snakes as well as sampling density and sampling biases. Main types of variables contained We compiled three databases of species occurrences: a dataset downloaded from the Global Biodiversity Information Facility (GBIF), a verified dataset built through taxonomic work and specialized literature, and a combined dataset comprising a cleaned version of the GBIF dataset merged with the verified dataset. Spatial location and grain Neotropics, Behrmann projection equivalent to 1° × 1°. Time period Specimens housed in museums during the last 150 years. Major taxa studied Squamata: Serpentes. Software format Geographical information system (GIS). Results The combined dataset provides the most comprehensive distribution database for Neotropical snakes to date. It contains 147,515 records for 886 species across 12 families, representing 74% of all species of snakes, spanning 27 countries in the Americas. Species richness and phylogenetic diversity show overall similar patterns. Amazonia is the least sampled Neotropical region, whereas most well‐sampled sites are located near large universities and scientific collections. We provide a list and updated maps of geographical distribution of all snake species surveyed. Main conclusions The biodiversity metrics of Neotropical snakes reflect patterns previously documented for other vertebrates, suggesting that similar factors may determine the diversity of both ectothermic and endothermic animals. We suggest conservation strategies for high‐diversity areas and sampling efforts be directed towards Amazonia and poorly known species. PMID:29398972
ESSG-based global spatial reference frame for datasets interrelation

NASA Astrophysics Data System (ADS)

Yu, J. Q.; Wu, L. X.; Jia, Y. J.

2013-10-01

To know well about the highly complex earth system, a large volume of, as well as a large variety of, datasets on the planet Earth are being obtained, distributed, and shared worldwide everyday. However, seldom of existing systems concentrates on the distribution and interrelation of different datasets in a common Global Spatial Reference Frame (GSRF), which holds an invisble obstacle to the data sharing and scientific collaboration. Group on Earth Obeservation (GEO) has recently established a new GSRF, named Earth System Spatial Grid (ESSG), for global datasets distribution, sharing and interrelation in its 2012-2015 WORKING PLAN.The ESSG may bridge the gap among different spatial datasets and hence overcome the obstacles. This paper is to present the implementation of the ESSG-based GSRF. A reference spheroid, a grid subdvision scheme, and a suitable encoding system are required to implement it. The radius of ESSG reference spheroid was set to the double of approximated Earth radius to make datasets from different areas of earth system science being covered. The same paramerters of positioning and orienting as Earth Centred Earth Fixed (ECEF) was adopted for the ESSG reference spheroid to make any other GSRFs being freely transformed into the ESSG-based GSRF. Spheroid degenerated octree grid with radius refiment (SDOG-R) and its encoding method were taken as the grid subdvision and encoding scheme for its good performance in many aspects. A triple (C, T, A) model is introduced to represent and link different datasets based on the ESSG-based GSRF. Finally, the methods of coordinate transformation between the ESSGbased GSRF and other GSRFs were presented to make ESSG-based GSRF operable and propagable.
The metagenomic data life-cycle: standards and best practices

PubMed Central

ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo; Corre, Erwan; Meyer, Folker; Mitchell, Alex; Pelletier, Eric; Pesole, Graziano; Santamaria, Monica; Willassen, Nils Peder

2017-01-01

Abstract Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community, but greater awareness and adoption is still needed. We emphasize the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing. PMID:28637310
Single-shot diffraction data from the Mimivirus particle using an X-ray free-electron laser.

PubMed

Ekeberg, Tomas; Svenda, Martin; Seibert, M Marvin; Abergel, Chantal; Maia, Filipe R N C; Seltzer, Virginie; DePonte, Daniel P; Aquila, Andrew; Andreasson, Jakob; Iwan, Bianca; Jönsson, Olof; Westphal, Daniel; Odić, Duško; Andersson, Inger; Barty, Anton; Liang, Meng; Martin, Andrew V; Gumprecht, Lars; Fleckenstein, Holger; Bajt, Saša; Barthelmess, Miriam; Coppola, Nicola; Claverie, Jean-Michel; Loh, N Duane; Bostedt, Christoph; Bozek, John D; Krzywinski, Jacek; Messerschmidt, Marc; Bogan, Michael J; Hampton, Christina Y; Sierra, Raymond G; Frank, Matthias; Shoeman, Robert L; Lomb, Lukas; Foucar, Lutz; Epp, Sascha W; Rolles, Daniel; Rudenko, Artem; Hartmann, Robert; Hartmann, Andreas; Kimmel, Nils; Holl, Peter; Weidenspointner, Georg; Rudek, Benedikt; Erk, Benjamin; Kassemeyer, Stephan; Schlichting, Ilme; Strüder, Lothar; Ullrich, Joachim; Schmidt, Carlo; Krasniqi, Faton; Hauser, Günter; Reich, Christian; Soltau, Heike; Schorb, Sebastian; Hirsemann, Helmut; Wunderer, Cornelia; Graafsma, Heinz; Chapman, Henry; Hajdu, Janos

2016-08-01

Free-electron lasers (FEL) hold the potential to revolutionize structural biology by producing X-ray pules short enough to outrun radiation damage, thus allowing imaging of biological samples without the limitation from radiation damage. Thus, a major part of the scientific case for the first FELs was three-dimensional (3D) reconstruction of non-crystalline biological objects. In a recent publication we demonstrated the first 3D reconstruction of a biological object from an X-ray FEL using this technique. The sample was the giant Mimivirus, which is one of the largest known viruses with a diameter of 450 nm. Here we present the dataset used for this successful reconstruction. Data-analysis methods for single-particle imaging at FELs are undergoing heavy development but data collection relies on very limited time available through a highly competitive proposal process. This dataset provides experimental data to the entire community and could boost algorithm development and provide a benchmark dataset for new algorithms.
Single-shot diffraction data from the Mimivirus particle using an X-ray free-electron laser

NASA Astrophysics Data System (ADS)

Ekeberg, Tomas; Svenda, Martin; Seibert, M. Marvin; Abergel, Chantal; Maia, Filipe R. N. C.; Seltzer, Virginie; Deponte, Daniel P.; Aquila, Andrew; Andreasson, Jakob; Iwan, Bianca; Jönsson, Olof; Westphal, Daniel; Odić, Duško; Andersson, Inger; Barty, Anton; Liang, Meng; Martin, Andrew V.; Gumprecht, Lars; Fleckenstein, Holger; Bajt, Saša; Barthelmess, Miriam; Coppola, Nicola; Claverie, Jean-Michel; Loh, N. Duane; Bostedt, Christoph; Bozek, John D.; Krzywinski, Jacek; Messerschmidt, Marc; Bogan, Michael J.; Hampton, Christina Y.; Sierra, Raymond G.; Frank, Matthias; Shoeman, Robert L.; Lomb, Lukas; Foucar, Lutz; Epp, Sascha W.; Rolles, Daniel; Rudenko, Artem; Hartmann, Robert; Hartmann, Andreas; Kimmel, Nils; Holl, Peter; Weidenspointner, Georg; Rudek, Benedikt; Erk, Benjamin; Kassemeyer, Stephan; Schlichting, Ilme; Strüder, Lothar; Ullrich, Joachim; Schmidt, Carlo; Krasniqi, Faton; Hauser, Günter; Reich, Christian; Soltau, Heike; Schorb, Sebastian; Hirsemann, Helmut; Wunderer, Cornelia; Graafsma, Heinz; Chapman, Henry; Hajdu, Janos

2016-08-01

Free-electron lasers (FEL) hold the potential to revolutionize structural biology by producing X-ray pules short enough to outrun radiation damage, thus allowing imaging of biological samples without the limitation from radiation damage. Thus, a major part of the scientific case for the first FELs was three-dimensional (3D) reconstruction of non-crystalline biological objects. In a recent publication we demonstrated the first 3D reconstruction of a biological object from an X-ray FEL using this technique. The sample was the giant Mimivirus, which is one of the largest known viruses with a diameter of 450 nm. Here we present the dataset used for this successful reconstruction. Data-analysis methods for single-particle imaging at FELs are undergoing heavy development but data collection relies on very limited time available through a highly competitive proposal process. This dataset provides experimental data to the entire community and could boost algorithm development and provide a benchmark dataset for new algorithms.
The metagenomic data life-cycle: standards and best practices

DOE Office of Scientific and Technical Information (OSTI.GOV)

ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo

Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonised way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (1) material sampling, (2) material sequencing (3) data analysis and (4) data archiving & publishing. Taking examples from marine research, we summarise essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community but greater awareness and adoption is stillmore » needed. We emphasise the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.« less
Tracking and Establishing Provenance of Earth Science Datasets: A NASA-based Example

NASA Technical Reports Server (NTRS)

Ramapriyan, Hampapuram K.; Goldstein, Justin C.; Hua, Hook; Wolfe, Robert E.

2016-01-01

Information quality is of paramount importance to science. Accurate, scientifically vetted and statistically meaningful and, ideally, reproducible information engenders scientific trust and research opportunities. Therefore, so-called Highly Influential Scientific Assessments (HISA) such as the U.S. Third National Climate Assessment undergo a very rigorous process to ensure transparency and credibility. As an activity to support the transparency of such reports, the U.S. Global Change Research Program has developed the Global Change Information System (GCIS). Specifically related to the transparency of NCA3, a recent activity was carried out to trace the provenance as completely as possible for all figures in the NCA3 report that predominantly used NASA data. This paper discusses lessons learned from this activity that trace the provenance of NASA figures in a major HISA-class pdf report.
Communication System Architecture for Planetary Exploration

NASA Technical Reports Server (NTRS)

Braham, Stephen P.; Alena, Richard; Gilbaugh, Bruce; Glass, Brian; Norvig, Peter (Technical Monitor)

2001-01-01

Future human missions to Mars will require effective communications supporting exploration activities and scientific field data collection. Constraints on cost, size, weight and power consumption for all communications equipment make optimization of these systems very important. These information and communication systems connect people and systems together into coherent teams performing the difficult and hazardous tasks inherent in planetary exploration. The communication network supporting vehicle telemetry data, mission operations, and scientific collaboration must have excellent reliability, and flexibility.
Using OpenTarget to Generate Potential Countermeasures for Long-Term Space Exposure from Data Available on GeneLab

NASA Technical Reports Server (NTRS)

Beheshti, Afshin

2018-01-01

GeneLab as a general tool for the scientific community; Utilizing GeneLab datasets to generate hypothesis and determining potential biological targets against health risks due to long-term space missions; How can OpenTarget be used to discover novel drugs to test as countermeasures that can be utilized by astronauts.
Education, Knowledge and the Evolution of Disparities in Health. NBER Working Paper No. 15840

ERIC Educational Resources Information Center

Aizer, Anna; Stroud, Laura

2010-01-01

We study how advances in scientific knowledge affect the evolution of disparities in health. Our focus is the 1964 Surgeon General Report on Smoking and Health--the first widely publicized report of the negative effects of smoking on health. Using an historical dataset that includes the smoking habits of pregnant women 1959-1966, we find that…
Sensitivity and specificity considerations for fMRI encoding, decoding, and mapping of auditory cortex at ultra-high field.

PubMed

Moerel, Michelle; De Martino, Federico; Kemper, Valentin G; Schmitter, Sebastian; Vu, An T; Uğurbil, Kâmil; Formisano, Elia; Yacoub, Essa

2018-01-01

Following rapid technological advances, ultra-high field functional MRI (fMRI) enables exploring correlates of neuronal population activity at an increasing spatial resolution. However, as the fMRI blood-oxygenation-level-dependent (BOLD) contrast is a vascular signal, the spatial specificity of fMRI data is ultimately determined by the characteristics of the underlying vasculature. At 7T, fMRI measurement parameters determine the relative contribution of the macro- and microvasculature to the acquired signal. Here we investigate how these parameters affect relevant high-end fMRI analyses such as encoding, decoding, and submillimeter mapping of voxel preferences in the human auditory cortex. Specifically, we compare a T 2 * weighted fMRI dataset, obtained with 2D gradient echo (GE) EPI, to a predominantly T 2 weighted dataset obtained with 3D GRASE. We first investigated the decoding accuracy based on two encoding models that represented different hypotheses about auditory cortical processing. This encoding/decoding analysis profited from the large spatial coverage and sensitivity of the T 2 * weighted acquisitions, as evidenced by a significantly higher prediction accuracy in the GE-EPI dataset compared to the 3D GRASE dataset for both encoding models. The main disadvantage of the T 2 * weighted GE-EPI dataset for encoding/decoding analyses was that the prediction accuracy exhibited cortical depth dependent vascular biases. However, we propose that the comparison of prediction accuracy across the different encoding models may be used as a post processing technique to salvage the spatial interpretability of the GE-EPI cortical depth-dependent prediction accuracy. Second, we explored the mapping of voxel preferences. Large-scale maps of frequency preference (i.e., tonotopy) were similar across datasets, yet the GE-EPI dataset was preferable due to its larger spatial coverage and sensitivity. However, submillimeter tonotopy maps revealed biases in assigned frequency preference and selectivity for the GE-EPI dataset, but not for the 3D GRASE dataset. Thus, a T 2 weighted acquisition is recommended if high specificity in tonotopic maps is required. In conclusion, different fMRI acquisitions were better suited for different analyses. It is therefore critical that any sequence parameter optimization considers the eventual intended fMRI analyses and the nature of the neuroscience questions being asked. Copyright © 2017 Elsevier Inc. All rights reserved.
Harvard Observing Project (HOP): Involving Undergraduates in Research Projects

NASA Astrophysics Data System (ADS)

Bieryla, Allyson

2017-01-01

The Harvard Observing Project (HOP) is designed to get students excited about observational astronomy while collecting data valuable to the scientific community. The primary goal is to give undergraduates a chance to try out observing with “no strings attached”. Observations are led by experienced observers, mostly graduate students. This not only gives graduate students extra opportunities to interact and teach undergraduates, but also a chance for them to get more observing experience. Each semester, we choose an interesting target and monitor it each week over the course of the semester using Harvard University’s 16-inch DFM Clay Telescope. These observing projects often produce large amounts of data. This provides an excellent dataset for a young undergraduate to analyze. Some successful semester-long observing projects have included variable stars, supernova and binary systems. Short-term projects have included exoplanet candidate followup, asteroid and comet followup and collaborating with the Pro-Am White Dwarf Monitoring (PAWM) project in attempts to detect a transiting Earth-sized planet orbiting a white dwarf. Each dataset is an opportunity for an undergraduate to be introduced to scientific research and present the results to the community.
Wyoming Landscape Conservation Initiative data management and integration

USGS Publications Warehouse

Latysh, Natalie; Bristol, R. Sky

2011-01-01

Six Federal agencies, two State agencies, and two local entities formally support the Wyoming Landscape Conservation Initiative (WLCI) and work together on a landscape scale to manage fragile habitats and wildlife resources amidst growing energy development in southwest Wyoming. The U.S. Geological Survey (USGS) was tasked with implementing targeted research and providing scientific information about southwest Wyoming to inform the development of WLCI habitat enhancement and restoration projects conducted by land management agencies. Many WLCI researchers and decisionmakers representing the Bureau of Land Management, U.S. Fish and Wildlife Service, the State of Wyoming, and others have overwhelmingly expressed the need for a stable, robust infrastructure to promote sharing of data resources produced by multiple entities, including metadata adequately describing the datasets. Descriptive metadata facilitates use of the datasets by users unfamiliar with the data. Agency representatives advocate development of common data handling and distribution practices among WLCI partners to enhance availability of comprehensive and diverse data resources for use in scientific analyses and resource management. The USGS Core Science Informatics (CSI) team is developing and promoting data integration tools and techniques across USGS and partner entity endeavors, including a data management infrastructure to aid WLCI researchers and decisionmakers.
Operational use of spaceborne lidar datasets

NASA Astrophysics Data System (ADS)

Marenco, Franco; Halloran, Gemma; Forsythe, Mary

2018-04-01

The Met Office plans to use space lidar datasets from CALIPSO, CATS, Aeolus and EarthCARE operationally in near real time (NRT), for the detection of aerosols. The first step is the development of NRT imagery for nowcasting of volcanic events, air quality, and mineral dust episodes. Model verification and possibly assimilation will be explored. Assimilation trials of Aeolus winds are also planned. Here we will present our first in-house imagery and our operational requirements.

Virtual probing system for medical volume data

NASA Astrophysics Data System (ADS)

Xiao, Yongfei; Fu, Yili; Wang, Shuguo

2007-12-01

Because of the huge computation in 3D medical data visualization, looking into its inner data interactively is always a problem to be resolved. In this paper, we present a novel approach to explore 3D medical dataset in real time by utilizing a 3D widget to manipulate the scanning plane. With the help of the 3D texture property in modern graphics card, a virtual scanning probe is used to explore oblique clipping plane of medical volume data in real time. A 3D model of the medical dataset is also rendered to illustrate the relationship between the scanning-plane image and the other tissues in medical data. It will be a valuable tool in anatomy education and understanding of medical images in the medical research.
Use of Persistent Identifiers to link Heterogeneous Data Systems in the Integrated Earth Data Applications (IEDA) Facility

NASA Astrophysics Data System (ADS)

Hsu, L.; Lehnert, K. A.; Carbotte, S. M.; Arko, R. A.; Ferrini, V.; O'hara, S. H.; Walker, J. D.

2012-12-01

The Integrated Earth Data Applications (IEDA) facility maintains multiple data systems with a wide range of solid earth data types from the marine, terrestrial, and polar environments. Examples of the different data types include syntheses of ultra-high resolution seafloor bathymetry collected on large collaborative cruises and analytical geochemistry measurements collected by single investigators in small, unique projects. These different data types have historically been channeled into separate, discipline-specific databases with search and retrieval tailored for the specific data type. However, a current major goal is to integrate data from different systems to allow interdisciplinary data discovery and scientific analysis. To increase discovery and access across these heterogeneous systems, IEDA employs several unique IDs, including sample IDs (International Geo Sample Number, IGSN), person IDs (GeoPass ID), funding award IDs (NSF Award Number), cruise IDs (from the Marine Geoscience Data System Expedition Metadata Catalog), dataset IDs (DOIs), and publication IDs (DOIs). These IDs allow linking of a sample registry (System for Earth SAmple Registration), data libraries and repositories (e.g. Geochemical Research Library, Marine Geoscience Data System), integrated synthesis databases (e.g. EarthChem Portal, PetDB), and investigator services (IEDA Data Compliance Tool). The linked systems allow efficient discovery of related data across different levels of granularity. In addition, IEDA data systems maintain links with several external data systems, including digital journal publishers. Links have been established between the EarthChem Portal and ScienceDirect through publication DOIs, returning sample-level objects and geochemical analyses for a particular publication. Linking IEDA-hosted data to digital publications with IGSNs at the sample level and with IEDA-allocated dataset DOIs are under development. As an example, an individual investigator could sign up for a GeoPass account ID, write a proposal to NSF and create a data plan using the IEDA Data Management Plan Tool. Having received the grant, the investigator then collects rock samples on a scientific cruise from dredges and registers the samples with IGSNs. The investigator then performs analytical geochemistry on the samples, and submits the full dataset to the Geochemical Resource Library for a dataset DOI. Finally, the investigator writes an article that is published in Science Direct. Knowing any of the following IDs: Investigator GeoPass ID, NSF Award Number, Cruise ID, Sample IGSNs, dataset DOI, or publication DOI, a user would be able to navigate to all samples, datasets, and publications in IEDA and external systems. Use of persistent identifiers to link heterogeneous data systems in IEDA thus increases access, discovery, and proper citation of hard-earned investigator datasets.
Workshop on Science and the Human Exploration of Mars

NASA Technical Reports Server (NTRS)

Duke, M. B. (Editor)

2001-01-01

The exploration of Mars will be a multi-decadal activity. Currently, a scientific program is underway, sponsored by NASA's Office of Space Science in the United States, in collaboration with international partners France, Italy, and the European Space Agency. Plans exist for the continuation of this robotic program through the first automated return of Martian samples in 2014. Mars is also a prime long-term objective for human exploration, and within NASA, efforts are being made to provide the best integration of the robotic program and future human exploration missions. From the perspective of human exploration missions, it is important to understand the scientific objectives of human missions, in order to design the appropriate systems, tools, and operational capabilities to maximize science on those missions. In addition, data from the robotic missions can provide critical environmental data - surface morphology, materials composition, evaluations of potential toxicity of surface materials, radiation, electrical and other physical properties of the Martian environment, and assessments of the probability that humans would encounter Martian life forms. Understanding of the data needs can lead to the definition of experiments that can be done in the near-term that will make the design of human missions more effective. This workshop was convened to begin a dialog between the scientific community that is central to the robotic exploration mission program and a set of experts in systems and technologies that are critical to human exploration missions. The charge to the workshop was to develop an understanding of the types of scientific exploration that would be best suited to the human exploration missions and the capabilities and limitations of human explorers in undertaking science on those missions.
Changing knowledge perspective in a changing world: The Adriatic multidisciplinary TDS approach

NASA Astrophysics Data System (ADS)

Bergamasco, Andrea; Carniel, Sandro; Nativi, Stefano; Signell, Richard P.; Benetazzo, Alvise; Falcieri, Francesco M.; Bonaldo, Davide; Minuzzo, Tiziano; Sclavo, Mauro

2013-04-01

The use and exploitation of the marine environment in recent years has been increasingly high, therefore calling for the need of a better description, monitoring and understanding of its behavior. However, marine scientists and managers often spend too much time in accessing and reformatting data instead of focusing on discovering new knowledge from the processes observed and data acquired. There is therefore the need to make more efficient our approach to data mining, especially in a world where rapid climate change imposes rapid and quick choices. In this context, it is mandatory to explore ways and possibilities to make large amounts of distributed data usable in an efficient and easy way, an effort that requires standardized data protocols, web services and standards-based tools. Following the US-IOOS approach, which has been adopted in many oceanographic and meteorological sectors, we present a CNR experience in the direction of setting up a national Italian IOOS framework (at the moment confined at the Adriatic Sea environment), using the THREDDS (THematic Real-time Environmental Distributed Data Services) Data Server (TDS). A TDS is a middleware designed to fill the gap between data providers and data users, and provides services allowing data users to find the data sets pertaining to their scientific needs, to access, visualize and use them in an easy way, without the need of downloading files to the local workspace. In order to achieve this results, it is necessary that the data providers make their data available in a standard form that the TDS understands, and with sufficient metadata so that the data can be read and searched for in a standard way. The TDS core is a NetCDF- Java Library implementing a Common Data Model (CDM), as developed by Unidata (http://www.unidata.ucar.edu), allowing the access to "array-based" scientific data. Climate and Forecast (CF) compliant NetCDF files can be read directly with no modification, while non-compliant files can be modified to meet appropriate metadata requirements. Once standardized in the CDM, the TDS makes datasets available through a series of web services such as OPeNDAP or Open Geospatial Consortium Web Coverage Service (WCS), allowing the data users to easily obtain small subsets from large datasets, and to quickly visualize their content by using tools such as GODIVA2 or Integrated Data Viewer (IDV). In addition, an ISO metadata service is available through the TDS that can be harvested by catalogue broker services (e.g. GI-cat) to enable distributed search across federated data servers. Example of TDS datasets from oceanographic evolutions (currents, waves, sediments...) will be described and discussed, while some examples can be accessed directly to the Venice site http://tds.ve.ismar.cnr.it:8080/thredds/catalog.html (Bergamasco et al., 2012) also within the framework of RITMARE Project. References Bergamasco A., Benetazzo A., Carniel S., Falcieri F., Minuzzo T., Signell R.P. and M. Sclavo, 2012. From interoperability to knowledge discovery using large model datasets in the marine environment: the THREDDS Data Server example. Advances in Oceanography and Limnology, 3(1), 41-50. DOI:10.1080/19475721.2012.669637
Assessing the Interdisciplinary Use of Socioeconomic and Remote Sensing Data in the Earth Sciences

NASA Astrophysics Data System (ADS)

Chen, R. S.; Downs, R. R.; Schumacher, J.

2013-12-01

Remotely sensed data are widely used in Earth science research and applications not just to improve understanding of natural systems but also to elucidate interactions between natural and human systems and to model and predict human impacts on the environment, whether planned or unplanned. It is therefore often necessary for both remote sensing and socioeconomic data to be used together in both Earth science and social science research, for example in modeling past, present, and future land cover change, in assessing societal vulnerability to geophysical and climatological hazards, in measuring the human health impacts of air and water pollution, or in developing improved approaches to managing water, ecological, and other resources. The NASA Socioeconomic Data and Applications Center (SEDAC) was established as part of the Earth Observing System Data and Information System (EOSDIS) to facilitate access to and use of socioeconomic data in conjunction with remote sensing data in both research and applications. SEDAC provides access both to socioeconomic data that have been transformed into forms more readily usable by Earth scientists and other users, and to integrated datasets that incorporate both socioeconomic and remote sensing data. SEDAC data have been cited in at least 2,000 scientific papers covering a wide range of scientific disciplines and problem areas. In many cases, SEDAC data are cited in these papers along with other remote sensing datasets available from NASA or other sources. However, such citations do not necessarily indicate significant, integrated use of SEDAC and remote sensing data. To assess the level and type of integrated data use, we analyze a selection of recent SEDAC data citations in Earth science journals to characterize the ways in which SEDAC data have been used in the underlying research project and the paper itself. Papers were selected based on the presence of a SEDAC data citation and one or more keywords related to a remote sensing instrument or dataset. We assess if and how the SEDAC and remote sensing data are used together, e.g., in an empirical analysis, model, and/or visualization. We also ascertain the multidisciplinary backgrounds of the author or authors, as well as the Web of Science category and impact factor associated with the journal, to help characterize the user community and the overall scientific impact of the data use. Another issue is whether or not authors are formally citing SEDAC data and remote sensing in reference sections as opposed to referring to data informally, e.g., in figure captions. A key challenge in promoting the cross-disciplinary use of scientific data is the identification of ways in which scientists and other users not only access data from other disciplines but also use these data in their research. Objective assessment of scientific outputs such as the peer-reviewed scientific literature provides important insight into how individual scientists and scientific teams are taking advantage of the ongoing explosion in the variety and quantity of digital data from multiple disciplines to address pressing research problems and applications.
Lakatos' Scientific Research Programmes as a Framework for Analysing Informal Argumentation about Socio-Scientific Issues

ERIC Educational Resources Information Center

Chang, Shu-Nu; Chiu, Mei-Hung

2008-01-01

The purpose of this study is to explore how Lakatos' scientific research programmes might serve as a theoretical framework for representing and evaluating informal argumentation about socio-scientific issues. Seventy undergraduate science and non-science majors were asked to make written arguments about four socio-scientific issues. Our analysis…
International Ultraviolet Explorer Observatory operations

NASA Technical Reports Server (NTRS)

1985-01-01

This volume contains the final report for the International Ultraviolet Explorer IUE Observatory Operations contract. The fundamental operational objective of the International Ultraviolet Explorer (IUE) program is to translate competitively selected observing programs into IUE observations, to reduce these observations into meaningful scientific data, and then to present these data to the Guest Observer in a form amenable to the pursuit of scientific research. The IUE Observatory is the key to this objective since it is the central control and support facility for all science operations functions within the IUE Project. In carrying out the operation of this facility, a number of complex functions were provided beginning with telescope scheduling and operation, proceeding to data processing, and ending with data distribution and scientific data analysis. In support of these critical-path functions, a number of other significant activities were also provided, including scientific instrument calibration, systems analysis, and software support. Routine activities have been summarized briefly whenever possible.
A robust background regression based score estimation algorithm for hyperspectral anomaly detection

NASA Astrophysics Data System (ADS)

Zhao, Rui; Du, Bo; Zhang, Liangpei; Zhang, Lefei

2016-12-01

Anomaly detection has become a hot topic in the hyperspectral image analysis and processing fields in recent years. The most important issue for hyperspectral anomaly detection is the background estimation and suppression. Unreasonable or non-robust background estimation usually leads to unsatisfactory anomaly detection results. Furthermore, the inherent nonlinearity of hyperspectral images may cover up the intrinsic data structure in the anomaly detection. In order to implement robust background estimation, as well as to explore the intrinsic data structure of the hyperspectral image, we propose a robust background regression based score estimation algorithm (RBRSE) for hyperspectral anomaly detection. The Robust Background Regression (RBR) is actually a label assignment procedure which segments the hyperspectral data into a robust background dataset and a potential anomaly dataset with an intersection boundary. In the RBR, a kernel expansion technique, which explores the nonlinear structure of the hyperspectral data in a reproducing kernel Hilbert space, is utilized to formulate the data as a density feature representation. A minimum squared loss relationship is constructed between the data density feature and the corresponding assigned labels of the hyperspectral data, to formulate the foundation of the regression. Furthermore, a manifold regularization term which explores the manifold smoothness of the hyperspectral data, and a maximization term of the robust background average density, which suppresses the bias caused by the potential anomalies, are jointly appended in the RBR procedure. After this, a paired-dataset based k-nn score estimation method is undertaken on the robust background and potential anomaly datasets, to implement the detection output. The experimental results show that RBRSE achieves superior ROC curves, AUC values, and background-anomaly separation than some of the other state-of-the-art anomaly detection methods, and is easy to implement in practice.
iSBatch: a batch-processing platform for data analysis and exploration of live-cell single-molecule microscopy images and other hierarchical datasets.

PubMed

Caldas, Victor E A; Punter, Christiaan M; Ghodke, Harshad; Robinson, Andrew; van Oijen, Antoine M

2015-10-01

Recent technical advances have made it possible to visualize single molecules inside live cells. Microscopes with single-molecule sensitivity enable the imaging of low-abundance proteins, allowing for a quantitative characterization of molecular properties. Such data sets contain information on a wide spectrum of important molecular properties, with different aspects highlighted in different imaging strategies. The time-lapsed acquisition of images provides information on protein dynamics over long time scales, giving insight into expression dynamics and localization properties. Rapid burst imaging reveals properties of individual molecules in real-time, informing on their diffusion characteristics, binding dynamics and stoichiometries within complexes. This richness of information, however, adds significant complexity to analysis protocols. In general, large datasets of images must be collected and processed in order to produce statistically robust results and identify rare events. More importantly, as live-cell single-molecule measurements remain on the cutting edge of imaging, few protocols for analysis have been established and thus analysis strategies often need to be explored for each individual scenario. Existing analysis packages are geared towards either single-cell imaging data or in vitro single-molecule data and typically operate with highly specific algorithms developed for particular situations. Our tool, iSBatch, instead allows users to exploit the inherent flexibility of the popular open-source package ImageJ, providing a hierarchical framework in which existing plugins or custom macros may be executed over entire datasets or portions thereof. This strategy affords users freedom to explore new analysis protocols within large imaging datasets, while maintaining hierarchical relationships between experiments, samples, fields of view, cells, and individual molecules.
Analyzing and synthesizing phylogenies using tree alignment graphs.

PubMed

Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E

2013-01-01

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs

PubMed Central

Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.

2013-01-01

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118
Exploring Transcription Factors-microRNAs Co-regulation Networks in Schizophrenia.

PubMed

Xu, Yong; Yue, Weihua; Yao Shugart, Yin; Li, Sheng; Cai, Lei; Li, Qiang; Cheng, Zaohuo; Wang, Guoqiang; Zhou, Zhenhe; Jin, Chunhui; Yuan, Jianmin; Tian, Lin; Wang, Jun; Zhang, Kai; Zhang, Kerang; Liu, Sha; Song, Yuqing; Zhang, Fuquan

2016-07-01

Transcriptional factors (TFs) and microRNAs (miRNAs) have been recognized as 2 classes of principal gene regulators that may be responsible for genome coexpression changes observed in schizophrenia (SZ). This study aims to (1) identify differentially coexpressed genes (DCGs) in 3 mRNA expression microarray datasets; (2) explore potential interactions among the DCGs, and differentially expressed miRNAs identified in our dataset composed of early-onset SZ patients and healthy controls; (3) validate expression levels of some key transcripts; and (4) explore the druggability of DCGs using the curated database. We detected a differential coexpression network associated with SZ and found that 9 out of the 12 regulators were replicated in either of the 2 other datasets. Leveraging the differentially expressed miRNAs identified in our previous dataset, we constructed a miRNA-TF-gene network relevant to SZ, including an EGR1-miR-124-3p-SKIL feed-forward loop. Our real-time quantitative PCR analysis indicated the overexpression of miR-124-3p, the under expression of SKIL and EGR1 in the blood of SZ patients compared with controls, and the direction of change of miR-124-3p and SKIL mRNA levels in SZ cases were reversed after a 12-week treatment cycle. Our druggability analysis revealed that many of these genes have the potential to be drug targets. Together, our results suggest that coexpression network abnormalities driven by combinatorial and interactive action from TFs and miRNAs may contribute to the development of SZ and be relevant to the clinical treatment of the disease. © The Author 2015. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Exploring Transcription Factors-microRNAs Co-regulation Networks in Schizophrenia

PubMed Central

Xu, Yong; Yue, Weihua; Yao Shugart, Yin; Li, Sheng; Cai, Lei; Li, Qiang; Cheng, Zaohuo; Wang, Guoqiang; Zhou, Zhenhe; Jin, Chunhui; Yuan, Jianmin; Tian, Lin; Wang, Jun; Zhang, Kai; Zhang, Kerang; Liu, Sha; Song, Yuqing; Zhang, Fuquan

2016-01-01

Background: Transcriptional factors (TFs) and microRNAs (miRNAs) have been recognized as 2 classes of principal gene regulators that may be responsible for genome coexpression changes observed in schizophrenia (SZ). Methods: This study aims to (1) identify differentially coexpressed genes (DCGs) in 3 mRNA expression microarray datasets; (2) explore potential interactions among the DCGs, and differentially expressed miRNAs identified in our dataset composed of early-onset SZ patients and healthy controls; (3) validate expression levels of some key transcripts; and (4) explore the druggability of DCGs using the curated database. Results: We detected a differential coexpression network associated with SZ and found that 9 out of the 12 regulators were replicated in either of the 2 other datasets. Leveraging the differentially expressed miRNAs identified in our previous dataset, we constructed a miRNA–TF–gene network relevant to SZ, including an EGR1–miR-124-3p–SKIL feed-forward loop. Our real-time quantitative PCR analysis indicated the overexpression of miR-124-3p, the under expression of SKIL and EGR1 in the blood of SZ patients compared with controls, and the direction of change of miR-124-3p and SKIL mRNA levels in SZ cases were reversed after a 12-week treatment cycle. Our druggability analysis revealed that many of these genes have the potential to be drug targets. Conclusions: Together, our results suggest that coexpression network abnormalities driven by combinatorial and interactive action from TFs and miRNAs may contribute to the development of SZ and be relevant to the clinical treatment of the disease. PMID:26609121
Implementing DOIs for Oceanographic Satellite Data at PO.DAAC

NASA Astrophysics Data System (ADS)

Hausman, J.; Tauer, E.; Chung, N.; Chen, C.; Moroni, D. F.

2013-12-01

The Physical Oceanographic Distributed Active Archive Center (PO.DAAC) is NASA's archive for physical oceanographic satellite data. It distributes over 500 datasets from gravity, ocean wind, sea surface topography, sea ice, ocean currents, salinity, and sea surface temperature satellite missions. A dataset is a collection of granules/files that share the same mission/project, versioning, processing level, spatial, and temporal characteristics. The large number of datasets is partially due to the number of satellite missions, but mostly because a single satellite mission typically has multiple versions or even temporal and spatial resolutions of data. As a result, a user might mistake one dataset for a different dataset from the same satellite mission. Due to the PO.DAAC'S vast variety and volume of data and growing requirements to report dataset usage, it has begun implementing DOIs for the datasets it archives and distributes. However, this was not as simple as registering a name for a DOI and providing a URL. Before implementing DOIs multiple questions needed to be answered. What are the sponsor and end-user expectations regarding DOIs? At what level does a DOI get assigned (dataset, file/granule)? Do all data get a DOI, or only selected data? How do we create a DOI? How do we create landing pages and manage them? What changes need to be made to the data archive, life cycle policy and web portal to accommodate DOIs? What if the data also exists at another archive and a DOI already exists? How is a DOI included if the data were obtained via a subsetting tool? How does a researcher or author provide a unique, definitive reference (standard citation) for a given dataset? This presentation will discuss how these questions were answered through changes in policy, process, and system design. Implementing DOIs is not a trivial undertaking, but as DOIs are rapidly becoming the de facto approach, it is worth the effort. Researchers have historically referenced the source satellite and data center (or archive), but scientific writings do not typically provide enough detail to point to a singular, uniquely identifiable dataset. DOIs provide the means to help researchers be precise in their data citations and provide needed clarity, standardization and permanence.
Electron microprobe analyses of glasses from Kīlauea tephra units, Kīlauea Volcano, Hawaii

USGS Publications Warehouse

Helz, Rosalind L.; Clague, David A.; Mastin, Larry G.; Rose, Timothy R.

2014-01-01

This report presents approximately 2,100 glass analyses from three tephra units of Kīlauea Volcano: the Keanakākoʻi Tephra, the Kulanaokuaiki Tephra, and the Pāhala Ash. It also includes some new analyses obtained as part of a re-evaluation of the MgO contents of glasses in two of the three original datasets; this re-evaluation was conducted to improve the consistency of glass MgO contents among the three datasets. The glass data are a principal focus of Helz and others (in press), which will appear in the AGU Monograph Hawaiian Volcanoes—From Source to Surface. The report is intended to support this publication, in addition to making the data available to the scientific community.
Big Data Provenance: Challenges, State of the Art and Opportunities.

PubMed

Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay

2015-01-01

Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.
Exploring English Language Learners (ELL) Experiences with Scientific Language and Inquiry within a Real Life Context

ERIC Educational Resources Information Center

Algee, Lisa M.

2012-01-01

English Language Learners (ELL) are often at a distinct disadvantage from receiving authentic science learning opportunites. This study explored English Language Learners (ELL) learning experiences with scientific language and inquiry within a real life context. This research was theoretically informed by sociocultural theory and literature on…
Exploring the Changes in Students' Understanding of the Scientific Method Using Word Associations

ERIC Educational Resources Information Center

Gulacar, Ozcan; Sinan, Olcay; Bowman, Charles R.; Yildirim, Yetkin

2015-01-01

A study is presented that explores how students' knowledge structures, as related to the scientific method, compare at different student ages. A word association test comprised of ten total stimulus words, among them "experiment," "science fair," and "hypothesis," is used to probe the students' knowledge structures.…
The Texture of Educational Inquiry: An Exploration of George Herbert Mead's Concept of the Scientific.

ERIC Educational Resources Information Center

Franzosa, Susan Douglas

1984-01-01

Explores the implications of Mead's philosophic social psychology for current disputes concerning the nature of the scientific in educational studies. Mead's contextualization of the knower and the known are found to be compatible with a contemporary critique of positivist paradigms and a critical reconceptualization of educational inquiry.…
Informal Formative Assessment and Scientific Inquiry: Exploring Teachers' Practices and Student Learning

ERIC Educational Resources Information Center

Ruiz-Primo, Maria Araceli; Furtak, Erin Marie

2006-01-01

What does informal formative assessment look like in the context of scientific inquiry teaching? Is it possible to identify different levels of informal assessment practices? Can different levels of informal assessment practices be related to levels of student learning? This study addresses these issues by exploring how 4 middle school science…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.