national hydrography dataset: Topics by Science.gov

Sample records for national hydrography dataset

NATIONAL HYDROGRAPHY DATASET

EPA Science Inventory

Resource Purpose:The National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that contains information about surface water features such as lakes, ponds, streams, rivers, springs and wells. Within the NHD, surface water features are combined to fo...
The National Hydrography Dataset

USGS Publications Warehouse

,

1999-01-01

The National Hydrography Dataset (NHD) is a newly combined dataset that provides hydrographic data for the United States. The NHD is the culmination of recent cooperative efforts of the U.S. Environmental Protection Agency (USEPA) and the U.S. Geological Survey (USGS). It combines elements of USGS digital line graph (DLG) hydrography files and the USEPA Reach File (RF3). The NHD supersedes RF3 and DLG files by incorporating them, not by replacing them. Users of RF3 or DLG files will find the same data in a new, more flexible format. They will find that the NHD is familiar but greatly expanded and refined. The DLG files contribute a national coverage of millions of features, including water bodies such as lakes and ponds, linear water features such as streams and rivers, and also point features such as springs and wells. These files provide standardized feature types, delineation, and spatial accuracy. From RF3, the NHD acquires hydrographic sequencing, upstream and downstream navigation for modeling applications, and reach codes. The reach codes provide a way to integrate data from organizations at all levels by linking the data to this nationally consistent hydrographic network. The feature names are from the Geographic Names Information System (GNIS). The NHD provides comprehensive coverage of hydrographic data for the United States. Some of the anticipated end-user applications of the NHD are multiuse hydrographic modeling and water-quality studies of fish habitats. Although based on 1:100,000-scale data, the NHD is planned so that it can incorporate and encourage the development of the higher resolution data that many users require. The NHD can be used to promote the exchange of data between users at the national, State, and local levels. Many users will benefit from the NHD and will want to contribute to the dataset as well.
National Hydrography Dataset (NHD)

USGS Publications Warehouse

,

2001-01-01

The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that make up the nation's surface water drainage system. NHD data was originally developed at 1:100,000 scale and exists at that scale for the whole country. High resolution NHD adds detail to the original 1:100,000-scale NHD. (Data for Alaska, Puerto Rico and the Virgin Islands was developed at high-resolution, not 1:100,000 scale.) Like the 1:100,000-scale NHD, high resolution NHD contains reach codes for networked features and isolated lakes, flow direction, names, stream level, and centerline representations for areal water bodies. Reaches are also defined to represent waterbodies and the approximate shorelines of the Great Lakes, the Atlantic and Pacific Oceans and the Gulf of Mexico. The NHD also incorporates the National Spatial Data Infrastructure framework criteria set out by the Federal Geographic Data Committee.
National Hydrography Dataset Plus (NHDPlus)

EPA Pesticide Factsheets

The NHDPlus Version 1.0 is an integrated suite of application-ready geospatial data sets that incorporate many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,000-scale NHD), improved networking, naming, and value-added attributes (VAA's). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainageenforcement technique first broadly applied in New England, and thus dubbed The New-England Method. This technique involves burning-in the 1:100,000-scale NHD and when available building walls using the national WatershedBoundary Dataset (WBD). The resulting modified digital elevation model(HydroDEM) is used to produce hydrologic derivatives that agree with the NHDand WBD. An interdisciplinary team from the U. S. Geological Survey (USGS), U.S. Environmental Protection Agency (USEPA), and contractors, over the lasttwo years has found this method to produce the best quality NHD catchments using an automated process.The VAAs include greatly enhanced capabilities for upstream and downstream navigation, analysis and modeling. Examples include: retrieve all flowlines (predominantly confluence-to-confluence stream segments) and catchments upstream of a given flowline using queries rather than by slower flowline-by flowline navigation; retrieve flowlines by stream order; subset a stream level path sorted in hydrologic order for st
Production of a national 1:1,000,000-scale hydrography dataset for the United States: feature selection, simplification, and refinement

USGS Publications Warehouse

Gary, Robin H.; Wilson, Zachary D.; Archuleta, Christy-Ann M.; Thompson, Florence E.; Vrabel, Joseph

2009-01-01

During 2006-09, the U.S. Geological Survey, in cooperation with the National Atlas of the United States, produced a 1:1,000,000-scale (1:1M) hydrography dataset comprising streams and waterbodies for the entire United States, including Puerto Rico and the U.S. Virgin Islands, for inclusion in the recompiled National Atlas. This report documents the methods used to select, simplify, and refine features in the 1:100,000-scale (1:100K) (1:63,360-scale in Alaska) National Hydrography Dataset to create the national 1:1M hydrography dataset. Custom tools and semi-automated processes were created to facilitate generalization of the 1:100K National Hydrography Dataset (1:63,360-scale in Alaska) to 1:1M on the basis of existing small-scale hydrography datasets. The first step in creating the new 1:1M dataset was to address feature selection and optimal data density in the streams network. Several existing methods were evaluated. The production method that was established for selecting features for inclusion in the 1:1M dataset uses a combination of the existing attributes and network in the National Hydrography Dataset and several of the concepts from the methods evaluated. The process for creating the 1:1M waterbodies dataset required a similar approach to that used for the streams dataset. Geometric simplification of features was the next step. Stream reaches and waterbodies indicated in the feature selection process were exported as new feature classes and then simplified using a geographic information system tool. The final step was refinement of the 1:1M streams and waterbodies. Refinement was done through the use of additional geographic information system tools.
Hydrography change detection: the usefulness of surface channels derived From LiDAR DEMs for updating mapped hydrography

USGS Publications Warehouse

Poppenga, Sandra K.; Gesch, Dean B.; Worstell, Bruce B.

2013-01-01

The 1:24,000-scale high-resolution National Hydrography Dataset (NHD) mapped hydrography flow lines require regular updating because land surface conditions that affect surface channel drainage change over time. Historically, NHD flow lines were created by digitizing surface water information from aerial photography and paper maps. Using these same methods to update nationwide NHD flow lines is costly and inefficient; furthermore, these methods result in hydrography that lacks the horizontal and vertical accuracy needed for fully integrated datasets useful for mapping and scientific investigations. Effective methods for improving mapped hydrography employ change detection analysis of surface channels derived from light detection and ranging (LiDAR) digital elevation models (DEMs) and NHD flow lines. In this article, we describe the usefulness of surface channels derived from LiDAR DEMs for hydrography change detection to derive spatially accurate and time-relevant mapped hydrography. The methods employ analyses of horizontal and vertical differences between LiDAR-derived surface channels and NHD flow lines to define candidate locations of hydrography change. These methods alleviate the need to analyze and update the nationwide NHD for time relevant hydrography, and provide an avenue for updating the dataset where change has occurred.
Evaluation of catchment delineation methods for the medium-resolution National Hydrography Dataset

USGS Publications Warehouse

Johnston, Craig M.; Dewald, Thomas G.; Bondelid, Timothy R.; Worstell, Bruce B.; McKay, Lucinda D.; Rea, Alan; Moore, Richard B.; Goodall, Jonathan L.

2009-01-01

Different methods for determining catchments (incremental drainage areas) for stream segments of the medium-resolution (1:100,000-scale) National Hydrography Dataset (NHD) were evaluated by the U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency (USEPA). The NHD is a comprehensive set of digital spatial data that contains information about surface-water features (such as lakes, ponds, streams, and rivers) of the United States. The need for NHD catchments was driven primarily by the goal to estimate NHD streamflow and velocity to support water-quality modeling. The application of catchments for this purpose also demonstrates the broader value of NHD catchments for supporting landscape characterization and analysis. Five catchment delineation methods were evaluated. Four of the methods use topographic information for the delineation of the NHD catchments. These methods include the Raster Seeding Method; two variants of a method first used in a USGS New England study-one used the Watershed Boundary Dataset (WBD) and the other did not-termed the 'New England Methods'; and the Outlet Matching Method. For these topographically based methods, the elevation data source was the 30-meter (m) resolution National Elevation Dataset (NED), as this was the highest resolution available for the conterminous United States and Hawaii. The fifth method evaluated, the Thiessen Polygon Method, uses distance to the nearest NHD stream segments to determine catchment boundaries. Catchments were generated using each method for NHD stream segments within six hydrologically and geographically distinct Subbasins to evaluate the applicability of the method across the United States. The five methods were evaluated by comparing the resulting catchments with the boundaries and the computed area measurements available from several verification datasets that were developed independently using manual methods. The results of the evaluation indicated that the two
The Great Lakes Hydrography Dataset: Consistent, binational ...

EPA Pesticide Factsheets

Ecosystem-based management of the Laurentian Great Lakes, which spans both the United States and Canada, is hampered by the lack of consistent binational watersheds for the entire Basin. Using comparable data sources and consistent methods we developed spatially equivalent watershed boundaries for the binational extent of the Basin to create the Great Lakes Hydrography Dataset (GLHD). The GLHD consists of 5,589 watersheds for the entire Basin, covering a total area of approximately 547,967 km2, or about twice the 247,003 km2 surface water area of the Great Lakes. The GLHD improves upon existing watershed efforts by delineating watersheds for the entire Basin using consistent methods; enhancing the precision of watershed delineation by using recently developed flow direction grids that have been hydrologically enforced and vetted by provincial and federal water resource agencies; and increasing the accuracy of watershed boundaries by enforcing embayments, delineating watersheds on islands, and delineating watersheds for all tributaries draining to connecting channels. In addition, the GLHD is packaged in a publically available geodatabase that includes synthetic stream networks, reach catchments, watershed boundaries, a broad set of attribute data for each tributary, and metadata documenting methodology. The GLHD provides a common set of watersheds and associated hydrography data for the Basin that will enhance binational efforts to protect and restore the Great
NHDPlus (National Hydrography Dataset Plus)

EPA Pesticide Factsheets

NHDPlus is a geospatial, hydrologic framework dataset that is intended for use by geospatial analysts and modelers to support water resources related applications. NHDPlus was developed by the USEPA in partnership with the US Geologic Survey
The National Map hydrography data stewardship: what is it and why is it important?

USGS Publications Warehouse

Arnold, Dave

2014-01-01

The National Hydrography Dataset (NHD) and Watershed Boundary Dataset (WBD) were designed and populated by a large consortium of agencies involved in hydrography across the United States. The effort was led by the U.S. Geological Survey (USGS), the U.S. Environmental Protection Agency (EPA), and the Natural Resources Conservation Service (NRCS). The high-resolution NHD dataset, completed in 2007, is based on the USGS 7.5-minute series topographic maps at a scale of 1:24,000. There are now 26 million features in the NHD representing a 7.5 million mile stream network with over 6.5 million waterbodies. The six-level WBD, completed in 2010, is based on 1:24,000 scale data and contains over 23,000 watershed polygons. The NHD’s flow network, attribution, and linear referencing are used to conduct extensive scientific analyses. The NHD is ideal for cartographic applications such as the US Topo topographic map series, and also is available on the Geospatial Platform, which provides shared and trusted geospatial data, services, and applications for use by government agencies, their partners, and the public. The WBD watersheds are used by scientists and managers to identify discrete drainage areas. The ongoing maintenance of the NHD and WBD is essential for improving these datasets to meet the ever increasing demand for currency, additional detail, and more significant attribution. The best source of information about changes in local hydrography are users closest to the data, such as State and local governments, as well as Federal land management agencies, and other users of the data. The need for local knowledge has led to the creation of a collaborative data stewardship process to revise and maintain the NHD.
Feature pruning by upstream drainage area to support automated generalization of the United States National Hydrography Dataset

USGS Publications Warehouse

Stanislawski, L.V.

2009-01-01

The United States Geological Survey has been researching generalization approaches to enable multiple-scale display and delivery of geographic data. This paper presents automated methods to prune network and polygon features of the United States high-resolution National Hydrography Dataset (NHD) to lower resolutions. Feature-pruning rules, data enrichment, and partitioning are derived from knowledge of surface water, the NHD model, and associated feature specification standards. Relative prominence of network features is estimated from upstream drainage area (UDA). Network and polygon features are pruned by UDA and NHD reach code to achieve a drainage density appropriate for any less detailed map scale. Data partitioning maintains local drainage density variations that characterize the terrain. For demonstration, a 48 subbasin area of 1:24 000-scale NHD was pruned to 1:100 000-scale (100 K) and compared to a benchmark, the 100 K NHD. The coefficient of line correspondence (CLC) is used to evaluate how well pruned network features match the benchmark network. CLC values of 0.82 and 0.77 result from pruning with and without partitioning, respectively. The number of polygons that remain after pruning is about seven times that of the benchmark, but the area covered by the polygons that remain after pruning is only about 10% greater than the area covered by benchmark polygons. ?? 2009.
National hydrography dataset--linear referencing

USGS Publications Warehouse

Simley, Jeffrey; Doumbouya, Ariel

2012-01-01

Geospatial data normally have a certain set of standard attributes, such as an identification number, the type of feature, and name of the feature. These standard attributes are typically embedded into the default attribute table, which is directly linked to the geospatial features. However, it is impractical to embed too much information because it can create a complex, inflexible, and hard to maintain geospatial dataset. Many scientists prefer to create a modular, or relational, data design where the information about the features is stored and maintained separately, then linked to the geospatial data. For example, information about the water chemistry of a lake can be maintained in a separate file and linked to the lake. A Geographic Information System (GIS) can then relate the water chemistry to the lake and analyze it as one piece of information. For example, the GIS can select all lakes more than 50 acres, with turbidity greater than 1.5 milligrams per liter.
Adapting generalization tools to physiographic diversity for the united states national hydrography dataset

USGS Publications Warehouse

Buttenfield, B.P.; Stanislawski, L.V.; Brewer, C.A.

2011-01-01

This paper reports on generalization and data modeling to create reduced scale versions of the National Hydrographic Dataset (NHD) for dissemination through The National Map, the primary data delivery portal for USGS. Our approach distinguishes local differences in physiographic factors, to demonstrate that knowledge about varying terrain (mountainous, hilly or flat) and varying climate (dry or humid) can support decisions about algorithms, parameters, and processing sequences to create generalized, smaller scale data versions which preserve distinct hydrographic patterns in these regions. We work with multiple subbasins of the NHD that provide a range of terrain and climate characteristics. Specifically tailored generalization sequences are used to create simplified versions of the high resolution data, which was compiled for 1:24,000 scale mapping. Results are evaluated cartographically and metrically against a medium resolution benchmark version compiled for 1:100,000, developing coefficients of linear and areal correspondence.
NHDPlusHR: A national geospatial framework for surface-water information

USGS Publications Warehouse

Viger, Roland; Rea, Alan H.; Simley, Jeffrey D.; Hanson, Karen M.

2016-01-01

The U.S. Geological Survey is developing a new geospatial hydrographic framework for the United States, called the National Hydrography Dataset Plus High Resolution (NHDPlusHR), that integrates a diversity of the best-available information, robustly supports ongoing dataset improvements, enables hydrographic generalization to derive alternate representations of the network while maintaining feature identity, and supports modern scientific computing and Internet accessibility needs. This framework is based on the High Resolution National Hydrography Dataset, the Watershed Boundaries Dataset, and elevation from the 3-D Elevation Program, and will provide an authoritative, high precision, and attribute-rich geospatial framework for surface-water information for the United States. Using this common geospatial framework will provide a consistent basis for indexing water information in the United States, eliminate redundancy, and harmonize access to, and exchange of water information.
Geospatial datasets for watershed delineation and characterization used in the Hawaii StreamStats web application

USGS Publications Warehouse

Rea, Alan; Skinner, Kenneth D.

2012-01-01

The U.S. Geological Survey Hawaii StreamStats application uses an integrated suite of raster and vector geospatial datasets to delineate and characterize watersheds. The geospatial datasets used to delineate and characterize watersheds on the StreamStats website, and the methods used to develop the datasets are described in this report. The datasets for Hawaii were derived primarily from 10 meter resolution National Elevation Dataset (NED) elevation models, and the National Hydrography Dataset (NHD), using a set of procedures designed to enforce the drainage pattern from the NHD into the NED, resulting in an integrated suite of elevation-derived datasets. Additional sources of data used for computing basin characteristics include precipitation, land cover, soil permeability, and elevation-derivative datasets. The report also includes links for metadata and downloads of the geospatial datasets.
EMODNet Hydrography - Seabed Mapping - Developing a higher resolution digital bathymetry for the European seas

NASA Astrophysics Data System (ADS)

Schaap, Dick M. A.; Moussat, Eric

2013-04-01

In December 2007 the European Parliament and Council adopted the Marine Strategy Framework Directive (MSFD) which aims to achieve environmentally healthy marine waters by 2020. This Directive includes an initiative for an overarching European Marine Observation and Data Network (EMODNet). The EMODNet Hydrography - Seabed Mapping projects made good progress in developing the EMODNet Hydrography portal to provide overview and access to available bathymetric survey datasets and to generate an harmonised digital bathymetry for Europe's sea basins. Up till end 2012 more than 8400 bathymetric survey datasets, managed by 14 data centres from 9 countries and originated from 118 institutes, have been gathered and populated in the EMODNet Hydrography Data Discovery and Access service, adopting SeaDataNet standards. These datasets have been used as input for analysing and generating the EMODNet digital terrain model (DTM), so far for the following sea basins: • the Greater North Sea, including the Kattegat • the English Channel and Celtic Seas • Western and Central Mediterranean Sea and Ionian Sea • Bay of Biscay, Iberian coast and North-East Atlantic • Adriatic Sea • Aegean - Levantine Sea (Eastern Mediterranean). • Azores - Madeira EEZ The Hydrography Viewing service gives users wide functionality for viewing and downloading the EMODNet digital bathymetry: • water depth in gridded form on a DTM grid of a quarter a minute of longitude and latitude • option to view QC parameters of individual DTM cells and references to source data • option to download DTM tiles in different formats: ESRI ASCII, XYZ, CSV, NetCDF (CF), GeoTiff and SD for Fledermaus 3 D viewer software • option for users to create their Personal Layer and to upload multibeam survey ASCII datasets for automatic processing into personal DTMs following the EMODNet standards The NetCDF (CF) DTM files are fit for use in a special 3D Viewer software package which is based on the existing open
Secondary analysis of national survey datasets.

PubMed

Boo, Sunjoo; Froelicher, Erika Sivarajan

2013-06-01

This paper describes the methodological issues associated with secondary analysis of large national survey datasets. Issues about survey sampling, data collection, and non-response and missing data in terms of methodological validity and reliability are discussed. Although reanalyzing large national survey datasets is an expedient and cost-efficient way of producing nursing knowledge, successful investigations require a methodological consideration of the intrinsic limitations of secondary survey analysis. Nursing researchers using existing national survey datasets should understand potential sources of error associated with survey sampling, data collection, and non-response and missing data. Although it is impossible to eliminate all potential errors, researchers using existing national survey datasets must be aware of the possible influence of errors on the results of the analyses. © 2012 The Authors. Japan Journal of Nursing Science © 2012 Japan Academy of Nursing Science.
High performance computing to support multiscale representation of hydrography for the conterminous United States

USGS Publications Warehouse

Stanislawski, Larry V.; Liu, Yan; Buttenfield, Barbara P.; Survila, Kornelijus; Wendel, Jeffrey; Okok, Abdurraouf

2016-01-01

The National Hydrography Dataset (NHD) for the United States furnishes a comprehensive set of vector features representing the surface-waters in the country (U.S. Geological Survey 2000). The high-resolution (HR) layer of the NHD is largely comprised of hydrographic features originally derived from 1:24,000-scale (24K) U.S. Topographic maps. However, in recent years (2009 to present) densified hydrographic feature content, from sources as large as 1:2,400, have been incorporated into some watersheds of the HR NHD within the conterminous United States to better support the needs of various local and state organizations. As such, the HR NHD is a multiresolution dataset with obvious data density variations because of scale changes. In addition, data density variations exist within the HR NHD that are particularly evident in the surface-water flow network (NHD flowlines) because of natural variations of local geographic conditions; and also because of unintentional compilation inconsistencies due to variations in data collection standards and climate conditions over the many years of 24K hydrographic data collection (US Geological Survey 1955).
National Elevation Dataset

USGS Publications Warehouse

,

1999-01-01

The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey (USGS). The NED is designed to provide national elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, permit edge matching, and fill sliver areas of missing data.
Analyzing legacy U.S. Geological Survey geochemical databases using GIS: applications for a national mineral resource assessment

USGS Publications Warehouse

Yager, Douglas B.; Hofstra, Albert H.; Granitto, Matthew

2012-01-01

This report emphasizes geographic information system analysis and the display of data stored in the legacy U.S. Geological Survey National Geochemical Database for use in mineral resource investigations. Geochemical analyses of soils, stream sediments, and rocks that are archived in the National Geochemical Database provide an extensive data source for investigating geochemical anomalies. A study area in the Egan Range of east-central Nevada was used to develop a geographic information system analysis methodology for two different geochemical datasets involving detailed (Bureau of Land Management Wilderness) and reconnaissance-scale (National Uranium Resource Evaluation) investigations. ArcGIS was used to analyze and thematically map geochemical information at point locations. Watershed-boundary datasets served as a geographic reference to relate potentially anomalous sample sites with hydrologic unit codes at varying scales. The National Hydrography Dataset was analyzed with Hydrography Event Management and ArcGIS Utility Network Analyst tools to delineate potential sediment-sample provenance along a stream network. These tools can be used to track potential upstream-sediment-contributing areas to a sample site. This methodology identifies geochemically anomalous sample sites, watersheds, and streams that could help focus mineral resource investigations in the field.

Wind Integration National Dataset Toolkit | Grid Modernization | NREL

Science.gov Websites

information, share tips The WIND Toolkit includes meteorological conditions and turbine power for more than Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and
Channel Classification across Arid West Landscapes in Support of OHW Delineation

DTIC Science & Technology

2013-01-01

8 Figure 5. National Hydrography Dataset for Chinle Creek, AZ...the OHW boundary is determined by observing recent physical evidence subsequent to flow. Channel morphology and physical features associated with the...data from the National Hydrography Dataset (NHD) (USGS 2010). The NHD digital ERDC/CRREL TR-13-3 9 stream data were downloaded as a line
Evaluation, Calibration and Comparison of the Precipitation-Runoff Modeling System (PRMS) National Hydrologic Model (NHM) Using Moderate Resolution Imaging Spectroradiometer (MODIS) and Snow Data Assimilation System (SNODAS) Gridded Datasets

NASA Astrophysics Data System (ADS)

Norton, P. A., II; Haj, A. E., Jr.

2014-12-01

The United States Geological Survey is currently developing a National Hydrologic Model (NHM) to support and facilitate coordinated and consistent hydrologic modeling efforts at the scale of the continental United States. As part of this effort, the Geospatial Fabric (GF) for the NHM was created. The GF is a database that contains parameters derived from datasets that characterize the physical features of watersheds. The GF was used to aggregate catchments and flowlines defined in the National Hydrography Dataset Plus dataset for more than 100,000 hydrologic response units (HRUs), and to establish initial parameter values for input to the Precipitation-Runoff Modeling System (PRMS). Many parameter values are adjusted in PRMS using an automated calibration process. Using these adjusted parameter values, the PRMS model estimated variables such as evapotranspiration (ET), potential evapotranspiration (PET), snow-covered area (SCA), and snow water equivalent (SWE). In order to evaluate the effectiveness of parameter calibration, and model performance in general, several satellite-based Moderate Resolution Imaging Spectroradiometer (MODIS) and Snow Data Assimilation System (SNODAS) gridded datasets including ET, PET, SCA, and SWE were compared to PRMS-simulated values. The MODIS and SNODAS data were spatially averaged for each HRU, and compared to PRMS-simulated ET, PET, SCA, and SWE values for each HRU in the Upper Missouri River watershed. Default initial GF parameter values and PRMS calibration ranges were evaluated. Evaluation results, and the use of MODIS and SNODAS datasets to update GF parameter values and PRMS calibration ranges, are presented and discussed.
Digital Mapping and Environmental Characterization of National Wild and Scenic River Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

McManamay, Ryan A; Bosnall, Peter; Hetrick, Shelaine L

2013-09-01

Spatially accurate geospatial information is required to support decision-making regarding sustainable future hydropower development. Under a memorandum of understanding among several federal agencies, a pilot study was conducted to map a subset of National Wild and Scenic Rivers (WSRs) at a higher resolution and provide a consistent methodology for mapping WSRs across the United States and across agency jurisdictions. A subset of rivers (segments falling under the jurisdiction of the National Park Service) were mapped at a high resolution using the National Hydrography Dataset (NHD). The spatial extent and representation of river segments mapped at NHD scale were compared withmore » the prevailing geospatial coverage mapped at a coarser scale. Accurately digitized river segments were linked to environmental attribution datasets housed within the Oak Ridge National Laboratory s National Hydropower Asset Assessment Program database to characterize the environmental context of WSR segments. The results suggest that both the spatial scale of hydrography datasets and the adherence to written policy descriptions are critical to accurately mapping WSRs. The environmental characterization provided information to deduce generalized trends in either the uniqueness or the commonness of environmental variables associated with WSRs. Although WSRs occur in a wide range of human-modified landscapes, environmental data layers suggest that they provide habitats important to terrestrial and aquatic organisms and recreation important to humans. Ultimately, the research findings herein suggest that there is a need for accurate, consistent, mapping of the National WSRs across the agencies responsible for administering each river. Geospatial applications examining potential landscape and energy development require accurate sources of information, such as data layers that portray realistic spatial representations.« less
Alaska national hydrography dataset positional accuracy assessment study

USGS Publications Warehouse

Arundel, Samantha; Yamamoto, Kristina H.; Constance, Eric; Mantey, Kim; Vinyard-Houx, Jeremy

2013-01-01

Initial visual assessments Wide range in the quality of fit between features in NHD and these new image sources. No statistical analysis has been performed to actually quantify accuracy Determining absolute accuracy is cost prohibitive (must collect independent, well defined test points) Quantitative analysis of relative positional error is feasible.
Topographic and hydrographic GIS dataset for the Afghanistan Geological Survey and U.S. Geological Survey 2010 Minerals Project

USGS Publications Warehouse

Chirico, P.G.; Moran, T.W.

2011-01-01

This dataset contains a collection of 24 folders, each representing a specific U.S. Geological Survey area of interest (AOI; fig. 1), as well as datasets for AOI subsets. Each folder includes the extent, contours, Digital Elevation Model (DEM), and hydrography of the corresponding AOI, which are organized into feature vector and raster datasets. The dataset comprises a geographic information system (GIS), which is available upon request from the USGS Afghanistan programs Web site (http://afghanistan.cr.usgs.gov/minerals.php), and the maps of the 24 areas of interest of the USGS AOIs.
A new, accurate, global hydrography data for remote sensing and modelling of river hydrodynamics

NASA Astrophysics Data System (ADS)

Yamazaki, D.

2017-12-01

A high-resolution hydrography data is an important baseline data for remote sensing and modelling of river hydrodynamics, given the spatial scale of river network is much smaller than that of land hydrology or atmosphere/ocean circulations. For about 10 years, HydroSHEDS, developed based on the SRTM3 DEM, has been the only available global-scale hydrography data. However, the data availability at the time of HydroSHEDS development limited the quality of the represented river networks. Here, we developed a new global hydrography data using latest geodata such as the multi-error-removed elevation data (MERIT DEM), Landsat-based global water body data (GSWO & G3WBM), cloud-sourced open geography database (OpenStreetMap). The new hydrography data covers the entire globe (including boreal regions above 60N), and it represents more detailed structure of the world river network and contains consistent supplementary data layers such as hydrologically adjusted elevations and river channel width. In the AGU meeting, the developing methodology, assessed quality, and potential applications of the new global hydrography data will be introduced.
Comparison of Surface Flow Features from Lidar-Derived Digital Elevation Models with Historical Elevation and Hydrography Data for Minnehaha County, South Dakota

USGS Publications Warehouse

Poppenga, Sandra K.; Worstell, Bruce B.; Stoker, Jason M.; Greenlee, Susan K.

2009-01-01

The U.S. Geological Survey (USGS) has taken the lead in the creation of a valuable remote sensing product by incorporating digital elevation models (DEMs) derived from Light Detection and Ranging (lidar) into the National Elevation Dataset (NED), the elevation layer of 'The National Map'. High-resolution lidar-derived DEMs provide the accuracy needed to systematically quantify and fully integrate surface flow including flow direction, flow accumulation, sinks, slope, and a dense drainage network. In 2008, 1-meter resolution lidar data were acquired in Minnehaha County, South Dakota. The acquisition was a collaborative effort between Minnehaha County, the city of Sioux Falls, and the USGS Earth Resources Observation and Science (EROS) Center. With the newly acquired lidar data, USGS scientists generated high-resolution DEMs and surface flow features. This report compares lidar-derived surface flow features in Minnehaha County to 30- and 10-meter elevation data previously incorporated in the NED and ancillary hydrography datasets. Surface flow features generated from lidar-derived DEMs are consistently integrated with elevation and are important in understanding surface-water movement to better detect surface-water runoff, flood inundation, and erosion. Many topographic and hydrologic applications will benefit from the increased availability of accurate, high-quality, and high-resolution surface-water data. The remotely sensed data provide topographic information and data integration capabilities needed for meeting current and future human and environmental needs.
National Hydropower Plant Dataset, Version 2 (FY18Q3)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samu, Nicole; Kao, Shih-Chieh; O'Connor, Patrick

The National Hydropower Plant Dataset, Version 2 (FY18Q3) is a geospatially comprehensive point-level dataset containing locations and key characteristics of U.S. hydropower plants that are currently either in the hydropower development pipeline (pre-operational), operational, withdrawn, or retired. These data are provided in GIS and tabular formats with corresponding metadata for each. In addition, we include access to download 2 versions of the National Hydropower Map, which was produced with these data (i.e. Map 1 displays the geospatial distribution and characteristics of all operational hydropower plants; Map 2 displays the geospatial distribution and characteristics of operational hydropower plants with pumped storagemore » and mixed capabilities only). This dataset is a subset of ORNL's Existing Hydropower Assets data series, updated quarterly as part of ORNL's National Hydropower Asset Assessment Program.« less
NOAA's National Water Model - Integration of National Water Model with Geospatial Data creating Water Intelligence

NASA Astrophysics Data System (ADS)

Clark, E. P.; Cosgrove, B.; Salas, F.

2016-12-01

As a significant step forward to transform NOAA's water prediction services, NOAA plans to implement a new National Water Model (NWM) Version 1.0 in August 2016. A continental scale water resources model, the NWM is an evolution of the WRF-Hydro architecture developed by the National Center for Atmospheric Research (NCAR). The NWM will provide analyses and forecasts of flow for the 2.7 million stream reaches nationwide in the National Hydrography Dataset Plus v2 (NHDPlusV2) jointly developed by the USGS and EPA. The NWM also produces high-resolution water budget variables of snow, soil moisture, and evapotranspiration on a 1-km grid. NOAA's stakeholders require additional decision support application to be built on these data. The Geo-intelligence division of the Office of Water Prediction is building new products and services that integrate output from the NWM with geospatial datasets such as infrastructure and demographics to better estimate the impacts dynamic water resource states on community resiliency. This presentation will detail the methods and underlying information to produce prototypes water resources intelligence that is timely, actionable and credible. Moreover, it will to explore the NWM capability to support sector-specific decision support services.
The Wind Integration National Dataset (WIND) toolkit (Presentation)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Caroline Draxl: NREL

2014-01-01

Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.
The LANDFIRE Refresh strategy: updating the national dataset

USGS Publications Warehouse

Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

2013-01-01

The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.
Potential for using regional and global datasets for national scale ecosystem service modelling

NASA Astrophysics Data System (ADS)

Maxwell, Deborah; Jackson, Bethanna

2016-04-01

Ecosystem service models are increasingly being used by planners and policy makers to inform policy development and decisions about national-level resource management. Such models allow ecosystem services to be mapped and quantified, and subsequent changes to these services to be identified and monitored. In some cases, the impact of small scale changes can be modelled at a national scale, providing more detailed information to decision makers about where to best focus investment and management interventions that could address these issues, while moving toward national goals and/or targets. National scale modelling often uses national (or local) data (for example, soils, landcover and topographical information) as input. However, there are some places where fine resolution and/or high quality national datasets cannot be easily obtained, or do not even exist. In the absence of such detailed information, regional or global datasets could be used as input to such models. There are questions, however, about the usefulness of these coarser resolution datasets and the extent to which inaccuracies in this data may degrade predictions of existing and potential ecosystem service provision and subsequent decision making. Using LUCI (the Land Utilisation and Capability Indicator) as an example predictive model, we examine how the reliability of predictions change when national datasets of soil, landcover and topography are substituted with coarser scale regional and global datasets. We specifically look at how LUCI's predictions of where water services, such as flood risk, flood mitigation, erosion and water quality, change when national data inputs are replaced by regional and global datasets. Using the Conwy catchment, Wales, as a case study, the land cover products compared are the UK's Land Cover Map (2007), the European CORINE land cover map and the ESA global land cover map. Soils products include the National Soil Map of England and Wales (NatMap) and the European
A conceptual prototype for the next-generation national elevation dataset

USGS Publications Warehouse

Stoker, Jason M.; Heidemann, Hans Karl; Evans, Gayla A.; Greenlee, Susan K.

2013-01-01

In 2012 the U.S. Geological Survey's (USGS) National Geospatial Program (NGP) funded a study to develop a conceptual prototype for a new National Elevation Dataset (NED) design with expanded capabilities to generate and deliver a suite of bare earth and above ground feature information over the United States. This report details the research on identifying operational requirements based on prior research, evaluation of what is needed for the USGS to meet these requirements, and development of a possible conceptual framework that could potentially deliver the kinds of information that are needed to support NGP's partners and constituents. This report provides an initial proof-of-concept demonstration using an existing dataset, and recommendations for the future, to inform NGP's ongoing and future elevation program planning and management decisions. The demonstration shows that this type of functional process can robustly create derivatives from lidar point cloud data; however, more research needs to be done to see how well it extends to multiple datasets.
Data Sources for the Analyses

EPA Pesticide Factsheets

Links are provided for the National Wetlands Inventory, National Hydrography Dataset, and the WorldClim-Global Climate Data source data websitesThis dataset is associated with the following publication:Lane , C., and E. D'Amico. Identification of Putative Geographically Isolated Wetlands of the Conterminous United States. JAWRA. American Water Resources Association, Middleburg, VA, USA, online, (2016).
National Elevation Dataset

USGS Publications Warehouse

,

2002-01-01

The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.
Wind Integration National Dataset (WIND) Toolkit; NREL (National Renewable Energy Laboratory)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Draxl, Caroline; Hodge, Bri-Mathias

A webinar about the Wind Integration National Dataset (WIND) Toolkit was presented by Bri-Mathias Hodge and Caroline Draxl on July 14, 2015. It was hosted by the Southern Alliance for Clean Energy. The toolkit is a grid integration data set that contains meteorological and power data at a 5-minute resolution across the continental United States for 7 years and hourly power forecasts.
The National Map - Hydrography

USGS Publications Warehouse

,

2002-01-01

Governments depend on a common set of base geographic information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and homeland security applications rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy.
Description of the National Hydrologic Model for use with the Precipitation-Runoff Modeling System (PRMS)

USGS Publications Warehouse

Regan, R. Steven; Markstrom, Steven L.; Hay, Lauren E.; Viger, Roland J.; Norton, Parker A.; Driscoll, Jessica M.; LaFontaine, Jacob H.

2018-01-08

This report documents several components of the U.S. Geological Survey National Hydrologic Model of the conterminous United States for use with the Precipitation-Runoff Modeling System (PRMS). It provides descriptions of the (1) National Hydrologic Model, (2) Geospatial Fabric for National Hydrologic Modeling, (3) PRMS hydrologic simulation code, (4) parameters and estimation methods used to compute spatially and temporally distributed default values as required by PRMS, (5) National Hydrologic Model Parameter Database, and (6) model extraction tool named Bandit. The National Hydrologic Model Parameter Database contains values for all PRMS parameters used in the National Hydrologic Model. The methods and national datasets used to estimate all the PRMS parameters are described. Some parameter values are derived from characteristics of topography, land cover, soils, geology, and hydrography using traditional Geographic Information System methods. Other parameters are set to long-established default values and computation of initial values. Additionally, methods (statistical, sensitivity, calibration, and algebraic) were developed to compute parameter values on the basis of a variety of nationally-consistent datasets. Values in the National Hydrologic Model Parameter Database can periodically be updated on the basis of new parameter estimation methods and as additional national datasets become available. A companion ScienceBase resource provides a set of static parameter values as well as images of spatially-distributed parameters associated with PRMS states and fluxes for each Hydrologic Response Unit across the conterminuous United States.
Accuracy assessment of the U.S. Geological Survey National Elevation Dataset, and comparison with other large-area elevation datasets: SRTM and ASTER

USGS Publications Warehouse

Gesch, Dean B.; Oimoen, Michael J.; Evans, Gayla A.

2014-01-01

The National Elevation Dataset (NED) is the primary elevation data product produced and distributed by the U.S. Geological Survey. The NED provides seamless raster elevation data of the conterminous United States, Alaska, Hawaii, U.S. island territories, Mexico, and Canada. The NED is derived from diverse source datasets that are processed to a specification with consistent resolutions, coordinate system, elevation units, and horizontal and vertical datums. The NED serves as the elevation layer of The National Map, and it provides basic elevation information for earth science studies and mapping applications in the United States and most of North America. An important part of supporting scientific and operational use of the NED is provision of thorough dataset documentation including data quality and accuracy metrics. The focus of this report is on the vertical accuracy of the NED and on comparison of the NED with other similar large-area elevation datasets, namely data from the Shuttle Radar Topography Mission (SRTM) and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER).

National Hydropower Plant Dataset, Version 1 (Update FY18Q2)

DOE Data Explorer

Samu, Nicole; Kao, Shih-Chieh; O'Connor, Patrick; Johnson, Megan; Uria-Martinez, Rocio; McManamay, Ryan

2016-09-30

The National Hydropower Plant Dataset, Version 1, Update FY18Q2, includes geospatial point-level locations and key characteristics of existing hydropower plants in the United States that are currently online. These data are a subset extracted from NHAAP’s Existing Hydropower Assets (EHA) dataset, which is a cornerstone of NHAAP’s EHA effort that has supported multiple U.S. hydropower R&D research initiatives related to market acceleration, environmental impact reduction, technology-to-market activities, and climate change impact assessment.
Can a national dataset generate a nomogram for necrotizing enterocolitis onset?

PubMed

Gordon, P V; Clark, R; Swanson, J R; Spitzer, A

2014-10-01

Mother's own milk and donor human milk use is increasing as a means of necrotizing enterocolitis (NEC) prevention. Early onset of enteral feeding has been associated with improvement of many outcomes but has not been shown to reduce the incidence of NEC. Better definition of the window of risk for NEC by gestational strata should improve resource management with respect to donor human milk and enhance our understanding of NEC timing and pathogenesis. Our objective was to establish a NEC dataset of sufficient size and quality, then build a generalizable model of NEC onset from the dataset across gestational strata. We used de-identified data from the Pediatrix national dataset and filtered out all diagnostic confounders that could be identified by either specific diagnoses or logical exclusions (example dual diagnoses), with a specific focus on NEC and spontaneous intestinal perforation (SIP) as the outcomes of interest. The median day of onset was plotted against the gestational age for each of these diagnoses and analyzed for similarities and differences in the day of diagnosis. Onset time of medical NEC was inversely proportional to gestation in a linear relationship across all gestational ages. We found the medical NEC dataset displayed characteristics most consistent with a homogeneous disease entity, whereas there was a skew towards early presentation in the youngest gestation groups of surgical NEC (suggesting probable SIP contamination). Our national dataset demonstrates that NEC onset occurs in an inverse stereotypic, linear relationship with gestational age at birth. Medical NEC is the most reliable sub-cohort for the purpose of determining the temporal window of NEC risk.
The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset

NASA Astrophysics Data System (ADS)

Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

2017-02-01

We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.
The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset.

PubMed

Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

2017-02-14

We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.
The French Muséum national d’histoire naturelle vascular plant herbarium collection dataset

PubMed Central

Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

2017-01-01

We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d’histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments. PMID:28195585
Stream ecological condition modeling at the reach and the hydrologic unit (HUC) scale: A look at model performance and mapping

EPA Science Inventory

The National Hydrography and updated Watershed Boundary Datasets provide a ready-made framework for hydrographic modeling. Determining particular stream reaches or watersheds in poor ecological condition across large regions is an essential goal for monitoring and management. T...
USGS Mineral Resources Program; national maps and datasets for research and land planning

USGS Publications Warehouse

Nicholson, S.W.; Stoeser, D.B.; Ludington, S.D.; Wilson, Frederic H.

2001-01-01

The U.S. Geological Survey, the Nation’s leader in producing and maintaining earth science data, serves as an advisor to Congress, the Department of the Interior, and many other Federal and State agencies. Nationwide datasets that are easily available and of high quality are critical for addressing a wide range of land-planning, resource, and environmental issues. Four types of digital databases (geological, geophysical, geochemical, and mineral occurrence) are being compiled and upgraded by the Mineral Resources Program on regional and national scales to meet these needs. Where existing data are incomplete, new data are being collected to ensure national coverage. Maps and analyses produced from these databases provide basic information essential for mineral resource assessments and environmental studies, as well as fundamental information for regional and national land-use studies. Maps and analyses produced from the databases are instrumental to ongoing basic research, such as the identification of mineral deposit origins, determination of regional background values of chemical elements with known environmental impact, and study of the relationships between toxic elements or mining practices to human health. As datasets are completed or revised, the information is made available through a variety of media, including the Internet. Much of the available information is the result of cooperative activities with State and other Federal agencies. The upgraded Mineral Resources Program datasets make geologic, geophysical, geochemical, and mineral occurrence information at the state, regional, and national scales available to members of Congress, State and Federal government agencies, researchers in academia, and the general public. The status of the Mineral Resources Program datasets is outlined below.
Multi-scale interactions between local hydrography, seabed topography, and community assembly on cold-water coral reefs

NASA Astrophysics Data System (ADS)

Henry, L.-A.; Moreno Navas, J.; Roberts, J. M.

2013-04-01

We investigated how interactions between hydrography, topography and species ecology influence the assembly of species and functional traits across multiple spatial scales of a cold-water coral reef seascape. In a novel approach for these ecosystems, we used a spatially resolved complex three-dimensional flow model of hydrography to help explain assembly patterns. Forward-selection of distance-based Moran's eigenvector mapping (dbMEM) variables identified two submodels of spatial scales at which communities change: broad-scale (across reef) and fine-scale (within reef). Variance partitioning identified bathymetric and hydrographic gradients important in creating broad-scale assembly of species and traits. In contrast, fine-scale assembly was related more to processes that created spatially autocorrelated patches of fauna, such as philopatric recruitment in sessile fauna, and social interactions and food supply in scavenging detritivores and mobile predators. Our study shows how habitat modification of reef connectivity and hydrography by bottom fishing and renewable energy installations could alter the structure and function of an entire cold-water coral reef seascape.
Landscape Metrics Arranged by Hydrological Proximity to Sites on Mississippi, Missouri, and Ohio Rivers

EPA Science Inventory

This work has been published to demonstrate an application that the authors made of the geospatial National Hydrography Dataset (NHDPlus) that was developed by Horizon Systems Corporation for the US EPA. NHDPlus was produced to enhance hydrological maps of the United States for ...
Southwest Region Threatened, Endangered, and At-Risk Species Workshop: Managing Within Highly Variable Environments Hydrology and Ecology of Intermittent Stream and Dry Wash Ecosystems

EPA Science Inventory

Ephemeral (dry washes) and intermittent streams make up approximately 59% of all streams in the U.S. (excluding Alaska), and over 81% in the arid and semi-arid Southwest (Arizona, New Mexico, Nevada, Utah, Colorado and California) according to the National Hydrography Dataset. T...
The need for a national LIDAR dataset

USGS Publications Warehouse

Stoker, Jason M.; Harding, David; Parrish, Jay

2008-01-01

On May 21st and 22nd 2008, the U.S. Geological Survey (USGS), the National Aeronautics and Space Administration (NASA), and the Association of American State Geologists (AASG) hosted the Second National Light Detection and Ranging (Lidar) Initiative Strategy Meeting at USGS Headquarters in Reston, Virginia. The USGS is taking the lead in cooperation with many partners to design and implement a future high-resolution National Lidar Dataset. Initial work is focused on determining viability, developing requirements and specifi cations, establishing what types of information contained in a lidar signal are most important, and identifying key stakeholders and their respective roles. In February 2007, USGS hosted the fi rst National Lidar Initiative Strategy Meeting at USGS Headquarters in Virginia. The presentations and a published summary report from the fi rst meeting can be found on the Center for Lidar Information Coordination and Knowledge (CLICK) Website: http://lidar.cr.usgs.gov. The fi rst meeting demonstrated the public need for consistent lidar data at the national scale. The goals of the second meeting were to further expand on the ideas and information developed in the fi rst meeting, to bring more stakeholders together, to both refi ne and expand on the requirements and capabilities needed, and to discuss an organizational and funding approach for an initiative of this magnitude. The approximately 200 participants represented Federal, State, local, commercial and academic interests. The second meeting included a public solicitation for presentations and posters to better democratize the workshop. All of the oral presentation abstracts that were submitted were accepted, and the 25 poster submissions augmented and expanded upon the oral presentations. The presentations from this second meeting, including audio, can be found on CLICK at http://lidar.cr.usgs.gov/national_lidar_2008.php. Based on the presentations and the discussion sessions, the following
Hydrography and circulation west of Sardinia in June 2014

NASA Astrophysics Data System (ADS)

Knoll, Michaela; Borrione, Ines; Fiekas, Heinz-Volker; Funk, Andreas; Hemming, Michael P.; Kaiser, Jan; Onken, Reiner; Queste, Bastien; Russo, Aniello

2017-11-01

In the frame of the REP14-MED sea trial in June 2014, the hydrography and circulation west of Sardinia, observed by means of gliders, shipborne CTD (conductivity, temperature, depth) instruments, towed devices, and vessel-mounted ADCPs (acoustic doppler current profilers), are presented and compared with previous knowledge. So far, the circulation is not well-known in this area, and the hydrography is subject to long-term changes. Potential temperature, salinity, and potential density ranges as well as core values of the observed water masses were determined. Modified Atlantic Water (MAW), with potential density anomalies below 28.72 kg m-3, showed a salinity minimum of 37.93 at 50 dbar. Levantine Intermediate Water (LIW), with a salinity maximum of about 38.70 at 400 dbar, was observed within a range of 28.72<σΘ/(kg m-3) < 29.10. MAW and LIW showed slightly higher salinities than previous investigations. During the trial, LIW covered the whole area from the Sardinian shelf to 7°15' E. Only north of 40° N was it tied to the continental slope. Within the MAW, a cold and saline anticyclonic eddy was observed in the southern trial area. The strongest variability in temperature and salinity appeared around this eddy, and in the southwestern part of the domain, where unusually low saline surface water entered the area towards the end of the experiment. An anticyclonic eddy of Winter Intermediate Water was recorded moving northward at 0.014 m s-1. Geostrophic currents and water mass transports calculated across zonal and meridional transects showed a good agreement with vessel-mounted ADCP measurements. Within the MAW, northward currents were observed over the shelf and offshore, while a southward transport of about 1.5 Sv occurred over the slope. A net northward transport of 0.38 Sv across the southern transect decreased to zero in the north. Within the LIW, northward transports of 0.6 Sv across the southern transects were mainly observed offshore, and decreased to
Overview and Meteorological Validation of the Wind Integration National Dataset toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Draxl, C.; Hodge, B. M.; Clifton, A.

2015-04-13

The Wind Integration National Dataset (WIND) Toolkit described in this report fulfills these requirements, and constitutes a state-of-the-art national wind resource data set covering the contiguous United States from 2007 to 2013 for use in a variety of next-generation wind integration analyses and wind power planning. The toolkit is a wind resource data set, wind forecast data set, and wind power production and forecast data set derived from the Weather Research and Forecasting (WRF) numerical weather prediction model. WIND Toolkit data are available online for over 116,000 land-based and 10,000 offshore sites representing existing and potential wind facilities.
Airborne Laser Hydrography II

NASA Astrophysics Data System (ADS)

Philpot, W.; Wozencraft, J.

2016-02-01

In 1985, Dr. Gary Guenther assembled the text, "Airborne Laser Hydrography" which quickly became a heavily used manual and guide for any and all scientists and engineers involved with airborne lidar bathymetry (ALB). It was a remarkable book that captured a snapshot of the state of the art of ALB and included historical developments, theoretical and modeling efforts as well as design characteristics and constraints, ending with accuracy assessment and a discussion of design tradeoffs. Known familiarly as the "Blue Book" it served the community remarkably well for many years. At 30 years of age, it is still a valued reference, but unavoidably dated in a field that has developed rapidly and nonstop over the intervening years. It is time for an update. The new text is attempt by the ALB community to update and expand upon Guenther's text. Like the original, Blue Book II reviews the historical developments in ALB, extending them into the 21st century, considers basic environmental water optical properties, theoretical developments, data processing and performance evaluation. All have progressed dramatically in the past 30 years. This paper presents an outline of the new book, a description of the contents, with emphasis on the theoretical models of the lidar waveform and its propagation through, and interaction with the water.
Watershed Boundary Dataset for Mississippi

USGS Publications Warehouse

Wilson, K. Van; Clair, Michael G.; Turnipseed, D. Phil; Rebich, Richard A.

2009-01-01

The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (Hydrologic Unit Codes) were further subdivided into 10-digit watersheds (62.5 to 391 square miles (mi2)) and 12-digit subwatersheds (15.6 to 62.5 mi2) - the exceptions being the Delta part of Mississippi and the Mississippi River inside levees, which were subdivided into 10-digit watersheds only. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, subdivision codes and names, and drainage-area data - are stored in a Geographic Information System database, which are available at: http://ms.water.usgs.gov/. This map shows information on drainage and hydrography in the form of U.S. Geological Survey hydrologic unit boundaries for water-resource 2-digit regions, 4-digit subregions, 6-digit basins (formerly called accounting units), 8-digit subbasins (formerly called cataloging units), 10-digit watershed, and 12-digit subwatersheds in Mississippi. A description of the project study area, methods used in the development of watershed and subwatershed boundaries for Mississippi, and results are presented in Wilson and others (2008). The data presented in this map and by Wilson and others (2008) supersede the data presented for Mississippi by Seaber and others (1987) and U.S. Geological Survey (1977).
A dataset on the species composition of amphipods (Crustacea) in a Mexican marine national park: Alacranes Reef, Yucatan

PubMed Central

Simões, Nuno; Pech, Daniel

2018-01-01

Abstract Background Alacranes Reef was declared as a National Marine Park in 1994. Since then, many efforts have been made to inventory its biodiversity. However, groups such as amphipods have been underestimated or not considered when benthic invertebrates were inventoried. Here we present a dataset that contributes to the knowledge of benthic amphipods (Crustacea, Peracarida) from the inner lagoon habitats from the Alacranes Reef National Park, the largest coral reef ecosystem in the Gulf of Mexico. The dataset contains information on records collected from 2009 to 2011. Data are available through Global Biodiversity Information Facility (GBIF). New information A total of 110 amphipod species distributed in 93 nominal species and 17 generic species, belonging to 71 genera, 33 families and three suborders are presented here. This information represents the first online dataset of amphipods from the Alacranes Reef National Park. The biological material is currently deposited in the crustacean collection from the regional unit of the National Autonomous University of Mexico located at Sisal, Yucatan, Mexico (UAS-Sisal). The biological material includes 588 data records with a total abundance of 6,551 organisms. The species inventory represents, until now, the richest fauna of benthic amphipods registered from any discrete coral reef ecosystem in Mexico. PMID:29416428
A dataset on the species composition of amphipods (Crustacea) in a Mexican marine national park: Alacranes Reef, Yucatan.

PubMed

Paz-Ríos, Carlos E; Simões, Nuno; Pech, Daniel

2018-01-01

Alacranes Reef was declared as a National Marine Park in 1994. Since then, many efforts have been made to inventory its biodiversity. However, groups such as amphipods have been underestimated or not considered when benthic invertebrates were inventoried. Here we present a dataset that contributes to the knowledge of benthic amphipods (Crustacea, Peracarida) from the inner lagoon habitats from the Alacranes Reef National Park, the largest coral reef ecosystem in the Gulf of Mexico. The dataset contains information on records collected from 2009 to 2011. Data are available through Global Biodiversity Information Facility (GBIF). A total of 110 amphipod species distributed in 93 nominal species and 17 generic species, belonging to 71 genera, 33 families and three suborders are presented here. This information represents the first online dataset of amphipods from the Alacranes Reef National Park. The biological material is currently deposited in the crustacean collection from the regional unit of the National Autonomous University of Mexico located at Sisal, Yucatan, Mexico (UAS-Sisal). The biological material includes 588 data records with a total abundance of 6,551 organisms. The species inventory represents, until now, the richest fauna of benthic amphipods registered from any discrete coral reef ecosystem in Mexico.
Dynamically consistent hydrography and absolute velocity in the eastern North Atlantic Ocean

NASA Technical Reports Server (NTRS)

Wunsch, Carl

1994-01-01

The problem of mapping a dynamically consistent hydrographic field and associated absolute geostrophic flow in the eastern North Atlantic between 24 deg and 36 deg N is related directly to the solution of the so-called thermocline equations. A nonlinear optimization problem involving Needler's P equation is solved to find the hydrography and resulting flow that minimizes the vertical mixing above about 1500 m in the ocean and is simultaneously consistent with the observations. A sharp minimum (at least in some dimensions) is found, apparently corresponding to a solution nearly conserving potential vorticity and with vertical eddy coefficient less than about 10(exp -5) sq m/s. Estimates of `residual' quantities such as eddy coefficients are extremely sensitive to slight modifications to the observed fields. Boundary conditions, vertical velocities, etc., are a product of the optimization and produce estimates differing quantitatively from prior ones relying directly upon observed hydrography. The results are generally insensitive to particular elements of the solution methodology, but many questions remain concerning the extent to which different synoptic sections can be asserted to represent the same ocean. The method can be regarded as a practical generalization of the beta spiral and geostrophic balance inverses for the estimate of absolute geostrophic flows. Numerous improvements to the methodology used in this preliminary attempt are possible.
USDA National Nutrient Database for Standard Reference Dataset for What We Eat in America, NHANES (Survey-SR) 2013-2014

USDA-ARS?s Scientific Manuscript database

USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR) provides the nutrient data for assessing dietary intakes from the national survey What We Eat In America, National Health and Nutrition Examination Survey (WWEIA, NHANES). The current versi...
Development of National Map ontologies for organization and orchestration of hydrologic observations

NASA Astrophysics Data System (ADS)

Lieberman, J. E.

2014-12-01

Feature layers in the National Map program (TNM) are a fundamental context for much of the data collection and analysis conducted by the USGS and other governmental and nongovernmental organizations. Their computational usefulness, though, has been constrained by the lack of formal relationships besides superposition between TNM layers, as well as limited means of representing how TNM datasets relate to additional attributes, datasets, and activities. In the field of Geospatial Information Science, there has been a growing recognition of the value of semantic representation and technology for addressing these limitations, particularly in the face of burgeoning information volume and heterogeneity. Fundamental to this approach is the development of formal ontologies for concepts related to that information that can be processed computationally to enhance creation and discovery of new geospatial knowledge. They offer a means of making much of the presently innate knowledge about relationships in and between TNM features accessible for machine processing and distributed computation.A full and comprehensive ontology of all knowledge represented by TNM features is still impractical. The work reported here involves elaboration and integration of a number of small ontology design patterns (ODP's) that represent limited, discrete, but commonly accepted and broadly applicable physical theories for the behavior of TNM features representing surface water bodies and landscape surfaces and the connections between them. These ontology components are validated through use in applications for discovery and aggregation of water science observational data associated with National Hydrography Data features, features from the National Elevation Dataset (NED) and Water Boundary Dataset (WBD) that constrain water occurrence in the continental US. These applications emphasize workflows which are difficult or impossible to automate using existing data structures. Evaluation of the

National Stream Quality Accounting Network and National Monitoring Network Basin Boundary Geospatial Dataset, 2008–13

USGS Publications Warehouse

Baker, Nancy T.

2011-01-01

This report and the accompanying geospatial data were created to assist in analysis and interpretation of water-quality data provided by the U.S. Geological Survey's National Stream Quality Accounting Network (NASQAN) and by the U.S. Coastal Waters and Tributaries National Monitoring Network (NMN), which is a cooperative monitoring program of Federal, regional, and State agencies. The report describes the methods used to develop the geospatial data, which was primarily derived from the National Watershed Boundary Dataset. The geospatial data contains polygon shapefiles of basin boundaries for 33 NASQAN and 5 NMN streamflow and water-quality monitoring stations. In addition, 30 polygon shapefiles of the closed and noncontributing basins contained within the NASQAN or NMN boundaries are included. Also included is a point shapefile of the NASQAN and NMN monitoring stations and associated basin and station attributes. Geospatial data for basin delineations, associated closed and noncontributing basins, and monitoring station locations are available at http://water.usgs.gov/GIS/metadata/usgswrd/XML/ds641_nasqan_wbd12.xml.
Data layer integration for the national map of the united states

USGS Publications Warehouse

Usery, E.L.; Finn, M.P.; Starbuck, M.

2009-01-01

The integration of geographic data layers in multiple raster and vector formats, from many different organizations and at a variety of resolutions and scales, is a significant problem for The National Map of the United States being developed by the U.S. Geological Survey. Our research has examined data integration from a layer-based approach for five of The National Map data layers: digital orthoimages, elevation, land cover, hydrography, and transportation. An empirical approach has included visual assessment by a set of respondents with statistical analysis to establish the meaning of various types of integration. A separate theoretical approach with established hypotheses tested against actual data sets has resulted in an automated procedure for integration of specific layers and is being tested. The empirical analysis has established resolution bounds on meanings of integration with raster datasets and distance bounds for vector data. The theoretical approach has used a combination of theories on cartographic transformation and generalization, such as T??pfer's radical law, and additional research concerning optimum viewing scales for digital images to establish a set of guiding principles for integrating data of different resolutions.
The Schema.org Datasets Schema: Experiences at the National Snow and Ice Data Center

NASA Astrophysics Data System (ADS)

Duerr, R.; Billingsley, B. W.; Harper, D.; Kovarik, J.

2014-12-01

Data discovery, is still a major challenge for many users. Relevant data may be located anywhere. There are currently no existing universal data registries. Often users start with a simple query through their web browser. But how do you get your data to actually show up near the top of the results? One relatively new way to accomplish this is to use schema.org dataset markup in your data pages. Theoretically this provides web crawlers the additional information needed so that a query for data will preferentially return those pages that were marked up accordingly. The National Snow and Ice Data Center recently implemented an initial set of markup in the data set pages returned by its catalog. The Datasets data model, our process, challenges encountered and results will be described.
Improvements in the spatial representation of lakes and reservoirs in the contiguous United States for the National Water Model

NASA Astrophysics Data System (ADS)

Khan, S.; Salas, F.; Sampson, K. M.; Read, L. K.; Cosgrove, B.; Li, Z.; Gochis, D. J.

2017-12-01

The representation of inland surface water bodies in distributed hydrologic models at the continental scale is a challenge. The National Water Model (NWM) utilizes the National Hydrography Dataset Plus Version 2 (NHDPlusV2) "waterbody" dataset to represent lakes and reservoirs. The "waterbody" layer is a comprehensive dataset that represents surface water bodies using common features like lakes, ponds, reservoirs, estuaries, playas and swamps/marshes. However, a major issue that remains unresolved even in the latest revision of NHDPlus Version 2 is the inconsistency in waterbody digitization and delineation errors. Manually correcting the water body polygons becomes tedious and quickly impossible for continental-scale hydrologic models such as the NWM. In this study, we improved spatial representation of 6,802 lakes and reservoirs by analyzing 379,110 waterbodies in the contiguous United States (excluding the Laurentian Great Lakes). We performed a step-by- step process that integrates a set of geospatial analyses to identify, track, and correct the extent of lakes and reservoirs features that are larger than 0.75 km2. The following assumptions were applied while developing the new dataset: a) lakes and reservoirs cannot directly feed into each other; b) each waterbody must have one outlet; and c) a single lake or reservoir feature cannot have multiple parts. The majority of the NHDplusV2 waterbody features in the original dataset are delineated correctly. However approximately 3 % of the lake and reservoir polygons were found to be incorrect with topological errors and were corrected accordingly. It is important to fix these digitizing errors because the waterbody features are closely linked to the river topology. This new waterbody dataset will ensure that model-simulated water is directed into and through the lakes and reservoirs in a manner that supports the NWM code base and assumptions. The improved dataset will facilitate more effective integration of lakes
Data-driven decision support for radiologists: re-using the National Lung Screening Trial dataset for pulmonary nodule management.

PubMed

Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L

2015-02-01

Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.
Carbon Dioxide, Hydrographic, and Chemical Data Obtained During the R/V Ronald H. Brown Repeat Hydrography Cruise in the Atlantic Ocean: CLIVAR CO2 Section A16S_2005 (11 January - 24 February, 2005)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozyr, Alex

This report presents methods, and analytical and quality control procedures for salinity, oxygen, nutrient, inorganic carbon, organic carbon, chlorofluorocarbon (CFC), and bomb 14C system parameters performed during the A16S_2005 cruise, which took place from January 11 to February 24, 2005, aboard research vessel (R/V) Ronald H. Brown under the auspices of the National Oceanic and Atmospheric Administration (NOAA). The R/V Ronald H. Brown departed Punta Arenas, Chile, on January 11, 2005, and ended its cruise in Fortaleza, Brazil, on February 24, 2005. The research conducted was one of a series of repeat hydrography sections jointly funded by NOAA and themore » National Science Foundation as part of the CLIVAR/CO 2/repeat hydrography/tracer program. Samples were taken from 36 depths at 121 stations. The data presented in this report include the analyses of water samples for total inorganic carbon (TCO 2), fugacity of CO 2 (fCO 2), total alkalinity (TALK), pH, dissolved organic carbon (DOC), CFC, 14C, hydrographic, and other chemical measurements. The R/V Ronald H. Brown A16S_2005 data set is available free of charge as a numeric data package (NDP) from the Carbon Dioxide Information Analysis Center (CDIAC). The NDP consists of the oceanographic data files and this printed documentation, which describes the procedures and methods used to obtain the data.« less
Advancements in Wind Integration Study Data Modeling: The Wind Integration National Dataset (WIND) Toolkit; Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Draxl, C.; Hodge, B. M.; Orwig, K.

2013-10-01

Regional wind integration studies in the United States require detailed wind power output data at many locations to perform simulations of how the power system will operate under high-penetration scenarios. The wind data sets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as be time synchronized with available load profiles. The Wind Integration National Dataset (WIND) Toolkit described in this paper fulfills these requirements. A wind resource dataset, wind power production time series, and simulated forecasts from a numerical weather predictionmore » model run on a nationwide 2-km grid at 5-min resolution will be made publicly available for more than 110,000 onshore and offshore wind power production sites.« less
How robust is the pre-1931 National Climatic Data Center—climate divisional dataset? Examples from Georgia and Louisiana

NASA Astrophysics Data System (ADS)

Allard, Jason; Thompson, Clint; Keim, Barry D.

2015-04-01

The National Climatic Data Center's climate divisional dataset (CDD) is commonly used in climate change analyses. This dataset is a spatially continuous dataset for the conterminous USA from 1895 to the present. The CDD since 1931 is computed by averaging all available representative cooperative weather station data into a single monthly value for each of the 344 climate divisions of the conterminous USA, while pre-1931 data for climate divisions are derived from statewide averages using regression equations. This study examines the veracity of these pre-1931 data. All available Cooperative Observer Program (COOP) stations within each climate division in Georgia and Louisiana were averaged into a single monthly value for each month and each climate division from 1897 to 1930 to generate a divisional dataset (COOP DD), using similar methods to those used by the National Climatic Data Center to generate the post-1931 CDD. The reliability of the official CDD—derived from statewide averages—to produce temperature and precipitation means and trends prior to 1931 are then evaluated by comparing that dataset with the COOP DD with difference-of-means tests, correlations, and linear regression techniques. The CDD and the COOP DD are also compared to a divisional dataset derived from the United States Historical Climatology Network (USHCN) data (USHCN DD), with difference of means and correlation techniques, to demonstrate potential impacts of inhomogeneities within the CDD and the COOP DD. The statistical results, taken as a whole, not only indicate broad similarities between the CDD and COOP DD but also show that the CDD does not adequately portray pre-1931 temperature and precipitation in certain climate divisions within Georgia and Louisiana. In comparison with the USHCN DD, both the CDD and the COOP DD appear to be subject to biases that probably result from changing stations within climate divisions. As such, the CDD should be used judiciously for long-term studies
Proceedings of the Fourth Laser Hydrography Symposium at Defence Research Centre and Royal Australian Navy Hydrographic Office

NASA Astrophysics Data System (ADS)

Penny, M. F.; Phillips, D. M.

1981-03-01

At this Symposium, research on laser hydrography and related development programs currently in progress in the United States of America, Canada, and Australia, were reported. The depth sounding systems described include the US Airborne Oceanographic Lidar and Hydrographic Airborne Laser Sounder, the Canadian Profiling Lidar Bathymeter, and the Australian Laser Airborne Depth Sounder. Other papers presented research on blue-green lasers, theoretical modelling, position fixing, and data processing.
Hydrography and bottom boundary layer dynamics: Influence on inner shelf sediment mobility, Long Bay, North Carolina

USGS Publications Warehouse

Davis, L.A.; Leonard, L.A.; Snedden, G.A.

2008-01-01

This study examined the hydrography and bottom boundary-layer dynamics of two typical storm events affecting coastal North Carolina (NC); a hurricane and the passages of two small consecutive extratropical storms during November 2005. Two upward-looking 1200-kHz Acoustic Doppler Current Profilers (ADCP) were deployed on the inner shelf in northern Long Bay, NC at water depths of less than 15 m. Both instruments profiled the overlying water column in 0.35 in bins beginning at a height of 1.35 in above the bottom (mab). Simultaneous measurements of wind speed and direction, wave and current parameters, and acoustic backscatter were coupled with output from a bottom boundary layer (bbl) model to describe the hydrography and boundary layer conditions during each event. The bbl model also was used to quantify sediment transport in the boundary layer during each storm. Both study sites exhibited similar temporal variations in wave and current magnitude, however, wave heights during the November event were higher than waves associated with the hurricane. Near-bottom mean and subtidal currents, however, were of greater magnitude during the hurricane. Peak depth-integrated suspended sediment transport during the November event exceeded transport associated with the hurricane by 25-70%. Substantial spatial variations in sediment transport existed throughout both events. During both events, along-shelf sediment transport exceeded across-shelf transport and was related to the magnitude and direction of subtidal currents. Given the variations in sediment type across the bay, complex shoreline configuration, and local bathymetry, the sediment transport rates reported here are very site specific. However, the general hydrography associated with the two storms is representative of conditions across northern Long Bay. Since the beaches in the study area undergo frequent renourishment to counter the effects of beach erosion, the results of this study also are relevant to coastal
Benchmark Dataset for Whole Genome Sequence Compression.

PubMed

C L, Biji; S Nair, Achuthsankar

2017-01-01

The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.
[German national consensus on wound documentation of leg ulcer : Part 1: Routine care - standard dataset and minimum dataset].

PubMed

Heyer, K; Herberger, K; Protz, K; Mayer, A; Dissemond, J; Debus, S; Augustin, M

2017-09-01

Standards for basic documentation and the course of treatment increase quality assurance and efficiency in health care. To date, no standards for the treatment of patients with leg ulcers are available in Germany. The aim of the study was to develop standards under routine conditions in the documentation of patients with leg ulcers. This article shows the recommended variables of a "standard dataset" and a "minimum dataset". Consensus building among experts from 38 scientific societies, professional associations, insurance and supply networks (n = 68 experts) took place. After conducting a systematic international literature research, available standards were reviewed and supplemented with our own considerations of the expert group. From 2012-2015 standards for documentation were defined in multistage online visits and personal meetings. A consensus was achieved for 18 variables for the minimum dataset and 48 variables for the standard dataset in a total of seven meetings and nine online Delphi visits. The datasets involve patient baseline data, data on the general health status, wound characteristics, diagnostic and therapeutic interventions, patient reported outcomes, nutrition, and education status. Based on a multistage continuous decision-making process, a standard in the measurement of events in routine care in patients with a leg ulcer was developed.
EnviroAtlas - Percent Stream Buffer Zone As Natural Land Cover for the Conterminous United States

EPA Pesticide Factsheets

This EnviroAtlas dataset shows the percentage of land area within a 30 meter buffer zone along the National Hydrography Dataset (NHD) high resolution stream network, and along water bodies such as lakes and ponds that are connected via flow to the streams, that is classified as forest land cover, modified forest land cover, and natural land cover using the 2006 National Land Cover Dataset (NLCD) for each Watershed Boundary Dataset (WBD) 12-digit hydrological unit (HUC) in the conterminous United States. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
A logistic regression equation for estimating the probability of a stream in Vermont having intermittent flow

USGS Publications Warehouse

Olson, Scott A.; Brouillette, Michael C.

2006-01-01

A logistic regression equation was developed for estimating the probability of a stream flowing intermittently at unregulated, rural stream sites in Vermont. These determinations can be used for a wide variety of regulatory and planning efforts at the Federal, State, regional, county and town levels, including such applications as assessing fish and wildlife habitats, wetlands classifications, recreational opportunities, water-supply potential, waste-assimilation capacities, and sediment transport. The equation will be used to create a derived product for the Vermont Hydrography Dataset having the streamflow characteristic of 'intermittent' or 'perennial.' The Vermont Hydrography Dataset is Vermont's implementation of the National Hydrography Dataset and was created at a scale of 1:5,000 based on statewide digital orthophotos. The equation was developed by relating field-verified perennial or intermittent status of a stream site during normal summer low-streamflow conditions in the summer of 2005 to selected basin characteristics of naturally flowing streams in Vermont. The database used to develop the equation included 682 stream sites with drainage areas ranging from 0.05 to 5.0 square miles. When the 682 sites were observed, 126 were intermittent (had no flow at the time of the observation) and 556 were perennial (had flowing water at the time of the observation). The results of the logistic regression analysis indicate that the probability of a stream having intermittent flow in Vermont is a function of drainage area, elevation of the site, the ratio of basin relief to basin perimeter, and the areal percentage of well- and moderately well-drained soils in the basin. Using a probability cutpoint (a lower probability indicates the site has perennial flow and a higher probability indicates the site has intermittent flow) of 0.5, the logistic regression equation correctly predicted the perennial or intermittent status of 116 test sites 85 percent of the time.
On the Utility of National Datasets and Resource Cost Models for Estimating Faculty Instructional Costs in Higher Education

ERIC Educational Resources Information Center

Morphew, Christopher; Baker, Bruce

2007-01-01

In this article, the authors present the results of a research study in which they used two national datasets to construct and examine a model that estimates relative faculty instructional costs for specific undergraduate degree programs and also identifies differences in these costs by region and institutional type. They conducted this research…
Analysis of the uncertainty associated with national fossil fuel CO2 emissions datasets for use in the global Fossil Fuel Data Assimilation System (FFDAS) and carbon budgets

NASA Astrophysics Data System (ADS)

Song, Y.; Gurney, K. R.; Rayner, P. J.; Asefi-Najafabady, S.

2012-12-01

High resolution quantification of global fossil fuel CO2 emissions has become essential in research aimed at understanding the global carbon cycle and supporting the verification of international agreements on greenhouse gas emission reductions. The Fossil Fuel Data Assimilation System (FFDAS) was used to estimate global fossil fuel carbon emissions at 0.25 degree from 1992 to 2010. FFDAS quantifies CO2 emissions based on areal population density, per capita economic activity, energy intensity and carbon intensity. A critical constraint to this system is the estimation of national-scale fossil fuel CO2 emissions disaggregated into economic sectors. Furthermore, prior uncertainty estimation is an important aspect of the FFDAS. Objective techniques to quantify uncertainty for the national emissions are essential. There are several institutional datasets that quantify national carbon emissions, including British Petroleum (BP), the International Energy Agency (IEA), the Energy Information Administration (EIA), and the Carbon Dioxide Information and Analysis Center (CDIAC). These four datasets have been "harmonized" by Jordan Macknick for inter-comparison purposes (Macknick, Carbon Management, 2011). The harmonization attempted to generate consistency among the different institutional datasets via a variety of techniques such as reclassifying into consistent emitting categories, recalculating based on consistent emission factors, and converting into consistent units. These harmonized data form the basis of our uncertainty estimation. We summarized the maximum, minimum and mean national carbon emissions for all the datasets from 1992 to 2010. We calculated key statistics highlighting the remaining differences among the harmonized datasets. We combine the span (max - min) of datasets for each country and year with the standard deviation of the national spans over time. We utilize the economic sectoral definitions from IEA to disaggregate the national total emission into
Genomic Datasets for Cancer Research

Cancer.gov

A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.
A hierarchical network-based algorithm for multi-scale watershed delineation

NASA Astrophysics Data System (ADS)

Castronova, Anthony M.; Goodall, Jonathan L.

2014-11-01

Watershed delineation is a process for defining a land area that contributes surface water flow to a single outlet point. It is a commonly used in water resources analysis to define the domain in which hydrologic process calculations are applied. There has been a growing effort over the past decade to improve surface elevation measurements in the U.S., which has had a significant impact on the accuracy of hydrologic calculations. Traditional watershed processing on these elevation rasters, however, becomes more burdensome as data resolution increases. As a result, processing of these datasets can be troublesome on standard desktop computers. This challenge has resulted in numerous works that aim to provide high performance computing solutions to large data, high resolution data, or both. This work proposes an efficient watershed delineation algorithm for use in desktop computing environments that leverages existing data, U.S. Geological Survey (USGS) National Hydrography Dataset Plus (NHD+), and open source software tools to construct watershed boundaries. This approach makes use of U.S. national-level hydrography data that has been precomputed using raster processing algorithms coupled with quality control routines. Our approach uses carefully arranged data and mathematical graph theory to traverse river networks and identify catchment boundaries. We demonstrate this new watershed delineation technique, compare its accuracy with traditional algorithms that derive watershed solely from digital elevation models, and then extend our approach to address subwatershed delineation. Our findings suggest that the open-source hierarchical network-based delineation procedure presented in the work is a promising approach to watershed delineation that can be used summarize publicly available datasets for hydrologic model input pre-processing. Through our analysis, we explore the benefits of reusing the NHD+ datasets for watershed delineation, and find that the our technique
The National Map - Orthoimagery Layer

USGS Publications Warehouse

,

2007-01-01

Many Federal, State, and local agencies use a common set of framework geographic information databases as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and homeland security applications rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continually maintained, and nationally consistent set of online, public domain, framework geographic information databases. The National Map will serve as a foundation for integrating, sharing, and using data easily and consistently. The data will be the source of revised paper topographic maps. The National Map includes digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information.
A User's Guide to the Tsunami Datasets at NOAA's National Data Buoy Center

NASA Astrophysics Data System (ADS)

Bouchard, R. H.; O'Neil, K.; Grissom, K.; Garcia, M.; Bernard, L. J.; Kern, K. J.

2013-12-01

The National Data Buoy Center (NDBC) has maintained and operated the National Oceanic and Atmospheric Administration's (NOAA) tsunameter network since 2003. The tsunameters employ the NOAA-developed Deep-ocean Assessment and Reporting of Tsunamis (DART) technology. The technology measures the pressure and temperature every 15 seconds on the ocean floor and transforms them into equivalent water-column height observations. A complex series of subsampled observations are transmitted acoustically in real-time to a moored buoy or marine autonomous vehicle (MAV) at the ocean surface. The surface platform uses its satellite communications to relay the observations to NDBC. NDBC places the observations onto the Global Telecommunication System (GTS) for relay to NOAA's Tsunami Warning Centers (TWC) in Hawai'i and Alaska and to the international community. It takes less than three minutes to speed the observations from the ocean floor to the TWCs. NDBC can retrieve limited amounts of the 15-s measurements from the instrumentation on the ocean floor using the technology's two-way communications. NDBC recovers the full resolution 15-s measurements about every 2 years and forwards the datasets and metadata to the National Geophysical Data Center for permanent archive. Meanwhile, NDBC retains the real-time observations on its website. The type of real-time observation depends on the operating mode of the tsunameter. NDBC provides the observations in a variety of traditional and innovative methods and formats that include descriptors of the operating mode. Datasets, organized by station, are available from the NDBC website as text files and from the NDBC THREDDS server in netCDF format. The website provides alerts and lists of events that allow users to focus on the information relevant for tsunami hazard analysis. In addition, NDBC developed a basic web service to query station information and observations to support the Short-term Inundation Forecasting for Tsunamis (SIFT

EnviroAtlas - 303(d) Impairments by 12-digit HUC for the Conterminous United States

EPA Pesticide Factsheets

This EnviroAtlas dataset depicts the total length of stream or river flowlines that have impairments submitted to the EPA by states under section 303(d) of the Clean Water Act. It also contains the total lengths of streams, rivers, and canals, total waterbody area, and stream density (stream length per area) from the US Geological Survey's high-resolution National Hydrography Dataset (NHD).This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
SIFlore, a dataset of geographical distribution of vascular plants covering five centuries of knowledge in France: Results of a collaborative project coordinated by the Federation of the National Botanical Conservatories.

PubMed

Just, Anaïs; Gourvil, Johan; Millet, Jérôme; Boullet, Vincent; Milon, Thomas; Mandon, Isabelle; Dutrève, Bruno

2015-01-01

More than 20 years ago, the French Muséum National d'Histoire Naturelle (MNHN, Secretariat of the Fauna and Flora) published the first part of an atlas of the flora of France at a 20km spatial resolution, accounting for 645 taxa (Dupont 1990). Since then, at the national level, there has not been any work on this scale relating to flora distribution, despite the obvious need for a better understanding. In 2011, in response to this need, the Federation des Conservatoires Botaniques Nationaux (FCBN, http://www.fcbn.fr) launched an ambitious collaborative project involving eleven national botanical conservatories of France. The project aims to establish a formal procedure and standardized system for data hosting, aggregation and publication for four areas: flora, fungi, vegetation and habitats. In 2014, the first phase of the project led to the development of the national flora dataset: SIFlore. As it includes about 21 million records of flora occurrences, this is currently the most comprehensive dataset on the distribution of vascular plants (Tracheophyta) in the French territory. SIFlore contains information for about 15'454 plant taxa occurrences (indigenous and alien taxa) in metropolitan France and Reunion Island, from 1545 until 2014. The data records were originally collated from inventories, checklists, literature and herbarium records. SIFlore was developed by assembling flora datasets from the regional to the national level. At the regional level, source records are managed by the national botanical conservatories that are responsible for flora data collection and validation. In order to present our results, a geoportal was developed by the Fédération des conservatoires botaniques nationaux that allows the SIFlore dataset to be publically viewed. This portal is available at: http://siflore.fcbn.fr. As the FCBN belongs to the Information System for Nature and Landscapes' (SINP), a governmental program, the dataset is also accessible through the websites of
SIFlore, a dataset of geographical distribution of vascular plants covering five centuries of knowledge in France: Results of a collaborative project coordinated by the Federation of the National Botanical Conservatories

PubMed Central

Just, Anaïs; Gourvil, Johan; Millet, Jérôme; Boullet, Vincent; Milon, Thomas; Mandon, Isabelle; Dutrève, Bruno

2015-01-01

Abstract More than 20 years ago, the French Muséum National d’Histoire Naturelle1 (MNHN, Secretariat of the Fauna and Flora) published the first part of an atlas of the flora of France at a 20km spatial resolution, accounting for 645 taxa (Dupont 1990). Since then, at the national level, there has not been any work on this scale relating to flora distribution, despite the obvious need for a better understanding. In 2011, in response to this need, the Federation des Conservatoires Botaniques Nationaux2 (FCBN, http://www.fcbn.fr) launched an ambitious collaborative project involving eleven national botanical conservatories of France. The project aims to establish a formal procedure and standardized system for data hosting, aggregation and publication for four areas: flora, fungi, vegetation and habitats. In 2014, the first phase of the project led to the development of the national flora dataset: SIFlore. As it includes about 21 million records of flora occurrences, this is currently the most comprehensive dataset on the distribution of vascular plants (Tracheophyta) in the French territory. SIFlore contains information for about 15'454 plant taxa occurrences (indigenous and alien taxa) in metropolitan France and Reunion Island, from 1545 until 2014. The data records were originally collated from inventories, checklists, literature and herbarium records. SIFlore was developed by assembling flora datasets from the regional to the national level. At the regional level, source records are managed by the national botanical conservatories that are responsible for flora data collection and validation. In order to present our results, a geoportal was developed by the Fédération des conservatoires botaniques nationaux that allows the SIFlore dataset to be publically viewed. This portal is available at: http://siflore.fcbn.fr. As the FCBN belongs to the Information System for Nature and Landscapes’ (SINP), a governmental program, the dataset is also accessible through
Personalizing lung cancer risk prediction and imaging follow-up recommendations using the National Lung Screening Trial dataset.

PubMed

Hostetter, Jason M; Morrison, James J; Morris, Michael; Jeudy, Jean; Wang, Kenneth C; Siegel, Eliot

2017-11-01

To demonstrate a data-driven method for personalizing lung cancer risk prediction using a large clinical dataset. An algorithm was used to categorize nodules found in the first screening year of the National Lung Screening Trial as malignant or nonmalignant. Risk of malignancy for nodules was calculated based on size criteria according to the Fleischner Society recommendations from 2005, along with the additional discriminators of pack-years smoking history, sex, and nodule location. Imaging follow-up recommendations were assigned according to Fleischner size category malignancy risk. Nodule size correlated with malignancy risk as predicted by the Fleischner Society recommendations. With the additional discriminators of smoking history, sex, and nodule location, significant risk stratification was observed. For example, men with ≥60 pack-years smoking history and upper lobe nodules measuring >4 and ≤6 mm demonstrated significantly increased risk of malignancy at 12.4% compared to the mean of 3.81% for similarly sized nodules (P < .0001). Based on personalized malignancy risk, 54% of nodules >4 and ≤6 mm were reclassified to longer-term follow-up than recommended by Fleischner. Twenty-seven percent of nodules ≤4 mm were reclassified to shorter-term follow-up. Using available clinical datasets such as the National Lung Screening Trial in conjunction with locally collected datasets can help clinicians provide more personalized malignancy risk predictions and follow-up recommendations. By incorporating 3 demographic data points, the risk of lung nodule malignancy within the Fleischner categories can be considerably stratified and more personalized follow-up recommendations can be made. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Inorganic and Organic Carbon, Nutrient, and Oxygen Data from the R/V Ronald H. Brown Repeat Hydrography Cruise in the Atlantic Ocean: CLIVAR CO2 Section A16N_2003a (4 June-11 August, 2003)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozyr, Alex

2005-08-30

This report presents methods and analytical and quality control procedures for nutrient, oxygen, and inorganic carbon system parameters performed during the A16N_2003a cruise, which took place from June 4 to August 11, 2003 aboard NOAA Ship R/V Ronald H. Brown under auspices of the National Oceanic and Atmospheric Administration (NOAA). The first hydrographic leg (June 19–July 10) was from Reykjavik, Iceland, to Funchal, Madeira, Portugal along the 20°W meridian, and the second leg (July 15–August 11) continued operations from Funchal, Portugal to Natal, Brazil, on a track southward and ending at 6°S, 25°W. The research was the first in amore » decadal series of repeat hydrography sections jointly funded by NOAA and the National Science Foundation (NSF) as part of the CLIVAR/CO 2/hydrography/tracer program. Samples were taken from up to 34 depths at 150 stations. The data presented in this report includes the analyses of water samples for total inorganic carbon (TCO2), fugacity of CO 2 (fCO 2), total alkalinity (TALK), pH, nitrate (NO 3), nitrite (NO 2), phosphate (PO 4), silicate (SiO4), and dissolved oxygen (O 2). The R/V Ronald H. Brown A16N_2003a data set is available free of charge as a numeric data package (NDP) from the Carbon Dioxide Information Analysis Center (CDIAC). The NDP consists of the oceanographic data files and this printed documentation, which describes the procedures and methods used to obtain the data.« less
Federal standards and procedures for the National Watershed Boundary Dataset (WBD)

USGS Publications Warehouse

,; ,; ,

2013-01-01

The Watershed Boundary Dataset (WBD) is a comprehensive aggregated collection of hydrologic unit data consistent with the national criteria for delineation and resolution. This document establishes Federal standards and procedures for creating the WBD as seamless and hierarchical hydrologic unit data, based on topographic and hydrologic features at a 1:24,000 scale in the United States, except for Alaska at 1:63,360 scale, and 1:25,000 scale in the Caribbean. The data within the WBD have been reviewed for certification through the 12-digit hydrologic unit for compliance with the criteria outlined in this document. Any edits to certified data will be reviewed against this standard prior to inclusion. Although not required as part of the framework WBD, the guidelines contain details for compiling and delineating the boundaries of two additional levels, the 14- and 16-digit hydrologic units, as well as the use of higher resolution base information to improve delineations. The guidelines presented herein are designed to enable local, regional, and national partners to delineate hydrologic units consistently and accurately. Such consistency improves watershed management through efficient sharing of information and resources and by ensuring that digital geographic data are usable with other related Geographic Information System (GIS) data.Terminology, definitions, and procedural information are provided to ensure uniformity in hydrologic unit boundaries, names, and numerical codes. Detailed standards and specifications for data are included. The document also includes discussion of objectives, communications required for revising the data resolution in the United States and the Caribbean, as well as final review and data-quality criteria. Instances of unusual landforms or artificial features that affect the hydrologic units are described with metadata standards. Up-to-date information and availability of the hydrologic units are listed at http:// www.nrcs.usda.gov/wps/portal/nrcs/detail/national
Federal standards and procedures for the National Watershed Boundary Dataset (WBD)

USGS Publications Warehouse

U.S. Geological Survey and U.S. Department of Agriculture, Natural Resources Conservation Service

2012-01-01

The Watershed Boundary Dataset (WBD) is a comprehensive aggregated collection of hydrologic unit data consistent with the national criteria for delineation and resolution. This document establishes Federal standards and procedures for creating the WBD as seamless and hierarchical hydrologic unit data, based on topographic and hydrologic features at a 1:24,000 scale in the United States, except for Alaska at 1:63,360 scale, and 1:25,000 scale in the Caribbean. The data within the WBD have been reviewed for certification through the 12-digit hydrologic unit for compliance with the criteria outlined in this document. Any edits to certified data will be reviewed against this standard prior to inclusion. Although not required as part of the framework WBD, the guidelines contain details for compiling and delineating the boundaries of two additional levels, the 14- and 16-digit hydrologic units, as well as the use of higher resolution base information to improve delineations. The guidelines presented herein are designed to enable local, regional, and national partners to delineate hydrologic units consistently and accurately. Such consistency improves watershed management through efficient sharing of information and resources and by ensuring that digital geographic data are usable with other related Geographic Information System (GIS) data. Terminology, definitions, and procedural information are provided to ensure uniformity in hydrologic unit boundaries, names, and numerical codes. Detailed standards and specifications for data are included. The document also includes discussion of objectives, communications required for revising the data resolution in the United States and the Caribbean, as well as final review and data-quality criteria. Instances of unusual landforms or artificial features that affect the hydrologic units are described with metadata standards. Up-to-date information and availability of the hydrologic units are listed at http://www.nrcs.usda.gov/wps/portal/nrcs/detail/national
Lake Michigan Diversion Accounting land cover change estimation by use of the National Land Cover Dataset and raingage network partitioning analysis

USGS Publications Warehouse

Sharpe, Jennifer B.; Soong, David T.

2015-01-01

This study used the National Land Cover Dataset (NLCD) and developed an automated process for determining the area of the three land cover types, thereby allowing faster updating of future models, and for evaluating land cover changes by use of historical NLCD datasets. The study also carried out a raingage partitioning analysis so that the segmentation of land cover and rainfall in each modeled unit is directly applicable to the HSPF modeling. Historical and existing impervious, grass, and forest land acreages partitioned by percentages covered by two sets of raingages for the Lake Michigan diversion SCAs, gaged basins, and ungaged basins are presented.
Final Report on the Creation of the Wind Integration National Dataset (WIND) Toolkit and API: October 1, 2013 - September 30, 2015

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hodge, Bri-Mathias

2016-04-08

The primary objective of this work was to create a state-of-the-art national wind resource data set and to provide detailed wind plant output data for specific sites based on that data set. Corresponding retrospective wind forecasts were also included at all selected locations. The combined information from these activities was used to create the Wind Integration National Dataset (WIND), and an extraction tool was developed to allow web-based data access.
A multidimensional representation model of geographic features

USGS Publications Warehouse

Usery, E. Lynn; Timson, George; Coletti, Mark

2016-01-28

A multidimensional model of geographic features has been developed and implemented with data from The National Map of the U.S. Geological Survey. The model, programmed in C++ and implemented as a feature library, was tested with data from the National Hydrography Dataset demonstrating the capability to handle changes in feature attributes, such as increases in chlorine concentration in a stream, and feature geometry, such as the changing shoreline of barrier islands over time. Data can be entered directly, from a comma separated file, or features with attributes and relationships can be automatically populated in the model from data in the Spatial Data Transfer Standard format.
A hierarchical spatial framework and database for the national river fish habitat condition assessment

USGS Publications Warehouse

Wang, L.; Infante, D.; Esselman, P.; Cooper, A.; Wu, D.; Taylor, W.; Beard, D.; Whelan, G.; Ostroff, A.

2011-01-01

Fisheries management programs, such as the National Fish Habitat Action Plan (NFHAP), urgently need a nationwide spatial framework and database for health assessment and policy development to protect and improve riverine systems. To meet this need, we developed a spatial framework and database using National Hydrography Dataset Plus (I-.100,000-scale); http://www.horizon-systems.com/nhdplus). This framework uses interconfluence river reaches and their local and network catchments as fundamental spatial river units and a series of ecological and political spatial descriptors as hierarchy structures to allow users to extract or analyze information at spatial scales that they define. This database consists of variables describing channel characteristics, network position/connectivity, climate, elevation, gradient, and size. It contains a series of catchment-natural and human-induced factors that are known to influence river characteristics. Our framework and database assembles all river reaches and their descriptors in one place for the first time for the conterminous United States. This framework and database provides users with the capability of adding data, conducting analyses, developing management scenarios and regulation, and tracking management progresses at a variety of spatial scales. This database provides the essential data needs for achieving the objectives of NFHAP and other management programs. The downloadable beta version database is available at http://ec2-184-73-40-15.compute-1.amazonaws.com/nfhap/main/.
Mapping benefits from updated ifsar data in Alaska: improved source data enables better maps

USGS Publications Warehouse

Craun, Kari J.

2015-08-06

The U.S. Geological Survey (USGS) and partners in other Federal and State agencies are working collaboratively toward Statewide coverage of interferometric synthetic aperture radar (ifsar) elevation data in Alaska. These data will provide many benefits to a wide range of stakeholders and users. Some applications include development of more accurate and highly detailed topographic maps; improvement of surface water information included in the National Hydrography (NHD) and Watershed Boundary Datasets (WBDs); and use in scientific modeling applications such as calculating glacier surface elevation differences over time and estimating tsunami inundation areas.
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets

NASA Astrophysics Data System (ADS)

Sorokine, A.; Stewart, R. N.

2017-10-01

Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
Development of a global historic monthly mean precipitation dataset

NASA Astrophysics Data System (ADS)

Yang, Su; Xu, Wenhui; Xu, Yan; Li, Qingxiang

2016-04-01

Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to
The road to NHDPlus — Advancements in digital stream networks and associated catchments

USGS Publications Warehouse

Moore, Richard B.; Dewald, Thomas A.

2016-01-01

A progression of advancements in Geographic Information Systems techniques for hydrologic network and associated catchment delineation has led to the production of the National Hydrography Dataset Plus (NHDPlus). NHDPlus is a digital stream network for hydrologic modeling with catchments and a suite of related geospatial data. Digital stream networks with associated catchments provide a geospatial framework for linking and integrating water-related data. Advancements in the development of NHDPlus are expected to continue to improve the capabilities of this national geospatial hydrologic framework. NHDPlus is built upon the medium-resolution NHD and, like NHD, was developed by the U.S. Environmental Protection Agency and U.S. Geological Survey to support the estimation of streamflow and stream velocity used in fate-and-transport modeling. Catchments included with NHDPlus were created by integrating vector information from the NHD and from the Watershed Boundary Dataset with the gridded land surface elevation as represented by the National Elevation Dataset. NHDPlus is an actively used and continually improved dataset. Users recognize the importance of a reliable stream network and associated catchments. The NHDPlus spatial features and associated data tables will continue to be improved to support regional water quality and streamflow models and other user-defined applications.
TopoLens: Building a cyberGIS community data service for enhancing the usability of high-resolution National Topographic datasets

USGS Publications Warehouse

Hu, Hao; Hong, Xingchen; Terstriep, Jeff; Liu, Yan; Finn, Michael P.; Rush, Johnathan; Wendel, Jeffrey; Wang, Shaowen

2016-01-01

Geospatial data, often embedded with geographic references, are important to many application and science domains, and represent a major type of big data. The increased volume and diversity of geospatial data have caused serious usability issues for researchers in various scientific domains, which call for innovative cyberGIS solutions. To address these issues, this paper describes a cyberGIS community data service framework to facilitate geospatial big data access, processing, and sharing based on a hybrid supercomputer architecture. Through the collaboration between the CyberGIS Center at the University of Illinois at Urbana-Champaign (UIUC) and the U.S. Geological Survey (USGS), a community data service for accessing, customizing, and sharing digital elevation model (DEM) and its derived datasets from the 10-meter national elevation dataset, namely TopoLens, is created to demonstrate the workflow integration of geospatial big data sources, computation, analysis needed for customizing the original dataset for end user needs, and a friendly online user environment. TopoLens provides online access to precomputed and on-demand computed high-resolution elevation data by exploiting the ROGER supercomputer. The usability of this prototype service has been acknowledged in community evaluation.
GIEMS-D3: A new long-term, dynamical, high-spatial resolution inundation extent dataset at global scale

NASA Astrophysics Data System (ADS)

Aires, Filipe; Miolane, Léo; Prigent, Catherine; Pham Duc, Binh; Papa, Fabrice; Fluet-Chouinard, Etienne; Lehner, Bernhard

2017-04-01

The Global Inundation Extent from Multi-Satellites (GIEMS) provides multi-year monthly variations of the global surface water extent at 25kmx25km resolution. It is derived from multiple satellite observations. Its spatial resolution is usually compatible with climate model outputs and with global land surface model grids but is clearly not adequate for local applications that require the characterization of small individual water bodies. There is today a strong demand for high-resolution inundation extent datasets, for a large variety of applications such as water management, regional hydrological modeling, or for the analysis of mosquitos-related diseases. A new procedure is introduced to downscale the GIEMS low spatial resolution inundations to a 3 arc second (90 m) dataset. The methodology is based on topography and hydrography information from the HydroSHEDS database. A new floodability index is adopted and an innovative smoothing procedure is developed to ensure the smooth transition, in the high-resolution maps, between the low-resolution boxes from GIEMS. Topography information is relevant for natural hydrology environments controlled by elevation, but is more limited in human-modified basins. However, the proposed downscaling approach is compatible with forthcoming fusion with other more pertinent satellite information in these difficult regions. The resulting GIEMS-D3 database is the only high spatial resolution inundation database available globally at the monthly time scale over the 1993-2007 period. GIEMS-D3 is assessed by analyzing its spatial and temporal variability, and evaluated by comparisons to other independent satellite observations from visible (Google Earth and Landsat), infrared (MODIS) and active microwave (SAR).
LANDFIRE 2010—Updates to the national dataset to support improved fire and natural resource management

USGS Publications Warehouse

Nelson, Kurtis J.; Long, Donald G.; Connot, Joel A.

2016-02-29

The Landscape Fire and Resource Management Planning Tools (LANDFIRE) 2010 data release provides updated and enhanced vegetation, fuel, and fire regime layers consistently across the United States. The data represent landscape conditions from approximately 2010 and are the latest release in a series of planned updates to maintain currency of LANDFIRE data products. Enhancements to the data products included refinement of urban areas by incorporating the National Land Cover Database 2006 land cover product, refinement of agricultural lands by integrating the National Agriculture Statistics Service 2011 cropland data layer, and improved wetlands delineations using the National Land Cover Database 2006 land cover and the U.S. Fish and Wildlife Service National Wetlands Inventory data. Disturbance layers were generated for years 2008 through 2010 using remotely sensed imagery, polygons representing disturbance events submitted by local organizations, and fire mapping program data such as the Monitoring Trends in Burn Severity perimeters produced by the U.S. Geological Survey and the U.S. Forest Service. Existing vegetation data were updated to account for transitions in disturbed areas and to account for vegetation growth and succession in undisturbed areas. Surface and canopy fuel data were computed from the updated vegetation type, cover, and height and occasionally from potential vegetation. Historical fire frequency and succession classes were also updated. Revised topographic layers were created based on updated elevation data from the National Elevation Dataset. The LANDFIRE program also released a new Web site offering updated content, enhanced usability, and more efficient navigation.
Finding the Maine Story in Hugh Cumbersome National Monitoring Datasets

EPA Science Inventory

What’s a manager, analyst, or concerned citizen to do with the complex datasets generated by State and Federal monitoring efforts? Is it possible to use such information to address Maine’s environmental issues without having a degree in informatics and statistics? This presentati...
Riparian Land Use/Land Cover Data for Five Study Units in the Nutrient Enrichment Effects Topical Study of the National Water-Quality Assessment Program

USGS Publications Warehouse

Johnson, Michaela R.; Buell, Gary R.; Kim, Moon H.; Nardi, Mark R.

2007-01-01

This dataset was developed as part of the National Water-Quality Assessment (NAWQA) Program, Nutrient Enrichment Effects Topical (NEET) study for five study units distributed across the United States: Apalachicola-Chattahoochee-Flint River Basin, Central Columbia Plateau-Yakima River Basin, Central Nebraska Basins, Potomac River Basin and Delmarva Peninsula, and White, Great and Little Miami River Basins. One hundred forty-three stream reaches were examined as part of the NEET study conducted 2003-04. Stream segments, with lengths equal to the logarithm of the basin area, were delineated upstream from the downstream ends of the stream reaches with the use of digital orthophoto quarter quadrangles (DOQQ) or selected from the high-resolution National Hydrography Dataset (NHD). Use of the NHD was necessary when the stream was not distinguishable in the DOQQ because of dense tree canopy. The analysis area for each stream segment was defined by a buffer beginning at the segment extending to 250 meters lateral to the stream segment. Delineation of land use/land cover (LULC) map units within stream segment buffers was conducted using on-screen digitizing of riparian LULC classes interpreted from the DOQQ. LULC units were mapped using a classification strategy consisting of nine classes. National Wetlands Inventory (NWI) data were used to aid in wetland classification. Longitudinal transect sampling lines offset from the stream segments were generated and partitioned into the underlying LULC types. These longitudinal samples yielded the relative linear extent and sequence of each LULC type within the riparian zone at the segment scale. The resulting areal and linear LULC data filled in the spatial-scale gap between the 30-meter resolution of the National Land Cover Dataset and the reach-level habitat assessment data collected onsite routinely for NAWQA ecological sampling. The final data consisted of 12 geospatial datasets: LULC within 25 meters of the stream reach

The National Map product and services directory

USGS Publications Warehouse

Newell, Mark R.

2008-01-01

As one of the cornerstones of the U.S. Geological Survey's (USGS) National Geospatial Program (NGP), The National Map is a collaborative effort among the USGS and other Federal, state, and local partners to improve and deliver topographic information for the Nation. It has many uses ranging from recreation to scientific analysis to emergency response. The National Map is easily accessible for display on the Web, as products, and as downloadable data. The geographic information available from The National Map includes orthoimagery (aerial photographs), elevation, geographic names, hydrography, boundaries, transportation, structures, and land cover. Other types of geographic information can be added to create specific types of maps. Of major importance, The National Map currently is being transformed to better serve the geospatial community. The USGS National Geospatial Program Office (NGPO) was established to provide leadership for placing geographic knowledge at the fingertips of the Nation. The office supports The National Map, Geospatial One-Stop (GOS), National Atlas of the United States®, and the Federal Geographic Data Committee (FGDC). This integrated portfolio of geospatial information and data supports the essential components of delivering the National Spatial Data Infrastructure (NSDI) and capitalizing on the power of place.
Topographic and hydrographic GIS datasets for the Afghan Geological Survey and U.S. Geological Survey 2013 mineral areas of interest

USGS Publications Warehouse

Casey, Brittany N.; Chirico, Peter G.

2013-01-01

Afghanistan is endowed with a vast amount of mineral resources, and it is believed that the current economic state of the country could be greatly improved through investment in the extraction and production of these resources. In 2007, the “Preliminary Non-Fuel Resource Assessment of Afghanistan 2007” was completed by members of the U.S. Geological Survey and Afghan Geological Survey (Peters and others, 2007). The assessment delineated 20 mineralized areas for further study using a geologic-based methodology. In 2011, a follow-on data product, “Summaries and Data Packages of Important Areas for Mineral Investment and Production Opportunities of Nonfuel Minerals in Afghanistan,” was released (Peters and others, 2011). As part of this more recent work, geologic, geohydrologic, and hyperspectral studies were carried out in the areas of interest (AOIs) to assess the location and characteristics of the mineral resources. The 2011 publication included a dataset of 24 identified AOIs containing subareas, a corresponding digital elevation model (DEM), elevation contours, areal extent, and hydrography for each AOI. In 2012, project scientists identified five new AOIs and two subareas in Afghanistan. These new areas are Ahankashan, Kandahar, Parwan, North Bamyan, and South Bamyan. The two identified subareas include Obatu-Shela and Sekhab-ZamtoKalay, both located within the larger Kandahar AOI. In addition, an extended Kandahar AOI is included in the project for water resource modeling purposes. The dataset presented in this publication consists of the areal extent of the five new AOIs, two subareas, and the extended Kandahar AOI, elevation contours at 100-, 50-, and 25-meter intervals, an enhanced DEM, and a hydrographic dataset covering the extent of the new study area. The resulting raster and vector layers are intended for use by government agencies, developmental organizations, and private companies in Afghanistan to assist with mineral assessments, monitoring
Experiments with Interaction between the National Water Model and the Reservoir System Simulation Model: A Case Study of Russian River Basin

NASA Astrophysics Data System (ADS)

Kim, J.; Johnson, L.; Cifelli, R.; Chandra, C. V.; Gochis, D.; McCreight, J. L.; Yates, D. N.; Read, L.; Flowers, T.; Cosgrove, B.

2017-12-01

NOAA National Water Center (NWC) in partnership with the National Centers for Environmental Prediction (NCEP), the National Center for Atmospheric Research (NCAR) and other academic partners have produced operational hydrologic predictions for the nation using a new National Water Model (NWM) that is based on the community WRF-Hydro modeling system since the summer of 2016 (Gochis et al., 2015). The NWM produces a variety of hydrologic analysis and prediction products, including gridded fields of soil moisture, snowpack, shallow groundwater levels, inundated area depths, evapotranspiration as well as estimates of river flow and velocity for approximately 2.7 million river reaches. Also included in the NWM are representations for more than 1,200 reservoirs which are linked into the national channel network defined by the USGS NHDPlusv2.0 hydrography dataset. Despite the unprecedented spatial and temporal coverage of the NWM, many known deficiencies exist, including the representation of lakes and reservoirs. This study addresses the implementation of a reservoir assimilation scheme through coupling of a reservoir simulation model to represent the influence of managed flows. We examine the use of the reservoir operations to dynamically update lake/reservoir storage volume states, characterize flow characteristics of river reaches flowing into and out of lakes and reservoirs, and incorporate enhanced reservoir operating rules for the reservoir model options within the NWM. Model experiments focus on a pilot reservoir domain-Lake Mendocino, CA, and its contributing watershed, the East Fork Russian River. This reservoir is modeled using United States Army Corps of Engineers (USACE) HEC-ResSim developed for application to examine forecast informed reservoir operations (FIRO) in the Russian River basin.
Neodymium in the oceans: a global database, a regional comparison and implications for palaeoceanographic research

PubMed Central

Griffiths, Alexander M.; Lambelet, Myriam; Little, Susan H.; Stichel, Torben; Wilson, David J.

2016-01-01

The neodymium (Nd) isotopic composition of seawater has been used extensively to reconstruct ocean circulation on a variety of time scales. However, dissolved neodymium concentrations and isotopes do not always behave conservatively, and quantitative deconvolution of this non-conservative component can be used to detect trace metal inputs and isotopic exchange at ocean–sediment interfaces. In order to facilitate such comparisons for historical datasets, we here provide an extended global database for Nd isotopes and concentrations in the context of hydrography and nutrients. Since 2010, combined datasets for a large range of trace elements and isotopes are collected on international GEOTRACES section cruises, alongside classical nutrient and hydrography measurements. Here, we take a first step towards exploiting these datasets by comparing high-resolution Nd sections for the western and eastern North Atlantic in the context of hydrography, nutrients and aluminium (Al) concentrations. Evaluating those data in tracer–tracer space reveals that North Atlantic seawater Nd isotopes and concentrations generally follow the patterns of advection, as do Al concentrations. Deviations from water mass mixing are observed locally, associated with the addition or removal of trace metals in benthic nepheloid layers, exchange with ocean margins (i.e. boundary exchange) and/or exchange with particulate phases (i.e. reversible scavenging). We emphasize that the complexity of some of the new datasets cautions against a quantitative interpretation of individual palaeo Nd isotope records, and indicates the importance of spatial reconstructions for a more balanced approach to deciphering past ocean changes. This article is part of the themed issue ‘Biological and climatic impacts of ocean trace element chemistry’. PMID:29035258
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades.

PubMed

Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K; Thakor, Nitish

2015-01-01

Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches.
An Intercomparison of Large-Extent Tree Canopy Cover Geospatial Datasets

NASA Astrophysics Data System (ADS)

Bender, S.; Liknes, G.; Ruefenacht, B.; Reynolds, J.; Miller, W. P.

2017-12-01

As a member of the Multi-Resolution Land Characteristics Consortium (MRLC), the U.S. Forest Service (USFS) is responsible for producing and maintaining the tree canopy cover (TCC) component of the National Land Cover Database (NLCD). The NLCD-TCC data are available for the conterminous United States (CONUS), coastal Alaska, Hawai'i, Puerto Rico, and the U.S. Virgin Islands. The most recent official version of the NLCD-TCC data is based primarily on reference data from 2010-2011 and is part of the multi-component 2011 version of the NLCD. NLCD data are updated on a five-year cycle. The USFS is currently producing the next official version (2016) of the NLCD-TCC data for the United States, and it will be made publicly-available in early 2018. In this presentation, we describe the model inputs, modeling methods, and tools used to produce the 30-m NLCD-TCC data. Several tree cover datasets at 30-m, as well as datasets at finer resolution, have become available in recent years due to advancements in earth observation data and their availability, computing, and sensors. We compare multiple tree cover datasets that have similar resolution to the NLCD-TCC data. We also aggregate the tree class from fine-resolution land cover datasets to a percent canopy value on a 30-m pixel, in order to compare the fine-resolution datasets to the datasets created directly from 30-m Landsat data. The extent of the tree canopy cover datasets included in the study ranges from global and national to the state level. Preliminary investigation of multiple tree cover datasets over the CONUS indicates a high amount of spatial variability. For example, in a comparison of the NLCD-TCC and the Global Land Cover Facility's Landsat Tree Cover Continuous Fields (2010) data by MRLC mapping zones, the zone-level root mean-square deviation ranges from 2% to 39% (mean=17%, median=15%). The analysis outcomes are expected to inform USFS decisions with regard to the next cycle (2021) of NLCD-TCC production.
Geochemistry of the Black Sea during the last 15 kyr: A protracted evolution of its hydrography and ecology

USGS Publications Warehouse

Piper, David Z.

2016-01-01

The Black Sea is a 2200 m deep anoxic, marine sea connected to the Mediterranean Sea via the Dardanelles Strait, Marmara Sea, and the 3 km wide, 35 m deep Bosphorus Strait. The biogeochemistry of sediment from the Anatolia slope has recorded changes to the hydrography leading up to and following the input of Mediterranean water at ~9.4 ka (103 years B.P.), when global sea level rose to the level of the Bosphorus sill and high-salinity water from the Mediterranean began to spill into the then brackish lake. The water initially mixed little with the lake water but cascaded to the bottom where it remained essentially isolated for ~1.6 kyr, the time required to fill the basin from the bottom up at its present input rate. The accumulation of Mo in the seafloor sediments, a proxy of bottom-water anoxia, increased sharply at ~8.6 ka, when bacterial respiration in the bottom water advanced to SO42− reduction by the oxidation of organic detritus that settled out of the photic zone. Its accumulation remained elevated to ~5.6 ka, when it decreased 60%, only to again increase slightly at ~2.0 ka. The accumulation of Corg, a proxy of primary productivity, increased threefold to fourfold at ~7.8 ka, when upward mixing of the high-salinity bottom water replaced the then thin veneer of the brackish photic zone in less than 50 years. From that time onward, the accumulation of Corg, Mo, and additional trace metals has reflected the hydrography of the basin and Bosphorus Strait, controlled largely by climate.
Development of a SPARK Training Dataset

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sayre, Amanda M.; Olson, Jarrod R.

2015-03-01

In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and acrossmore » locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer
Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

NASA Astrophysics Data System (ADS)

Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

2017-04-01

CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the
Wide-Open: Accelerating public data release by automating detection of overdue datasets

PubMed Central

Poon, Hoifung; Howe, Bill

2017-01-01

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819
Wide-Open: Accelerating public data release by automating detection of overdue datasets.

PubMed

Grechkin, Maxim; Poon, Hoifung; Howe, Bill

2017-06-01

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
Statistical Reference Datasets

National Institute of Standards and Technology Data Gateway

Statistical Reference Datasets (Web, free access) The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.
A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system

NASA Astrophysics Data System (ADS)

Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley

2016-04-01

An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between
Dataset Lifecycle Policy

NASA Technical Reports Server (NTRS)

Armstrong, Edward; Tauer, Eric

2013-01-01

The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.
Dataset on spatial distribution and location of universities in Nigeria.

PubMed

Adeyemi, G A; Edeki, S O

2018-06-01

Access to quality educational system, and the location of educational institutions are of great importance for future prospect of youth in any nation. These in return, have great effects on the economy growth and development of any country. Thus, the dataset contained in this article examines and explains the spatial distribution of universities in the Nigeria system of education. Data from the university commission, Nigeria, as at December 2017 are used. These include all the 40 federal universities, 44 states universities, and 69 private universities making a total of 153 universities in the Nigerian system of education. The data analysis is via the Geographic Information System (GIS) software. The dataset contained in this article will be of immense assistance to the national educational policy makers, parents, and potential students as regards smart and reliable decision making academically.
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

PubMed Central

Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi

2018-01-01

Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.

PubMed

Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi

2018-02-27

Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
Providing Access to a Diverse Set of Global Reanalysis Dataset Collections

NASA Astrophysics Data System (ADS)

Schuster, D.; Worley, S. J.

2015-12-01

The National Center for Atmospheric Research (NCAR) Research Data Archive (RDA, http://rda.ucar.edu) provides open access to a variety of global reanalysis dataset collections to support atmospheric and related sciences research worldwide. These include products from the European Centre for Medium-Range Weather Forecasts (ECMWF), Japan Meteorological Agency (JMA), National Centers for Environmental Prediction (NCEP), National Oceanic and Atmospheric Administration (NOAA), and NCAR.All RDA hosted reanalysis collections are freely accessible to registered users through a variety of methods. Standard access methods include traditional browser and scripted HTTP file download. Enhanced downloads are available through the Globus GridFTP "fire and forget" data transfer service, which provides an efficient, reliable, and preferred alternative to traditional HTTP-based methods. For those that favor interoperable access using compatible tools, the Unidata THREDDS Data server provides remote access to complete reanalysis collections by virtual dataset aggregation "files". Finally, users can request data subsets and format conversions to be prepared for them through web interface form requests or web service API batch requests. This approach uses NCAR HPC and central file systems to effectively prepare products from the high-resolution and very large reanalyses archives. The presentation will include a detailed inventory of all RDA reanalysis dataset collection holdings, and highlight access capabilities to these collections through use case examples.
Learning to recognize rat social behavior: Novel dataset and cross-dataset application.

PubMed

Lorbach, Malte; Kyriakou, Elisavet I; Poppe, Ronald; van Dam, Elsbeth A; Noldus, Lucas P J J; Veltkamp, Remco C

2018-04-15

Social behavior is an important aspect of rodent models. Automated measuring tools that make use of video analysis and machine learning are an increasingly attractive alternative to manual annotation. Because machine learning-based methods need to be trained, it is important that they are validated using data from different experiment settings. To develop and validate automated measuring tools, there is a need for annotated rodent interaction datasets. Currently, the availability of such datasets is limited to two mouse datasets. We introduce the first, publicly available rat social interaction dataset, RatSI. We demonstrate the practical value of the novel dataset by using it as the training set for a rat interaction recognition method. We show that behavior variations induced by the experiment setting can lead to reduced performance, which illustrates the importance of cross-dataset validation. Consequently, we add a simple adaptation step to our method and improve the recognition performance. Most existing methods are trained and evaluated in one experimental setting, which limits the predictive power of the evaluation to that particular setting. We demonstrate that cross-dataset experiments provide more insight in the performance of classifiers. With our novel, public dataset we encourage the development and validation of automated recognition methods. We are convinced that cross-dataset validation enhances our understanding of rodent interactions and facilitates the development of more sophisticated recognition methods. Combining them with adaptation techniques may enable us to apply automated recognition methods to a variety of animals and experiment settings. Copyright © 2017 Elsevier B.V. All rights reserved.
Utilizing the Antarctic Master Directory to find orphan datasets

NASA Astrophysics Data System (ADS)

Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

2011-12-01

While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

Carbon dioxide, hydrographic, and chemical data obtained during the R/Vs Roger Revelle and Thomas Thompson repeat hydrography cruises in the Pacific Ocean: CLIVAR CO 2 sections P16S-2005 (9 January - 19 February, 2005) and P16N-2006 (13 February - 30 March, 2006)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozyr, Alex; Feely, R. A.; Sabine, C. L.

2009-05-01

This report presents methods, and analytical and quality control procedures for salinity, oxygen, nutrients, total carbon dioxide (TCO 2), total alkalinity (TALK), pH, discrete CO 2 partial pressure (pCO 2), dissolved organic carbon (DOC), chlorofluorocarbons (CFCs), radiocarbon, δ13C, and underway carbon measurements performed during the P16S-2005 (9 January - 19 February 2005) and P16N-2006 (13 February - 30 March, 2006) cruises in the Pacific Ocean. The research vessel (R/V) Roger Revelle departed Papeete, Tahiti, on January 9, 2005 for the Repeat Section P16S, nominally along 150°W, ending in Wellington, New Zealand, on February 19. During this cruise, samples were takenmore » from 36 depths at 111 CTD stations between 16°S and 71°S. The Repeat Section P16N, nominally along 152°W, consisted of two legs. Leg 1 started on February 13, 2006 in Papeete, Tahiti, and finished on March 3, in Honolulu, Hawaii. The R/V Thomas G. Thompson departed Honolulu for Leg 2 on March 10, 2006 and arrived in Kodiak, Alaska, on March 30. During the P16N cruises, samples were taken from 34 or 36 depths at 84 stations between 17°S and 56.28°N. The research conducted on these cruises was part of a series of repeat hydrography sections jointly funded by the National Oceanic and Atmospheric Administration (NOAA) and the National Science Foundation (NSF) as part of the Climate Variability Program (CLIVAR)/CO 2 Repeat Hydrography Program. The P16S and P16N data sets are available free of charge as a numeric data package (NDP) from the Carbon Dioxide Information Analysis Center (CDIAC). The NDP consists of the oceanographic data files and this printed documentation, which describes the procedures and methods used to obtain the data.« less
DATS, the data tag suite to enable discoverability of datasets.

PubMed

Sansone, Susanna-Assunta; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Alter, George; Grethe, Jeffrey S; Xu, Hua; Fore, Ian M; Lyle, Jared; Gururaj, Anupama E; Chen, Xiaoling; Kim, Hyeon-Eui; Zong, Nansu; Li, Yueling; Liu, Ruiling; Ozyurt, I Burak; Ohno-Machado, Lucila

2017-06-06

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.
National Water Model: Providing the Nation with Actionable Water Intelligence

NASA Astrophysics Data System (ADS)

Aggett, G. R.; Bates, B.

2017-12-01

The National Water Model (NWM) provides national, street-level detail of water movement through time and space. Operating hourly, this flood of information offers enormous benefits in the form of water resource management, natural disaster preparedness, and the protection of life and property. The Geo-Intelligence Division at the NOAA National Water Center supplies forecasters and decision-makers with timely, actionable water intelligence through the processing of billions of NWM data points every hour. These datasets include current streamflow estimates, short and medium range streamflow forecasts, and many other ancillary datasets. The sheer amount of NWM data produced yields a dataset too large to allow for direct human comprehension. As such, it is necessary to undergo model data post-processing, filtering, and data ingestion by visualization web apps that make use of cartographic techniques to bring attention to the areas of highest urgency. This poster illustrates NWM output post-processing and cartographic visualization techniques being developed and employed by the Geo-Intelligence Division at the NOAA National Water Center to provide national actionable water intelligence.
Chemical elements in the environment: multi-element geochemical datasets from continental to national scale surveys on four continents

USGS Publications Warehouse

Caritat, Patrice de; Reimann, Clemens; Smith, David; Wang, Xueqiu

2017-01-01

During the last 10-20 years, Geological Surveys around the world have undertaken a major effort towards delivering fully harmonized and tightly quality-controlled low-density multi-element soil geochemical maps and datasets of vast regions including up to whole continents. Concentrations of between 45 and 60 elements commonly have been determined in a variety of different regolith types (e.g., sediment, soil). The multi-element datasets are published as complete geochemical atlases and made available to the general public. Several other geochemical datasets covering smaller areas but generally at a higher spatial density are also available. These datasets may, however, not be found by superficial internet-based searches because the elements are not mentioned individually either in the title or in the keyword lists of the original references. This publication attempts to increase the visibility and discoverability of these fundamental background datasets covering large areas up to whole continents.
Web services in the U.S. geological survey streamstats web application

USGS Publications Warehouse

Guthrie, J.D.; Dartiguenave, C.; Ries, Kernell G.

2009-01-01

StreamStats is a U.S. Geological Survey Web-based GIS application developed as a tool for waterresources planning and management, engineering design, and other applications. StreamStats' primary functionality allows users to obtain drainage-basin boundaries, basin characteristics, and streamflow statistics for gaged and ungaged sites. Recently, Web services have been developed that provide the capability to remote users and applications to access comprehensive GIS tools that are available in StreamStats, including delineating drainage-basin boundaries, computing basin characteristics, estimating streamflow statistics for user-selected locations, and determining point features that coincide with a National Hydrography Dataset (NHD) reach address. For the state of Kentucky, a web service also has been developed that provides users the ability to estimate daily time series of drainage-basin average values of daily precipitation and temperature. The use of web services allows the user to take full advantage of the datasets and processes behind the Stream Stats application without having to develop and maintain them. ?? 2009 IEEE.
Application of the Streamflow Prediction Tool to Estimate Sediment Dredging Volumes in Texas Coastal Waterways

NASA Astrophysics Data System (ADS)

Yeates, E.; Dreaper, G.; Afshari, S.; Tavakoly, A. A.

2017-12-01

Over the past six fiscal years, the United States Army Corps of Engineers (USACE) has contracted an average of about a billion dollars per year for navigation channel dredging. To execute these funds effectively, USACE Districts must determine which navigation channels need to be dredged in a given year. Improving this prioritization process results in more efficient waterway maintenance. This study uses the Streamflow Prediction Tool, a runoff routing model based on global weather forecast ensembles, to estimate dredged volumes. This study establishes regional linear relationships between cumulative flow and dredged volumes over a long-term simulation covering 30 years (1985-2015), using drainage area and shoaling parameters. The study framework integrates the National Hydrography Dataset (NHDPlus Dataset) with parameters from the Corps Shoaling Analysis Tool (CSAT) and dredging record data from USACE District records. Results in the test cases of the Houston Ship Channel and the Sabine and Port Arthur Harbor waterways in Texas indicate positive correlation between the simulated streamflows and actual dredging records.
Scalable persistent identifier systems for dynamic datasets

NASA Astrophysics Data System (ADS)

Golodoniuc, P.; Cox, S. J. D.; Klump, J. F.

2016-12-01

Reliable and persistent identification of objects, whether tangible or not, is essential in information management. Many Internet-based systems have been developed to identify digital data objects, e.g., PURL, LSID, Handle, ARK. These were largely designed for identification of static digital objects. The amount of data made available online has grown exponentially over the last two decades and fine-grained identification of dynamically generated data objects within large datasets using conventional systems (e.g., PURL) has become impractical. We have compared capabilities of various technological solutions to enable resolvability of data objects in dynamic datasets, and developed a dataset-centric approach to resolution of identifiers. This is particularly important in Semantic Linked Data environments where dynamic frequently changing data is delivered live via web services, so registration of individual data objects to obtain identifiers is impractical. We use identifier patterns and pattern hierarchies for identification of data objects, which allows relationships between identifiers to be expressed, and also provides means for resolving a single identifier into multiple forms (i.e. views or representations of an object). The latter can be implemented through (a) HTTP content negotiation, or (b) use of URI querystring parameters. The pattern and hierarchy approach has been implemented in the Linked Data API supporting the United Nations Spatial Data Infrastructure (UNSDI) initiative and later in the implementation of geoscientific data delivery for the Capricorn Distal Footprints project using International Geo Sample Numbers (IGSN). This enables flexible resolution of multi-view persistent identifiers and provides a scalable solution for large heterogeneous datasets.
Towards a High-Resolution Global Inundation Delineation Dataset

NASA Astrophysics Data System (ADS)

Fluet-Chouinard, E.; Lehner, B.

2011-12-01

Although their importance for biodiversity, flow regulation and ecosystem service provision is widely recognized, wetlands and temporarily inundated landscapes remain poorly mapped globally because of their inherent elusive nature. Inventorying of wetland resources has been identified in international agreements as an essential component of appropriate conservation efforts and management initiatives of these threatened ecosystems. However, despite recent advances in remote sensing surface water monitoring, current inventories of surface water variations remain incomplete at the regional-to-global scale due to methodological limitations restricting truly global application. Remote sensing wetland applications such as SAR L-band are particularly constrained by image availability and heterogeneity of acquisition dates, while coarse resolution passive microwave and multi-sensor methods cannot discriminate distinct surface water bodies. As a result, the most popular global wetland dataset remains to this day the Global Lake & Wetland Database (Lehner and Doll, 2004) a spatially inconsistent database assembled from various existing data sources. The approach taken in this project circumvents the limitations of current global wetland monitoring methods by combining globally available topographic and hydrographic data to downscale coarse resolution global inundation data (Prigent et al., 2007) and thus create a superior inundation delineation map product. The developed procedure downscales inundation data from the coarse resolution (~27km) of current passive microwave sensors to the finer spatial resolution (~500m) of the topographic and hydrographic layers of HydroSHEDS' data suite (Lehner et al., 2006), while retaining the high temporal resolution of the multi-sensor inundation dataset. From the downscaling process emerges new information on the specific location of inundation, but also on its frequency and duration. The downscaling algorithm employs a decision tree
Segmentation of Unstructured Datasets

NASA Technical Reports Server (NTRS)

Bhat, Smitha

1996-01-01

Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.
Eighteenth annual report of the United States Geological Survey to the Secretary of the Interior, 1896-1897: Part IV - Hydrography

USGS Publications Warehouse

Davis, Arthur Powell; Leverett, Frank; Darton, N.H.; Schuyler, J.D.

1897-01-01

The completion of this volume marks the revival of extended systematic investigation of the hydrography of the United State. This book is, in effect, the ninth annual report of what has been known as the Irrigation Survey. Its preparation and publication has been made possible by the act of June 11, 1896 (Stat. L., vol. 29, p. 436), which enlarged the scope of the work and authorized the preparation of reports upon the best methods of utilizing the water resources of arid and semiarid sections. For some years before this date the sums available for hydrographic work were so small that it was practicable merely to continue observations at previously established stations, compute discharges, and compile for publication the data accumulated in the office.
Standardising trauma monitoring: the development of a minimum dataset for trauma registries in Australia and New Zealand.

PubMed

Palmer, Cameron S; Davey, Tamzyn M; Mok, Meng Tuck; McClure, Rod J; Farrow, Nathan C; Gruen, Russell L; Pollard, Cliff W

2013-06-01

Trauma registries are central to the implementation of effective trauma systems. However, differences between trauma registry datasets make comparisons between trauma systems difficult. In 2005, the collaborative Australian and New Zealand National Trauma Registry Consortium began a process to develop a bi-national minimum dataset (BMDS) for use in Australasian trauma registries. This study aims to describe the steps taken in the development and preliminary evaluation of the BMDS. A working party comprising sixteen representatives from across Australasia identified and discussed the collectability and utility of potential BMDS fields. This included evaluating existing national and international trauma registry datasets, as well as reviewing all quality indicators and audit filters in use in Australasian trauma centres. After the working party activities concluded, this process was continued by a number of interested individuals, with broader feedback sought from the Australasian trauma community on a number of occasions. Once the BMDS had reached a suitable stage of development, an email survey was conducted across Australasian trauma centres to assess whether BMDS fields met an ideal minimum standard of field collectability. The BMDS was also compared with three prominent international datasets to assess the extent of dataset overlap. Following this, the BMDS was encapsulated in a data dictionary, which was introduced in late 2010. The finalised BMDS contained 67 data fields. Forty-seven of these fields met a previously published criterion of 80% collectability across respondent trauma institutions; the majority of the remaining fields either could be collected without any change in resources, or could be calculated from other data fields in the BMDS. However, comparability with international registry datasets was poor. Only nine BMDS fields had corresponding, directly comparable fields in all the national and international-level registry datasets evaluated. A
A method for mapping corn using the US Geological Survey 1992 National Land Cover Dataset

USGS Publications Warehouse

Maxwell, S.K.; Nuckols, J.R.; Ward, M.H.

2006-01-01

Long-term exposure to elevated nitrate levels in community drinking water supplies has been associated with an elevated risk of several cancers including non-Hodgkin's lymphoma, colon cancer, and bladder cancer. To estimate human exposure to nitrate, specific crop type information is needed as fertilizer application rates vary widely by crop type. Corn requires the highest application of nitrogen fertilizer of crops grown in the Midwest US. We developed a method to refine the US Geological Survey National Land Cover Dataset (NLCD) (including map and original Landsat images) to distinguish corn from other crops. Overall average agreement between the resulting corn and other row crops class and ground reference data was 0.79 kappa coefficient with individual Landsat images ranging from 0.46 to 0.93 kappa. The highest accuracies occurred in Regions where corn was the single dominant crop (greater than 80.0%) and the crop vegetation conditions at the time of image acquisition were optimum for separation of corn from all other crops. Factors that resulted in lower accuracies included the accuracy of the NLCD map, accuracy of corn areal estimates, crop mixture, crop condition at the time of Landsat overpass, and Landsat scene anomalies.
A Compilation of Spatial Datasets to Support a Preliminary Assessment of Pesticides and Pesticide Use on Tribal Lands in Oklahoma

USGS Publications Warehouse

Mashburn, Shana L.; Winton, Kimberly T.

2010-01-01

This CD-ROM contains spatial datasets that describe natural and anthropogenic features and county-level estimates of agricultural pesticide use and pesticide data for surface-water, groundwater, and biological specimens in the state of Oklahoma. County-level estimates of pesticide use were compiled from the Pesticide National Synthesis Project of the U.S. Geological Survey, National Water-Quality Assessment Program. Pesticide data for surface water, groundwater, and biological specimens were compiled from U.S. Geological Survey National Water Information System database. These spatial datasets that describe natural and manmade features were compiled from several agencies and contain information collected by the U.S. Geological Survey. The U.S. Geological Survey datasets were not collected specifically for this compilation, but were previously collected for projects with various objectives. The spatial datasets were created by different agencies from sources with varied quality. As a result, features common to multiple layers may not overlay exactly. Users should check the metadata to determine proper use of these spatial datasets. These data were not checked for accuracy or completeness. If a question of accuracy or completeness arise, the user should contact the originator cited in the metadata.
An integrated pan-tropical biomass map using multiple reference datasets.

PubMed

Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

2016-04-01

We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. © 2015 John Wiley & Sons Ltd.
Northern Hemisphere winter storm track trends since 1959 derived from multiple reanalysis datasets

NASA Astrophysics Data System (ADS)

Chang, Edmund K. M.; Yau, Albert M. W.

2016-09-01

In this study, a comprehensive comparison of Northern Hemisphere winter storm track trend since 1959 derived from multiple reanalysis datasets and rawinsonde observations has been conducted. In addition, trends in terms of variance and cyclone track statistics have been compared. Previous studies, based largely on the National Center for Environmental Prediction-National Center for Atmospheric Research Reanalysis (NNR), have suggested that both the Pacific and Atlantic storm tracks have significantly intensified between the 1950s and 1990s. Comparison with trends derived from rawinsonde observations suggest that the trends derived from NNR are significantly biased high, while those from the European Center for Medium Range Weather Forecasts 40-year Reanalysis and the Japanese 55-year Reanalysis are much less biased but still too high. Those from the two twentieth century reanalysis datasets are most consistent with observations but may exhibit slight biases of opposite signs. Between 1959 and 2010, Pacific storm track activity has likely increased by 10 % or more, while Atlantic storm track activity has likely increased by <10 %. Our analysis suggests that trends in Pacific and Atlantic basin wide storm track activity prior to the 1950s derived from the two twentieth century reanalysis datasets are unlikely to be reliable due to changes in density of surface observations. Nevertheless, these datasets may provide useful information on interannual variability, especially over the Atlantic.
Validation of the Hospital Episode Statistics Outpatient Dataset in England.

PubMed

Thorn, Joanna C; Turner, Emma; Hounsome, Luke; Walsh, Eleanor; Donovan, Jenny L; Verne, Julia; Neal, David E; Hamdy, Freddie C; Martin, Richard M; Noble, Sian M

2016-02-01

The Hospital Episode Statistics (HES) dataset is a source of administrative 'big data' with potential for costing purposes in economic evaluations alongside clinical trials. This study assesses the validity of coverage in the HES outpatient dataset. Men who died of, or with, prostate cancer were selected from a prostate-cancer screening trial (CAP, Cluster randomised triAl of PSA testing for Prostate cancer). Details of visits that took place after 1/4/2003 to hospital outpatient departments for conditions related to prostate cancer were extracted from medical records (MR); these appointments were sought in the HES outpatient dataset based on date. The matching procedure was repeated for periods before and after 1/4/2008, when the HES outpatient dataset was accredited as a national statistic. 4922 outpatient appointments were extracted from MR for 370 men. 4088 appointments recorded in MR were identified in the HES outpatient dataset (83.1%; 95% confidence interval [CI] 82.0-84.1). For appointments occurring prior to 1/4/2008, 2195/2755 (79.7%; 95% CI 78.2-81.2) matches were observed, while 1893/2167 (87.4%; 95% CI 86.0-88.9) appointments occurring after 1/4/2008 were identified (p for difference <0.001). 215/370 men (58.1%) had at least one appointment in the MR review that was unmatched in HES, 155 men (41.9%) had all their appointments identified, and 20 men (5.4%) had no appointments identified in HES. The HES outpatient dataset appears reasonably valid for research, particularly following accreditation. The dataset may be a suitable alternative to collecting MR data from hospital notes within a trial, although caution should be exercised with data collected prior to accreditation.
WIND Toolkit Offshore Summary Dataset

DOE Office of Scientific and Technical Information (OSTI.GOV)

Draxl, Caroline; Musial, Walt; Scott, George

This dataset contains summary statistics for offshore wind resources for the continental United States derived from the Wind Integration National Datatset (WIND) Toolkit. These data are available in two formats: GDB - Compressed geodatabases containing statistical summaries aligned with lease blocks (aliquots) stored in a GIS format. These data are partitioned into Pacific, Atlantic, and Gulf resource regions. HDF5 - Statistical summaries of all points in the offshore Pacific, Atlantic, and Gulf offshore regions. These data are located on the original WIND Toolkit grid and have not been reassigned or downsampled to lease blocks. These data were developed under contractmore » by NREL for the Bureau of Oceanic Energy Management (BOEM).« less
A rapid approach for automated comparison of independently derived stream networks

USGS Publications Warehouse

Stanislawski, Larry V.; Buttenfield, Barbara P.; Doumbouya, Ariel T.

2015-01-01

This paper presents an improved coefficient of line correspondence (CLC) metric for automatically assessing the similarity of two different sets of linear features. Elevation-derived channels at 1:24,000 scale (24K) are generated from a weighted flow-accumulation model and compared to 24K National Hydrography Dataset (NHD) flowlines. The CLC process conflates two vector datasets through a raster line-density differencing approach that is faster and more reliable than earlier methods. Methods are tested on 30 subbasins distributed across different terrain and climate conditions of the conterminous United States. CLC values for the 30 subbasins indicate 44–83% of the features match between the two datasets, with the majority of the mismatching features comprised of first-order features. Relatively lower CLC values result from subbasins with less than about 1.5 degrees of slope. The primary difference between the two datasets may be explained by different data capture criteria. First-order, headwater tributaries derived from the flow-accumulation model are captured more comprehensively through drainage area and terrain conditions, whereas capture of headwater features in the NHD is cartographically constrained by tributary length. The addition of missing headwaters to the NHD, as guided by the elevation-derived channels, can substantially improve the scientific value of the NHD.
Geospatial database of estimates of groundwater discharge to streams in the Upper Colorado River Basin

USGS Publications Warehouse

Garcia, Adriana; Masbruch, Melissa D.; Susong, David D.

2014-01-01

The U.S. Geological Survey, as part of the Department of the Interior’s WaterSMART (Sustain and Manage America’s Resources for Tomorrow) initiative, compiled published estimates of groundwater discharge to streams in the Upper Colorado River Basin as a geospatial database. For the purpose of this report, groundwater discharge to streams is the baseflow portion of streamflow that includes contributions of groundwater from various flow paths. Reported estimates of groundwater discharge were assigned as attributes to stream reaches derived from the high-resolution National Hydrography Dataset. A total of 235 estimates of groundwater discharge to streams were compiled and included in the dataset. Feature class attributes of the geospatial database include groundwater discharge (acre-feet per year), method of estimation, citation abbreviation, defined reach, and 8-digit hydrologic unit code(s). Baseflow index (BFI) estimates of groundwater discharge were calculated using an existing streamflow characteristics dataset and were included as an attribute in the geospatial database. A comparison of the BFI estimates to the compiled estimates of groundwater discharge found that the BFI estimates were greater than the reported groundwater discharge estimates.
Fixing Dataset Search

NASA Technical Reports Server (NTRS)

Lynnes, Chris

2014-01-01

Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

ASSESSING THE ACCURACY OF NATIONAL LAND COVER DATASET AREA ESTIMATES AT MULTIPLE SPATIAL EXTENTS

EPA Science Inventory

Site specific accuracy assessments provide fine-scale evaluation of the thematic accuracy of land use/land cover (LULC) datasets; however, they provide little insight into LULC accuracy across varying spatial extents. Additionally, LULC data are typically used to describe lands...
2016 National Census of Ferry Operators [supporting datasets

DOT National Transportation Integrated Search

2018-03-06

The Bureau of Transportation Statistics (BTS) conducted the National Census of Ferry Operators (NCFO) from April through November 2016, collecting the operational characteristics of the 2015 calendar year ferry operations. : The supporting zip file c...
NCAR's Research Data Archive: OPeNDAP Access for Complex Datasets

NASA Astrophysics Data System (ADS)

Dattore, R.; Worley, S. J.

2014-12-01

Many datasets have complex structures including hundreds of parameters and numerous vertical levels, grid resolutions, and temporal products. Making these data accessible is a challenge for a data provider. OPeNDAP is powerful protocol for delivering in real-time multi-file datasets that can be ingested by many analysis and visualization tools, but for these datasets there are too many choices about how to aggregate. Simple aggregation schemes can fail to support, or at least make it very challenging, for many potential studies based on complex datasets. We address this issue by using a rich file content metadata collection to create a real-time customized OPeNDAP service to match the full suite of access possibilities for complex datasets. The Climate Forecast System Reanalysis (CFSR) and it's extension, the Climate Forecast System Version 2 (CFSv2) datasets produced by the National Centers for Environmental Prediction (NCEP) and hosted by the Research Data Archive (RDA) at the Computational and Information Systems Laboratory (CISL) at NCAR are examples of complex datasets that are difficult to aggregate with existing data server software. CFSR and CFSv2 contain 141 distinct parameters on 152 vertical levels, six grid resolutions and 36 products (analyses, n-hour forecasts, multi-hour averages, etc.) where not all parameter/level combinations are available at all grid resolution/product combinations. These data are archived in the RDA with the data structure provided by the producer; no additional re-organization or aggregation have been applied. Since 2011, users have been able to request customized subsets (e.g. - temporal, parameter, spatial) from the CFSR/CFSv2, which are processed in delayed-mode and then downloaded to a user's system. Until now, the complexity has made it difficult to provide real-time OPeNDAP access to the data. We have developed a service that leverages the already-existing subsetting interface and allows users to create a virtual dataset
National Transportation Atlas Databases : 2013

DOT National Transportation Integrated Search

2013-01-01

The National Transportation Atlas Databases 2013 (NTAD2013) is a set of nationwide geographic datasets of transportation facilities, transportation networks, associated infrastructure, and other political and administrative entities. These datasets i...
Comparison of Four Precipitation Forcing Datasets in Land Information System Simulations over the Continental U.S.

NASA Technical Reports Server (NTRS)

Case, Jonathan L.; Kumar, Sujay V.; Kuligowski, Robert J.; Langston, Carrie

2013-01-01

The NASA Short ]term Prediction Research and Transition (SPoRT) Center in Huntsville, AL is running a real ]time configuration of the NASA Land Information System (LIS) with the Noah land surface model (LSM). Output from the SPoRT ]LIS run is used to initialize land surface variables for local modeling applications at select National Weather Service (NWS) partner offices, and can be displayed in decision support systems for situational awareness and drought monitoring. The SPoRT ]LIS is run over a domain covering the southern and eastern United States, fully nested within the National Centers for Environmental Prediction Stage IV precipitation analysis grid, which provides precipitation forcing to the offline LIS ]Noah runs. The SPoRT Center seeks to expand the real ]time LIS domain to the entire Continental U.S. (CONUS); however, geographical limitations with the Stage IV analysis product have inhibited this expansion. Therefore, a goal of this study is to test alternative precipitation forcing datasets that can enable the LIS expansion by improving upon the current geographical limitations of the Stage IV product. The four precipitation forcing datasets that are inter ]compared on a 4 ]km resolution CONUS domain include the Stage IV, an experimental GOES quantitative precipitation estimate (QPE) from NESDIS/STAR, the National Mosaic and QPE (NMQ) product from the National Severe Storms Laboratory, and the North American Land Data Assimilation System phase 2 (NLDAS ]2) analyses. The NLDAS ]2 dataset is used as the control run, with each of the other three datasets considered experimental runs compared against the control. The regional strengths, weaknesses, and biases of each precipitation analysis are identified relative to the NLDAS ]2 control in terms of accumulated precipitation pattern and amount, and the impacts on the subsequent LSM spin ]up simulations. The ultimate goal is to identify an alternative precipitation forcing dataset that can best support an
FLUXNET2015 Dataset: Batteries included

NASA Astrophysics Data System (ADS)

Pastorello, G.; Papale, D.; Agarwal, D.; Trotta, C.; Chu, H.; Canfora, E.; Torn, M. S.; Baldocchi, D. D.

2016-12-01

The synthesis datasets have become one of the signature products of the FLUXNET global network. They are composed from contributions of individual site teams to regional networks, being then compiled into uniform data products - now used in a wide variety of research efforts: from plant-scale microbiology to global-scale climate change. The FLUXNET Marconi Dataset in 2000 was the first in the series, followed by the FLUXNET LaThuile Dataset in 2007, with significant additions of data products and coverage, solidifying the adoption of the datasets as a research tool. The FLUXNET2015 Dataset counts with another round of substantial improvements, including extended quality control processes and checks, use of downscaled reanalysis data for filling long gaps in micrometeorological variables, multiple methods for USTAR threshold estimation and flux partitioning, and uncertainty estimates - all of which accompanied by auxiliary flags. This "batteries included" approach provides a lot of information for someone who wants to explore the data (and the processing methods) in detail. This inevitably leads to a large number of data variables. Although dealing with all these variables might seem overwhelming at first, especially to someone looking at eddy covariance data for the first time, there is method to our madness. In this work we describe the data products and variables that are part of the FLUXNET2015 Dataset, and the rationale behind the organization of the dataset, covering the simplified version (labeled SUBSET), the complete version (labeled FULLSET), and the auxiliary products in the dataset.
Isfahan MISP Dataset

PubMed Central

Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

2017-01-01

An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled “biosigdata.com.” It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf). PMID:28487832
Isfahan MISP Dataset.

PubMed

Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

2017-01-01

An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).
Preprocessed Consortium for Neuropsychiatric Phenomics dataset.

PubMed

Gorgolewski, Krzysztof J; Durnez, Joke; Poldrack, Russell A

2017-01-01

Here we present preprocessed MRI data of 265 participants from the Consortium for Neuropsychiatric Phenomics (CNP) dataset. The preprocessed dataset includes minimally preprocessed data in the native, MNI and surface spaces accompanied with potential confound regressors, tissue probability masks, brain masks and transformations. In addition the preprocessed dataset includes unthresholded group level and single subject statistical maps from all tasks included in the original dataset. We hope that availability of this dataset will greatly accelerate research.
Overcoming boundaries of worldwide joint arthroplasty registers: the European Arthroplasty Register minimal dataset.

PubMed

Sadoghi, Patrick; Leithner, Andreas; Labek, Gerold

2013-09-01

Worldwide joint arthroplasty registers are instrumental to screen for complications or implant failures. In order to achieve comparable results a similar classification dataset is essential. The authors therefore present the European Federation of National Associations of Orthopaedics and Traumatology (EFORT) European Arthroplasty Register (EAR) minimal dataset for primary and revision joint arthroplasty. Main parameters include the following: date of operation, country, hospital ID-code, patient's name and prename, birthday, identification code of the implant, gender, diagnosis, preoperations, type of prosthesis (partial, total), side, cementation technique, use of antibiotics in the cement, surgical approach, and others specifically related to the affected joint. The authors believe that using this minimal dataset will improve the chance for a worldwide comparison of arthroplasty registers and ask future countries for implementation. Copyright © 2013 Elsevier Inc. All rights reserved.
Preliminary AirMSPI Datasets

Atmospheric Science Data Center

2018-02-26

... Datasets The data files available through this web page and ftp links are preliminary AIrMSPI datasets from recent campaigns. ... and geometric corrections. Caution should be used for science analysis. At a later date, more qualified versions will be made public. ...
Integrative Exploratory Analysis of Two or More Genomic Datasets.

PubMed

Meng, Chen; Culhane, Aedin

2016-01-01

Exploratory analysis is an essential step in the analysis of high throughput data. Multivariate approaches such as correspondence analysis (CA), principal component analysis, and multidimensional scaling are widely used in the exploratory analysis of single dataset. Modern biological studies often assay multiple types of biological molecules (e.g., mRNA, protein, phosphoproteins) on a same set of biological samples, thereby creating multiple different types of omics data or multiassay data. Integrative exploratory analysis of these multiple omics data is required to leverage the potential of multiple omics studies. In this chapter, we describe the application of co-inertia analysis (CIA; for analyzing two datasets) and multiple co-inertia analysis (MCIA; for three or more datasets) to address this problem. These methods are powerful yet simple multivariate approaches that represent samples using a lower number of variables, allowing a more easily identification of the correlated structure in and between multiple high dimensional datasets. Graphical representations can be employed to this purpose. In addition, the methods simultaneously project samples and variables (genes, proteins) onto the same lower dimensional space, so the most variant variables from each dataset can be selected and associated with samples, which can be further used to facilitate biological interpretation and pathway analysis. We applied CIA to explore the concordance between mRNA and protein expression in a panel of 60 tumor cell lines from the National Cancer Institute. In the same 60 cell lines, we used MCIA to perform a cross-platform comparison of mRNA gene expression profiles obtained on four different microarray platforms. Last, as an example of integrative analysis of multiassay or multi-omics data we analyzed transcriptomic, proteomic, and phosphoproteomic data from pluripotent (iPS) and embryonic stem (ES) cell lines.
Status and Preliminary Evaluation for Chinese Re-Analysis Datasets

NASA Astrophysics Data System (ADS)

bin, zhao; chunxiang, shi; tianbao, zhao; dong, si; jingwei, liu

2016-04-01

Based on operational T639L60 spectral model, combined with Hybird_GSI assimilation system by using meteorological observations including radiosondes, buoyes, satellites el al., a set of Chinese Re-Analysis (CRA) datasets is developing by Chinese National Meteorological Information Center (NMIC) of Chinese Meteorological Administration (CMA). The datasets are run at 30km (0.28°latitude / longitude) resolution which holds higher resolution than most of the existing reanalysis dataset. The reanalysis is done in an effort to enhance the accuracy of historical synoptic analysis and aid to find out detailed investigation of various weather and climate systems. The current status of reanalysis is in a stage of preliminary experimental analysis. One-year forecast data during Jun 2013 and May 2014 has been simulated and used in synoptic and climate evaluation. We first examine the model prediction ability with the new assimilation system, and find out that it represents significant improvement in Northern and Southern hemisphere, due to addition of new satellite data, compared with operational T639L60 model, the effect of upper-level prediction is improved obviously and overall prediction stability is enhanced. In climatological analysis, compared with ERA-40, NCEP/NCAR and NCEP/DOE reanalyses, the results show that surface temperature simulates a bit lower in land and higher over ocean, 850-hPa specific humidity reflects weakened anomaly and the zonal wind value anomaly is focus on equatorial tropics. Meanwhile, the reanalysis dataset shows good ability for various climate index, such as subtropical high index, ESMI (East-Asia subtropical Summer Monsoon Index) et al., especially for the Indian and western North Pacific monsoon index. Latter we will further improve the assimilation system and dynamical simulating performance, and obtain 40-years (1979-2018) reanalysis datasets. It will provide a more comprehensive analysis for synoptic and climate diagnosis.
Open University Learning Analytics dataset.

PubMed

Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

2017-11-28

Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.
Open University Learning Analytics dataset

PubMed Central

Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

2017-01-01

Learning Analytics focuses on the collection and analysis of learners’ data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students’ interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license. PMID:29182599
Development of South Australian-Victorian Prostate Cancer Health Outcomes Research Dataset.

PubMed

Ruseckaite, Rasa; Beckmann, Kerri; O'Callaghan, Michael; Roder, David; Moretti, Kim; Zalcberg, John; Millar, Jeremy; Evans, Sue

2016-01-22

Prostate cancer is the most commonly diagnosed and prevalent malignancy reported to Australian cancer registries, with numerous studies from single institutions summarizing patient outcomes at individual hospitals or States. In order to provide an overview of patterns of care of men with prostate cancer across multiple institutions in Australia, a specialized dataset was developed. This dataset, containing amalgamated data from South Australian and Victorian prostate cancer registries, is called the South Australian-Victorian Prostate Cancer Health Outcomes Research Dataset (SA-VIC PCHORD). A total of 13,598 de-identified records of men with prostate cancer diagnosed and consented between 2008 and 2013 in South Australia and Victoria were merged into the SA-VIC PCHORD. SA-VIC PCHORD contains detailed information about socio-demographic, diagnostic and treatment characteristics of patients with prostate cancer in South Australia and Victoria. Data from individual registries are available to researchers and can be accessed under individual data access policies in each State. The SA-VIC PCHORD will be used for numerous studies summarizing trends in diagnostic characteristics, survival and patterns of care in men with prostate cancer in Victoria and South Australia. It is expected that in the future the SA-VIC PCHORD will become a principal component of the recently developed bi-national Australian and New Zealand Prostate Cancer Outcomes Registry to collect and report patterns of care and standardised patient reported outcome measures of men nation-wide in Australia and New Zealand.
A unified high-resolution wind and solar dataset from a rapidly updating numerical weather prediction model

DOE PAGES

James, Eric P.; Benjamin, Stanley G.; Marquis, Melinda

2016-10-28

A new gridded dataset for wind and solar resource estimation over the contiguous United States has been derived from hourly updated 1-h forecasts from the National Oceanic and Atmospheric Administration High-Resolution Rapid Refresh (HRRR) 3-km model composited over a three-year period (approximately 22 000 forecast model runs). The unique dataset features hourly data assimilation, and provides physically consistent wind and solar estimates for the renewable energy industry. The wind resource dataset shows strong similarity to that previously provided by a Department of Energy-funded study, and it includes estimates in southern Canada and northern Mexico. The solar resource dataset represents anmore » initial step towards application-specific fields such as global horizontal and direct normal irradiance. This combined dataset will continue to be augmented with new forecast data from the advanced HRRR atmospheric/land-surface model.« less
EPA Office of Water (OW): 2002 Impaired Waters Baseline NHDPlus Indexed Dataset

EPA Pesticide Factsheets

This dataset consists of geospatial and attribute data identifying the spatial extent of state-reported impaired waters (EPA's Integrated Reporting categories 4a, 4b, 4c and 5)* available in EPA's Reach Address Database (RAD) at the time of extraction. For the 2002 baseline reporting year, EPA compiled state-submitted GIS data to create a seamless and nationally consistent picture of the Nation's impaired waters for measuring progress. EPA's Assessment and TMDL Tracking and Implementation System (ATTAINS) is a national compilation of states' 303(d) listings and TMDL development information, spanning several years of tracking over 40,000 impaired waters.
Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

NASA Astrophysics Data System (ADS)

Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

2017-12-01

Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.
Evaluation of reanalysis datasets against observational soil temperature data over China

NASA Astrophysics Data System (ADS)

Yang, Kai; Zhang, Jingyong

2018-01-01

Soil temperature is a key land surface variable, and is a potential predictor for seasonal climate anomalies and extremes. Using observational soil temperature data in China for 1981-2005, we evaluate four reanalysis datasets, the land surface reanalysis of the European Centre for Medium-Range Weather Forecasts (ERA-Interim/Land), the second modern-era retrospective analysis for research and applications (MERRA-2), the National Center for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR), and version 2 of the Global Land Data Assimilation System (GLDAS-2.0), with a focus on 40 cm soil layer. The results show that reanalysis data can mainly reproduce the spatial distributions of soil temperature in summer and winter, especially over the east of China, but generally underestimate their magnitudes. Owing to the influence of precipitation on soil temperature, the four datasets perform better in winter than in summer. The ERA-Interim/Land and GLDAS-2.0 produce spatial characteristics of the climatological mean that are similar to observations. The interannual variability of soil temperature is well reproduced by the ERA-Interim/Land dataset in summer and by the CFSR dataset in winter. The linear trend of soil temperature in summer is well rebuilt by reanalysis datasets. We demonstrate that soil heat fluxes in April-June and in winter are highly correlated with the soil temperature in summer and winter, respectively. Different estimations of surface energy balance components can contribute to different behaviors in reanalysis products in terms of estimating soil temperature. In addition, reanalysis datasets can mainly rebuild the northwest-southeast gradient of soil temperature memory over China.

Unified Ecoregions of Alaska: 2001

USGS Publications Warehouse

Nowacki, Gregory J.; Spencer, Page; Fleming, Michael; Brock, Terry; Jorgenson, Torre

2003-01-01

Major ecosystems have been mapped and described for the State of Alaska and nearby areas. Ecoregion units are based on newly available datasets and field experience of ecologists, biologists, geologists and regional experts. Recently derived datasets for Alaska included climate parameters, vegetation, surficial geology and topography. Additional datasets incorporated in the mapping process were lithology, soils, permafrost, hydrography, fire regime and glaciation. Thirty two units are mapped using a combination of the approaches of Bailey (hierarchial), and Omernick (integrated). The ecoregions are grouped into two higher levels using a 'tri-archy' based on climate parameters, vegetation response and disturbance processes. The ecoregions are described with text, photos and tables on the published map.
Commentary: A cautionary tale regarding use of the National Land Cover Dataset 1992

USGS Publications Warehouse

Thogmartin, Wayne E.; Gallant, Alisa L.; Knutson, Melinda G.; Fox, Timothy J.; Suarez, Manuel J.

2004-01-01

Digital land-cover data are among the most popular data sources used in ecological research and natural resource management. However, processes for accurate land-cover classification over large regions are still evolving. We identified inconsistencies in the National Land Cover Dataset 1992, the most current and available representation of land cover for the conterminous United States. We also report means to address these inconsistencies in a bird-habitat model. We used a Geographic Information System (GIS) to position a regular grid (or lattice) over the upper midwestern United States and summarized the proportion of individual land covers in each cell within the lattice. These proportions were then mapped back onto the lattice, and the resultant lattice was compared to satellite paths, state borders, and regional map classification units. We observed mapping inconsistencies at the borders between mapping regions, states, and Thematic Mapper (TM) mapping paths in the upper midwestern United States, particularly related to grass I and-herbaceous, emergent-herbaceous wetland, and small-grain land covers. We attributed these discrepancies to differences in image dates between mapping regions, suboptimal image dates for distinguishing certain land-cover types, lack of suitable ancillary data for improving discrimination for rare land covers, and possibly differences among image interpreters. To overcome these inconsistencies for the purpose of modeling regional populations of birds, we combined grassland-herbaceous and pasture-hay land-cover classes and excluded the use of emergent-herbaceous and small-grain land covers. We recommend that users of digital land-cover data conduct similar assessments for other regions before using these data for habitat evaluation. Further, caution is advised in using these data in the analysis of regional land-cover change because it is not likely that future digital land-cover maps will repeat the same problems, thus resulting in
Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

PubMed

Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

2009-07-01

Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
Comparison and validation of gridded precipitation datasets for Spain

NASA Astrophysics Data System (ADS)

Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

2016-04-01

In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely
Background qualitative analysis of the European Reference Life Cycle Database (ELCD) energy datasets - part I: fuel datasets.

PubMed

Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

2015-01-01

The aim of this study is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) fuel datasets. The revision is based on the data quality indicators described by the ILCD Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD fuel datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the fuel-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD fuel datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall DQR of databases.
A fully distributed implementation of mean annual streamflow regional regression equations

USGS Publications Warehouse

Verdin, K.L.; Worstell, B.

2008-01-01

Estimates of mean annual streamflow are needed for a variety of hydrologic assessments. Away from gage locations, regional regression equations that are a function of upstream area, precipitation, and temperature are commonly used. Geographic information systems technology has facilitated their use for projects, but traditional approaches using the polygon overlay operator have been too inefficient for national scale applications. As an alternative, the Elevation Derivatives for National Applications (EDNA) database was used as a framework for a fully distributed implementation of mean annual streamflow regional regression equations. The raster “flow accumulation” operator was used to efficiently achieve spatially continuous parameterization of the equations for every 30 m grid cell of the conterminous United States (U.S.). Results were confirmed by comparing with measured flows at stations of the Hydro-Climatic Data Network, and their applications value demonstrated in the development of a national geospatial hydropower assessment. Interactive tools at the EDNA website make possible the fast and efficient query of mean annual streamflow for any location in the conterminous U.S., providing a valuable complement to other national initiatives (StreamStats and the National Hydrography Dataset Plus).
Recent observations in the straits of the East/Japan Sea: A review of hydrography, currents and volume transports

NASA Astrophysics Data System (ADS)

Na, Hanna; Isoda, Yutaka; Kim, Kuh; Kim, Young Ho; Lyu, Sang Jin

2009-09-01

Recent observations of hydrography, currents and volume transports in the straits of the East/Japan Sea are reviewed. It is newly found that bottom cold water in the Korea/Tsushima Strait originating from the northern region of the East/Japan Sea appears not only in summer and autumn but also in winter. Intensive observations in the Korea/Tsushima Strait revealed two distinct cores of northeastward currents in the upper layer of the western and eastern channels. Mean volume transport through the Korea/Tsushima Strait is calculated as 2.5 ± 0.5 Sv from four-year direct and indirect measurements. As continuous monitoring has started in the Tsugaru and Soya Straits, understanding of temporal variability of currents and volume transports through the straits is in progress. For the first time, simultaneous time series of volume transports are available in the Korea/Tsushima and Tsugaru Straits during the winter of 1999-2000. Ouflow through the Tsugaru Strait accounts for about 70% of inflow through the Korea/Tsushima Strait for this period.
Design of an audio advertisement dataset

NASA Astrophysics Data System (ADS)

Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

2015-12-01

Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.
Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

PubMed

Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

2015-09-01

Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the
The National Map - Texas Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
The National Map - Florida Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
The National Map - Pennsylvania Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
The National Map - Delaware Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
The National Map - Missouri Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
The transition from winter to early spring in the eastern Weddell Sea, Antarctica: Plankton biomass and composition in relation to hydrography and nutrients

NASA Astrophysics Data System (ADS)

Scharek, Renate; Smetacek, Victor; Fahrbach, Eberhard; Gordon, Louis I.; Rohardt, Gerd; Moore, Stanley

1994-08-01

Hydrography and nutrient distribution in relation to plankton biomass and composition were studied during two transects (October and December) that crossed the ice-covered eastern Weddell Sea (approximately along the Greenwich Meridian) from the ice edge at 58°S to the continental margin at 70°30'S in 1986. Whereas the winter situation still prevailed under the intact ice cover during the October transect, extensive melting was underway by December. Despite the very low levels of plankton biomass encountered under sea ice in late winter (as low at 0.02 μg chlorophyll α 1 -1), distinct differences, particularly in diatom abundance and species composition, were present between the northern, eastward-flowing and southern, westward-flowing limbs of the Weddell Gyre. On the basis of species composition and physiological state of diatom assemblages, the higher biomass of the northern limb is attributed to entrainment of plankton-rich water from the ice-free Circumpolar Current rather than to in situ growth. The pelagic community characteristic of the region under the pack ice throughout the study was dominated by nanoflagellates, ciliates and heterotrophic dinoflagellates. Biomass of the latter groups ranged between 12 and 119% of that of autotrophs, and microscopic observations suggested that grazing pressure was heavy. This winter and early spring community resembled the regenerating communities of nutrient-limited waters. Break-up and melt of the ice cover in early December occurred simultaneously over an extensive area yet did not elicit biomass build-up, not even at the northern ice edge where favorable growth conditions appeared to prevail. Apparently most of the diatoms sinking into the water from the rich stocks developing in melting ice are grazed by protozoa and krill, hence do not contribute to water column blooms in this region. This situation contrasts with those reported from the western Weddell and Ross Sea ice edges where blooms of ice diatoms were
Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

PubMed

Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

2015-01-01

The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
CHAMBARA: The changing hydrography and man made biomass burning in Africa: a concept for earth observations from the International Space Station.

NASA Astrophysics Data System (ADS)

Muller, Christian

2010-05-01

In parallel to vegetation mapping exemplified by VEGETATION and spectral thematic instruments as MERIS, other important natural and man-made phenomena characterize the equatorial and low latitude regions region covered especially well by the International Space Station orbit. The agreement between the space agencies evolves now to a lifetime of the ISS up to 2025. Two themes can be proposed: hydrography and biomass burning. Hydrography has an extreme human importance as human life and agriculture depend on water, transport as well; also the hydroelectric energy which could be harnessed from the hydrological network is tremendous and would allow a sustainable development of the entire region. The CHAMBARA proposed concept differs from other satellite observation programmes in a sense that the images are taken either according either to pre-planned scientific campaigns controlled from an operation centre either according to real time unexpected events or emergencies. For example, biomass burning imaging campaigns are organised at the end of the dry season, while deltas and lake are monitored at specific points of the dry seasons and, if the cloud cover allows it, at periods of the wet season. In exceptional cases, as natural disasters or rapidly varying scenes, the operation centre will reschedule the programme and even ask for exceptional crew assistance. This project aims at this point to the European and African scientific communities specialized on Sub-Saharan Africa which is currently studied by several Belgian scientific institutions but its techniques could also be extended to the Amazon basin, tropical Asia and Oceania. The equipment proposed will be an advanced true colour rapid camera, external mounting is wished in order to free the optical window but nadir pointing should be the nominal position. An example of the concept is given by the serendipitous image ISS004E11 Central African observation (ISS photograph, May 16, 2002, centered near 8.6 degrees
78 FR 25095 - Notice of an Extension of an Information Collection (1028-0092)

Federal Register 2010, 2011, 2012, 2013, 2014

2013-04-29

... the development of The National Map and other national geospatial databases. In FY 2010, projects for... including elevation, orthoimagery, hydrography and other layers in the national databases may be possible. We will accept applications from State, local or tribal governments and academic institutions to...
Subsampling for dataset optimisation

NASA Astrophysics Data System (ADS)

Ließ, Mareike

2017-04-01

Soil-landscapes have formed by the interaction of soil-forming factors and pedogenic processes. In modelling these landscapes in their pedodiversity and the underlying processes, a representative unbiased dataset is required. This concerns model input as well as output data. However, very often big datasets are available which are highly heterogeneous and were gathered for various purposes, but not to model a particular process or data space. As a first step, the overall data space and/or landscape section to be modelled needs to be identified including considerations regarding scale and resolution. Then the available dataset needs to be optimised via subsampling to well represent this n-dimensional data space. A couple of well-known sampling designs may be adapted to suit this purpose. The overall approach follows three main strategies: (1) the data space may be condensed and de-correlated by a factor analysis to facilitate the subsampling process. (2) Different methods of pattern recognition serve to structure the n-dimensional data space to be modelled into units which then form the basis for the optimisation of an existing dataset through a sensible selection of samples. Along the way, data units for which there is currently insufficient soil data available may be identified. And (3) random samples from the n-dimensional data space may be replaced by similar samples from the available dataset. While being a presupposition to develop data-driven statistical models, this approach may also help to develop universal process models and identify limitations in existing models.
Decibel: The Relational Dataset Branching System

PubMed Central

Maddox, Michael; Goehring, David; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya; Deshpande, Amol

2017-01-01

As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs. PMID:28149668

Comparing National Water Model Inundation Predictions with Hydrodynamic Modeling

NASA Astrophysics Data System (ADS)

Egbert, R. J.; Shastry, A.; Aristizabal, F.; Luo, C.

2017-12-01

The National Water Model (NWM) simulates the hydrologic cycle and produces streamflow forecasts, runoff, and other variables for 2.7 million reaches along the National Hydrography Dataset for the continental United States. NWM applies Muskingum-Cunge channel routing which is based on the continuity equation. However, the momentum equation also needs to be considered to obtain better estimates of streamflow and stage in rivers especially for applications such as flood inundation mapping. Simulation Program for River NeTworks (SPRNT) is a fully dynamic model for large scale river networks that solves the full nonlinear Saint-Venant equations for 1D flow and stage height in river channel networks with non-uniform bathymetry. For the current work, the steady-state version of the SPRNT model was leveraged. An evaluation on SPRNT's and NWM's abilities to predict inundation was conducted for the record flood of Hurricane Matthew in October 2016 along the Neuse River in North Carolina. This event was known to have been influenced by backwater effects from the Hurricane's storm surge. Retrospective NWM discharge predictions were converted to stage using synthetic rating curves. The stages from both models were utilized to produce flood inundation maps using the Height Above Nearest Drainage (HAND) method which uses the local relative heights to provide a spatial representation of inundation depths. In order to validate the inundation produced by the models, Sentinel-1A synthetic aperture radar data in the VV and VH polarizations along with auxiliary data was used to produce a reference inundation map. A preliminary, binary comparison of the inundation maps to the reference, limited to the five HUC-12 areas of Goldsboro, NC, yielded that the flood inundation accuracies for NWM and SPRNT were 74.68% and 78.37%, respectively. The differences for all the relevant test statistics including accuracy, true positive rate, true negative rate, and positive predictive value were found
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

PubMed

Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers

PubMed Central

Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
A global distributed basin morphometric dataset

NASA Astrophysics Data System (ADS)

Shen, Xinyi; Anagnostou, Emmanouil N.; Mei, Yiwen; Hong, Yang

2017-01-01

Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack's law.
Soil chemistry in lithologically diverse datasets: the quartz dilution effect

USGS Publications Warehouse

Bern, Carleton R.

2009-01-01

National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.
Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

NASA Astrophysics Data System (ADS)

Besha, A. A.; Steele, C. M.; Fernald, A.

2014-12-01

Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.
The USGS role in mapping the nation's submerged lands

USGS Publications Warehouse

Schwab, Bill; Haines, John

2004-01-01

The seabed provides habitat for a diverse marine life having commercial, recreational, and intrinsic value. The habitat value of the seabed is largely a function of the geological structure and related geological, biological, oceanologic, and geochemical processes. Of equal importance, the nation's submerged lands contain energy and mineral resources and are utilized for the siting of offshore infrastructure and waste disposal. Seabed character and processes influence the safety and viability of offshore operations. Seabed and subseabed characterization is a prerequisite for the assessment, protection, and utilization of both living and non-living marine resources. A comprehensive program to characterize and understand the nation's submerged lands requires scientific expertise in the fields of geology, biology, hydrography, and oceanography. The U.S. Geological Survey (USGS) has long experience as the Federal agency charged with conducting geologic research and mapping in both coastal and offshore regions. The USGS Coastal and Marine Geology Program (CMGP) leads the nation in expertise related to characterization of seabed and subseabed geology, geological processes, seabed dynamics, and (in collaboration with the National Oceanic and Atmospheric Administration (NOAA) and international partners) habitat geoscience. Numerous USGS studies show that sea-floor geology and processes determine the character and distribution of biological habitats, control coastal evolution, influence the coastal response to storm events and human alterations, and determine the occurrence and concentration of natural resources.
Integrated remotely sensed datasets for disaster management

NASA Astrophysics Data System (ADS)

McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

2008-10-01

Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.
School Attendance Problems and Youth Psychopathology: Structural Cross-Lagged Regression Models in Three Longitudinal Datasets

PubMed Central

Wood, Jeffrey J.; Lynne, Sarah D.; Langer, David A.; Wood, Patricia A.; Clark, Shaunna L.; Eddy, J. Mark; Ialongo, Nicholas

2011-01-01

This study tests a model of reciprocal influences between absenteeism and youth psychopathology using three longitudinal datasets (Ns= 20745, 2311, and 671). Participants in 1st through 12th grades were interviewed annually or bi-annually. Measures of psychopathology include self-, parent-, and teacher-report questionnaires. Structural cross-lagged regression models were tested. In a nationally representative dataset (Add Health), middle school students with relatively greater absenteeism at study year 1 tended towards increased depression and conduct problems in study year 2, over and above the effects of autoregressive associations and demographic covariates. The opposite direction of effects was found for both middle and high school students. Analyses with two regionally representative datasets were also partially supportive. Longitudinal links were more evident in adolescence than in childhood. PMID:22188462
Bayesian correlated clustering to integrate multiple datasets

PubMed Central

Kirk, Paul; Griffin, Jim E.; Savage, Richard S.; Ghahramani, Zoubin; Wild, David L.

2012-01-01

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID
CPTAC Releases Largest-Ever Ovarian Cancer Proteome Dataset from Previously Genome Characterized Tumors | Office of Cancer Clinical Proteomics Research

Cancer.gov

National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) scientists have just released a comprehensive dataset of the proteomic analysis of high grade serous ovarian tumor samples, previously genomically analyzed by The Cancer Genome Atlas (TCGA). This is one of the largest public datasets covering the proteome, phosphoproteome and glycoproteome with complementary deep genomic sequencing data on the same tumor.
The National Map - Utah Transportation Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
The National Map - Washington-Idaho Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
ACCURACY OF THE 1992 NATIONAL LAND COVER DATASET AREA ESTIMATES: AN ANALYSIS AT MULTIPLE SPATIAL EXTENTS

EPA Science Inventory

Abstract for poster presentation:

Site-specific accuracy assessments evaluate fine-scale accuracy of land-use/land-cover(LULC) datasets but provide little insight into accuracy of area estimates of LULC

classes derived from sampling units of varying size. Additiona...
A new integrated and homogenized global monthly land surface air temperature dataset for the period since 1900

NASA Astrophysics Data System (ADS)

Xu, Wenhui; Li, Qingxiang; Jones, Phil; Wang, Xiaolan L.; Trewin, Blair; Yang, Su; Zhu, Chen; Zhai, Panmao; Wang, Jinfeng; Vincent, Lucie; Dai, Aiguo; Gao, Yun; Ding, Yihui

2018-04-01

A new dataset of integrated and homogenized monthly surface air temperature over global land for the period since 1900 [China Meteorological Administration global Land Surface Air Temperature (CMA-LSAT)] is developed. In total, 14 sources have been collected and integrated into the newly developed dataset, including three global (CRUTEM4, GHCN, and BEST), three regional and eight national sources. Duplicate stations are identified, and those with the higher priority are chosen or spliced. Then, a consistency test and a climate outlier test are conducted to ensure that each station series is quality controlled. Next, two steps are adopted to assure the homogeneity of the station series: (1) homogenized station series in existing national datasets (by National Meteorological Services) are directly integrated into the dataset without any changes (50% of all stations), and (2) the inhomogeneities are detected and adjusted for in the remaining data series using a penalized maximal t test (50% of all stations). Based on the dataset, we re-assess the temperature changes in global and regional areas compared with GHCN-V3 and CRUTEM4, as well as the temperature changes during the three periods of 1900-2014, 1979-2014 and 1998-2014. The best estimates of warming trends and there 95% confidence ranges for 1900-2014 are approximately 0.102 ± 0.006 °C/decade for the whole year, and 0.104 ± 0.009, 0.112 ± 0.007, 0.090 ± 0.006, and 0.092 ± 0.007 °C/decade for the DJF (December, January, February), MAM, JJA, and SON seasons, respectively. MAM saw the most significant warming trend in both 1900-2014 and 1979-2014. For an even shorter and more recent period (1998-2014), MAM, JJA and SON show similar warming trends, while DJF shows opposite trends. The results show that the ability of CMA-LAST for describing the global temperature changes is similar with other existing products, while there are some differences when describing regional temperature changes.
Handwritten mathematical symbols dataset.

PubMed

Chajri, Yassine; Bouikhalene, Belaid

2016-06-01

Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc.
The effects of spatial population dataset choice on estimates of population at risk of disease

PubMed Central

2011-01-01

Background The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR) of P. falciparum malaria as an example. Methods The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution), LandScan (~1 km), UNEP Global Population Databases (~5 km), and GPW3 (~5 km). More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets. Results The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was consistently more
77 FR 15052 - Dataset Workshop-U.S. Billion Dollar Disasters Dataset (1980-2011): Assessing Dataset Strengths...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-14

... and related methodology. Emphasis will be placed on dataset accuracy and time-dependent biases. Pathways to overcome accuracy and bias issues will be an important focus. Participants will consider...] Guidance for improving these methods. [cir] Recommendations for rectifying any known time-dependent biases...
Indirectly Estimating International Net Migration Flows by Age and Gender: The Community Demographic Model International Migration (CDM-IM) Dataset

PubMed Central

Nawrotzki, Raphael J.; Jiang, Leiwen

2015-01-01

Although data for the total number of international migrant flows is now available, no global dataset concerning demographic characteristics, such as the age and gender composition of migrant flows exists. This paper reports on the methods used to generate the CDM-IM dataset of age and gender specific profiles of bilateral net (not gross) migrant flows. We employ raw data from the United Nations Global Migration Database and estimate net migrant flows by age and gender between two time points around the year 2000, accounting for various demographic processes (fertility, mortality). The dataset contains information on 3,713 net migrant flows. Validation analyses against existing data sets and the historical, geopolitical context demonstrate that the CDM-IM dataset is of reasonably high quality. PMID:26692590
The Similarity and Appropriate Usage of Three Honey Bee (Hymenoptera: Apidae) Datasets for Longitudinal Studies.

PubMed

Highland, Steven; James, R R

2016-04-01

Honey bee (Apis mellifera L., Hymenoptera: Apidae) colonies have experienced profound fluctuations, especially declines, in the past few decades. Long-term datasets on honey bees are needed to identify the most important environmental and cultural factors associated with these changes. While a few such datasets exist, scientists have been hesitant to use some of these due to perceived shortcomings in the data. We compared data and trends for three datasets. Two come from the US Department of Agriculture's National Agricultural Statistics Service (NASS), Agricultural Statistics Board: one is the annual survey of honey-producing colonies from the Annual Bee and Honey program (ABH), and the other is colony counts from the Census of Agriculture conducted every five years. The third dataset we developed from the number of colonies registered annually by some states. We compared the long-term patterns of change in colony numbers among the datasets on a state-by-state basis. The three datasets often showed similar hive numbers and trends varied by state, with differences between datasets being greatest for those states receiving a large number of migratory colonies. Dataset comparisons provide a method to estimate the number of colonies in a state used for pollination versus honey production. Some states also had separate data for local and migratory colonies, allowing one to determine whether the migratory colonies were typically used for pollination or honey production. The Census of Agriculture should provide the most accurate long-term data on colony numbers, but only every five years. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

DataMed - an open source discovery index for finding biomedical datasets.

PubMed

Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua

2018-01-13

Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

NASA Astrophysics Data System (ADS)

Contractor, S.; Donat, M.; Alexander, L. V.

2017-12-01

Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.
NP-PAH Interaction Dataset

EPA Pesticide Factsheets

Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration were allowed to equilibrate with known mass of nanoparticles. The mixture was then ultracentrifuged and sampled for analysis. This dataset is associated with the following publication:Sahle-Demessie, E., A. Zhao, C. Han, B. Hann, and H. Grecsek. Interaction of engineered nanomaterials with hydrophobic organic pollutants.. Journal of Nanotechnology. Hindawi Publishing Corporation, New York, NY, USA, 27(28): 284003, (2016).
Handwritten mathematical symbols dataset

PubMed Central

Chajri, Yassine; Bouikhalene, Belaid

2016-01-01

Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc. PMID:27006975
The Montage architecture for grid-enabled science processing of large, distributed datasets

NASA Technical Reports Server (NTRS)

Jacob, Joseph C.; Katz, Daniel S .; Prince, Thomas; Berriman, Bruce G.; Good, John C.; Laity, Anastasia C.; Deelman, Ewa; Singh, Gurmeet; Su, Mei-Hui

2004-01-01

Montage is an Earth Science Technology Office (ESTO) Computational Technologies (CT) Round III Grand Challenge investigation to deploy a portable, compute-intensive, custom astronomical image mosaicking service for the National Virtual Observatory (NVO). Although Montage is developing a compute- and data-intensive service for the astronomy community, we are also helping to address a problem that spans both Earth and Space science, namely how to efficiently access and process multi-terabyte, distributed datasets. In both communities, the datasets are massive, and are stored in distributed archives that are, in most cases, remote from the available Computational resources. Therefore, state of the art computational grid technologies are a key element of the Montage portal architecture. This paper describes the aspects of the Montage design that are applicable to both the Earth and Space science communities.
Development of a consensus core dataset in juvenile dermatomyositis for clinical use to inform research

PubMed Central

McCann, Liza J; Pilkington, Clarissa A; Huber, Adam M; Ravelli, Angelo; Appelbe, Duncan; Kirkham, Jamie J; Williamson, Paula R; Aggarwal, Amita; Christopher-Stine, Lisa; Constantin, Tamas; Feldman, Brian M; Lundberg, Ingrid; Maillard, Sue; Mathiesen, Pernille; Murphy, Ruth; Pachman, Lauren M; Reed, Ann M; Rider, Lisa G; van Royen-Kerkof, Annet; Russo, Ricardo; Spinty, Stefan; Wedderburn, Lucy R

2018-01-01

Objectives This study aimed to develop consensus on an internationally agreed dataset for juvenile dermatomyositis (JDM), designed for clinical use, to enhance collaborative research and allow integration of data between centres. Methods A prototype dataset was developed through a formal process that included analysing items within existing databases of patients with idiopathic inflammatory myopathies. This template was used to aid a structured multistage consensus process. Exploiting Delphi methodology, two web-based questionnaires were distributed to healthcare professionals caring for patients with JDM identified through email distribution lists of international paediatric rheumatology and myositis research groups. A separate questionnaire was sent to parents of children with JDM and patients with JDM, identified through established research networks and patient support groups. The results of these parallel processes informed a face-to-face nominal group consensus meeting of international myositis experts, tasked with defining the content of the dataset. This developed dataset was tested in routine clinical practice before review and finalisation. Results A dataset containing 123 items was formulated with an accompanying glossary. Demographic and diagnostic data are contained within form A collected at baseline visit only, disease activity measures are included within form B collected at every visit and disease damage items within form C collected at baseline and annual visits thereafter. Conclusions Through a robust international process, a consensus dataset for JDM has been formulated that can capture disease activity and damage over time. This dataset can be incorporated into national and international collaborative efforts, including existing clinical research databases. PMID:29084729
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

PubMed Central

Wernisch, Lorenz

2017-01-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

PubMed

Gabasova, Evelina; Reid, John; Wernisch, Lorenz

2017-10-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
TRI Preliminary Dataset

EPA Pesticide Factsheets

The TRI preliminary dataset includes the most current TRI data available and reflects toxic chemical releases and pollution prevention activities that occurred at TRI facilities during the each calendar year.
Comparison of recent SnIa datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S., E-mail: jbueno@cc.uoi.gr, E-mail: nesseris@nbi.ku.dk, E-mail: leandros@uoi.gr

2009-11-01

We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w{sub 0}+w{sub 1}(1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U),more » (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w{sub 0},w{sub 1}) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample.« less
PERSIANN-CDR Daily Precipitation Dataset for Hydrologic Applications and Climate Studies.

NASA Astrophysics Data System (ADS)

Sorooshian, S.; Hsu, K. L.; Ashouri, H.; Braithwaite, D.; Nguyen, P.; Thorstensen, A. R.

2015-12-01

Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network - Climate Data Record (PERSIANN-CDR) is a newly developed and released dataset which covers more than 3 decades (01/01/1983 - 03/31/2015 to date) of daily precipitation estimations at 0.25° resolution for 60°S-60°N latitude band. PERSIANN-CDR is processed using the archive of the Gridded Satellite IRWIN CDR (GridSat-B1) from the International Satellite Cloud Climatology Project (ISCCP), and the Global Precipitation Climatology Project (GPCP) 2.5° monthly product for bias correction. The dataset has been released and made available for public access through NOAA's National Centers for Environmental Information (NCEI) (http://www1.ncdc.noaa.gov/pub/data/sds/cdr/CDRs/PERSIANN/Overview.pdf). PERSIANN-CDR has already shown its usefulness for a wide range of applications, including climate variability and change monitoring, hydrologic applications, and water resources system planning and management. This precipitation CDR data has also been used in studying the behavior of historical extreme precipitation events. Demonstration of PERSIANN-CDR data in detecting trends and variability of precipitation over the past 30 years, the potential usefulness of the dataset for evaluating climate model performance relevant to precipitation in retrospective mode, will be presented.
A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

PubMed Central

Gururaj, Anupama E.; Chen, Xiaoling; Pournejati, Saeid; Alter, George; Hersh, William R.; Demner-Fushman, Dina; Ohno-Machado, Lucila

2017-01-01

Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data PMID:29220453
Modelling surface-water depression storage in a Prairie Pothole Region

USGS Publications Warehouse

Hay, Lauren E.; Norton, Parker A.; Viger, Roland; Markstrom, Steven; Regan, R. Steven; Vanderhoof, Melanie

2018-01-01

In this study, the Precipitation-Runoff Modelling System (PRMS) was used to simulate changes in surface-water depression storage in the 1,126-km2 Upper Pipestem Creek basin located within the Prairie Pothole Region of North Dakota, USA. The Prairie Pothole Region is characterized by millions of small water bodies (or surface-water depressions) that provide numerous ecosystem services and are considered an important contribution to the hydrologic cycle. The Upper Pipestem PRMS model was extracted from the U.S. Geological Survey's (USGS) National Hydrologic Model (NHM), developed to support consistent hydrologic modelling across the conterminous United States. The Geospatial Fabric database, created for the USGS NHM, contains hydrologic model parameter values derived from datasets that characterize the physical features of the entire conterminous United States for 109,951 hydrologic response units. Each hydrologic response unit in the Geospatial Fabric was parameterized using aggregated surface-water depression area derived from the National Hydrography Dataset Plus, an integrated suite of application-ready geospatial datasets. This paper presents a calibration strategy for the Upper Pipestem PRMS model that uses normalized lake elevation measurements to calibrate the parameters influencing simulated fractional surface-water depression storage. Results indicate that inclusion of measurements that give an indication of the change in surface-water depression storage in the calibration procedure resulted in accurate changes in surface-water depression storage in the water balance. Regionalized parameterization of the USGS NHM will require a proxy for change in surface-storage to accurately parameterize surface-water depression storage within the USGS NHM.
The National Map - Lake Tahoe Area Pilot Project

USGS Publications Warehouse

,

2001-01-01

Governments depend on a common set of geographic base information as a tool for economic and community development, land and natural resource management, and health and safety services. Emergency management and defense operations rely on this information. Private industry, nongovernmental organizations, and individual citizens use the same geographic data. Geographic information underpins an increasingly large part of the Nation's economy. Available geographic data often have the following problems: * They do not align with each other because layers are frequently created or revised separately, * They do not match across administrative boundaries because each producing organization uses different methods and standards, and * They are not up to date because of the complexity and cost of revision. The U.S. Geological Survey (USGS) is developing The National Map to be a seamless, continuously maintained, and nationally consistent set of online, public domain, geographic base information to address these issues. The National Map will serve as a foundation for integrating, sharing, and using other data easily and consistently. In collaboration with other government agencies, the private sector, academia, and volunteer groups, the USGS will coordinate, integrate, and, where needed, produce and maintain base geographic data. The National Map will include digital orthorectified imagery; elevation data; vector data for hydrography, transportation, boundary, and structure features; geographic names; and land cover information. The data will be the source of revised paper topographic maps. Many technical and institutional issues must be resolved as The National Map is implemented. To begin the refinement of this new paradigm, pilot projects are being designed to identify and investigate these issues. The pilots are the foundation upon which future partnerships for data sharing and maintenance will be built.
[Spatial domain display for interference image dataset].

PubMed

Wang, Cai-Ling; Li, Yu-Shan; Liu, Xue-Bin; Hu, Bing-Liang; Jing, Juan-Juan; Wen, Jia

2011-11-01

The requirements of imaging interferometer visualization is imminent for the user of image interpretation and information extraction. However, the conventional researches on visualization only focus on the spectral image dataset in spectral domain. Hence, the quick show of interference spectral image dataset display is one of the nodes in interference image processing. The conventional visualization of interference dataset chooses classical spectral image dataset display method after Fourier transformation. In the present paper, the problem of quick view of interferometer imager in image domain is addressed and the algorithm is proposed which simplifies the matter. The Fourier transformation is an obstacle since its computation time is very large and the complexion would be even deteriorated with the size of dataset increasing. The algorithm proposed, named interference weighted envelopes, makes the dataset divorced from transformation. The authors choose three interference weighted envelopes respectively based on the Fourier transformation, features of interference data and human visual system. After comparing the proposed with the conventional methods, the results show the huge difference in display time.
Internationally coordinated glacier monitoring: strategy and datasets

NASA Astrophysics Data System (ADS)

Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

2014-05-01

Internationally coordinated monitoring of long-term glacier changes provide key indicator data about global climate change and began in the year 1894 as an internationally coordinated effort to establish standardized observations. Today, world-wide monitoring of glaciers and ice caps is embedded within the Global Climate Observing System (GCOS) in support of the United Nations Framework Convention on Climate Change (UNFCCC) as an important Essential Climate Variable (ECV). The Global Terrestrial Network for Glaciers (GTN-G) was established in 1999 with the task of coordinating measurements and to ensure the continuous development and adaptation of the international strategies to the long-term needs of users in science and policy. The basic monitoring principles must be relevant, feasible, comprehensive and understandable to a wider scientific community as well as to policy makers and the general public. Data access has to be free and unrestricted, the quality of the standardized and calibrated data must be high and a combination of detailed process studies at selected field sites with global coverage by satellite remote sensing is envisaged. Recently a GTN-G Steering Committee was established to guide and advise the operational bodies responsible for the international glacier monitoring, which are the World Glacier Monitoring Service (WGMS), the US National Snow and Ice Data Center (NSIDC), and the Global Land Ice Measurements from Space (GLIMS) initiative. Several online databases containing a wealth of diverse data types having different levels of detail and global coverage provide fast access to continuously updated information on glacier fluctuation and inventory data. For world-wide inventories, data are now available through (a) the World Glacier Inventory containing tabular information of about 130,000 glaciers covering an area of around 240,000 km2, (b) the GLIMS-database containing digital outlines of around 118,000 glaciers with different time stamps and
Development of a consensus core dataset in juvenile dermatomyositis for clinical use to inform research.

PubMed

McCann, Liza J; Pilkington, Clarissa A; Huber, Adam M; Ravelli, Angelo; Appelbe, Duncan; Kirkham, Jamie J; Williamson, Paula R; Aggarwal, Amita; Christopher-Stine, Lisa; Constantin, Tamas; Feldman, Brian M; Lundberg, Ingrid; Maillard, Sue; Mathiesen, Pernille; Murphy, Ruth; Pachman, Lauren M; Reed, Ann M; Rider, Lisa G; van Royen-Kerkof, Annet; Russo, Ricardo; Spinty, Stefan; Wedderburn, Lucy R; Beresford, Michael W

2018-02-01

This study aimed to develop consensus on an internationally agreed dataset for juvenile dermatomyositis (JDM), designed for clinical use, to enhance collaborative research and allow integration of data between centres. A prototype dataset was developed through a formal process that included analysing items within existing databases of patients with idiopathic inflammatory myopathies. This template was used to aid a structured multistage consensus process. Exploiting Delphi methodology, two web-based questionnaires were distributed to healthcare professionals caring for patients with JDM identified through email distribution lists of international paediatric rheumatology and myositis research groups. A separate questionnaire was sent to parents of children with JDM and patients with JDM, identified through established research networks and patient support groups. The results of these parallel processes informed a face-to-face nominal group consensus meeting of international myositis experts, tasked with defining the content of the dataset. This developed dataset was tested in routine clinical practice before review and finalisation. A dataset containing 123 items was formulated with an accompanying glossary. Demographic and diagnostic data are contained within form A collected at baseline visit only, disease activity measures are included within form B collected at every visit and disease damage items within form C collected at baseline and annual visits thereafter. Through a robust international process, a consensus dataset for JDM has been formulated that can capture disease activity and damage over time. This dataset can be incorporated into national and international collaborative efforts, including existing clinical research databases. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
05/04/2018: Articles citing Ag Data Commons datasets | National

Science.gov Websites

Agricultural Library Skip to main content Home National Agricultural Library United States trademark of Dries Buytaert. NAL Home | USDA.gov | Agricultural Research Service | Plain Language | FOIA
A case study of data integration for aquatic resources using semantic web technologies

USGS Publications Warehouse

Gordon, Janice M.; Chkhenkeli, Nina; Govoni, David L.; Lightsom, Frances L.; Ostroff, Andrea C.; Schweitzer, Peter N.; Thongsavanh, Phethala; Varanka, Dalia E.; Zednik, Stephan

2015-01-01

Use cases, information modeling, and linked data techniques are Semantic Web technologies used to develop a prototype system that integrates scientific observations from four independent USGS and cooperator data systems. The techniques were tested with a use case goal of creating a data set for use in exploring potential relationships among freshwater fish populations and environmental factors. The resulting prototype extracts data from the BioData Retrieval System, the Multistate Aquatic Resource Information System, the National Geochemical Survey, and the National Hydrography Dataset. A prototype user interface allows a scientist to select observations from these data systems and combine them into a single data set in RDF format that includes explicitly defined relationships and data definitions. The project was funded by the USGS Community for Data Integration and undertaken by the Community for Data Integration Semantic Web Working Group in order to demonstrate use of Semantic Web technologies by scientists. This allows scientists to simultaneously explore data that are available in multiple, disparate systems beyond those they traditionally have used.
Seasonal evaluation of evapotranspiration fluxes from MODIS satellite and mesoscale model downscaled global reanalysis datasets

NASA Astrophysics Data System (ADS)

Srivastava, Prashant K.; Han, Dawei; Islam, Tanvir; Petropoulos, George P.; Gupta, Manika; Dai, Qiang

2016-04-01

Reference evapotranspiration (ETo) is an important variable in hydrological modeling, which is not always available, especially for ungauged catchments. Satellite data, such as those available from the MODerate Resolution Imaging Spectroradiometer (MODIS), and global datasets via the European Centre for Medium Range Weather Forecasts (ECMWF) reanalysis (ERA) interim and National Centers for Environmental Prediction (NCEP) reanalysis are important sources of information for ETo. This study explored the seasonal performances of MODIS (MOD16) and Weather Research and Forecasting (WRF) model downscaled global reanalysis datasets, such as ERA interim and NCEP-derived ETo, against ground-based datasets. Overall, on the basis of the statistical metrics computed, ETo derived from ERA interim and MODIS were more accurate in comparison to the estimates from NCEP for all the seasons. The pooled datasets also revealed a similar performance to the seasonal assessment with higher agreement for the ERA interim (r = 0.96, RMSE = 2.76 mm/8 days; bias = 0.24 mm/8 days), followed by MODIS (r = 0.95, RMSE = 7.66 mm/8 days; bias = -7.17 mm/8 days) and NCEP (r = 0.76, RMSE = 11.81 mm/8 days; bias = -10.20 mm/8 days). The only limitation with downscaling ERA interim reanalysis datasets using WRF is that it is time-consuming in contrast to the readily available MODIS operational product for use in mesoscale studies and practical applications.

Creation of digital contours that approach the characteristics of cartographic contours

USGS Publications Warehouse

Tyler, Dean J.; Greenlee, Susan K.

2012-01-01

The capability to easily create digital contours using commercial off-the-shelf (COTS) software has existed for decades. Out-of-the-box raw contours are suitable for many scientific applications without pre- or post-processing; however, cartographic applications typically require additional improvements. For example, raw contours generally require smoothing before placement on a map. Cartographic contours must also conform to certain spatial/logical rules; for example, contours may not cross waterbodies. The objective was to create contours that match as closely as possible the cartographic contours produced by manual methods on the 1:24,000-scale, 7.5-minute Topographic Map series. This report outlines the basic approach, describes a variety of problems that were encountered, and discusses solutions. Many of the challenges described herein were the result of imperfect input raster elevation data and the requirement to have the contours integrated with hydrographic features from the National Hydrography Dataset (NHD).
U.S. Datasets

Cancer.gov

Datasets for U.S. mortality, U.S. populations, standard populations, county attributes, and expected survival. Plus SEER-linked databases (SEER-Medicare, SEER-Medicare Health Outcomes Survey [SEER-MHOS], SEER-Consumer Assessment of Healthcare Providers and Systems [SEER-CAHPS]).
Dataset of Scientific Inquiry Learning Environment

ERIC Educational Resources Information Center

Ting, Choo-Yee; Ho, Chiung Ching

2015-01-01

This paper presents the dataset collected from student interactions with INQPRO, a computer-based scientific inquiry learning environment. The dataset contains records of 100 students and is divided into two portions. The first portion comprises (1) "raw log data", capturing the student's name, interfaces visited, the interface…
Simulation of Smart Home Activity Datasets

PubMed Central

Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

2015-01-01

A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation. PMID:26087371
Simulation of Smart Home Activity Datasets.

PubMed

Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

2015-06-16

A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.
The Great Lakes Hydrography Dataset: Consistent, binational watersheds for the Laurentian Great Lakes Basin

EPA Science Inventory

Ecosystem-based management of the Laurentian Great Lakes, which spans both the United States and Canada, is hampered by the lack of consistent binational watersheds for the entire Basin. Using comparable data sources and consistent methods we developed spatially equivalent waters...
Providing Geographic Datasets as Linked Data in Sdi

NASA Astrophysics Data System (ADS)

Hietanen, E.; Lehto, L.; Latvala, P.

2016-06-01

In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
Do the psychosocial risks associated with television viewing increase mortality? Evidence from the 2008 General Social Survey-National Death Index dataset.

PubMed

Muennig, Peter; Rosen, Zohn; Johnson, Gretchen

2013-06-01

Television viewing is associated with an increased risk of mortality, which could be caused by a sedentary lifestyle, the content of television programming (e.g., cigarette product placement or stress-inducing content), or both. We examined the relationship between self-reported hours of television viewing and mortality risk over 30 years in a representative sample of the American adult population using the 2008 General Social Survey-National Death Index dataset. We also explored the intervening variable effect of various emotional states (e.g., happiness) and beliefs (e.g., trust in government) of the relationship between television viewing and mortality. We find that, for each additional hour of viewing, mortality risks increased 4%. Given the mean duration of television viewing in our sample, this amounted to about 1.2 years of life expectancy in the United States. This association was tempered by a number of potential psychosocial mediators, including self-reported measures of happiness, social capital, or confidence in institutions. Although none of these were clinically significant, the combined mediation power was statistically significant (P < .001). Television viewing among healthy adults is correlated with premature mortality in a nationally representative sample of U.S. adults, and this association may be partially mediated by programming content related to beliefs or affective states. However, this mediation effect is the result of many small changes in psychosocial states rather than large effects from a few factors. Copyright © 2013 Elsevier Inc. All rights reserved.
Do the psychosocial risks associated with television viewing increase mortality? Evidence from the 2008 General Social Survey-National Death Index Dataset

PubMed Central

Rosen, Zohn; Johnson, Gretchen

2013-01-01

Background Television viewing is associated with an increased risk of mortality, which could be caused by a sedentary lifestyle, the content of television programming (e.g., cigarette product placement or stress-inducing content), or both. Methods We examined the relationship between self-reported hours of television viewing and mortality risk over 30 years in a representative sample of the American adult population using the 2008 General Social Survey-National Death Index dataset. We also explored the intervening variable effect of various emotional states (e.g., happiness) and beliefs (e.g., trust in government) of the relationship between television viewing and mortality. Results We find that for each additional hour of viewing, mortality risks increased 4%. Given the mean duration of television viewing in our sample, this amounted to about 1.2 years of life expectancy in the US. This association was tempered by a number of potential psychosocial mediators, including self-reported measures of happiness, social capital, or confidence in institutions. While none of these were clinically significant, the combined mediation power was statistically significant (p < 0.001). Conclusions Television viewing among healthy adults is correlated with premature mortality in a nationally-representative sample of US adults, and this association may be partially mediated by programming content related to beliefs or affective states. However, this mediation effect is the result of many small changes in psychosocial states rather than large effects from a few factors. PMID:23683712
The Optimum Dataset method - examples of the application

NASA Astrophysics Data System (ADS)

Błaszczak-Bąk, Wioleta; Sobieraj-Żłobińska, Anna; Wieczorek, Beata

2018-01-01

Data reduction is a procedure to decrease the dataset in order to make their analysis more effective and easier. Reduction of the dataset is an issue that requires proper planning, so after reduction it meets all the user's expectations. Evidently, it is better if the result is an optimal solution in terms of adopted criteria. Within reduction methods, which provide the optimal solution there is the Optimum Dataset method (OptD) proposed by Błaszczak-Bąk (2016). The paper presents the application of this method for different datasets from LiDAR and the possibility of using the method for various purposes of the study. The following reduced datasets were presented: (a) measurement of Sielska street in Olsztyn (Airbrone Laser Scanning data - ALS data), (b) measurement of the bas-relief that is on the building in Gdańsk (Terrestrial Laser Scanning data - TLS data), (c) dataset from Biebrza river measurment (TLS data).
The ISLSCP initiative I global datasets: Surface boundary conditions and atmospheric forcings for land-atmosphere studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sellers, P.J.; Collatz, J.; Koster, R.

1996-09-01

A comprehensive series of global datasets for land-atmosphere models has been collected, formatted to a common grid, and released on a set of CD-ROMs. This paper describes the motivation for and the contents of the dataset. In June of 1992, an interdisciplinary earth science workshop was convened in Columbia, Maryland, to assess progress in land-atmosphere research, specifically in the areas of models, satellite data algorithms, and field experiments. At the workshop, representatives of the land-atmosphere modeling community defined a need for global datasets to prescribe boundary conditions, initialize state variables, and provide near-surface meteorological and radiative forcings for their models.more » The International Satellite Land Surface Climatology Project (ISLSCP), a part of the Global Energy and Water Cycle Experiment, worked with the Distributed Active Archive Center of the National Aeronautics and Space Administration Goddard Space Flight Center to bring the required datasets together in a usable format. The data have since been released on a collection of CD-ROMs. The datasets on the CD-ROMs are grouped under the following headings: vegetation; hydrology and soils; snow, ice, and oceans; radiation and clouds; and near-surface meteorology. All datasets cover the period 1987-88, and all but a few are spatially continuous over the earth`s land surface. All have been mapped to a common 1{degree} x 1{degree} equal-angle grid. The temporal frequency for most of the datasets is monthly. A few of the near-surface meteorological parameters are available both as six-hourly values and as monthly means. 26 refs., 8 figs., 2 tabs.« less
The Transition of NASA EOS Datasets to WFO Operations: A Model for Future Technology Transfer

NASA Technical Reports Server (NTRS)

Darden, C.; Burks, J.; Jedlovec, G.; Haines, S.

2007-01-01

The collocation of a National Weather Service (NWS) Forecast Office with atmospheric scientists from NASA/Marshall Space Flight Center (MSFC) in Huntsville, Alabama has afforded a unique opportunity for science sharing and technology transfer. Specifically, the NWS office in Huntsville has interacted closely with research scientists within the SPORT (Short-term Prediction and Research and Transition) Center at MSFC. One significant technology transfer that has reaped dividends is the transition of unique NASA EOS polar orbiting datasets into NWS field operations. NWS forecasters primarily rely on the AWIPS (Advanced Weather Information and Processing System) decision support system for their day to day forecast and warning decision making. Unfortunately, the transition of data from operational polar orbiters or low inclination orbiting satellites into AWIPS has been relatively slow due to a variety of reasons. The ability to integrate these high resolution NASA datasets into operations has yielded several benefits. The MODIS (MODerate-resolution Imaging Spectrometer ) instrument flying on the Aqua and Terra satellites provides a broad spectrum of multispectral observations at resolutions as fine as 250m. Forecasters routinely utilize these datasets to locate fine lines, boundaries, smoke plumes, locations of fog or haze fields, and other mesoscale features. In addition, these important datasets have been transitioned to other WFOs for a variety of local uses. For instance, WFO Great Falls Montana utilizes the MODIS snow cover product for hydrologic planning purposes while several coastal offices utilize the output from the MODIS and AMSR-E instruments to supplement observations in the data sparse regions of the Gulf of Mexico and western Atlantic. In the short term, these datasets have benefited local WFOs in a variety of ways. In the longer term, the process by which these unique datasets were successfully transitioned to operations will benefit the planning and
Relevancy Ranking of Satellite Dataset Search Results

NASA Technical Reports Server (NTRS)

Lynnes, Christopher; Quinn, Patrick; Norton, James

2017-01-01

As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
Development of large scale riverine terrain-bathymetry dataset by integrating NHDPlus HR with NED,CoNED and HAND data

NASA Astrophysics Data System (ADS)

Li, Z.; Clark, E. P.

2017-12-01

Large scale and fine resolution riverine bathymetry data is critical for flood inundation modelingbut not available over the continental United States (CONUS). Previously we implementedbankfull hydraulic geometry based approaches to simulate bathymetry for individual riversusing NHDPlus v2.1 data and 10 m National Elevation Dataset (NED). USGS has recentlydeveloped High Resolution NHD data (NHDPlus HR Beta) (USGS, 2017), and thisenhanced dataset has a significant improvement on its spatial correspondence with 10 m DEM.In this study, we used this high resolution data, specifically NHDFlowline and NHDArea,to create bathymetry/terrain for CONUS river channels and floodplains. A software packageNHDPlus Inundation Modeler v5.0 Beta was developed for this project as an Esri ArcGIShydrological analysis extension. With the updated tools, raw 10 m DEM was first hydrologicallytreated to remove artificial blockages (e.g., overpasses, bridges and eve roadways, etc.) usinglow pass moving window filters. Cross sections were then automatically constructed along eachflowline to extract elevation from the hydrologically treated DEM. In this study, river channelshapes were approximated using quadratic curves to reduce uncertainties from commonly usedtrapezoids. We calculated underneath water channel elevation at each cross section samplingpoint using bankfull channel dimensions that were estimated from physiographicprovince/division based regression equations (Bieger et al. 2015). These elevation points werethen interpolated to generate bathymetry raster. The simulated bathymetry raster wasintegrated with USGS NED and Coastal National Elevation Database (CoNED) (whereveravailable) to make seamless terrain-bathymetry dataset. Channel bathymetry was alsointegrated to the HAND (Height above Nearest Drainage) dataset to improve large scaleinundation modeling. The generated terrain-bathymetry was processed at WatershedBoundary Dataset Hydrologic Unit 4 (WBDHU4) level.
Datasets collected in general practice: an international comparison using the example of obesity.

PubMed

Sturgiss, Elizabeth; van Boven, Kees

2018-06-04

International datasets from general practice enable the comparison of how conditions are managed within consultations in different primary healthcare settings. The Australian Bettering the Evaluation and Care of Health (BEACH) and TransHIS from the Netherlands collect in-consultation general practice data that have been used extensively to inform local policy and practice. Obesity is a global health issue with different countries applying varying approaches to management. The objective of the present paper is to compare the primary care management of obesity in Australia and the Netherlands using data collected from consultations. Despite the different prevalence in obesity in the two countries, the number of patients per 1000 patient-years seen with obesity is similar. Patients in Australia with obesity are referred to allied health practitioners more often than Dutch patients. Without quality general practice data, primary care researchers will not have data about the management of conditions within consultations. We use obesity to highlight the strengths of these general practice data sources and to compare their differences. What is known about the topic? Australia had one of the longest-running consecutive datasets about general practice activity in the world, but it has recently lost government funding. The Netherlands has a longitudinal general practice dataset of information collected within consultations since 1985. What does this paper add? We discuss the benefits of general practice-collected data in two countries. Using obesity as a case example, we compare management in general practice between Australia and the Netherlands. This type of analysis should start all international collaborations of primary care management of any health condition. Having a national general practice dataset allows international comparisons of the management of conditions with primary care. Without a current, quality general practice dataset, primary care researchers will not
Changes in Ocean Heat, Carbon Content, and Ventilation: A Review of the First Decade of GO-SHIP Global Repeat Hydrography.

PubMed

Talley, L D; Feely, R A; Sloyan, B M; Wanninkhof, R; Baringer, M O; Bullister, J L; Carlson, C A; Doney, S C; Fine, R A; Firing, E; Gruber, N; Hansell, D A; Ishii, M; Johnson, G C; Katsumata, K; Key, R M; Kramp, M; Langdon, C; Macdonald, A M; Mathis, J T; McDonagh, E L; Mecking, S; Millero, F J; Mordy, C W; Nakano, T; Sabine, C L; Smethie, W M; Swift, J H; Tanhua, T; Thurnherr, A M; Warner, M J; Zhang, J-Z

2016-01-01

Global ship-based programs, with highly accurate, full water column physical and biogeochemical observations repeated decadally since the 1970s, provide a crucial resource for documenting ocean change. The ocean, a central component of Earth's climate system, is taking up most of Earth's excess anthropogenic heat, with about 19% of this excess in the abyssal ocean beneath 2,000 m, dominated by Southern Ocean warming. The ocean also has taken up about 27% of anthropogenic carbon, resulting in acidification of the upper ocean. Increased stratification has resulted in a decline in oxygen and increase in nutrients in the Northern Hemisphere thermocline and an expansion of tropical oxygen minimum zones. Southern Hemisphere thermocline oxygen increased in the 2000s owing to stronger wind forcing and ventilation. The most recent decade of global hydrography has mapped dissolved organic carbon, a large, bioactive reservoir, for the first time and quantified its contribution to export production (∼20%) and deep-ocean oxygen utilization. Ship-based measurements also show that vertical diffusivity increases from a minimum in the thermocline to a maximum within the bottom 1,500 m, shifting our physical paradigm of the ocean's overturning circulation.
A dataset on tail risk of commodities markets.

PubMed

Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

2017-12-01

This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.
Multiresolution comparison of precipitation datasets for large-scale models

NASA Astrophysics Data System (ADS)

Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

2014-12-01

Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.
Social voting advice applications-definitions, challenges, datasets and evaluation.

PubMed

Katakis, Ioannis; Tsapatsoulis, Nicolas; Mendez, Fernando; Triga, Vasiliki; Djouvas, Constantinos

2014-07-01

Voting advice applications (VAAs) are online tools that have become increasingly popular and purportedly aid users in deciding which party/candidate to vote for during an election. In this paper we present an innovation to current VAA design which is based on the introduction of a social network element. We refer to this new type of online tool as a social voting advice application (SVAA). SVAAs extend VAAs by providing (a) community-based recommendations, (b) comparison of users' political opinions, and (c) a channel of user communication. In addition, SVAAs enriched with data mining modules, can operate as citizen sensors recording the sentiment of the electorate on issues and candidates. Drawing on VAA datasets generated by the Preference Matcher research consortium, we evaluate the results of the first VAA-Choose4Greece-which incorporated social voting features and was launched during the landmark Greek national elections of 2012. We demonstrate how an SVAA can provide community based features and, at the same time, serve as a citizen sensor. Evaluation of the proposed techniques is realized on a series of datasets collected from various VAAs, including Choose4Greece. The collection is made available online in order to promote research in the field.
Control Measure Dataset

EPA Pesticide Factsheets

The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air pollution emissions from a range of regulated source types, whether directly through the use of technical measures, or indirectly through economic or other measures.

Developing a provisional, international minimal dataset for Juvenile Dermatomyositis: for use in clinical practice to inform research.

PubMed

McCann, Liza J; Arnold, Katie; Pilkington, Clarissa A; Huber, Adam M; Ravelli, Angelo; Beard, Laura; Beresford, Michael W; Wedderburn, Lucy R

2014-01-01

Juvenile dermatomyositis (JDM) is a rare but severe autoimmune inflammatory myositis of childhood. International collaboration is essential in order to undertake clinical trials, understand the disease and improve long-term outcome. The aim of this study was to propose from existing collaborative initiatives a preliminary minimal dataset for JDM. This will form the basis of the future development of an international consensus-approved minimum core dataset to be used both in clinical care and inform research, allowing integration of data between centres. A working group of internationally-representative JDM experts was formed to develop a provisional minimal dataset. Clinical and laboratory variables contained within current national and international collaborative databases of patients with idiopathic inflammatory myopathies were scrutinised. Judgements were informed by published literature and a more detailed analysis of the Juvenile Dermatomyositis Cohort Biomarker Study and Repository, UK and Ireland. A provisional minimal JDM dataset has been produced, with an associated glossary of definitions. The provisional minimal dataset will request information at time of patient diagnosis and during on-going prospective follow up. At time of patient diagnosis, information will be requested on patient demographics, diagnostic criteria and treatments given prior to diagnosis. During on-going prospective follow-up, variables will include the presence of active muscle or skin disease, major organ involvement or constitutional symptoms, investigations, treatment, physician global assessments and patient reported outcome measures. An internationally agreed minimal dataset has the potential to significantly enhance collaboration, allow effective communication between groups, provide a minimal standard of care and enable analysis of the largest possible number of JDM patients to provide a greater understanding of this disease. This preliminary dataset can now be developed into
Developing a provisional, international Minimal Dataset for Juvenile Dermatomyositis: for use in clinical practice to inform research

PubMed Central

2014-01-01

Background Juvenile dermatomyositis (JDM) is a rare but severe autoimmune inflammatory myositis of childhood. International collaboration is essential in order to undertake clinical trials, understand the disease and improve long-term outcome. The aim of this study was to propose from existing collaborative initiatives a preliminary minimal dataset for JDM. This will form the basis of the future development of an international consensus-approved minimum core dataset to be used both in clinical care and inform research, allowing integration of data between centres. Methods A working group of internationally-representative JDM experts was formed to develop a provisional minimal dataset. Clinical and laboratory variables contained within current national and international collaborative databases of patients with idiopathic inflammatory myopathies were scrutinised. Judgements were informed by published literature and a more detailed analysis of the Juvenile Dermatomyositis Cohort Biomarker Study and Repository, UK and Ireland. Results A provisional minimal JDM dataset has been produced, with an associated glossary of definitions. The provisional minimal dataset will request information at time of patient diagnosis and during on-going prospective follow up. At time of patient diagnosis, information will be requested on patient demographics, diagnostic criteria and treatments given prior to diagnosis. During on-going prospective follow-up, variables will include the presence of active muscle or skin disease, major organ involvement or constitutional symptoms, investigations, treatment, physician global assessments and patient reported outcome measures. Conclusions An internationally agreed minimal dataset has the potential to significantly enhance collaboration, allow effective communication between groups, provide a minimal standard of care and enable analysis of the largest possible number of JDM patients to provide a greater understanding of this disease. This
Comparison of Shallow Survey 2012 Multibeam Datasets

NASA Astrophysics Data System (ADS)

Ramirez, T. M.

2012-12-01

The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and
Developing a data dictionary for the irish nursing minimum dataset.

PubMed

Henry, Pamela; Mac Neela, Pádraig; Clinton, Gerard; Scott, Anne; Treacy, Pearl; Butler, Michelle; Hyde, Abbey; Morris, Roisin; Irving, Kate; Byrne, Anne

2006-01-01

One of the challenges in health care in Ireland is the relatively slow acceptance of standardised clinical information systems. Yet the national Irish health reform programme indicates that an Electronic Health Care Record (EHCR) will be implemented on a phased basis. [3-5]. While nursing has a key role in ensuring the quality and comparability of health information, the so- called 'invisibility' of some nursing activities makes this a challenging aim to achieve [3-5]. Any integrated health care system requires the adoption of uniform standards for electronic data exchange [1-2]. One of the pre-requisites for uniform standards is the composition of a data dictionary. Inadequate definition of data elements in a particular dataset hinders the development of an integrated data depository or electronic health care record (EHCR). This paper outlines how work on the data dictionary for the Irish Nursing Minimum Dataset (INMDS) has addressed this issue. Data set elements were devised on the basis of a large scale empirical research programme. ISO 18104, the reference terminology for nursing [6], was used to cross-map the data set elements with semantic domains, categories and links and data set items were dissected.
Abandoned Uranium Mine (AUM) Points, Navajo Nation, 2016, US EPA Region 9

EPA Pesticide Factsheets

This GIS dataset contains point features of all Abandoned Uranium Mines (AUMs) on or within one mile of the Navajo Nation. Points are centroids developed from the Navajo Nation production mines polygon dataset that comprise of productive or unproductive Abandoned Uranium Mines. Attributes include mine names, aliases, links to AUM reports, indicators whether an AUM was mined above or below ground, indicators whether an AUM was mined above or below the local water table, and the region in which an AUM is located. This dataset contains 608 features.
Two ultraviolet radiation datasets that cover China

NASA Astrophysics Data System (ADS)

Liu, Hui; Hu, Bo; Wang, Yuesi; Liu, Guangren; Tang, Liqin; Ji, Dongsheng; Bai, Yongfei; Bao, Weikai; Chen, Xin; Chen, Yunming; Ding, Weixin; Han, Xiaozeng; He, Fei; Huang, Hui; Huang, Zhenying; Li, Xinrong; Li, Yan; Liu, Wenzhao; Lin, Luxiang; Ouyang, Zhu; Qin, Boqiang; Shen, Weijun; Shen, Yanjun; Su, Hongxin; Song, Changchun; Sun, Bo; Sun, Song; Wang, Anzhi; Wang, Genxu; Wang, Huimin; Wang, Silong; Wang, Youshao; Wei, Wenxue; Xie, Ping; Xie, Zongqiang; Yan, Xiaoyuan; Zeng, Fanjiang; Zhang, Fawei; Zhang, Yangjian; Zhang, Yiping; Zhao, Chengyi; Zhao, Wenzhi; Zhao, Xueyong; Zhou, Guoyi; Zhu, Bo

2017-07-01

Ultraviolet (UV) radiation has significant effects on ecosystems, environments, and human health, as well as atmospheric processes and climate change. Two ultraviolet radiation datasets are described in this paper. One contains hourly observations of UV radiation measured at 40 Chinese Ecosystem Research Network stations from 2005 to 2015. CUV3 broadband radiometers were used to observe the UV radiation, with an accuracy of 5%, which meets the World Meteorology Organization's measurement standards. The extremum method was used to control the quality of the measured datasets. The other dataset contains daily cumulative UV radiation estimates that were calculated using an all-sky estimation model combined with a hybrid model. The reconstructed daily UV radiation data span from 1961 to 2014. The mean absolute bias error and root-mean-square error are smaller than 30% at most stations, and most of the mean bias error values are negative, which indicates underestimation of the UV radiation intensity. These datasets can improve our basic knowledge of the spatial and temporal variations in UV radiation. Additionally, these datasets can be used in studies of potential ozone formation and atmospheric oxidation, as well as simulations of ecological processes.
The Harvard organic photovoltaic dataset

DOE PAGES

Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; ...

2016-09-27

Presented in this work is the Harvard Organic Photovoltaic Dataset (HOPV15), a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.
The Harvard organic photovoltaic dataset.

PubMed

Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

2016-09-27

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.
A new global 1-km dataset of percentage tree cover derived from remote sensing

USGS Publications Warehouse

DeFries, R.S.; Hansen, M.C.; Townshend, J.R.G.; Janetos, A.C.; Loveland, Thomas R.

2000-01-01

Accurate assessment of the spatial extent of forest cover is a crucial requirement for quantifying the sources and sinks of carbon from the terrestrial biosphere. In the more immediate context of the United Nations Framework Convention on Climate Change, implementation of the Kyoto Protocol calls for estimates of carbon stocks for a baseline year as well as for subsequent years. Data sources from country level statistics and other ground-based information are based on varying definitions of 'forest' and are consequently problematic for obtaining spatially and temporally consistent carbon stock estimates. By combining two datasets previously derived from the Advanced Very High Resolution Radiometer (AVHRR) at 1 km spatial resolution, we have generated a prototype global map depicting percentage tree cover and associated proportions of trees with different leaf longevity (evergreen and deciduous) and leaf type (broadleaf and needleleaf). The product is intended for use in terrestrial carbon cycle models, in conjunction with other spatial datasets such as climate and soil type, to obtain more consistent and reliable estimates of carbon stocks. The percentage tree cover dataset is available through the Global Land Cover Facility at the University of Maryland at http://glcf.umiacs.umd.edu.
National Transportation Atlas Databases : 2002

DOT National Transportation Integrated Search

2002-01-01

The National Transportation Atlas Databases 2002 (NTAD2002) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2010

DOT National Transportation Integrated Search

2010-01-01

The National Transportation Atlas Databases 2010 (NTAD2010) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2006

DOT National Transportation Integrated Search

2006-01-01

The National Transportation Atlas Databases 2006 (NTAD2006) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2005

DOT National Transportation Integrated Search

2005-01-01

The National Transportation Atlas Databases 2005 (NTAD2005) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2008

DOT National Transportation Integrated Search

2008-01-01

The National Transportation Atlas Databases 2008 (NTAD2008) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2003

DOT National Transportation Integrated Search

2003-01-01

The National Transportation Atlas Databases 2003 (NTAD2003) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2014

DOT National Transportation Integrated Search

2014-01-01

The National Transportation Atlas Databases 2014 : (NTAD2014) is a set of nationwide geographic datasets of : transportation facilities, transportation networks, associated : infrastructure, and other political and administrative entities. : These da...
National Transportation Atlas Databases : 2004

DOT National Transportation Integrated Search

2004-01-01

The National Transportation Atlas Databases 2004 (NTAD2004) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2009

DOT National Transportation Integrated Search

2009-01-01

The National Transportation Atlas Databases 2009 (NTAD2009) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2007

DOT National Transportation Integrated Search

2007-01-01

The National Transportation Atlas Databases 2007 (NTAD2007) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2012

DOT National Transportation Integrated Search

2012-01-01

The National Transportation Atlas Databases 2012 (NTAD2012) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...

National Transportation Atlas Databases : 2015

DOT National Transportation Integrated Search

2015-01-01

The National Transportation Atlas Databases 2015 : (NTAD2015) is a set of nationwide geographic datasets of : transportation facilities, transportation networks, associated : infrastructure, and other political and administrative entities. : These da...
National Transportation Atlas Databases : 2011

DOT National Transportation Integrated Search

2011-01-01

The National Transportation Atlas Databases 2011 (NTAD2011) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
An open source high-performance solution to extract surface water drainage networks from diverse terrain conditions

USGS Publications Warehouse

Stanislawski, Larry V.; Survila, Kornelijus; Wendel, Jeffrey; Liu, Yan; Buttenfield, Barbara P.

2018-01-01

This paper describes a workflow for automating the extraction of elevation-derived stream lines using open source tools with parallel computing support and testing the effectiveness of procedures in various terrain conditions within the conterminous United States. Drainage networks are extracted from the US Geological Survey 1/3 arc-second 3D Elevation Program elevation data having a nominal cell size of 10 m. This research demonstrates the utility of open source tools with parallel computing support for extracting connected drainage network patterns and handling depressions in 30 subbasins distributed across humid, dry, and transitional climate regions and in terrain conditions exhibiting a range of slopes. Special attention is given to low-slope terrain, where network connectivity is preserved by generating synthetic stream channels through lake and waterbody polygons. Conflation analysis compares the extracted streams with a 1:24,000-scale National Hydrography Dataset flowline network and shows that similarities are greatest for second- and higher-order tributaries.
Improving clinical models based on knowledge extracted from current datasets: a new approach.

PubMed

Mendes, D; Paredes, S; Rocha, T; Carvalho, P; Henriques, J; Morais, J

2016-08-01

The Cardiovascular Diseases (CVD) are the leading cause of death in the world, being prevention recognized to be a key intervention able to contradict this reality. In this context, although there are several models and scores currently used in clinical practice to assess the risk of a new cardiovascular event, they present some limitations. The goal of this paper is to improve the CVD risk prediction taking into account the current models as well as information extracted from real and recent datasets. This approach is based on a decision tree scheme in order to assure the clinical interpretability of the model. An innovative optimization strategy is developed in order to adjust the decision tree thresholds (rule structure is fixed) based on recent clinical datasets. A real dataset collected in the ambit of the National Registry on Acute Coronary Syndromes, Portuguese Society of Cardiology is applied to validate this work. In order to assess the performance of the new approach, the metrics sensitivity, specificity and accuracy are used. This new approach achieves sensitivity, a specificity and an accuracy values of, 80.52%, 74.19% and 77.27% respectively, which represents an improvement of about 26% in relation to the accuracy of the original score.
Genomics dataset of unidentified disclosed isolates.

PubMed

Rekadwad, Bhagwan N

2016-09-01

Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Circulation, Hydrography, and Transport over the Summit of Axial-the Destination Node of OOI's Cabled Array

NASA Astrophysics Data System (ADS)

Xu, G.; Lavelle, J. W.

2016-12-01

A numerical model of ocean flow and transport is used to extrapolate observations of currents and hydrography and infer patterns of material flux in the deep ocean around Axial Volcano--the destination node of the Ocean Observatories Initiative (OOI)'s Cabled Array. Using an inverse method, the model is made to approximate measured deep ocean flow around this site during a 35-day time period in 2002. The model is then used to extract month-long mean patterns and examine smaller-scale spatial and temporal variability around Axial. Like prior observations, model month-long mean currents flow anti-cyclonically (clockwise) around the volcano's summit in toroidal form at speeds of up to 7 cm/s. The mean vertical circulation has a net effect of pumping water out of the caldera. Temperature and salinity iso-surfaces sweep upward and downward on opposite sides of the volcano with vertical excursions of up to 70 m. As a time mean, the temperature (salinity) anomaly takes the form of a cold (briny) dome above the summit. Passive tracer material released at the location of the ASHES vent field exits the caldera through its southern open end and over the western bounding wall driven by vertical flow. Once outside the caldera, the tracer circles the summit in clockwise fashion, while gradually bleeding southwestward into the ambient ocean. Another tracer release experiment using a source of 2-day duration inside and near the northern end of the caldera suggests a residence time of the fluid at that locale of 5-6 days.
High-resolution digital elevation dataset for Crater Lake National Park and vicinity, Oregon, based on LiDAR survey of August-September 2010 and bathymetric survey of July 2000

USGS Publications Warehouse

Robinson, Joel E.

2012-01-01

Crater Lake partially fills the caldera that formed approximately 7,700 years ago during the eruption of a 12,000-foot volcano known as Mount Mazama. The caldera-forming or climactic eruption of Mount Mazama devastated the surrounding landscape, left a thick deposit of pumice and ash in adjacent valleys, and spread a blanket of volcanic ash as far away as southern Canada. Because the Crater Lake region is potentially volcanically active, knowledge of past events is important to understanding hazards from future eruptions. Similarly, because the area is seismically active, documenting and evaluating geologic faults is critical to assessing hazards from earthquakes. As part of the American Recovery and Reinvestment Act (ARRA) of 2009, the U.S. Geological Survey was awarded funding for high-precision airborne LiDAR (Light Detection And Ranging) data collection at several volcanoes in the Cascade Range through the Oregon LiDAR Consortium, administered by the Oregon Department of Geology and Mineral Industries (DOGAMI). The Oregon LiDAR Consortium contracted with Watershed Sciences, Inc., to conduct the data collection surveys. Collaborating agencies participating with the Oregon LiDAR Consortium for data collection in the Crater Lake region include Crater Lake National Park (National Park Service) and the Federal Highway Administration. In the immediate vicinity of Crater Lake National Park, 798 square kilometers of LiDAR data were collected, providing a digital elevation dataset of the ground surface beneath forest cover with an average resolution of 1.6 laser returns/m2 and both vertical and horizontal accuracies of ±5 cm. The LiDAR data were mosaicked in this report with bathymetry of the lake floor of Crater Lake, collected in 2000 using high-resolution multibeam sonar in a collaborative effort between the U.S. Geological Survey, Crater Lake National Park, and the Center for Coastal and Ocean Mapping at the University of New Hampshire. The bathymetric survey
Application of Huang-Hilbert Transforms to Geophysical Datasets

NASA Technical Reports Server (NTRS)

Duffy, Dean G.

2003-01-01

The Huang-Hilbert transform is a promising new method for analyzing nonstationary and nonlinear datasets. In this talk I will apply this technique to several important geophysical datasets. To understand the strengths and weaknesses of this method, multi- year, hourly datasets of the sea level heights and solar radiation will be analyzed. Then we will apply this transform to the analysis of gravity waves observed in a mesoscale observational net.
Technical note: Space-time analysis of rainfall extremes in Italy: clues from a reconciled dataset

NASA Astrophysics Data System (ADS)

Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

2018-05-01

Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.
The Harvard organic photovoltaic dataset

PubMed Central

Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

2016-01-01

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312
Interpolation of diffusion weighted imaging datasets.

PubMed

Dyrby, Tim B; Lundell, Henrik; Burke, Mark W; Reislev, Nina L; Paulson, Olaf B; Ptito, Maurice; Siebner, Hartwig R

2014-12-01

Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical resolution and more anatomical details in complex regions such as tract boundaries and cortical layers, which are normally only visualized at higher image resolutions. Similar results were found with typical clinical human DWI dataset. However, a possible bias in quantitative values imposed by the interpolation method used should be considered. The results indicate that conventional interpolation methods can be successfully applied to DWI datasets for mining anatomical details that are normally seen only at higher resolutions, which will aid in tractography and microstructural mapping of tissue compartments. Copyright © 2014. Published by Elsevier Inc.
Progress towards Continental River Dynamics modeling

NASA Astrophysics Data System (ADS)

Yu, Cheng-Wei; Zheng, Xing; Liu, Frank; Maidment, Daivd; Hodges, Ben

2017-04-01

The high-resolution National Water Model (NWM), launched by U.S. National Oceanic and Atmospheric Administration (NOAA) in August 2016, has shown it is possible to provide real-time flow prediction in rivers and streams across the entire continental United States. The next step for continental-scale modeling is moving from reduced physics (e.g. Muskingum-Cunge) to full dynamic modeling with the Saint-Venant equations. The Simulation Program for River Networks (SPRNT) provides a computational approach for the Saint-Venant equations, but obtaining sufficient channel bathymetric data and hydraulic roughness is seen as a critical challenge. However, recent work has shown the Height Above Nearest Drainage (HAND) method can be applied with the National Elevation Dataset (NED) to provide automated estimation of effective channel bathymetry suitable for large-scale hydraulic simulations. The present work examines the use of SPRNT with the National Hydrography Dataset (NHD) and HAND-derived bathymetry for automated generation of rating curves that can be compared to existing data. The approach can, in theory, be applied to every stream reach in the NHD and thus provide flood guidance where none is available. To test this idea we generated 2000+ rating curves in two catchments in Texas and Alabama (USA). Field data from the USGS and flood records from an Austin, Texas flood in May 2015 were used as validation. Large-scale implementation of this idea requires addressing several critical difficulties associated with numerical instabilities, including ill-posed boundary conditions generated in automated model linkages and inconsistencies in the river geometry. A key to future progress is identifying efficient approaches to isolate numerical instability contributors in a large time-space varying solution. This research was supported in part by the National Science Foundation under grant number CCF-1331610.
CPTAC Releases Largest-Ever Breast Cancer Proteome Dataset from Previously Genome Characterized Tumors | Office of Cancer Clinical Proteomics Research

Cancer.gov

National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) scientists have released a dataset of proteins and phosphopeptides identified through deep proteomic and phosphoproteomic analysis of breast tumor samples, previously genomically analyzed by The Cancer Genome Atlas (TCGA).
Digital hydrologic networks supporting applications related to spatially referenced regression modeling

USGS Publications Warehouse

Brakebill, John W.; Wolock, David M.; Terziotti, Silvia

2011-01-01

Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based ⁄ statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling.
Basin Assessment Spatial Planning Platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

The tool is intended to facilitate hydropower development and water resource planning by improving synthesis and interpretation of disparate spatial datasets that are considered in development actions (e.g., hydrological characteristics, environmentally and culturally sensitive areas, existing or proposed water power resources, climate-informed forecasts). The tool enables this capability by providing a unique framework for assimilating, relating, summarizing, and visualizing disparate spatial data through the use of spatial aggregation techniques, relational geodatabase platforms, and an interactive web-based Geographic Information Systems (GIS). Data are aggregated and related based on shared intersections with a common spatial unit; in this case, industry-standard hydrologic drainagemore » areas for the U.S. (National Hydrography Dataset) are used as the spatial unit to associate planning data. This process is performed using all available scalar delineations of drainage areas (i.e., region, sub-region, basin, sub-basin, watershed, sub-watershed, catchment) to create spatially hierarchical relationships among planning data and drainages. These entity-relationships are stored in a relational geodatabase that provides back-end structure to the web GIS and its widgets. The full technology stack was built using all open-source software in modern programming languages. Interactive widgets that function within the viewport are also compatible with all modern browsers.« less
Remote-sensing application for facilitating land resource assessment and monitoring for utility-scale solar energy development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamada, Yuki; Grippo, Mark A.

2015-01-01

A monitoring plan that incorporates regional datasets and integrates cost-effective data collection methods is necessary to sustain the long-term environmental monitoring of utility-scale solar energy development in expansive, environmentally sensitive desert environments. Using very high spatial resolution (VHSR; 15 cm) multispectral imagery collected in November 2012 and January 2014, an image processing routine was developed to characterize ephemeral streams, vegetation, and land surface in the southwestern United States where increased utility-scale solar development is anticipated. In addition to knowledge about desert landscapes, the methodology integrates existing spectral indices and transformation (e.g., visible atmospherically resistant index and principal components); a newlymore » developed index, erosion resistance index (ERI); and digital terrain and surface models, all of which were derived from a common VHSR image. The methodology identified fine-scale ephemeral streams with greater detail than the National Hydrography Dataset and accurately estimated vegetation distribution and fractional cover of various surface types. The ERI classified surface types that have a range of erosive potentials. The remote-sensing methodology could ultimately reduce uncertainty and monitoring costs for all stakeholders by providing a cost-effective monitoring approach that accurately characterizes the land resources at potential development sites.« less
Does using different modern climate datasets impact pollen-based paleoclimate reconstructions in North America during the past 2,000 years

NASA Astrophysics Data System (ADS)

Ladd, Matthew; Viau, Andre

2013-04-01

Paleoclimate reconstructions rely on the accuracy of modern climate datasets for calibration of fossil records under the assumption of climate normality through time, which means that the modern climate operates in a similar manner as over the past 2,000 years. In this study, we show how using different modern climate datasets have an impact on a pollen-based reconstruction of mean temperature of the warmest month (MTWA) during the past 2,000 years for North America. The modern climate datasets used to explore this research question include the: Whitmore et al., (2005) modern climate dataset; North American Regional Reanalysis (NARR); National Center For Environmental Prediction (NCEP); European Center for Medium Range Weather Forecasting (ECMWF) ERA-40 reanalysis; WorldClim, Global Historical Climate Network (GHCN) and New et al., which is derived from the CRU dataset. Results show that some caution is advised in using the reanalysis data on large-scale reconstructions. Station data appears to dampen out the variability of the reconstruction produced using station based datasets. The reanalysis or model-based datasets are not recommended for paleoclimate large-scale North American reconstructions as they appear to lack some of the dynamics observed in station datasets (CRU) which resulted in warm-biased reconstructions as compared to the station-based reconstructions. The Whitmore et al. (2005) modern climate dataset appears to be a compromise between CRU-based datasets and model-based datasets except for the ERA-40. In addition, an ultra-high resolution gridded climate dataset such as WorldClim may only be useful if the pollen calibration sites in North America have at least the same spatial precision. We reconstruct the MTWA to within +/-0.01°C by using an average of all curves derived from the different modern climate datasets, demonstrating the robustness of the procedure used. It may be that the use of an average of different modern datasets may reduce the
EEG datasets for motor imagery brain-computer interface.

PubMed

Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

2017-07-01

Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.
An Environmental Assessment of United States Drinking Water Watersheds

EPA Science Inventory

There is an emerging recognition that natural lands and their conservation are important elements of a sustainable drinking water infrastructure. We conducted a national, watershed-level environmental assessment of drinking water watersheds using data on land cover, hydrography a...
A high-resolution European dataset for hydrologic modeling

NASA Astrophysics Data System (ADS)

Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

2013-04-01

There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

EAARL coastal topography-Assategue Island National Seashore, Maryland and Virginia, 2010

USGS Publications Warehouse

Bonisteel-Cormier, J.M.; Nayegandhi, Amar; Wright, C.W.; Brock, J.C.; Nagle, D.B.; Vivekanandan, Saisudha; Klipp, E.S.; Fredericks, Xan; Stevens, Sara

2011-01-01

This DVD contains lidar-derived bare-earth (BE) and first-surface (FS) topography GIS datasets of a portion of the Assateague Island National Seashore in Maryland and Virginia. These datasets were acquired on March 19 and 24, 2010.
ASSISTments Dataset from Multiple Randomized Controlled Experiments

ERIC Educational Resources Information Center

Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

2016-01-01

In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…
Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

PubMed

McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

2017-01-01

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers. © 2016 New York Academy of Sciences.
Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets

PubMed Central

McKinney, Bill; Meyer, Peter A.; Crosas, Mercè; Sliz, Piotr

2016-01-01

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension—functionality supporting preservation of filesystem structure within Dataverse—which is essential for both in-place computation and supporting non-http data transfers. PMID:27862010
Estimating parameters for probabilistic linkage of privacy-preserved datasets.

PubMed

Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

2017-07-10

Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher
Viking Seismometer PDS Archive Dataset

NASA Astrophysics Data System (ADS)

Lorenz, R. D.

2016-12-01

The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.
A Benchmark Dataset for SSVEP-Based Brain-Computer Interfaces.

PubMed

Wang, Yijun; Chen, Xiaogang; Gao, Xiaorong; Gao, Shangkai

2017-10-01

This paper presents a benchmark steady-state visual evoked potential (SSVEP) dataset acquired with a 40-target brain- computer interface (BCI) speller. The dataset consists of 64-channel Electroencephalogram (EEG) data from 35 healthy subjects (8 experienced and 27 naïve) while they performed a cue-guided target selecting task. The virtual keyboard of the speller was composed of 40 visual flickers, which were coded using a joint frequency and phase modulation (JFPM) approach. The stimulation frequencies ranged from 8 Hz to 15.8 Hz with an interval of 0.2 Hz. The phase difference between two adjacent frequencies was . For each subject, the data included six blocks of 40 trials corresponding to all 40 flickers indicated by a visual cue in a random order. The stimulation duration in each trial was five seconds. The dataset can be used as a benchmark dataset to compare the methods for stimulus coding and target identification in SSVEP-based BCIs. Through offline simulation, the dataset can be used to design new system diagrams and evaluate their BCI performance without collecting any new data. The dataset also provides high-quality data for computational modeling of SSVEPs. The dataset is freely available fromhttp://bci.med.tsinghua.edu.cn/download.html.
Dataset-Driven Research to Support Learning and Knowledge Analytics

ERIC Educational Resources Information Center

Verbert, Katrien; Manouselis, Nikos; Drachsler, Hendrik; Duval, Erik

2012-01-01

In various research areas, the availability of open datasets is considered as key for research and application purposes. These datasets are used as benchmarks to develop new algorithms and to compare them to other algorithms in given settings. Finding such available datasets for experimentation can be a challenging task in technology enhanced…
Development and validation of a national data registry for midwife-led births: the Midwives Alliance of North America Statistics Project 2.0 dataset.

PubMed

Cheyney, Melissa; Bovbjerg, Marit; Everson, Courtney; Gordon, Wendy; Hannibal, Darcy; Vedam, Saraswathi

2014-01-01

In 2004, the Midwives Alliance of North America's (MANA's) Division of Research developed a Web-based data collection system to gather information on the practices and outcomes associated with midwife-led births in the United States. This system, called the MANA Statistics Project (MANA Stats), grew out of a widely acknowledged need for more reliable data on outcomes by intended place of birth. This article describes the history and development of the MANA Stats birth registry and provides an analysis of the 2.0 dataset's content, strengths, and limitations. Data collection and review procedures for the MANA Stats 2.0 dataset are described, along with methods for the assessment of data accuracy. We calculated descriptive statistics for client demographics and contributing midwife credentials, and assessed the quality of data by calculating point estimates, 95% confidence intervals, and kappa statistics for key outcomes on pre- and postreview samples of records. The MANA Stats 2.0 dataset (2004-2009) contains 24,848 courses of care, 20,893 of which are for women who planned a home or birth center birth at the onset of labor. The majority of these records were planned home births (81%). Births were attended primarily by certified professional midwives (73%), and clients were largely white (92%), married (87%), and college-educated (49%). Data quality analyses of 9932 records revealed no differences between pre- and postreviewed samples for 7 key benchmarking variables (kappa, 0.98-1.00). The MANA Stats 2.0 data were accurately entered by participants; any errors in this dataset are likely random and not systematic. The primary limitation of the 2.0 dataset is that the sample was captured through voluntary participation; thus, it may not accurately reflect population-based outcomes. The dataset's primary strength is that it will allow for the examination of research questions on normal physiologic birth and midwife-led birth outcomes by intended place of birth. �
Method of generating features optimal to a dataset and classifier

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.
Querying Patterns in High-Dimensional Heterogenous Datasets

ERIC Educational Resources Information Center

Singh, Vishwakarma

2012-01-01

The recent technological advancements have led to the availability of a plethora of heterogenous datasets, e.g., images tagged with geo-location and descriptive keywords. An object in these datasets is described by a set of high-dimensional feature vectors. For example, a keyword-tagged image is represented by a color-histogram and a…
Development of an internationally agreed minimal dataset for juvenile dermatomyositis (JDM) for clinical and research use.

PubMed

McCann, Liza J; Kirkham, Jamie J; Wedderburn, Lucy R; Pilkington, Clarissa; Huber, Adam M; Ravelli, Angelo; Appelbe, Duncan; Williamson, Paula R; Beresford, Michael W

2015-06-12

Juvenile dermatomyositis (JDM) is a rare autoimmune inflammatory disorder associated with significant morbidity and mortality. International collaboration is necessary to better understand the pathogenesis of the disease, response to treatment and long-term outcome. To aid international collaboration, it is essential to have a core set of data that all researchers and clinicians collect in a standardised way for clinical purposes and for research. This should include demographic details, diagnostic data and measures of disease activity, investigations and treatment. Variables in existing clinical registries have been compared to produce a provisional data set for JDM. We now aim to develop this into a consensus-approved minimum core dataset, tested in a wider setting, with the objective of achieving international agreement. A two-stage bespoke Delphi-process will engage the opinion of a large number of key stakeholders through Email distribution via established international paediatric rheumatology and myositis organisations. This, together with a formalised patient/parent participation process will help inform a consensus meeting of international experts that will utilise a nominal group technique (NGT). The resulting proposed minimal dataset will be tested for feasibility within existing database infrastructures. The developed minimal dataset will be sent to all internationally representative collaborators for final comment. The participants of the expert consensus group will be asked to draw together these comments, ratify and 'sign off' the final minimal dataset. An internationally agreed minimal dataset has the potential to significantly enhance collaboration, allow effective communication between groups, provide a minimal standard of care and enable analysis of the largest possible number of JDM patients to provide a greater understanding of this disease. The final approved minimum core dataset could be rapidly incorporated into national and international
Suicide mortality and marital status for specific ages, genders, and education levels in South Korea: Using a virtually individualized dataset from national aggregate data.

PubMed

Park, Soo Kyung; Lee, Chung Kwon; Kim, Haeryun

2018-09-01

Previous studies in Eastern as well as Western countries have shown a relationship between marital status and suicide mortality. However, to date, no Korean study has calculated national suicide rates by marital status for specific genders, ages, and education levels. This study investigated whether the relationship between marital status and suicide differs by age, gender, and educational attainment, and analyzed the effect of marital status on suicide risk after controlling for these socio-demographic variables. Using national mortality data from 2015, and aggregated census data from 2010 in South Korea, we created a virtually individualized dataset with multiple weighting algorithms, including individual socio-demographic characteristics and suicide rates across the entire population. The findings show that the following groups faced the highest relative suicide risks: 1) divorced men of all ages and men aged more than 75 years, particularly divorced men aged more than 75; and 2) never-married men aged 55-64 years, and never-married women of lower education status. We did not account for important variables such as mental health, substance abuse, employment insecurity, social integration, perceived loneness, and family income which we were unable to access. This current research extends prior theoretical and methodological work on suicide, aiding efforts to reduce suicide mortality in South Korea. Copyright © 2018 Elsevier B.V. All rights reserved.
Attributes for NHDPlus Catchments (Version 1.1): Level 3 Nutrient Ecoregions, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the area of each level 3 nutrient ecoregion in square meters, compiled for every catchment of NHDPlus for the conterminous United States. The source data are from the 2002 version of the U.S. Environmental Protection Agency's (USEPA) Aggregations of Level III Ecoregions for National Nutrient Assessment & Management Strategy (USEPA, 2002). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins
A hybrid organic-inorganic perovskite dataset

NASA Astrophysics Data System (ADS)

Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

2017-05-01

Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.
High-resolution precipitation mapping in a mountainous watershed: ground truth for evaluating uncertainty in a national precipitation dataset

Treesearch

Christopher Daly; Melissa E. Slater; Joshua A. Roberti; Stephanie H. Laseter; Lloyd W. Swift

2017-01-01

A 69-station, densely spaced rain gauge network was maintained over the period 1951â1958 in the Coweeta Hydrologic Laboratory, located in the southern Appalachians in western North Carolina, USA. This unique dataset was used to develop the first digital seasonal and annual precipitation maps for the Coweeta basin, using elevation regression functions and...
Comparison of trends and abrupt changes of the South Asia high from 1979 to 2014 in reanalysis and radiosonde datasets

NASA Astrophysics Data System (ADS)

Shi, Chunhua; Huang, Ying; Guo, Dong; Zhou, Shunwu; Hu, Kaixi; Liu, Yu

2018-05-01

The South Asian High (SAH) has an important influence on atmospheric circulation and the Asian climate in summer. However, current comparative analyses of the SAH are mostly between reanalysis datasets and there is a lack of sounding data. We therefore compared the climatology, trends and abrupt changes in the SAH in the Japanese 55-year Reanalysis (JRA-55) dataset, the National Centers for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR) dataset, the European Center for Medium-Range Weather Forecasts Reanalysis Interim (ERA-interim) dataset and radiosonde data from China using linear analysis and a sliding t-test. The trends in geopotential height in the control area of the SAH were positive in the JRA-55, NCEP-CFSR and ERA-interim datasets, but negative in the radiosonde data in the time period 1979-2014. The negative trends for the SAH were significant at the 90% confidence level in the radiosonde data from May to September. The positive trends in the NCEP-CFSR dataset were significant at the 90% confidence level in May, July, August and September, but the positive trends in the JRA-55 and ERA-Interim were only significant at the 90% confidence level in September. The reasons for the differences in the trends of the SAH between the radiosonde data and the three reanalysis datasets in the time period 1979-2014 were updates to the sounding systems, changes in instrumentation and improvements in the radiation correction method for calculations around the year 2000. We therefore analyzed the trends in the two time periods of 1979-2000 and 2001-2014 separately. From 1979 to 2000, the negative SAH trends in the radiosonde data mainly agreed with the negative trends in the NCEP-CFSR dataset, but were in contrast with the positive trends in the JRA-55 and ERA-Interim datasets. In 2001-2014, however, the trends in the SAH were positive in all four datasets and most of the trends in the radiosonde and NCEP-CFSR datasets were significant. It is
78 FR 9403 - National Institute on Aging; Notice of Closed Meetings

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-08

... DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Institute on Aging... personal privacy. Name of Committee: National Institute on Aging Special Emphasis Panel; CALERIE Dataset.... Place: National Institute on Aging, Gateway Building, Room 2C212, 7201 Wisconsin Avenue, Bethesda, MD...
78 FR 37232 - National Institute on Aging; Notice of Closed Meetings

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-20

....nih.gov . Name of Committee: National Institute on Aging Special Emphasis Panel; Treatment of Obesity... DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Institute on Aging... personal privacy. Name of Committee: National Institute on Aging Special Emphasis Panel; NIA DBSR DATASETS...
Attributes for NHDPlus catchments (version 1.1) for the conterminous United States: 30-year average annual maximum temperature, 1971-2000

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the 30-year (1971-2000) average annual maximum temperature in Celsius multiplied by 100 compiled for every catchment of NHDPlus for the conterminous United States. The source data were the United States Average Monthly or Annual Minimum Temperature, 1971 - 2000 raster dataset produced by the PRISM Group at Oregon State University. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins

Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: 30-Year Average Annual Precipitation, 1971-2000

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the 30-year (1971-2000) average annual precipitation in millimeters multiplied by 100 compiled for every catchment of NHDPlus for the conterminous United States. The source data were the "United States Average Monthly or Annual Precipitation, 1971 - 2000" raster dataset produced by the PRISM Group at Oregon State University. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: 30-Year Average Annual Minimum Temperature, 1971-2000

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the 30-year (1971-2000) average annual minimum temperature in Celsius multiplied by 100 compiled for every catchment of NHDPlus for the conterminous United States. The source data were the "United States Average Monthly or Annual Minimum Temperature, 1971 - 2000" raster dataset produced by the PRISM Group at Oregon State University. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins
Omicseq: a web-based search engine for exploring omics datasets

PubMed Central

Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

2017-01-01

Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Usefulness of DARPA dataset for intrusion detection system evaluation

NASA Astrophysics Data System (ADS)

Thomas, Ciza; Sharma, Vishwas; Balakrishnan, N.

2008-03-01

The MIT Lincoln Laboratory IDS evaluation methodology is a practical solution in terms of evaluating the performance of Intrusion Detection Systems, which has contributed tremendously to the research progress in that field. The DARPA IDS evaluation dataset has been criticized and considered by many as a very outdated dataset, unable to accommodate the latest trend in attacks. Then naturally the question arises as to whether the detection systems have improved beyond detecting these old level of attacks. If not, is it worth thinking of this dataset as obsolete? The paper presented here tries to provide supporting facts for the use of the DARPA IDS evaluation dataset. The two commonly used signature-based IDSs, Snort and Cisco IDS, and two anomaly detectors, the PHAD and the ALAD, are made use of for this evaluation purpose and the results support the usefulness of DARPA dataset for IDS evaluation.
The SPoRT-WRF: Evaluating the Impact of NASA Datasets on Convective Forecasts

NASA Technical Reports Server (NTRS)

Zavodsky, Bradley; Case, Jonathan; Kozlowski, Danielle; Molthan, Andrew

2012-01-01

The Short-term Prediction Research and Transition Center (SPoRT) is a collaborative partnership between NASA and operational forecasting entities, including a number of National Weather Service offices. SPoRT transitions real-time NASA products and capabilities to its partners to address specific operational forecast challenges. One challenge that forecasters face is applying convection-allowing numerical models to predict mesoscale convective weather. In order to address this specific forecast challenge, SPoRT produces real-time mesoscale model forecasts using the Weather Research and Forecasting (WRF) model that includes unique NASA products and capabilities. Currently, the SPoRT configuration of the WRF model (SPoRT-WRF) incorporates the 4-km Land Information System (LIS) land surface data, 1-km SPoRT sea surface temperature analysis and 1-km Moderate resolution Imaging Spectroradiometer (MODIS) greenness vegetation fraction (GVF) analysis, and retrieved thermodynamic profiles from the Atmospheric Infrared Sounder (AIRS). The LIS, SST, and GVF data are all integrated into the SPoRT-WRF through adjustments to the initial and boundary conditions, and the AIRS data are assimilated into a 9-hour SPoRT WRF forecast each day at 0900 UTC. This study dissects the overall impact of the NASA datasets and the individual surface and atmospheric component datasets on daily mesoscale forecasts. A case study covering the super tornado outbreak across the Ce ntral and Southeastern United States during 25-27 April 2011 is examined. Three different forecasts are analyzed including the SPoRT-WRF (NASA surface and atmospheric data), the SPoRT WRF without AIRS (NASA surface data only), and the operational National Severe Storms Laboratory (NSSL) WRF (control with no NASA data). The forecasts are compared qualitatively by examining simulated versus observed radar reflectivity. Differences between the simulated reflectivity are further investigated using convective parameters along
Geocoding and stereo display of tropical forest multisensor datasets

NASA Technical Reports Server (NTRS)

Welch, R.; Jordan, T. R.; Luvall, J. C.

1990-01-01

Concern about the future of tropical forests has led to a demand for geocoded multisensor databases that can be used to assess forest structure, deforestation, thermal response, evapotranspiration, and other parameters linked to climate change. In response to studies being conducted at the Braulino Carrillo National Park, Costa Rica, digital satellite and aircraft images recorded by Landsat TM, SPOT HRV, Thermal Infrared Multispectral Scanner, and Calibrated Airborne Multispectral Scanner sensors were placed in register using the Landsat TM image as the reference map. Despite problems caused by relief, multitemporal datasets, and geometric distortions in the aircraft images, registration was accomplished to within + or - 20 m (+ or - 1 data pixel). A digital elevation model constructed from a multisensor Landsat TM/SPOT stereopair proved useful for generating perspective views of the rugged, forested terrain.
Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

NASA Astrophysics Data System (ADS)

Maskey, M.; Ramachandran, R.; Miller, J.

2017-12-01

Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.
Challenges in Extracting Information From Large Hydrogeophysical-monitoring Datasets

NASA Astrophysics Data System (ADS)

Day-Lewis, F. D.; Slater, L. D.; Johnson, T.

2012-12-01

Over the last decade, new automated geophysical data-acquisition systems have enabled collection of increasingly large and information-rich geophysical datasets. Concurrent advances in field instrumentation, web services, and high-performance computing have made real-time processing, inversion, and visualization of large three-dimensional tomographic datasets practical. Geophysical-monitoring datasets have provided high-resolution insights into diverse hydrologic processes including groundwater/surface-water exchange, infiltration, solute transport, and bioremediation. Despite the high information content of such datasets, extraction of quantitative or diagnostic hydrologic information is challenging. Visual inspection and interpretation for specific hydrologic processes is difficult for datasets that are large, complex, and (or) affected by forcings (e.g., seasonal variations) unrelated to the target hydrologic process. New strategies are needed to identify salient features in spatially distributed time-series data and to relate temporal changes in geophysical properties to hydrologic processes of interest while effectively filtering unrelated changes. Here, we review recent work using time-series and digital-signal-processing approaches in hydrogeophysics. Examples include applications of cross-correlation, spectral, and time-frequency (e.g., wavelet and Stockwell transforms) approaches to (1) identify salient features in large geophysical time series; (2) examine correlation or coherence between geophysical and hydrologic signals, even in the presence of non-stationarity; and (3) condense large datasets while preserving information of interest. Examples demonstrate analysis of large time-lapse electrical tomography and fiber-optic temperature datasets to extract information about groundwater/surface-water exchange and contaminant transport.
Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

NASA Astrophysics Data System (ADS)

Lary, D. J.

2013-12-01

A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.
BigNeuron dataset V.0.0

DOE Data Explorer

Ramanathan, Arvind

2016-01-01

The cleaned bench testing reconstructions for the gold166 datasets have been put online at github https://github.com/BigNeuron/Events-and-News/wiki/BigNeuron-Events-and-News https://github.com/BigNeuron/Data/releases/tag/gold166_bt_v1.0 The respective image datasets were released a while ago from other sites (major pointer is available at github as well https://github.com/BigNeuron/Data/releases/tag/Gold166_v1 but since the files were big, the actual downloading was distributed at 3 continents separately)
Validating Variational Bayes Linear Regression Method With Multi-Central Datasets.

PubMed

Murata, Hiroshi; Zangwill, Linda M; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Hirasawa, Kazunori; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Shoji, Nobuyuki; Asaoka, Ryo

2018-04-01

To validate the prediction accuracy of variational Bayes linear regression (VBLR) with two datasets external to the training dataset. The training dataset consisted of 7268 eyes of 4278 subjects from the University of Tokyo Hospital. The Japanese Archive of Multicentral Databases in Glaucoma (JAMDIG) dataset consisted of 271 eyes of 177 patients, and the Diagnostic Innovations in Glaucoma Study (DIGS) dataset includes 248 eyes of 173 patients, which were used for validation. Prediction accuracy was compared between the VBLR and ordinary least squared linear regression (OLSLR). First, OLSLR and VBLR were carried out using total deviation (TD) values at each of the 52 test points from the second to fourth visual fields (VFs) (VF2-4) to 2nd to 10th VF (VF2-10) of each patient in JAMDIG and DIGS datasets, and the TD values of the 11th VF test were predicted every time. The predictive accuracy of each method was compared through the root mean squared error (RMSE) statistic. OLSLR RMSEs with the JAMDIG and DIGS datasets were between 31 and 4.3 dB, and between 19.5 and 3.9 dB. On the other hand, VBLR RMSEs with JAMDIG and DIGS datasets were between 5.0 and 3.7, and between 4.6 and 3.6 dB. There was statistically significant difference between VBLR and OLSLR for both datasets at every series (VF2-4 to VF2-10) (P < 0.01 for all tests). However, there was no statistically significant difference in VBLR RMSEs between JAMDIG and DIGS datasets at any series of VFs (VF2-2 to VF2-10) (P > 0.05). VBLR outperformed OLSLR to predict future VF progression, and the VBLR has a potential to be a helpful tool at clinical settings.
Squish: Near-Optimal Compression for Archival of Relational Datasets

PubMed Central

Gao, Yihan; Parameswaran, Aditya

2017-01-01

Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal for compressing relational datasets since they ignore the table structure and relationships between attributes. We study compression algorithms that leverage the relational structure to compress datasets to a much greater extent. We develop Squish, a system that uses a combination of Bayesian Networks and Arithmetic Coding to capture multiple kinds of dependencies among attributes and achieve near-entropy compression rate. Squish also supports user-defined attributes: users can instantiate new data types by simply implementing five functions for a new class interface. We prove the asymptotic optimality of our compression algorithm and conduct experiments to show the effectiveness of our system: Squish achieves a reduction of over 50% in storage size relative to systems developed in prior work on a variety of real datasets. PMID:28180028
Omicseq: a web-based search engine for exploring omics datasets.

PubMed

Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

2017-07-03

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
ISRUC-Sleep: A comprehensive public dataset for sleep researchers.

PubMed

Khalighi, Sirvan; Sousa, Teresa; Santos, José Moutinho; Nunes, Urbano

2016-02-01

To facilitate the performance comparison of new methods for sleep patterns analysis, datasets with quality content, publicly-available, are very important and useful. We introduce an open-access comprehensive sleep dataset, called ISRUC-Sleep. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC). The dataset comprises three groups of data: (1) data concerning 100 subjects, with one recording session per subject; (2) data gathered from 8 subjects; two recording sessions were performed per subject, and (3) data collected from one recording session related to 10 healthy subjects. The polysomnography (PSG) recordings, associated with each subject, were visually scored by two human experts. Comparing the existing sleep-related public datasets, ISRUC-Sleep provides data of a reasonable number of subjects with different characteristics such as: data useful for studies involving changes in the PSG signals over time; and data of healthy subjects useful for studies involving comparison of healthy subjects with the patients, suffering from sleep disorders. This dataset was created aiming to complement existing datasets by providing easy-to-apply data collection with some characteristics not covered yet. ISRUC-Sleep can be useful for analysis of new contributions: (i) in biomedical signal processing; (ii) in development of ASSC methods; and (iii) on sleep physiology studies. To evaluate and compare new contributions, which use this dataset as a benchmark, results of applying a subject-independent automatic sleep stage classification (ASSC) method on ISRUC-Sleep dataset are presented. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Quantifying uncertainty in observational rainfall datasets

NASA Astrophysics Data System (ADS)

Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

2015-04-01

The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded
Precipitation climatology over India: validation with observations and reanalysis datasets and spatial trends

NASA Astrophysics Data System (ADS)

Kishore, P.; Jyothi, S.; Basha, Ghouse; Rao, S. V. B.; Rajeevan, M.; Velicogna, Isabella; Sutterley, Tyler C.

2016-01-01

Changing rainfall patterns have significant effect on water resources, agriculture output in many countries, especially the country like India where the economy depends on rain-fed agriculture. Rainfall over India has large spatial as well as temporal variability. To understand the variability in rainfall, spatial-temporal analyses of rainfall have been studied by using 107 (1901-2007) years of daily gridded India Meteorological Department (IMD) rainfall datasets. Further, the validation of IMD precipitation data is carried out with different observational and different reanalysis datasets during the period from 1989 to 2007. The Global Precipitation Climatology Project data shows similar features as that of IMD with high degree of comparison, whereas Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation data show similar features but with large differences, especially over northwest, west coast and western Himalayas. Spatially, large deviation is observed in the interior peninsula during the monsoon season with National Aeronautics Space Administration-Modern Era Retrospective-analysis for Research and Applications (NASA-MERRA), pre-monsoon with Japanese 25 years Re Analysis (JRA-25), and post-monsoon with climate forecast system reanalysis (CFSR) reanalysis datasets. Among the reanalysis datasets, European Centre for Medium-Range Weather Forecasts Interim Re-Analysis (ERA-Interim) shows good comparison followed by CFSR, NASA-MERRA, and JRA-25. Further, for the first time, with high resolution and long-term IMD data, the spatial distribution of trends is estimated using robust regression analysis technique on the annual and seasonal rainfall data with respect to different regions of India. Significant positive and negative trends are noticed in the whole time series of data during the monsoon season. The northeast and west coast of the Indian region shows significant positive trends and negative trends over western Himalayas and
Internal Consistency of the NVAP Water Vapor Dataset

NASA Technical Reports Server (NTRS)

Suggs, Ronnie J.; Jedlovec, Gary J.; Arnold, James E. (Technical Monitor)

2001-01-01

The NVAP (NASA Water Vapor Project) dataset is a global dataset at 1 x 1 degree spatial resolution consisting of daily, pentad, and monthly atmospheric precipitable water (PW) products. The analysis blends measurements from the Television and Infrared Operational Satellite (TIROS) Operational Vertical Sounder (TOVS), the Special Sensor Microwave/Imager (SSM/I), and radiosonde observations into a daily collage of PW. The original dataset consisted of five years of data from 1988 to 1992. Recent updates have added three additional years (1993-1995) and incorporated procedural and algorithm changes from the original methodology. Since each of the PW sources (TOVS, SSM/I, and radiosonde) do not provide global coverage, each of these sources compliment one another by providing spatial coverage over regions and during times where the other is not available. For this type of spatial and temporal blending to be successful, each of the source components should have similar or compatible accuracies. If this is not the case, regional and time varying biases may be manifested in the NVAP dataset. This study examines the consistency of the NVAP source data by comparing daily collocated TOVS and SSM/I PW retrievals with collocated radiosonde PW observations. The daily PW intercomparisons are performed over the time period of the dataset and for various regions.
Topic modeling for cluster analysis of large biological and medical datasets

PubMed Central

2014-01-01

Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than
Topic modeling for cluster analysis of large biological and medical datasets.

PubMed

Zhao, Weizhong; Zou, Wen; Chen, James J

2014-01-01

The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting
Geospatial resources for the geologic community: The USGS National Map

USGS Publications Warehouse

Witt, Emitt C.

2015-01-01

Geospatial data are a key component of investigating, interpreting, and communicating the geological sciences. Locating geospatial data can be time-consuming, which detracts from time spent on a study because these data are not obviously placed in central locations or are served from many disparate databases. The National Map of the US Geological Survey is a publicly available resource for accessing the geospatial base map data needs of the geological community from a central location. The National Map data are available through a viewer and download platform providing access to eight primary data themes, plus the US Topo and scanned historical topographic maps. The eight themes are elevation, orthoimagery, hydrography, geographic names, boundaries, transportation, structures, and land cover, and they are being offered for download as predefined tiles in formats supported by leading geographic information system software. Data tiles are periodically refreshed to capture the most current content and are an efficient method for disseminating and receiving geospatial information. Elevation data, for example, are offered as a download from the National Map as 1° × 1° tiles for the 10- and 30- m products and as 15′ × 15′ tiles for the higher-resolution 3-m product. Vector data sets with smaller file sizes are offered at several tile sizes and formats. Partial tiles are not a download option—any prestaged data that intersect the requesting bounding box will be, in their entirety, part of the download order. While there are many options for accessing geospatial data via the Web, the National Map represents authoritative sources of data that are documented and can be referenced for citation and inclusion in scientific publications. Therefore, National Map products and services should be part of a geologist’s first stop for geospatial information and data.

Coastal circulation and hydrography in the Gulf of Tehuantepec, Mexico, during winter

NASA Astrophysics Data System (ADS)

Barton, E. D.; Lavín, M. F.; Trasviña, A.

2009-02-01

Winter observations of shelf and slope hydrography and currents in the inner Gulf of Tehuantepec are analysed from two field studies in 1989 and 1996 to specify the variability of near-shore conditions under varying wind stress. During the winter period frequent outbursts of 'Norte' winds over the central Gulf result in persistent alongshore inflows along both its eastern and western coasts. Wind-induced variability on time scales of several days strongly influences the shelf currents, but has greater effect on its western coast because of the generation and separation of anticyclonic eddies there. The steadier inflow (˜0.2 m s -1) on the eastern shelf is evident in a strong down-bowing of shallow isosurfaces towards the coast within 100 km of shore, below a wedge of warmer, fresher and lighter water. This persistent entry of less saline (33.4-34.0), warmer water from the southeast clearly originates in buoyancy input by rivers along the Central American coast, but is augmented by a general shoreward tendency (0.2 m s -1) in the southeastern Gulf. The resultant shallow tongue of anomalous water is generally swept offshore in the head of the Gulf and mixed away by the strong outflow and vertical overturning of the frequent 'Norte' events but during wind relaxations the warm, low-salinity coastal flow may briefly extend further west. In the head of the Gulf, flow is predominantly offshore (<0.2 m s -1) as the alongshore component alternates eastward and westward in association with elevation or depression, respectively, of the pycnocline against the shore. More saline, open ocean water is introduced from the north-western side of the Gulf by the inflow along the west coast. During extended wind relaxations, the flow becomes predominantly eastward beyond the shelf while nearshore the coastally trapped buoyant inflow from the southeast penetrates across the entire head of the gulf at least as far as its western limit. On the basis of these and other recent
Food Recognition: A New Dataset, Experiments, and Results.

PubMed

Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo

2017-05-01

We propose a new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications. Each image depicts a real canteen tray with dishes and foods arranged in different ways. Each tray contains multiple instances of food classes. The dataset contains 1027 canteen trays for a total of 3616 food instances belonging to 73 food classes. The food on the tray images has been manually segmented using carefully drawn polygonal boundaries. We have benchmarked the dataset by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class. We have experimented with three different classification strategies using also several visual descriptors. We achieve about 79% of food and tray recognition accuracy using convolutional-neural-networks-based features. The dataset, as well as the benchmark framework, are available to the research community.
A reanalysis dataset of the South China Sea.

PubMed

Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

2014-01-01

Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992-2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability.
Dataset definition for CMS operations and physics analyses

NASA Astrophysics Data System (ADS)

Franzoni, Giovanni; Compact Muon Solenoid Collaboration

2016-04-01

Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.
Network Intrusion Dataset Assessment

DTIC Science & Technology

2013-03-01

Security, 6(1):173–180, October 2009. abs/0911.0787. 70 • Jungsuk Song, Hiroki Takakura, Yasuo Okabe, and Koji Nakao. “Toward a more practical...Inoue, and Koji Nakao. “Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation”. BADGERS ’11: Proceedings of
Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

PubMed

Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

2017-08-01

At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.
Harmonization of forest disturbance datasets of the conterminous USA from 1986 to 2011

USGS Publications Warehouse

Soulard, Christopher E.; Acevedo, William; Cohen, Warren B.; Yang, Zhiqiang; Stehman, Stephen V.; Taylor, Janis L.

2017-01-01

Several spatial forest disturbance datasets exist for the conterminous USA. The major problem with forest disturbance mapping is that variability between map products leads to uncertainty regarding the actual rate of disturbance. In this article, harmonized maps were produced from multiple data sources (i.e., Global Forest Change, LANDFIRE Vegetation Disturbance, National Land Cover Database, Vegetation Change Tracker, and Web-Enabled Landsat Data). The harmonization process involved fitting common class ontologies and determining spatial congruency to produce forest disturbance maps for four time intervals (1986–1992, 1992–2001, 2001–2006, and 2006–2011). Pixels mapped as disturbed for two or more datasets were labeled as disturbed in the harmonized maps. The primary advantage gained by harmonization was improvement in commission error rates relative to the individual disturbance products. Disturbance omission errors were high for both harmonized and individual forest disturbance maps due to underlying limitations in mapping subtle disturbances with Landsat classification algorithms. To enhance the value of the harmonized disturbance products, we used fire perimeter maps to add information on the cause of disturbance.
Harmonization of forest disturbance datasets of the conterminous USA from 1986 to 2011.

PubMed

Soulard, Christopher E; Acevedo, William; Cohen, Warren B; Yang, Zhiqiang; Stehman, Stephen V; Taylor, Janis L

2017-04-01

Several spatial forest disturbance datasets exist for the conterminous USA. The major problem with forest disturbance mapping is that variability between map products leads to uncertainty regarding the actual rate of disturbance. In this article, harmonized maps were produced from multiple data sources (i.e., Global Forest Change, LANDFIRE Vegetation Disturbance, National Land Cover Database, Vegetation Change Tracker, and Web-Enabled Landsat Data). The harmonization process involved fitting common class ontologies and determining spatial congruency to produce forest disturbance maps for four time intervals (1986-1992, 1992-2001, 2001-2006, and 2006-2011). Pixels mapped as disturbed for two or more datasets were labeled as disturbed in the harmonized maps. The primary advantage gained by harmonization was improvement in commission error rates relative to the individual disturbance products. Disturbance omission errors were high for both harmonized and individual forest disturbance maps due to underlying limitations in mapping subtle disturbances with Landsat classification algorithms. To enhance the value of the harmonized disturbance products, we used fire perimeter maps to add information on the cause of disturbance.
Visualization of conserved structures by fusing highly variable datasets.

PubMed

Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

2002-01-01

Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual
Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

DTIC Science & Technology

2014-12-23

publications for benchmarking prognostics algorithms. The turbofan degradation datasets have received over seven thousand unique downloads in the last five...approaches that researchers have taken to implement prognostics using these turbofan datasets. Some unique characteristics of these datasets are also...Description of the five turbofan degradation datasets available from NASA repository. Datasets #Fault Modes #Conditions #Train Units #Test Units
Specialized food composition dataset for vitamin D content in foods based on European standards: Application to dietary intake assessment.

PubMed

Milešević, Jelena; Samaniego, Lourdes; Kiely, Mairead; Glibetić, Maria; Roe, Mark; Finglas, Paul

2018-02-01

A review of national nutrition surveys from 2000 to date, demonstrated high prevalence of vitamin D intakes below the EFSA Adequate Intake (AI) (<15μg/d vitamin D) in adults across Europe. Dietary assessment and modelling are required to monitor efficacy and safety of ongoing strategic vitamin D fortification. To support these studies, a specialized vitamin D food composition dataset, based on EuroFIR standards, was compiled. The FoodEXplorer™ tool was used to retrieve well documented analytical data for vitamin D and arrange the data into two datasets - European (8 European countries, 981 data values) and US (1836 data values). Data were classified, using the LanguaL™, FoodEX2 and ODIN classification systems and ranked according to quality criteria. Significant differences in the content, quality of data values, missing data on vitamin D 2 and 25(OH)D 3 and documentation of analytical methods were observed. The dataset is available through the EuroFIR platform. Copyright © 2017 Elsevier Ltd. All rights reserved.
Total ozone trends from 1979 to 2016 derived from five merged observational datasets - the emergence into ozone recovery

NASA Astrophysics Data System (ADS)

Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Long, Craig S.; Loyola, Diego

2018-02-01

We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978-present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (˜ 1996 globally and ˜ 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 % decade-1 that are barely statistically significant at the 2σ uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 % decade-1, while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of ground-based data, are not accounted for in the trend
SisFall: A Fall and Movement Dataset

PubMed Central

Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

2017-01-01

Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark. PMID:28117691
SisFall: A Fall and Movement Dataset.

PubMed

Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

2017-01-20

Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark.
Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

NASA Astrophysics Data System (ADS)

Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

2016-11-01

This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.
CIFAR10-DVS: An Event-Stream Dataset for Object Classification

PubMed Central

Li, Hongmin; Liu, Hanchao; Ji, Xiangyang; Li, Guoqi; Shi, Luping

2017-01-01

Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as “CIFAR10-DVS.” The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification. PMID:28611582
CIFAR10-DVS: An Event-Stream Dataset for Object Classification.

PubMed

Li, Hongmin; Liu, Hanchao; Ji, Xiangyang; Li, Guoqi; Shi, Luping

2017-01-01

Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as "CIFAR10-DVS." The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification.
Finding Spatio-Temporal Patterns in Large Sensor Datasets

ERIC Educational Resources Information Center

McGuire, Michael Patrick

2010-01-01

Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…
Impact of frailty on outcomes in geriatric femoral neck fracture management: An analysis of national surgical quality improvement program dataset.

PubMed

Dayama, Anand; Olorunfemi, Odunayo; Greenbaum, Simon; Stone, Melvin E; McNelis, John

2016-04-01

Frailty is a clinical state of increased vulnerability resulting from aging-associated decline in physiologic reserve. Hip fractures are serious fall injuries that affect our aging population. We retrospectively sought to study the effect of frailty on postoperative outcomes after Total Hip Arthroplasty (THA) and Hemiarthroplasty (HA) for femoral neck fracture in a national data set. National Surgical Quality Improvement Project dataset (NSQIP) was queried to identify THA and HA for a primary diagnosis femoral neck fracture using ICD-9 codes. Frailty was assessed using the modified frailty index (mFI) derived from the Canadian Study of Health and Aging. The primary outcome was 30-day mortality and secondary outcomes were 30-day morbidity and failure to rescue (FTR). We used multivariate logistic regression to estimate odds ratio for outcomes while controlling for confounders. Of 3121 patients, mean age of patients was 77.34 ± 9.8 years. The overall 30-day mortality was 6.4% (3.2%-THA and 7.2%-HA). One or more severe complications (Clavien-Dindo class-IV) occurred in 7.1% patients (6.7%-THA vs.7.2%-HA). Adjusted odds ratios (ORs) for mortality in the group with the higher than median frailty score were 2 (95%CI, 1.4-3.7) after HA and 3.9 (95%CI, 1.3-11.1) after THA. Similarly, in separate multivariate analysis for Clavien-Dindo Class-IV complications and failure to rescue 1.6 times (CI95% 1.15-2.25) and 2.1 times (CI95% 1.12-3.93) higher odds were noted in above median frailty group. mFI is an independent predictor of mortality among patients undergoing HA and THA for femoral neck fracture beyond traditional risk factors such as age, ASA class, and other comorbidities. Level II. Copyright © 2016 IJS Publishing Group Limited. Published by Elsevier Ltd. All rights reserved.
Revised Land Use Characteristic Dataset for Asia and Southwest Asia for the Navy Aerosol Analysis and Prediction System (NAAPS)

NASA Astrophysics Data System (ADS)

Walker, A. L.; Richardson, K.; Westphal, D. L.

2002-12-01

Presently, the Navy Aerosol Analysis and Prediction System (NAAPS) uses the U.S. Geological Survey (USGS) land use characteristic dataset to determine global dust emission areas. The USGS dataset was developed from Advanced Very High-Resolution Radiometer 1-km data from April 1992 to March 1993. In the past decade drastic changes in land and water use in Asia and Southwest Asia have quickly outdated this dataset. In China and Mongolia, age-old practices of farming and animal husbandry have been abandoned. Herders have too many animals in one location allowing the grassland to be eaten away and leaving vast areas of topsoil exposed and primed for removal by the wind. In the case of Southwest Asia, a four-year drought is in progress. Many of the wetlands and marshes in the river deltas are drying up from the lack of water runoff. To compound the problem several new dams were and are being built along the major watersheds. In particular, Iraq's dam building in the 1990's and politically driven draining of the Mesopotamian marshes between the Tigris and Euphrates rivers has lead to the near disappearance of this historical marshland. To incorporate these changes we are updating the USGS land use characteristic dataset using GIS-like software named ENVI (Environment for Visualizing Images), 1 km National Geophysical Data Center (NGDC) global topographical data, satellite imagery, and recently released governmental maps and reports. (For example, within the last two years the Chinese and Mongolian governments have released land degradation and desertification maps to satisfy the requirements set forth by United Nations Convention to Combat Desertification.) The steps taken to create the new land use characteristic database will be described in detail. Before (non-dust producing areas) and after (dust producing areas) examples will be shown.

Heuristics for Relevancy Ranking of Earth Dataset Search Results

NASA Astrophysics Data System (ADS)

Lynnes, C.; Quinn, P.; Norton, J.

2016-12-01

As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
Heuristics for Relevancy Ranking of Earth Dataset Search Results

NASA Technical Reports Server (NTRS)

Lynnes, Christopher; Quinn, Patrick; Norton, James

2016-01-01

As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

PubMed

Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

2010-06-30

QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but
Towards interoperable and reproducible QSAR analyses: Exchange of datasets

PubMed Central

2010-01-01

Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets
The national elevation data set

USGS Publications Warehouse

Gesch, Dean B.; Oimoen, Michael J.; Greenlee, Susan K.; Nelson, Charles A.; Steuck, Michael J.; Tyler, Dean J.

2002-01-01

The NED is a seamless raster dataset from the USGS that fulfills many of the concepts of framework geospatial data as envisioned for the NSDI, allowing users to focus on analysis rather than data preparation. It is regularly maintained and updated, and it provides basic elevation data for many GIS applications. The NED is one of several seamless datasets that the USGS is making available through the Web. The techniques and approaches developed for producing, maintaining, and distributing the NED are the type that will be used for implementing the USGS National Map (http://nationalmap.usgs.gov/).
VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

NASA Astrophysics Data System (ADS)

Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.
Toward Computational Cumulative Biology by Combining Models of Biological Datasets

PubMed Central

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176
Toward computational cumulative biology by combining models of biological datasets.

PubMed

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
Improving the discoverability, accessibility, and citability of omics datasets: a case report.

PubMed

Darlington, Yolanda F; Naumov, Alexey; McOwiti, Apollo; Kankanamge, Wasula H; Becnel, Lauren B; McKenna, Neil J

2017-03-01

Although omics datasets represent valuable assets for hypothesis generation, model testing, and data validation, the infrastructure supporting their reuse lacks organization and consistency. Using nuclear receptor signaling transcriptomic datasets as proof of principle, we developed a model to improve the discoverability, accessibility, and citability of published omics datasets. Primary datasets were retrieved from archives, processed to extract data points, then subjected to metadata enrichment and gap filling. The resulting secondary datasets were exposed on responsive web pages to support mining of gene lists, discovery of related datasets, and single-click citation integration with popular reference managers. Automated processes were established to embed digital object identifier-driven links to the secondary datasets in associated journal articles, small molecule and gene-centric databases, and a dataset search engine. Our model creates multiple points of access to reprocessed and reannotated derivative datasets across the digital biomedical research ecosystem, promoting their visibility and usability across disparate research communities. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A dataset of forest biomass structure for Eurasia.

PubMed

Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

2017-05-16

The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.
A reanalysis dataset of the South China Sea

PubMed Central

Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

2014-01-01

Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803
A dataset of forest biomass structure for Eurasia

NASA Astrophysics Data System (ADS)

Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

2017-05-01

The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.
Optimizing tertiary storage organization and access for spatio-temporal datasets

NASA Technical Reports Server (NTRS)

Chen, Ling Tony; Rotem, Doron; Shoshani, Arie; Drach, Bob; Louis, Steve; Keating, Meridith

1994-01-01

We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and interactive visualization. We have developed data partitioning techniques that partition datasets into 'clusters' based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.
Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

PubMed Central

Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

2014-01-01

SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111
Assessment of the NASA-USGS Global Land Survey (GLS) Datasets

USGS Publications Warehouse

Gutman, Garik; Huang, Chengquan; Chander, Gyanesh; Noojipady, Praveen; Masek, Jeffery G.

2013-01-01

The Global Land Survey (GLS) datasets are a collection of orthorectified, cloud-minimized Landsat-type satellite images, providing near complete coverage of the global land area decadally since the early 1970s. The global mosaics are centered on 1975, 1990, 2000, 2005, and 2010, and consist of data acquired from four sensors: Enhanced Thematic Mapper Plus, Thematic Mapper, Multispectral Scanner, and Advanced Land Imager. The GLS datasets have been widely used in land-cover and land-use change studies at local, regional, and global scales. This study evaluates the GLS datasets with respect to their spatial coverage, temporal consistency, geodetic accuracy, radiometric calibration consistency, image completeness, extent of cloud contamination, and residual gaps. In general, the three latest GLS datasets are of a better quality than the GLS-1990 and GLS-1975 datasets, with most of the imagery (85%) having cloud cover of less than 10%, the acquisition years clustered much more tightly around their target years, better co-registration relative to GLS-2000, and better radiometric absolute calibration. Probably, the most significant impediment to scientific use of the datasets is the variability of image phenology (i.e., acquisition day of year). This paper provides end-users with an assessment of the quality of the GLS datasets for specific applications, and where possible, suggestions for mitigating their deficiencies.
Brown CA et al 2016 Dataset

EPA Pesticide Factsheets

This dataset contains the research described in the following publication:Brown, C.A., D. Sharp, and T. Mochon Collura. 2016. Effect of Climate Change on Water Temperature and Attainment of Water Temperature Criteria in the Yaquina Estuary, Oregon (USA). Estuarine, Coastal and Shelf Science. 169:136-146, doi: 10.1016/j.ecss.2015.11.006.This dataset is associated with the following publication:Brown , C., D. Sharp, and T. MochonCollura. Effect of Climate Change on Water Temperature and Attainment of Water Temperature Criteria in the Yaquina Estuary, Oregon (USA). ESTUARINE, COASTAL AND SHELF SCIENCE. Elsevier Science Ltd, New York, NY, USA, 169: 136-146, (2016).
Conducting high-value secondary dataset analysis: an introductory guide and resources.

PubMed

Smith, Alexander K; Ayanian, John Z; Covinsky, Kenneth E; Landon, Bruce E; McCarthy, Ellen P; Wee, Christina C; Steinman, Michael A

2011-08-01

Secondary analyses of large datasets provide a mechanism for researchers to address high impact questions that would otherwise be prohibitively expensive and time-consuming to study. This paper presents a guide to assist investigators interested in conducting secondary data analysis, including advice on the process of successful secondary data analysis as well as a brief summary of high-value datasets and online resources for researchers, including the SGIM dataset compendium ( www.sgim.org/go/datasets ). The same basic research principles that apply to primary data analysis apply to secondary data analysis, including the development of a clear and clinically relevant research question, study sample, appropriate measures, and a thoughtful analytic approach. A real-world case description illustrates key steps: (1) define your research topic and question; (2) select a dataset; (3) get to know your dataset; and (4) structure your analysis and presentation of findings in a way that is clinically meaningful. Secondary dataset analysis is a well-established methodology. Secondary analysis is particularly valuable for junior investigators, who have limited time and resources to demonstrate expertise and productivity.
Generation of openEHR Test Datasets for Benchmarking.

PubMed

El Helou, Samar; Karvonen, Tuukka; Yamamoto, Goshiro; Kume, Naoto; Kobayashi, Shinji; Kondo, Eiji; Hiragi, Shusuke; Okamoto, Kazuya; Tamura, Hiroshi; Kuroda, Tomohiro

2017-01-01

openEHR is a widely used EHR specification. Given its technology-independent nature, different approaches for implementing openEHR data repositories exist. Public openEHR datasets are needed to conduct benchmark analyses over different implementations. To address their current unavailability, we propose a method for generating openEHR test datasets that can be publicly shared and used.
Parallel task processing of very large datasets

NASA Astrophysics Data System (ADS)

Romig, Phillip Richardson, III

This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.
Wind and wave dataset for Matara, Sri Lanka

NASA Astrophysics Data System (ADS)

Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

2018-01-01

We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: NLCD 2001 Imperviousness

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the mean percent impervious surface from the Imperviousness Layer of the National Land Cover Dataset 2001 (LaMotte and Wieczorek, 2010), compiled for every catchment of NHDPlus for the conterminous United States. The source data set represents imperviousness for the conterminous United States for 2001. The Imperviousness Layer of the National Land Cover Data Set for 2001 was produced through a cooperative project conducted by the Multi-Resolution Land Characteristics (MRLC) Consortium. The MRLC Consortium is a partnership of Federal agencies (http://www.mrlc.gov), consisting of the U.S. Geological Survey (USGS), the National Oceanic and Atmospheric Administration (NOAA), the U.S. Environmental Protection Agency (USEPA), the U.S. Department of Agriculture (USDA), the U.S. Forest Service (USFS), the National Park Service (NPS), the U.S. Fish and Wildlife Service (USFWS), the Bureau of Land Management (BLM), and the USDA Natural Resources Conservation Service (NRCS). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: NLCD 2001 Land Use and Land Cover

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the estimated area of land use and land cover from the National Land Cover Dataset 2001 (LaMotte, 2008), compiled for every catchment of NHDPlus for the conterminous United States. The source data set represents land use and land cover for the conterminous United States for 2001. The National Land Cover Data Set for 2001 was produced through a cooperative project conducted by the Multi-Resolution Land Characteristics (MRLC) Consortium. The MRLC Consortium is a partnership of Federal agencies (http://www.mrlc.gov), consisting of the U.S. Geological Survey (USGS), the National Oceanic and Atmospheric Administration (NOAA), the U.S. Environmental Protection Agency (USEPA), the U.S. Department of Agriculture (USDA), the U.S. Forest Service (USFS), the National Park Service (NPS), the U.S. Fish and Wildlife Service (USFWS), the Bureau of Land Management (BLM), and the USDA Natural Resources Conservation Service (NRCS). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

PubMed

Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

2017-01-01

As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross
Digital Hydrologic Networks Supporting Applications Related to Spatially Referenced Regression Modeling

USGS Publications Warehouse

Brakebill, J.W.; Wolock, D.M.; Terziotti, S.E.

2011-01-01

Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based/statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling. ?? 2011 American Water Resources Association. This article is a U.S. Government work and is in the public domain in the USA.
Using Graph Indices for the Analysis and Comparison of Chemical Datasets.

PubMed

Fourches, Denis; Tropsha, Alexander

2013-10-01

In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Interactive visualization and analysis of multimodal datasets for surgical applications.

PubMed

Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

2012-12-01

Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.
Five year global dataset: NMC operational analyses (1978 to 1982)

NASA Technical Reports Server (NTRS)

Straus, David; Ardizzone, Joseph

1987-01-01

This document describes procedures used in assembling a five year dataset (1978 to 1982) using NMC Operational Analysis data. These procedures entailed replacing missing and unacceptable data in order to arrive at a complete dataset that is continuous in time. In addition, a subjective assessment on the integrity of all data (both preliminary and final) is presented. Documentation on tapes comprising the Five Year Global Dataset is also included.
Exploring patterns enriched in a dataset with contrastive principal component analysis.

PubMed

Abid, Abubakar; Zhang, Martin J; Bagaria, Vivek K; Zou, James

2018-05-30

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Level 3 Ecoregions

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the estimated area of level 3 ecological landscape regions (ecoregions), as defined by Omernik (1987), compiled for every catchment of NHDPlus for the conterminous United States. The source data set is Level III Ecoregions of the Continental United States (U.S. Environmental Protection Agency, 2003). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Hydrologic Landscape Regions

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the area of Hydrologic Landscape Regions (HLR) compiled for every catchment of NHDPlus for the conterminous United States. The source data set is a 100-meter version of Hydrologic Landscape Regions of the United States (Wolock, 2003). HLR groups watersheds on the basis of similarities in land-surface form, geologic texture, and climate characteristics. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris
Attributes for NHDPlus Catchments (Version 1.1): Basin Characteristics, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents basin characteristics, compiled for every catchment in NHDPlus for the conterminous United States. These characteristics are basin shape index, stream density, sinuosity, mean elevation, mean slope, and number of road-stream crossings. The source data sets are the U.S. Environmental Protection Agency's NHDPlus and the U.S. Census Bureau's TIGER/Line Files. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Base-Flow Index

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This tabular data set represents the mean base-flow index expressed as a percent, compiled for every catchment in NHDPlus for the conterminous United States. Base flow is the component of streamflow that can be attributed to ground-water discharge into streams. The source data set is Base-Flow Index for the Conterminous United States (Wolock, 2003). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Average Atmospheric (Wet) Deposition of Inorganic Nitrogen, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the average atmospheric (wet) deposition, in kilograms per square kilometer, of inorganic nitrogen for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. The source data set for wet deposition was from the USGS's raster data set atmospheric (wet) deposition of inorganic nitrogen for 2002 (Gronberg, 2005). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years (2007-2008), an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Nutrient Inputs from Fertilizer and Manure, Nitrogen and Phosphorus (N&P), 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the estimated amount of nitrogen and phosphorus in kilograms for the year 2002, compiled for every catchment of NHDPlus for the conterminous United States. The source data set is County-Level Estimates of Nutrient Inputs to the Land Surface of the Conterminous United States, 1982-2001 (Ruddy and others, 2006). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production
Attributes for NHDPlus catchments (version 1.1) for the conterminous United States: surficial geology

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the area of surficial geology types in square meters compiled for every catchment of NHDPlus for the conterminous United States. The source data set is the "Digital data set describing surficial geology in the conterminous US" (Clawges and Price, 1999). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Estimated Mean Annual Natural Groundwater Recharge, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the mean annual natural groundwater recharge, in millimeters, compiled for every catchment of NHDPlus for the conterminous United States. The source data set is Estimated Mean Annual Natural Ground-Water Recharge in the Conterminous United States (Wolock, 2003). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, containing NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the
Attributes for NHDplus Catchments (Version 1.1) for the Conterminous United States: Population Density, 2000

USGS Publications Warehouse

Wieczorek, Michael; LaMottem, Andrew E.

2010-01-01

This data set represents the average population density, in number of people per square kilometer multiplied by 10 for the year 2000, compiled for every catchment of NHDPlus for the conterminous United States. The source data set is the 2000 Population Density by Block Group for the Conterminous United States (Hitt, 2003). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Nutrient Application (Phosphorus and Nitrogen ) for Fertilizer and Manure Applied to Crops (Cropsplit), 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the estimated amount of phosphorus and nitrogen fertilizers applied to selected crops for the year 2002, compiled for every catchment of NHDPlus for the conterminous United States. The source data set is based on 2002 fertilizer data (Ruddy and others, 2006) and tabulated by crop type per county (Alexander and others, 2007). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains
Total Ozone Trends from 1979 to 2016 Derived from Five Merged Observational Datasets - The Emergence into Ozone Recovery

NASA Technical Reports Server (NTRS)

Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Loyola, Diego

2018-01-01

We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978â€“present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (approximately 1996 globally and approximately 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 percent decade(exp. -1) that are barely statistically significant at the 2 Sigma uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 percent(exp.-1), while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of
GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

PubMed

Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

2015-07-02

A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare

PubMed Central

Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

2015-01-01

A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a “data modeler” tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets. PMID:26147731
A database of georeferenced nutrient chemistry data for mountain lakes of the Western United States

PubMed Central

Williams, Jason; Labou, Stephanie G.

2017-01-01

Human activities have increased atmospheric nitrogen and phosphorus deposition rates relative to pre-industrial background. In the Western U.S., anthropogenic nutrient deposition has increased nutrient concentrations and stimulated algal growth in at least some remote mountain lakes. The Georeferenced Lake Nutrient Chemistry (GLNC) Database was constructed to create a spatially-extensive lake chemistry database needed to assess atmospheric nutrient deposition effects on Western U.S. mountain lakes. The database includes nitrogen and phosphorus water chemistry data spanning 1964–2015, with 148,336 chemistry results from 51,048 samples collected across 3,602 lakes in the Western U.S. Data were obtained from public databases, government agencies, scientific literature, and researchers, and were formatted into a consistent table structure. All data are georeferenced to a modified version of the National Hydrography Dataset Plus version 2. The database is transparent and reproducible; R code and input files used to format data are provided in an appendix. The database will likely be useful to those assessing spatial patterns of lake nutrient chemistry associated with atmospheric deposition or other environmental stressors. PMID:28509907
Partial polygon pruning of hydrographic features in automated generalization

USGS Publications Warehouse

Stum, Alexander K.; Buttenfield, Barbara P.; Stanislawski, Larry V.

2017-01-01

This paper demonstrates a working method to automatically detect and prune portions of waterbody polygons to support creation of a multi-scale hydrographic database. Water features are known to be sensitive to scale change; and thus multiple representations are required to maintain visual and geographic logic at smaller scales. Partial pruning of polygonal features—such as long and sinuous reservoir arms, stream channels that are too narrow at the target scale, and islands that begin to coalesce—entails concurrent management of the length and width of polygonal features as well as integrating pruned polygons with other generalized point and linear hydrographic features to maintain stream network connectivity. The implementation follows data representation standards developed by the U.S. Geological Survey (USGS) for the National Hydrography Dataset (NHD). Portions of polygonal rivers, streams, and canals are automatically characterized for width, length, and connectivity. This paper describes an algorithm for automatic detection and subsequent processing, and shows results for a sample of NHD subbasins in different landscape conditions in the United States.
Vulnerable transportation and utility assets near actively migrating streams in Indiana

USGS Publications Warehouse

Sperl, Benjamin J.

2017-11-02

An investigation was completed by the U.S. Geological Survey in cooperation with the Indiana Office of Community and Rural Affairs that found 1,132 transportation and utility assets in Indiana are vulnerable to fluvial erosion hazards due to close proximity to actively migrating streams. Locations of transportation assets (bridges, roadways, and railroad lines) and selected utility assets (high-capacity overhead power-transmission lines, underground pipelines, water treatment facilities, and in-channel dams) were determined using aerial imagery hosted by the Google Earth platform. Identified assets were aggregated by stream reach, county, and class. Accompanying the report is a polyline shapefile of the stream reaches documented by Robinson. The shapefile, derived from line work in the National Hydrography Dataset and attributed with channel migration rates, is released with complete Federal Geographic Data Committee metadata. The data presented in this report are intended to help stakeholders and others identify high-risk areas where transportation and utility assets may be threatened by fluvial erosion hazards thus warranting consideration for mitigation strategies.
Database for the geologic map of Upper Geyser Basin, Yellowstone National Park, Wyoming

USGS Publications Warehouse

Abendini, Atosa A.; Robinson, Joel E.; Muffler, L. J. Patrick; White, D. E.; Beeson, Melvin H.; Truesdell, A. H.

2015-01-01

This dataset contains contacts, geologic units, and map boundaries from Miscellaneous Investigations Series Map I-1371, "The Geologic map of upper Geyser Basin, Yellowstone, National Park, Wyoming". This dataset was constructed to produce a digital geologic map as a basis for ongoing studies of hydrothermal processes.
Geoseq: a tool for dissecting deep-sequencing datasets.

PubMed

Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

2010-10-12

Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
A Research Graph dataset for connecting research data repositories using RD-Switchboard.

PubMed

Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

2018-05-29

This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.
Dataset for forensic analysis of B-tree file system.

PubMed

Wani, Mohamad Ahtisham; Bhat, Wasim Ahmad

2018-06-01

Since B-tree file system (Btrfs) is set to become de facto standard file system on Linux (and Linux based) operating systems, Btrfs dataset for forensic analysis is of great interest and immense value to forensic community. This article presents a novel dataset for forensic analysis of Btrfs that was collected using a proposed data-recovery procedure. The dataset identifies various generalized and common file system layouts and operations, specific node-balancing mechanisms triggered, logical addresses of various data structures, on-disk records, recovered-data as directory entries and extent data from leaf and internal nodes, and percentage of data recovered.
Process mining in oncology using the MIMIC-III dataset

NASA Astrophysics Data System (ADS)

Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

2018-03-01

Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.
Microarray Analysis Dataset

EPA Pesticide Factsheets

This file contains a link for Gene Expression Omnibus and the GSE designations for the publicly available gene expression data used in the study and reflected in Figures 6 and 7 for the Das et al., 2016 paper.This dataset is associated with the following publication:Das, K., C. Wood, M. Lin, A.A. Starkov, C. Lau, K.B. Wallace, C. Corton, and B. Abbott. Perfluoroalky acids-induced liver steatosis: Effects on genes controlling lipid homeostasis. TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 378: 32-52, (2017).
Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network.

PubMed

Han, Seung Seog; Park, Gyeong Hun; Lim, Woohyung; Kim, Myoung Shin; Na, Jung Im; Park, Ilwoo; Chang, Sung Eun

2018-01-01

Although there have been reports of the successful diagnosis of skin disorders using deep learning, unrealistically large clinical image datasets are required for artificial intelligence (AI) training. We created datasets of standardized nail images using a region-based convolutional neural network (R-CNN) trained to distinguish the nail from the background. We used R-CNN to generate training datasets of 49,567 images, which we then used to fine-tune the ResNet-152 and VGG-19 models. The validation datasets comprised 100 and 194 images from Inje University (B1 and B2 datasets, respectively), 125 images from Hallym University (C dataset), and 939 images from Seoul National University (D dataset). The AI (ensemble model; ResNet-152 + VGG-19 + feedforward neural networks) results showed test sensitivity/specificity/ area under the curve values of (96.0 / 94.7 / 0.98), (82.7 / 96.7 / 0.95), (92.3 / 79.3 / 0.93), (87.7 / 69.3 / 0.82) for the B1, B2, C, and D datasets. With a combination of the B1 and C datasets, the AI Youden index was significantly (p = 0.01) higher than that of 42 dermatologists doing the same assessment manually. For B1+C and B2+ D dataset combinations, almost none of the dermatologists performed as well as the AI. By training with a dataset comprising 49,567 images, we achieved a diagnostic accuracy for onychomycosis using deep learning that was superior to that of most of the dermatologists who participated in this study.
A comparison of public datasets for acceleration-based fall detection.

PubMed

Igual, Raul; Medrano, Carlos; Plaza, Inmaculada

2015-09-01

Falls are one of the leading causes of mortality among the older population, being the rapid detection of a fall a key factor to mitigate its main adverse health consequences. In this context, several authors have conducted studies on acceleration-based fall detection using external accelerometers or smartphones. The published detection rates are diverse, sometimes close to a perfect detector. This divergence may be explained by the difficulties in comparing different fall detection studies in a fair play since each study uses its own dataset obtained under different conditions. In this regard, several datasets have been made publicly available recently. This paper presents a comparison, to the best of our knowledge for the first time, of these public fall detection datasets in order to determine whether they have an influence on the declared performances. Using two different detection algorithms, the study shows that the performances of the fall detection techniques are affected, to a greater or lesser extent, by the specific datasets used to validate them. We have also found large differences in the generalization capability of a fall detector depending on the dataset used for training. In fact, the performance decreases dramatically when the algorithms are tested on a dataset different from the one used for training. Other characteristics of the datasets like the number of training samples also have an influence on the performance while algorithms seem less sensitive to the sampling frequency or the acceleration range. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.
SAR image classification based on CNN in real and simulation datasets

NASA Astrophysics Data System (ADS)

Peng, Lijiang; Liu, Ming; Liu, Xiaohua; Dong, Liquan; Hui, Mei; Zhao, Yuejin

2018-04-01

Convolution neural network (CNN) has made great success in image classification tasks. Even in the field of synthetic aperture radar automatic target recognition (SAR-ATR), state-of-art results has been obtained by learning deep representation of features on the MSTAR benchmark. However, the raw data of MSTAR have shortcomings in training a SAR-ATR model because of high similarity in background among the SAR images of each kind. This indicates that the CNN would learn the hierarchies of features of backgrounds as well as the targets. To validate the influence of the background, some other SAR images datasets have been made which contains the simulation SAR images of 10 manufactured targets such as tank and fighter aircraft, and the backgrounds of simulation SAR images are sampled from the whole original MSTAR data. The simulation datasets contain the dataset that the backgrounds of each kind images correspond to the one kind of backgrounds of MSTAR targets or clutters and the dataset that each image shares the random background of whole MSTAR targets or clutters. In addition, mixed datasets of MSTAR and simulation datasets had been made to use in the experiments. The CNN architecture proposed in this paper are trained on all datasets mentioned above. The experimental results shows that the architecture can get high performances on all datasets even the backgrounds of the images are miscellaneous, which indicates the architecture can learn a good representation of the targets even though the drastic changes on background.
On sample size and different interpretations of snow stability datasets

NASA Astrophysics Data System (ADS)

Schirmer, M.; Mitterer, C.; Schweizer, J.

2009-04-01

Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar
Really big data: Processing and analysis of large datasets

USDA-ARS?s Scientific Manuscript database

Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...
A polymer dataset for accelerated property prediction and design.

PubMed

Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

2016-03-01

Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.
A polymer dataset for accelerated property prediction and design

DOE PAGES

Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; ...

2016-03-01

Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate targetmore » of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. As a result, it will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.« less
A robust dataset-agnostic heart disease classifier from Phonocardiogram.

PubMed

Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

2017-07-01

Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

PubMed Central

Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

2017-01-01

Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic
Determining Scale-dependent Patterns in Spatial and Temporal Datasets

NASA Astrophysics Data System (ADS)

Roy, A.; Perfect, E.; Mukerji, T.; Sylvester, L.

2016-12-01

Spatial and temporal datasets of interest to Earth scientists often contain plots of one variable against another, e.g., rainfall magnitude vs. time or fracture aperture vs. spacing. Such data, comprised of distributions of events along a transect / timeline along with their magnitudes, can display persistent or antipersistent trends, as well as random behavior, that may contain signatures of underlying physical processes. Lacunarity is a technique that was originally developed for multiscale analysis of data. In a recent study we showed that lacunarity can be used for revealing changes in scale-dependent patterns in fracture spacing data. Here we present a further improvement in our technique, with lacunarity applied to various non-binary datasets comprised of event spacings and magnitudes. We test our technique on a set of four synthetic datasets, three of which are based on an autoregressive model and have magnitudes at every point along the "timeline" thus representing antipersistent, persistent, and random trends. The fourth dataset is made up of five clusters of events, each containing a set of random magnitudes. The concept of lacunarity ratio, LR, is introduced; this is the lacunarity of a given dataset normalized to the lacunarity of its random counterpart. It is demonstrated that LR can successfully delineate scale-dependent changes in terms of antipersistence and persistence in the synthetic datasets. This technique is then applied to three different types of data: a hundred-year rainfall record from Knoxville, TN, USA, a set of varved sediments from Marca Shale, and a set of fracture aperture and spacing data from NE Mexico. While the rainfall data and varved sediments both appear to be persistent at small scales, at larger scales they both become random. On the other hand, the fracture data shows antipersistence at small scale (within cluster) and random behavior at large scales. Such differences in behavior with respect to scale-dependent changes in

State of Texas - Highlighting low-lying areas derived from USGS Digital Elevation Data

USGS Publications Warehouse

Kosovich, John J.

2008-01-01

In support of U.S. Geological Survey (USGS) disaster preparedness efforts, this map depicts a color shaded relief representation of Texas and a grayscale relief of the surrounding areas. The first 30 feet of relief above mean sea level are displayed as brightly colored 5-foot elevation bands, which highlight low-elevation areas at a coarse spatial resolution. Standard USGS National Elevation Dataset (NED) 1 arc-second (nominally 30-meter) digital elevation model (DEM) data are the basis for the map, which is designed to be used at a broad scale and for informational purposes only. The NED data were derived from the original 1:24,000-scale USGS topographic map bare-earth contours, which were converted into gridded quadrangle-based DEM tiles at a constant post spacing (grid cell size) of either 30 meters (data before the mid-1990s) or 10 meters (mid-1990s and later data). These individual-quadrangle DEMs were then converted to spherical coordinates (latitude/longitude decimal degrees) and edge-matched to ensure seamlessness. The NED source data for this map consists of a mixture of 30-meter- and 10-meter-resolution DEMs. State and county boundary, hydrography, city, and road layers were modified from USGS National Atlas data downloaded in 2003. The NED data were downloaded in 2002. Shaded relief over Mexico was obtained from the USGS National Atlas.
An assessment of differences in gridded precipitation datasets in complex terrain

NASA Astrophysics Data System (ADS)

Henn, Brian; Newman, Andrew J.; Livneh, Ben; Daly, Christopher; Lundquist, Jessica D.

2018-01-01

Hydrologic modeling and other geophysical applications are sensitive to precipitation forcing data quality, and there are known challenges in spatially distributing gauge-based precipitation over complex terrain. We conduct a comparison of six high-resolution, daily and monthly gridded precipitation datasets over the Western United States. We compare the long-term average spatial patterns, and interannual variability of water-year total precipitation, as well as multi-year trends in precipitation across the datasets. We find that the greatest absolute differences among datasets occur in high-elevation areas and in the maritime mountain ranges of the Western United States, while the greatest percent differences among datasets relative to annual total precipitation occur in arid and rain-shadowed areas. Differences between datasets in some high-elevation areas exceed 200 mm yr-1 on average, and relative differences range from 5 to 60% across the Western United States. In areas of high topographic relief, true uncertainties and biases are likely higher than the differences among the datasets; we present evidence of this based on streamflow observations. Precipitation trends in the datasets differ in magnitude and sign at smaller scales, and are sensitive to how temporal inhomogeneities in the underlying precipitation gauge data are handled.
Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

NASA Astrophysics Data System (ADS)

Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

2016-04-01

Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not
Global distribution of urban parameters derived from high-resolution global datasets for weather modelling

NASA Astrophysics Data System (ADS)

Kawano, N.; Varquez, A. C. G.; Dong, Y.; Kanda, M.

2016-12-01

Numerical model such as Weather Research and Forecasting model coupled with single-layer Urban Canopy Model (WRF-UCM) is one of the powerful tools to investigate urban heat island. Urban parameters such as average building height (Have), plain area index (λp) and frontal area index (λf), are necessary inputs for the model. In general, these parameters are uniformly assumed in WRF-UCM but this leads to unrealistic urban representation. Distributed urban parameters can also be incorporated into WRF-UCM to consider a detail urban effect. The problem is that distributed building information is not readily available for most megacities especially in developing countries. Furthermore, acquiring real building parameters often require huge amount of time and money. In this study, we investigated the potential of using globally available satellite-captured datasets for the estimation of the parameters, Have, λp, and λf. Global datasets comprised of high spatial resolution population dataset (LandScan by Oak Ridge National Laboratory), nighttime lights (NOAA), and vegetation fraction (NASA). True samples of Have, λp, and λf were acquired from actual building footprints from satellite images and 3D building database of Tokyo, New York, Paris, Melbourne, Istanbul, Jakarta and so on. Regression equations were then derived from the block-averaging of spatial pairs of real parameters and global datasets. Results show that two regression curves to estimate Have and λf from the combination of population and nightlight are necessary depending on the city's level of development. An index which can be used to decide which equation to use for a city is the Gross Domestic Product (GDP). On the other hand, λphas less dependence on GDP but indicated a negative relationship to vegetation fraction. Finally, a simplified but precise approximation of urban parameters through readily-available, high-resolution global datasets and our derived regressions can be utilized to estimate a
Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

NASA Technical Reports Server (NTRS)

Ramasso, Emannuel; Saxena, Abhinav

2014-01-01

Benchmarking of prognostic algorithms has been challenging due to limited availability of common datasets suitable for prognostics. In an attempt to alleviate this problem several benchmarking datasets have been collected by NASA's prognostic center of excellence and made available to the Prognostics and Health Management (PHM) community to allow evaluation and comparison of prognostics algorithms. Among those datasets are five C-MAPSS datasets that have been extremely popular due to their unique characteristics making them suitable for prognostics. The C-MAPSS datasets pose several challenges that have been tackled by different methods in the PHM literature. In particular, management of high variability due to sensor noise, effects of operating conditions, and presence of multiple simultaneous fault modes are some factors that have great impact on the generalization capabilities of prognostics algorithms. More than 70 publications have used the C-MAPSS datasets for developing data-driven prognostic algorithms. The C-MAPSS datasets are also shown to be well-suited for development of new machine learning and pattern recognition tools for several key preprocessing steps such as feature extraction and selection, failure mode assessment, operating conditions assessment, health status estimation, uncertainty management, and prognostics performance evaluation. This paper summarizes a comprehensive literature review of publications using C-MAPSS datasets and provides guidelines and references to further usage of these datasets in a manner that allows clear and consistent comparison between different approaches.
Measurement Properties of the NIH-Minimal Dataset Dutch Language Version in Patients With Chronic Low Back Pain.

PubMed

Boer, Annemarie; Dutmer, Alisa L; Schiphorst Preuper, Henrica R; van der Woude, Lucas H V; Stewart, Roy E; Deyo, Richard A; Reneman, Michiel F; Soer, Remko

2017-10-01

Validation study with cross-sectional and longitudinal measurements. To translate the US National Institutes of Health (NIH)-minimal dataset for clinical research on chronic low back pain into the Dutch language and to test its validity and reliability among people with chronic low back pain. The NIH developed a minimal dataset to encourage more complete and consistent reporting of clinical research and to be able to compare studies across countries in patients with low back pain. In the Netherlands, the NIH-minimal dataset has not been translated before and measurement properties are unknown. Cross-cultural validity was tested by a formal forward-backward translation. Structural validity was tested with exploratory factor analyses (comparative fit index, Tucker-Lewis index, and root mean square error of approximation). Hypothesis testing was performed to compare subscales of the NIH dataset with the Pain Disability Index and the EurQol-5D (Pearson correlation coefficients). Internal consistency was tested with Cronbach α and test-retest reliability at 2 weeks was calculated in a subsample of patients with Intraclass Correlation Coefficients and weighted Kappa (κω). In total, 452 patients were included of which 52 were included for the test-retest study. factor analysis for structural validity pointed into the direction of a seven-factor model (Cronbach α = 0.78). Factors and total score of the NIH-minimal dataset showed fair to good correlations with Pain Disability Index (r = 0.43-0.70) and EuroQol-5D (r = -0.41 to -0.64). Reliability: test-retest reliability per item showed substantial agreement (κω=0.65). Test-retest reliability per factor was moderate to good (Intraclass Correlation Coefficient = 0.71). The Dutch language version measurement properties of the NIH-minimal were satisfactory. N/A.
Toxics Release Inventory Chemical Hazard Information Profiles (TRI-CHIP) Dataset

EPA Pesticide Factsheets

The Toxics Release Inventory (TRI) Chemical Hazard Information Profiles (TRI-CHIP) dataset contains hazard information about the chemicals reported in TRI. Users can use this XML-format dataset to create their own databases and hazard analyses of TRI chemicals. The hazard information is compiled from a series of authoritative sources including the Integrated Risk Information System (IRIS). The dataset is provided as a downloadable .zip file that when extracted provides XML files and schemas for the hazard information tables.
Drought Early Warning and Agro-Meteorological Risk Assessment using Earth Observation Rainfall Datasets and Crop Water Budget Modelling

NASA Astrophysics Data System (ADS)

Tarnavsky, E.

2016-12-01

The water resources satisfaction index (WRSI) model is widely used in drought early warning and food security analyses, as well as in agro-meteorological risk management through weather index-based insurance. Key driving data for the model is provided from satellite-based rainfall estimates such as ARC2 and TAMSAT over Africa and CHIRPS globally. We evaluate the performance of these rainfall datasets for detecting onset and cessation of rainfall and estimating crop production conditions for the WRSI model. We also examine the sensitivity of the WRSI model to different satellite-based rainfall products over maize growing regions in Tanzania. Our study considers planting scenarios for short-, medium-, and long-growing cycle maize, and we apply these for 'regular' and drought-resistant maize, as well as with two different methods for defining the start of season (SOS). Simulated maize production estimates are compared against available reported production figures at the national and sub-national (province) levels. Strengths and weaknesses of the driving rainfall data, insights into the role of the SOS definition method, and phenology-based crop yield coefficient and crop yield reduction functions are discussed in the context of space-time drought characteristics. We propose a way forward for selecting skilled rainfall datasets and discuss their implication for crop production monitoring and the design and structure of weather index-based insurance products as risk transfer mechanisms implemented across scales for smallholder farmers to national programmes.
An environmental assessment of United States drinking water watersheds

Treesearch

James Wickham; Timothy Wade; Kurt Riitters

2011-01-01

Abstract There is an emerging recognition that natural lands and their conservation are important elements of a sustainable drinking water infrastructure. We conducted a national, watershed-level environmental assessment of 5,265 drinking water watersheds using data on land cover, hydrography and conservation status. Approximately 78% of the conterminous United States...
Discovering New Global Climate Patterns: Curating a 21-Year High Temporal (Hourly) and Spatial (40km) Resolution Reanalysis Dataset

NASA Astrophysics Data System (ADS)

Hou, C. Y.; Dattore, R.; Peng, G. S.

2014-12-01

The National Center for Atmospheric Research's Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40km Reanalysis dataset is a dynamically downscaled dataset with high temporal and spatial resolution. The dataset contains three-dimensional hourly analyses in netCDF format for the global atmospheric state from 1985 to 2005 on a 40km horizontal grid (0.4°grid increment) with 28 vertical levels, providing good representation of local forcing and diurnal variation of processes in the planetary boundary layer. This project aimed to make the dataset publicly available, accessible, and usable in order to provide a unique resource to allow and promote studies of new climate characteristics. When the curation project started, it had been five years since the data files were generated. Also, although the Principal Investigator (PI) had generated a user document at the end of the project in 2009, the document had not been maintained. Furthermore, the PI had moved to a new institution, and the remaining team members were reassigned to other projects. These factors made data curation in the areas of verifying data quality, harvest metadata descriptions, documenting provenance information especially challenging. As a result, the project's curation process found that: Data curator's skill and knowledge helped make decisions, such as file format and structure and workflow documentation, that had significant, positive impact on the ease of the dataset's management and long term preservation. Use of data curation tools, such as the Data Curation Profiles Toolkit's guidelines, revealed important information for promoting the data's usability and enhancing preservation planning. Involving data curators during each stage of the data curation life cycle instead of at the end could improve the curation process' efficiency. Overall, the project showed that proper resources invested in the curation process would give datasets the best chance to fulfill their potential to
ESSG-based global spatial reference frame for datasets interrelation

NASA Astrophysics Data System (ADS)

Yu, J. Q.; Wu, L. X.; Jia, Y. J.

2013-10-01

To know well about the highly complex earth system, a large volume of, as well as a large variety of, datasets on the planet Earth are being obtained, distributed, and shared worldwide everyday. However, seldom of existing systems concentrates on the distribution and interrelation of different datasets in a common Global Spatial Reference Frame (GSRF), which holds an invisble obstacle to the data sharing and scientific collaboration. Group on Earth Obeservation (GEO) has recently established a new GSRF, named Earth System Spatial Grid (ESSG), for global datasets distribution, sharing and interrelation in its 2012-2015 WORKING PLAN.The ESSG may bridge the gap among different spatial datasets and hence overcome the obstacles. This paper is to present the implementation of the ESSG-based GSRF. A reference spheroid, a grid subdvision scheme, and a suitable encoding system are required to implement it. The radius of ESSG reference spheroid was set to the double of approximated Earth radius to make datasets from different areas of earth system science being covered. The same paramerters of positioning and orienting as Earth Centred Earth Fixed (ECEF) was adopted for the ESSG reference spheroid to make any other GSRFs being freely transformed into the ESSG-based GSRF. Spheroid degenerated octree grid with radius refiment (SDOG-R) and its encoding method were taken as the grid subdvision and encoding scheme for its good performance in many aspects. A triple (C, T, A) model is introduced to represent and link different datasets based on the ESSG-based GSRF. Finally, the methods of coordinate transformation between the ESSGbased GSRF and other GSRFs were presented to make ESSG-based GSRF operable and propagable.
Atlas-Guided Cluster Analysis of Large Tractography Datasets

PubMed Central

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Atlas-guided cluster analysis of large tractography datasets.

PubMed

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
Management and assimilation of diverse, distributed watershed datasets

NASA Astrophysics Data System (ADS)

Varadharajan, C.; Faybishenko, B.; Versteeg, R.; Agarwal, D.; Hubbard, S. S.; Hendrix, V.

2016-12-01

The U.S. Department of Energy's (DOE) Watershed Function Scientific Focus Area (SFA) seeks to determine how perturbations to mountainous watersheds (e.g., floods, drought, early snowmelt) impact the downstream delivery of water, nutrients, carbon, and metals over seasonal to decadal timescales. We are building a software platform that enables integration of diverse and disparate field, laboratory, and simulation datasets, of various types including hydrological, geological, meteorological, geophysical, geochemical, ecological and genomic datasets across a range of spatial and temporal scales within the Rifle floodplain and the East River watershed, Colorado. We are using agile data management and assimilation approaches, to enable web-based integration of heterogeneous, multi-scale dataSensor-based observations of water-level, vadose zone and groundwater temperature, water quality, meteorology as well as biogeochemical analyses of soil and groundwater samples have been curated and archived in federated databases. Quality Assurance and Quality Control (QA/QC) are performed on priority datasets needed for on-going scientific analyses, and hydrological and geochemical modeling. Automated QA/QC methods are used to identify and flag issues in the datasets. Data integration is achieved via a brokering service that dynamically integrates data from distributed databases via web services, based on user queries. The integrated results are presented to users in a portal that enables intuitive search, interactive visualization and download of integrated datasets. The concepts, approaches and codes being used are shared across various data science components of various large DOE-funded projects such as the Watershed Function SFA, Next Generation Ecosystem Experiment (NGEE) Tropics, Ameriflux/FLUXNET, and Advanced Simulation Capability for Environmental Management (ASCEM), and together contribute towards DOE's cyberinfrastructure for data management and model-data integration.
NP_PAH_interaction dataset

EPA Pesticide Factsheets

Concentrations of different polyaromatic hydrocarbons in water before and after interaction with nanomaterials. The results show the capacity of engineer nanomaterials for adsorbing different organic pollutants. This dataset is associated with the following publication:Sahle-Demessie, E., A. Zhao, C. Han, B. Hann, and H. Grecsek. Interaction of engineered nanomaterials with hydrophobic organic pollutants.. Journal of Nanotechnology. Hindawi Publishing Corporation, New York, NY, USA, 27(28): 284003, (2016).
Exploring drivers of wetland hydrologic fluxes across parameters and space

NASA Astrophysics Data System (ADS)

Jones, C. N.; Cheng, F. Y.; Mclaughlin, D. L.; Basu, N. B.; Lang, M.; Alexander, L. C.

2017-12-01

Depressional wetlands provide diverse ecosystem services, ranging from critical habitat to the regulation of landscape hydrology. The latter is of particular interest, because while hydrologic connectivity between depressional wetlands and downstream waters has been a focus of both scientific research and policy, it remains difficult to quantify the mode, magnitude, and timing of this connectivity at varying spatial and temporary scales. To do so requires robust empirical and modeling tools that accurately represent surface and subsurface flowpaths between depressional wetlands and other landscape elements. Here, we utilize a parsimonious wetland hydrology model to explore drivers of wetland water fluxes in different archetypal wetland-rich landscapes. We validated the model using instrumented sites from regions that span North America: Prairie Pothole Region (south-central Canada), Delmarva Peninsula (Mid-Atlantic Coastal Plain), and Big Cypress Swamp (southern Florida). Then, using several national scale datasets (e.g., National Wetlands Inventory, USFWS; National Hydrography Dataset, USGS; Soil Survey Geographic Database, NRCS), we conducted a global sensitivity analysis to elucidate dominant drivers of simulated fluxes. Finally, we simulated and compared wetland hydrology in five contrasting landscapes dominated by depressional wetlands: prairie potholes, Carolina and Delmarva bays, pocosins, western vernal pools, and Texas coastal prairie wetlands. Results highlight specific drivers that vary across these regions. Largely, hydroclimatic variables (e.g., PET/P ratios) controlled the timing and magnitude of wetland connectivity, whereas both wetland morphology (e.g., storage capacity and watershed size) and soil characteristics (e.g., ksat and confining layer depth) controlled the duration and mode (surface vs. subsurface) of wetland connectivity. Improved understanding of the drivers of wetland hydrologic connectivity supports enhanced, region
BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters.

PubMed

Biswas, Mithun; Islam, Rafiqul; Shom, Gautam Kumar; Shopon, Md; Mohammed, Nabeel; Momen, Sifat; Abedin, Anowarul

2017-06-01

BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.
Federal standards and procedures for the National Watershed Boundary Dataset (WBD)

USGS Publications Warehouse

,; ,; ,

2009-03-11

Terminology, definitions, and procedural information are provided to ensure uniformity in hydrologic unit boundaries, names, and numerical codes. Detailed standards and specifications for data are included. The document also includes discussion of objectives, communications required for revising the data resolution in the United States and the Caribbean, as well as final review and data-quality criteria. Instances of unusual landforms or artificial features that affect the hydrologic units are described with metadata standards. Up-to-date information and availability of the hydrologic units are listed athttp://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/technical/nra/dma/?&cid=nrcs143_021630/.
Open and scalable analytics of large Earth observation datasets: From scenes to multidimensional arrays using SciDB and GDAL

NASA Astrophysics Data System (ADS)

Appel, Marius; Lahn, Florian; Buytaert, Wouter; Pebesma, Edzer

2018-04-01

Earth observation (EO) datasets are commonly provided as collection of scenes, where individual scenes represent a temporal snapshot and cover a particular region on the Earth's surface. Using these data in complex spatiotemporal modeling becomes difficult as soon as data volumes exceed a certain capacity or analyses include many scenes, which may spatially overlap and may have been recorded at different dates. In order to facilitate analytics on large EO datasets, we combine and extend the geospatial data abstraction library (GDAL) and the array-based data management and analytics system SciDB. We present an approach to automatically convert collections of scenes to multidimensional arrays and use SciDB to scale computationally intensive analytics. We evaluate the approach in three study cases on national scale land use change monitoring with Landsat imagery, global empirical orthogonal function analysis of daily precipitation, and combining historical climate model projections with satellite-based observations. Results indicate that the approach can be used to represent various EO datasets and that analyses in SciDB scale well with available computational resources. To simplify analyses of higher-dimensional datasets as from climate model output, however, a generalization of the GDAL data model might be needed. All parts of this work have been implemented as open-source software and we discuss how this may facilitate open and reproducible EO analyses.
Scalable Visual Analytics of Massive Textual Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.

2007-04-01

This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

Harvard Aging Brain Study: Dataset and accessibility.

PubMed

Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

2017-01-01

The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.
Sensitivity of a numerical wave model on wind re-analysis datasets

NASA Astrophysics Data System (ADS)

Lavidas, George; Venugopal, Vengatesan; Friedrich, Daniel

2017-03-01

Wind is the dominant process for wave generation. Detailed evaluation of metocean conditions strengthens our understanding of issues concerning potential offshore applications. However, the scarcity of buoys and high cost of monitoring systems pose a barrier to properly defining offshore conditions. Through use of numerical wave models, metocean conditions can be hindcasted and forecasted providing reliable characterisations. This study reports the sensitivity of wind inputs on a numerical wave model for the Scottish region. Two re-analysis wind datasets with different spatio-temporal characteristics are used, the ERA-Interim Re-Analysis and the CFSR-NCEP Re-Analysis dataset. Different wind products alter results, affecting the accuracy obtained. The scope of this study is to assess different available wind databases and provide information concerning the most appropriate wind dataset for the specific region, based on temporal, spatial and geographic terms for wave modelling and offshore applications. Both wind input datasets delivered results from the numerical wave model with good correlation. Wave results by the 1-h dataset have higher peaks and lower biases, in expense of a high scatter index. On the other hand, the 6-h dataset has lower scatter but higher biases. The study shows how wind dataset affects the numerical wave modelling performance, and that depending on location and study needs, different wind inputs should be considered.
The Centennial Trends Greater Horn of Africa precipitation dataset.

PubMed

Funk, Chris; Nicholson, Sharon E; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

2015-01-01

East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded 'Centennial Trends' precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data.
The Centennial Trends Greater Horn of Africa precipitation dataset

USGS Publications Warehouse

Funk, Chris; Nicholson, Sharon E.; Landsfeld, Martin F.; Klotter, Douglas; Peterson, Pete J.; Harrison, Laura

2015-01-01

East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded ‘Centennial Trends’ precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data.
A comparison of NLCD 2011 and LANDFIRE EVT 2010: Regional and national summaries.

USGS Publications Warehouse

McKerrow, Alexa; Dewitz, Jon; Long, Donald G.; Nelson, Kurtis; Connot, Joel A.; Smith, Jim

2016-01-01

In order to provide the land cover user community a summary of the similarity and differences between the 2011 National Land Cover Dataset (NLCD) and the Landscape Fire and Resource Management Planning Tools Program Existing Vegetation 2010 Data (LANDFIRE EVT), the two datasets were compared at a national (conterminous U.S.) and regional (Eastern, Midwestern, and Western) extents (Figure 1). The comparisons were done by generalizing the LANDFIRE data to be consistent with mapped land cover classes in the NLCD (i.e., crosswalked). Summaries of the comparisons were based on areal extent including 1) the total extent of each land cover class, and 2) land cover classes in corresponding 900-m2 areas. The results from the comparisons provide the user community information regarding the utility of both datasets relative to their intended uses.
Querying Large Biological Network Datasets

ERIC Educational Resources Information Center

Gulsoy, Gunhan

2013-01-01

New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…
Dataset used to improve liquid water absorption models in the microwave

DOE Data Explorer

Turner, David

2015-12-14

Two datasets, one a compilation of laboratory data and one a compilation from three field sites, are provided here. These datasets provide measurements of the real and imaginary refractive indices and absorption as a function of cloud temperature. These datasets were used in the development of the new liquid water absorption model that was published in Turner et al. 2015.
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Physiographic Provinces

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This dataset represents the area of each physiographic province (Fenneman and Johnson, 1946) in square meters, compiled for every catchment of NHDPlus for the conterminous United States. The source data are from Fenneman and Johnson's Physiographic Provinces of the United States, which is based on 8 major divisions, 25 provinces, and 86 sections representing distinctive areas having common topography, rock type and structure, and geologic and geomorphic history (Fenneman and Johnson, 1946). The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Average Annual Daily Minimum Temperature, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the average monthly minimum temperature in Celsius multiplied by 100 for 2002 compiled for every catchment of NHDPlus for the conterminous United States. The source data were the Near-Real-Time High-Resolution Monthly Average Maximum/Minimum Temperature for the Conterminous United States for 2002 raster dataset produced by the Spatial Climate Analysis Service at Oregon State University. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio
Attributes for NHDPlus catchments (version 1.1) for the conterminous United States: Average Annual Daily Maximum Temperature, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the average monthly maximum temperature in Celsius multiplied by 100 for 2002 compiled for every catchment of NHDPlus for the conterminous United States. The source data were the Near-Real-Time High-Resolution Monthly Average Maximum/Minimum Temperature for the Conterminous United States for 2002 raster dataset produced by the Spatial Climate Analysis Service at Oregon State University. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Average Monthly Precipitation, 2002

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

This data set represents the average monthly precipitation in millimeters multiplied by 100 for 2002 compiled for every catchment of NHDPlus for the conterminous United States. The source data were the Near-Real-Time Monthly High-Resolution Precipitation Climate Data Set for the Conterminous United States (2002) raster dataset produced by the Spatial Climate Analysis Service at Oregon State University. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper
Primary Datasets for Case Studies of River-Water Quality

ERIC Educational Resources Information Center

Goulder, Raymond

2008-01-01

Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…
A dataset of human decision-making in teamwork management.

PubMed

Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

2017-01-17

Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.
A dataset of human decision-making in teamwork management

PubMed Central

Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

2017-01-01

Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members’ capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches. PMID:28094787
A global experimental dataset for assessing grain legume production

PubMed Central

Cernay, Charles; Pelzer, Elise; Makowski, David

2016-01-01

Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide. PMID:27676125
A dataset of human decision-making in teamwork management

NASA Astrophysics Data System (ADS)

Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

2017-01-01

Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.
Abandoned Uranium Mine (AUM) Surface Areas, Navajo Nation, 2016, US EPA Region 9

EPA Pesticide Factsheets

This GIS dataset contains polygon features that represent all Abandoned Uranium Mines (AUMs) on or within one mile of the Navajo Nation. Attributes include mine names, aliases, Potentially Responsible Parties, reclaimation status, EPA mine status, links to AUM reports, and the region in which an AUM is located. This dataset contains 608 features.
Reference datasets for bioequivalence trials in a two-group parallel design.

PubMed

Fuglsang, Anders; Schütz, Helmut; Labes, Detlew

2015-03-01

In order to help companies qualify and validate the software used to evaluate bioequivalence trials with two parallel treatment groups, this work aims to define datasets with known results. This paper puts a total 11 datasets into the public domain along with proposed consensus obtained via evaluations from six different software packages (R, SAS, WinNonlin, OpenOffice Calc, Kinetica, EquivTest). Insofar as possible, datasets were evaluated with and without the assumption of equal variances for the construction of a 90% confidence interval. Not all software packages provide functionality for the assumption of unequal variances (EquivTest, Kinetica), and not all packages can handle datasets with more than 1000 subjects per group (WinNonlin). Where results could be obtained across all packages, one showed questionable results when datasets contained unequal group sizes (Kinetica). A proposal is made for the results that should be used as validation targets.
Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

NASA Astrophysics Data System (ADS)

Liu, Bo; Pop, Mihai

Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.
ClimateNet: A Machine Learning dataset for Climate Science Research

NASA Astrophysics Data System (ADS)

Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

2017-12-01

Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

Regional climate change study requires new temperature datasets

NASA Astrophysics Data System (ADS)

Wang, K.; Zhou, C.

2016-12-01

Analyses of global mean air temperature (Ta), i. e., NCDC GHCN, GISS, and CRUTEM4, are the fundamental datasets for climate change study and provide key evidence for global warming. All of the global temperature analyses over land are primarily based on meteorological observations of the daily maximum and minimum temperatures (Tmax and Tmin) and their averages (T2) because in most weather stations, the measurements of Tmax and Tmin may be the only choice for a homogenous century-long analysis of mean temperature. Our studies show that these datasets are suitable for long-term global warming studies. However, they may introduce substantial bias in quantifying local and regional warming rates, i.e., with a root mean square error of more than 25% at 5°x 5° grids. From 1973 to 1997, the current datasets tend to significantly underestimate the warming rate over the central U.S. and overestimate the warming rate over the northern high latitudes. Similar results revealed during the period 1998-2013, the warming hiatus period, indicate the use of T2 enlarges the spatial contrast of temperature trends. This because T2 over land only sample air temperature twice daily and cannot accurately reflect land-atmosphere and incoming radiation variations in the temperature diurnal cycle. For better regional climate change detection and attribution, we suggest creating new global mean air temperature datasets based on the recently available high spatiotemporal resolution meteorological observations, i.e., daily four observations weather station since 1960s, These datasets will not only help investigate dynamical processes on temperature variances but also help better evaluate the reanalyzed and modeled simulations of temperature and make some substantial improvements for other related climate variables in models, especially over regional and seasonal aspects.
Regional climate change study requires new temperature datasets

NASA Astrophysics Data System (ADS)

Wang, Kaicun; Zhou, Chunlüe

2017-04-01

Analyses of global mean air temperature (Ta), i. e., NCDC GHCN, GISS, and CRUTEM4, are the fundamental datasets for climate change study and provide key evidence for global warming. All of the global temperature analyses over land are primarily based on meteorological observations of the daily maximum and minimum temperatures (Tmax and Tmin) and their averages (T2) because in most weather stations, the measurements of Tmax and Tmin may be the only choice for a homogenous century-long analysis of mean temperature. Our studies show that these datasets are suitable for long-term global warming studies. However, they may have substantial biases in quantifying local and regional warming rates, i.e., with a root mean square error of more than 25% at 5 degree grids. From 1973 to 1997, the current datasets tend to significantly underestimate the warming rate over the central U.S. and overestimate the warming rate over the northern high latitudes. Similar results revealed during the period 1998-2013, the warming hiatus period, indicate the use of T2 enlarges the spatial contrast of temperature trends. This is because T2 over land only samples air temperature twice daily and cannot accurately reflect land-atmosphere and incoming radiation variations in the temperature diurnal cycle. For better regional climate change detection and attribution, we suggest creating new global mean air temperature datasets based on the recently available high spatiotemporal resolution meteorological observations, i.e., daily four observations weather station since 1960s. These datasets will not only help investigate dynamical processes on temperature variances but also help better evaluate the reanalyzed and modeled simulations of temperature and make some substantial improvements for other related climate variables in models, especially over regional and seasonal aspects.
[Consideration of guidelines, recommendations and quality indicators for treatment of stroke in the dataset "Emergency Department" of DIVI].

PubMed

Kulla, M; Friess, M; Schellinger, P D; Harth, A; Busse, O; Walcher, F; Helm, M

2015-12-01

The dataset "Emergency Department" of the German Interdisciplinary Association of Critical Care and Emergency Medicine (DIVI) has been developed during several expert meetings. Its goal is an all-encompassing documentation of the early clinical treatment of patients in emergency departments. Using the example of the index disease acute ischemic stroke (stroke), the aim was to analyze how far this approach has been fulfilled. In this study German, European and US American guidelines were used to analyze the extent of coverage of the datasets on current emergency department guidelines and recommendations from professional societies. In addition, it was examined whether the dataset includes recommended quality indicators (QI) for quality management (QM) and in a third step it was examined to what extent national provisions for billing are included. In each case a differentiation was made whether the respective rationale was primary, i.e. directly apparent or whether it was merely secondarily depicted by expertise. In the evaluation an additional differentiation was made between the level of recommendations and further quality relevant criteria. The modular design of the emergency department dataset comprising 676 data fields is briefly described. A total of 401 individual fields, divided into basic documentation, monitoring and specific neurological documentation of the treatment of stroke patients were considered. For 247 data fields a rationale was found. Partially overlapping, 78.9 % of 214 medical recommendations in 3 guidelines and 85.8 % of the 106 identified quality indicators were primarily covered. Of the 67 requirements for billing of performance of services, 55.5 % are primarily part of the emergency department dataset. Through appropriate expertise and documentation by a board certified neurologist, the results can be improved to almost 100 %. The index disease stroke illustrates that the emergency department dataset of the DIVI covers medical
Schooling and National Income: How Large Are the Externalities?

ERIC Educational Resources Information Center

Breton, Theodore R.

2010-01-01

This paper uses a new data-set for cumulative national investment in formal schooling and a new instrument for schooling to estimate the national return on investment in 61 countries. These estimates are combined with data on the private rate of return on investment in schooling to estimate the external rate of return. In 1990 the external rate of…
Image segmentation evaluation for very-large datasets

NASA Astrophysics Data System (ADS)

Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

2016-03-01

With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.
Completion of the National Land Cover Database (NLCD) 1992-2001 Land Cover Change Retrofit Product

EPA Science Inventory

The Multi-Resolution Land Characteristics Consortium has supported the development of two national digital land cover products: the National Land Cover Dataset (NLCD) 1992 and National Land Cover Database (NLCD) 2001. Substantial differences in imagery, legends, and methods betwe...
Attributes for NHDPlus Catchments (Version 1.1) in the Conterminous United States: Artificial Drainage (1992) and Irrigation Types (1997)

USGS Publications Warehouse

Wieczorek, Michael; LaMotte, Andrew E.

2010-01-01

incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geological Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains
A modern vs. Permian black shale - the hydrography, primary productivity, and water-column chemistry of deposition

USGS Publications Warehouse

Piper, D.Z.; Perkins, R.B.

2004-01-01

The sediment currently accumulating in the Cariaco Basin, on the continental shelf of Venezuela, has an elevated organic-carbon content of approximately 5%; is accumulating under O2-depleted bottom-water conditions (SO42- reduction); is composed dominantly of foraminiferal calcite, diatomaceous silica, clay, and silt; and is dark greenish gray in color. Upon lithification, it will become a black shale. Recent studies have established the hydrography of the basin and the level of primary productivity and bottom-water redox conditions. These properties are used to model accumulation rates of Cd, Cr, Cu, Mo, Ni, V, and Zn on the seafloor. The model rates agree closely with measured rates for the uppermost surface sediment.The model is applied to the Meade Peak Phosphatic Shale Member of the Phosphoria Formation, a phosphate deposit of Permian age in the northwest United States. It too has all of the requisite properties of a black shale. Although the deposit is a world-class phosphorite, it is composed mostly of phosphatic mudstone and siltstone, chert, limestone, and dolomite. It has organic-carbon concentrations of up to 15%, is strongly enriched in several trace elements above a terrigenous contribution and is black. The trace-element accumulation defines a mean primary productivity in the photic zone of the Phosphoria Basin as moderate, at 500 g m-2 year-1 organic carbon, comparable to primary productivity in the Cariaco Basin. The source of nutrient-enriched water that was imported into the Phosphoria Basin, upwelled into the photic zone, and supported primary productivity was an O2 minimum zone of the open ocean. The depth range over which the water was imported would have been between approximately 100 and 600 m. The mean residence time of bottom water in the basin was approximately 4 years vs. 100 years in the Cariaco Basin. The bottom water was O2 depleted, but it was denitrifying, or NO3- reducing, rather than SO42- reducing. Published by Elsevier B.V.
The health care and life sciences community profile for dataset descriptions

PubMed Central

Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko

2016-01-01

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. PMID:27602295
The CMS dataset bookkeeping service

DOE Office of Scientific and Technical Information (OSTI.GOV)

Afaq, Anzar,; /Fermilab; Dolgert, Andrew

2007-10-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS ismore » available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.« less
The CMS dataset bookkeeping service

NASA Astrophysics Data System (ADS)

Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

2008-07-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.
seNorge2 daily precipitation, an observational gridded dataset over Norway from 1957 to the present day

NASA Astrophysics Data System (ADS)

Lussana, Cristian; Saloranta, Tuomo; Skaugen, Thomas; Magnusson, Jan; Tveito, Ole Einar; Andersen, Jess

2018-02-01

The conventional climate gridded datasets based on observations only are widely used in atmospheric sciences; our focus in this paper is on climate and hydrology. On the Norwegian mainland, seNorge2 provides high-resolution fields of daily total precipitation for applications requiring long-term datasets at regional or national level, where the challenge is to simulate small-scale processes often taking place in complex terrain. The dataset constitutes a valuable meteorological input for snow and hydrological simulations; it is updated daily and presented on a high-resolution grid (1 km of grid spacing). The climate archive goes back to 1957. The spatial interpolation scheme builds upon classical methods, such as optimal interpolation and successive-correction schemes. An original approach based on (spatial) scale-separation concepts has been implemented which uses geographical coordinates and elevation as complementary information in the interpolation. seNorge2 daily precipitation fields represent local precipitation features at spatial scales of a few kilometers, depending on the station network density. In the surroundings of a station or in dense station areas, the predictions are quite accurate even for intense precipitation. For most of the grid points, the performances are comparable to or better than a state-of-the-art pan-European dataset (E-OBS), because of the higher effective resolution of seNorge2. However, in very data-sparse areas, such as in the mountainous region of southern Norway, seNorge2 underestimates precipitation because it does not make use of enough geographical information to compensate for the lack of observations. The evaluation of seNorge2 as the meteorological forcing for the seNorge snow model and the DDD (Distance Distribution Dynamics) rainfall-runoff model shows that both models have been able to make profitable use of seNorge2, partly because of the automatic calibration procedure they incorporate for precipitation. The seNorge2
A Merged Dataset for Solar Probe Plus FIELDS Magnetometers

NASA Astrophysics Data System (ADS)

Bowen, T. A.; Dudok de Wit, T.; Bale, S. D.; Revillet, C.; MacDowall, R. J.; Sheppard, D.

2016-12-01

The Solar Probe Plus FIELDS experiment will observe turbulent magnetic fluctuations deep in the inner heliosphere. The FIELDS magnetometer suite implements a set of three magnetometers: two vector DC fluxgate magnetometers (MAGs), sensitive from DC- 100Hz, as well as a vector search coil magnetometer (SCM), sensitive from 10Hz-50kHz. Single axis measurements are additionally made up to 1MHz. To study the full range of observations, we propose merging data from the individual magnetometers into a single dataset. A merged dataset will improve the quality of observations in the range of frequencies observed by both magnetometers ( 10-100 Hz). Here we present updates on the individual MAG and SCM calibrations as well as our results on generating a cross-calibrated and merged dataset.
A cross-country Exchange Market Pressure (EMP) dataset.

PubMed

Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

2017-06-01

The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.
Upper layer circulation, hydrography, and biological response of the Andaman waters during winter monsoon based on in situ and satellite observations

NASA Astrophysics Data System (ADS)

Chandran, Salini Thaliyakkattil; Raj, Smitha Bal; Ravindran, Sajeev; Narayana, Sanjeevan Vellorkirakathil

2018-05-01

Upper layer circulation, hydrography, and biological response of Andaman waters during winter monsoon are assessed based on the observations carried out onboard FORV Sagar Sampada during January 2009 and November-December 2011. Cool and dry air carried by the moderate winds (6 m/s) from north and northeast indicates the influence of northeast monsoon (NEM) in the area during the observation time. The characteristics of physical parameters and the water mass indicate that the southeastern side is dominated by the less saline water from South China Sea intruded through the Malacca Strait, whereas the northeast is influenced by the freshwater from Ayeyarwady-Salween river system. The western side of the Andaman and Nicobar Islands exhibits similar properties of Bay of Bengal (BoB) water as evidenced in the T-S relation. Circulation pattern is uniform for the upper 88 m and is found to be more geostrophic rather than wind driven. Magnitude of the current velocity varies between 100 and 900 mm/s in November-December 2011 with strong current (900 mm/s) near Katchal and Nancowry islands and 100 and 1000 mm/s in January 2009 recording strong current (1000 mm/s) near the Little Nicobar Island. The Andaman waters are observed as less productive during the season based on the satellite-derived surface chl-a (0.1-0.4 mg/m3) and column-integrated primary productivity (PP) (100-275 mgC/m2/d).
Space-time clustering analysis of wildfires: The influence of dataset characteristics, fire prevention policy decisions, weather and climate.

PubMed

Parente, Joana; Pereira, Mário G; Tonini, Marj

2016-07-15

The present study focuses on the dependence of the space-time permutation scan statistics (STPSS) (1) on the input database's characteristics and (2) on the use of this methodology to assess changes on the fire regime due to different type of climate and fire management activities. Based on the very strong relationship between weather and the fire incidence in Portugal, the detected clusters will be interpreted in terms of the atmospheric conditions. Apart from being the country most affected by the fires in the European context, Portugal meets all the conditions required to carry out this study, namely: (i) two long and comprehensive official datasets, i.e. the Portuguese Rural Fire Database (PRFD) and the National Mapping Burnt Areas (NMBA), respectively based on ground and satellite measurements; (ii) the two types of climate (Csb in the north and Csa in the south) that characterizes the Mediterranean basin regions most affected by the fires also divide the mainland Portuguese area; and, (iii) the national plan for the defence of forest against fires was approved a decade ago and it is now reasonable to assess its impacts. Results confirmed (1) the influence of the dataset's characteristics on the detected clusters, (2) the existence of two different fire regimes in the country promoted by the different types of climate, (3) the positive impacts of the fire prevention policy decisions and (4) the ability of the STPSS to correctly identify clusters, regarding their number, location, and space-time size in spite of eventual space and/or time splits of the datasets. Finally, the role of the weather on days when clustered fires were active was confirmed for the classes of small, medium and large fires. Copyright © 2016 Elsevier B.V. All rights reserved.
The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

NASA Technical Reports Server (NTRS)

Bridges, James; Wernet, Mark P.

2011-01-01

Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.
EnviroAtlas National Layers Master Web Service

EPA Pesticide Factsheets

This EnviroAtlas web service supports research and online mapping activities related to EnviroAtlas (https://www.epa.gov/enviroatlas). This web service includes layers depicting EnviroAtlas national metrics mapped at the 12-digit HUC within the conterminous United States. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video

DTIC Science & Technology

2011-06-01

orders of magnitude larger than existing datasets such CAVIAR [7]. TRECVID 2008 airport dataset [16] contains 100 hours of video, but, it provides only...entire human figure (e.g., above shoulder), amounting to 500% human to video 2Some statistics are approximate, obtained from the CAVIAR 1st scene and...and diversity in both col- lection sites and viewpoints. In comparison to surveillance datasets such as CAVIAR [7] and TRECVID [16] shown in Fig. 3
Animal Viruses Probe dataset (AVPDS) for microarray-based diagnosis and identification of viruses.

PubMed

Yadav, Brijesh S; Pokhriyal, Mayank; Vasishtha, Dinesh P; Sharma, Bhaskar

2014-03-01

AVPDS (Animal Viruses Probe dataset) is a dataset of virus-specific and conserve oligonucleotides for identification and diagnosis of viruses infecting animals. The current dataset contain 20,619 virus specific probes for 833 viruses and their subtypes and 3,988 conserved probes for 146 viral genera. Dataset of virus specific probe has been divided into two fields namely virus name and probe sequence. Similarly conserved probes for virus genera table have genus, and subgroup within genus name and probe sequence. The subgroup within genus is artificially divided subgroups with no taxonomic significance and contains probes which identifies viruses in that specific subgroup of the genus. Using this dataset we have successfully diagnosed the first case of Newcastle disease virus in sheep and reported a mixed infection of Bovine viral diarrhea and Bovine herpesvirus in cattle. These dataset also contains probes which cross reacts across species experimentally though computationally they meet specifications. These probes have been marked. We hope that this dataset will be useful in microarray-based detection of viruses. The dataset can be accessed through the link https://dl.dropboxusercontent.com/u/94060831/avpds/HOME.html.

Dataset from chemical gas sensor array in turbulent wind tunnel.

PubMed

Fonollosa, Jordi; Rodríguez-Luján, Irene; Trincavelli, Marco; Huerta, Ramón

2015-06-01

The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to "On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines", by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings.
Improved statistical assessment of a long-term groundwater-quality dataset with a non-parametric permutation method

NASA Astrophysics Data System (ADS)

Thomas, M. A.

2016-12-01

The Waste Isolation Pilot Plant (WIPP) is the only deep geological repository for transuranic waste in the United States. As the Science Advisor for the WIPP, Sandia National Laboratories annually evaluates site data against trigger values (TVs), metrics whose violation is indicative of conditions that may impact long-term repository performance. This study focuses on a groundwater-quality dataset used to redesign a TV for the Culebra Dolomite Member (Culebra) of the Permian-age Rustler Formation. Prior to this study, a TV violation occurred if the concentration of a major ion fell outside a range defined as the mean +/- two standard deviations. The ranges were thought to denote conditions that 95% of future values would fall within. Groundwater-quality data used in evaluating compliance, however, are rarely normally distributed. To create a more robust Culebra groundwater-quality TV, this study employed the randomization test, a non-parametric permutation method. Recent groundwater compositions considered TV violations under the original ion concentration ranges are now interpreted as false positives in light of the insignificant p-values calculated with the randomization test. This work highlights that the normality assumption can weaken as the size of a groundwater-quality dataset grows over time. Non-parametric permutation methods are an attractive option because no assumption about the statistical distribution is required and calculating all combinations of the data is an increasingly tractable problem with modern workstations. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. This research is funded by WIPP programs administered by the Office of Environmental Management (EM) of the U.S. Department of Energy. SAND2016-7306A
Knowledge mining from clinical datasets using rough sets and backpropagation neural network.

PubMed

Nahato, Kindie Biredagn; Harichandran, Khanna Nehemiah; Arputharaj, Kannan

2015-01-01

The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
A photogrammetric technique for generation of an accurate multispectral optical flow dataset

NASA Astrophysics Data System (ADS)

Kniaz, V. V.

2017-06-01

A presence of an accurate dataset is the key requirement for a successful development of an optical flow estimation algorithm. A large number of freely available optical flow datasets were developed in recent years and gave rise for many powerful algorithms. However most of the datasets include only images captured in the visible spectrum. This paper is focused on the creation of a multispectral optical flow dataset with an accurate ground truth. The generation of an accurate ground truth optical flow is a rather complex problem, as no device for error-free optical flow measurement was developed to date. Existing methods for ground truth optical flow estimation are based on hidden textures, 3D modelling or laser scanning. Such techniques are either work only with a synthetic optical flow or provide a sparse ground truth optical flow. In this paper a new photogrammetric method for generation of an accurate ground truth optical flow is proposed. The method combines the benefits of the accuracy and density of a synthetic optical flow datasets with the flexibility of laser scanning based techniques. A multispectral dataset including various image sequences was generated using the developed method. The dataset is freely available on the accompanying web site.
Down but Not Out: The National Education Association in Federal Politics

ERIC Educational Resources Information Center

Marianno, Bradley D.

2018-01-01

This research provides new evidence on the political activity and policy-setting agenda of the largest national teachers' union during a time of political change. Using a longitudinal dataset comprised of election outcomes and campaign contributions for all candidates for federal office and the National Education Association's (NEA) official…
Scalable Machine Learning for Massive Astronomical Datasets

NASA Astrophysics Data System (ADS)

Ball, Nicholas M.; Gray, A.

2014-04-01

We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex
Scalable Machine Learning for Massive Astronomical Datasets

NASA Astrophysics Data System (ADS)

Ball, Nicholas M.; Astronomy Data Centre, Canadian

2014-01-01

We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
Evaluation of bulk heat fluxes from atmospheric datasets

NASA Astrophysics Data System (ADS)

Farmer, Benton

Heat fluxes at the air-sea interface are an important component of the Earth's heat budget. In addition, they are an integral factor in determining the sea surface temperature (SST) evolution of the oceans. Different representations of these fluxes are used in both the atmospheric and oceanic communities for the purpose of heat budget studies and, in particular, for forcing oceanic models. It is currently difficult to quantify the potential impact varying heat flux representations have on the ocean response. In this study, a diagnostic tool is presented that allows for a straightforward comparison of surface heat flux formulations and atmospheric data sets. Two variables, relaxation time (RT) and the apparent temperature (T*), are derived from the linearization of the bulk formulas. They are then calculated to compare three bulk formulae and five atmospheric datasets. Additionally, the linearization is expanded to the second order to compare the amount of residual flux present. It is found that the use of a bulk formula employing a constant heat transfer coefficient produces longer relaxation times and contains a greater amount of residual flux in the higher order terms of the linearization. Depending on the temperature difference, the residual flux remaining in the second order and above terms can reach as much as 40--50% of the total residual on a monthly time scale. This is certainly a non-negligible residual flux. In contrast, a bulk formula using a stability and wind dependent transfer coefficient retains much of the total flux in the first order term, as only a few percent remain in the residual flux. Most of the difference displayed among the bulk formulas stems from the sensitivity to wind speed and the choice of a constant or spatially varying transfer coefficient. Comparing the representation of RT and T* provides insight into the differences among various atmospheric datasets. In particular, the representations of the western boundary current, upwelling
Antibody-protein interactions: benchmark datasets and prediction tools evaluation

PubMed Central

Ponomarenko, Julia V; Bourne, Philip E

2007-01-01

Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It
A daily global mesoscale ocean eddy dataset from satellite altimetry.

PubMed

Faghmous, James H; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

2015-01-01

Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993-2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System.
A daily global mesoscale ocean eddy dataset from satellite altimetry

PubMed Central

Faghmous, James H.; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

2015-01-01

Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993–2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744
Spatially-explicit estimation of geographical representation in large-scale species distribution datasets.

PubMed

Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis

2014-01-01

Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution
The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

NASA Technical Reports Server (NTRS)

Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

1997-01-01

The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.
FieldSAFE: Dataset for Obstacle Detection in Agriculture.

PubMed

Kragh, Mikkel Fly; Christiansen, Peter; Laursen, Morten Stigaard; Larsen, Morten; Steen, Kim Arild; Green, Ole; Karstoft, Henrik; Jørgensen, Rasmus Nyholm

2017-11-09

In this paper, we present a multi-modal dataset for obstacle detection in agriculture. The dataset comprises approximately 2 h of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016. Sensing modalities include stereo camera, thermal camera, web camera, 360 ∘ camera, LiDAR and radar, while precise localization is available from fused IMU and GNSS. Both static and moving obstacles are present, including humans, mannequin dolls, rocks, barrels, buildings, vehicles and vegetation. All obstacles have ground truth object labels and geographic coordinates.
FieldSAFE: Dataset for Obstacle Detection in Agriculture

PubMed Central

Christiansen, Peter; Larsen, Morten; Steen, Kim Arild; Green, Ole; Karstoft, Henrik

2017-01-01

In this paper, we present a multi-modal dataset for obstacle detection in agriculture. The dataset comprises approximately 2 h of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016. Sensing modalities include stereo camera, thermal camera, web camera, 360∘ camera, LiDAR and radar, while precise localization is available from fused IMU and GNSS. Both static and moving obstacles are present, including humans, mannequin dolls, rocks, barrels, buildings, vehicles and vegetation. All obstacles have ground truth object labels and geographic coordinates. PMID:29120383
Fast randomization of large genomic datasets while preserving alteration counts.

PubMed

Gobbi, Andrea; Iorio, Francesco; Dawson, Kevin J; Wedge, David C; Tamborero, David; Alexandrov, Ludmil B; Lopez-Bigas, Nuria; Garnett, Mathew J; Jurman, Giuseppe; Saez-Rodriguez, Julio

2014-09-01

Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Evaluating Soil Moisture Retrievals from ESA's SMOS and NASA's SMAP Brightness Temperature Datasets

NASA Technical Reports Server (NTRS)

Al-Yaari, A.; Wigernon, J.-P.; Kerr, Y.; Rodriguez-Fernandez, N.; O'Neill, P. E.; Jackson, T. J.; De Lannoy, G. J. M.; Al Bitar, A.; Mialon, A.; Richaume, P.;

2017-01-01

Two satellites are currently monitoring surface soil moisture (SM) using L-band observations: SMOS (Soil Moisture and Ocean Salinity), a joint ESA (European Space Agency), CNES (Centre national d'tudes spatiales), and CDTI (the Spanish government agency with responsibility for space) satellite launched on November 2, 2009 and SMAP (Soil Moisture Active Passive), a National Aeronautics and Space Administration (NASA) satellite successfully launched in January 2015. In this study, we used a multilinear regression approach to retrieve SM from SMAP data to create a global dataset of SM, which is consistent with SM data retrieved from SMOS. This was achieved by calibrating coefficients of the regression model using the CATDS (Centre Aval de Traitement des Donnes) SMOS Level 3 SM and the horizontally and vertically polarized brightness temperatures (TB) at 40 deg incidence angle, over the 2013 - 2014 period. Next, this model was applied to SMAP L3 TB data from Apr 2015 to Jul 2016. The retrieved SM from SMAP (referred to here as SMAP_Reg) was compared to: (i) the operational SMAP L3 SM (SMAP_SCA), retrieved using the baseline Single Channel retrieval Algorithm (SCA); and (ii) the operational SMOSL3 SM, derived from the multiangular inversion of the L-MEB model (L-MEB algorithm) (SMOSL3). This inter-comparison was made against in situ soil moisture measurements from more than 400 sites spread over the globe, which are used here as a reference soil moisture dataset. The in situ observations were obtained from the International Soil Moisture Network (ISMN; https:ismn.geo.tuwien.ac.at) in North of America (PBO_H2O, SCAN, SNOTEL, iRON, and USCRN), in Australia (Oznet), Africa (DAHRA), and in Europe (REMEDHUS, SMOSMANIA, FMI, and RSMN). The agreement was analyzed in terms of four classical statistical criteria: Root Mean Squared Error (RMSE),Bias, Unbiased RMSE (UnbRMSE), and correlation coefficient (R). Results of the comparison of these various products with in situ

Evaluating soil moisture retrievals from ESA's SMOS and NASA's SMAP brightness temperature datasets.

PubMed

Al-Yaari, A; Wigneron, J-P; Kerr, Y; Rodriguez-Fernandez, N; O'Neill, P E; Jackson, T J; De Lannoy, G J M; Al Bitar, A; Mialon, A; Richaume, P; Walker, J P; Mahmoodi, A; Yueh, S

2017-05-01

Two satellites are currently monitoring surface soil moisture (SM) using L-band observations: SMOS (Soil Moisture and Ocean Salinity), a joint ESA (European Space Agency), CNES (Centre national d'études spatiales), and CDTI (the Spanish government agency with responsibility for space) satellite launched on November 2, 2009 and SMAP (Soil Moisture Active Passive), a National Aeronautics and Space Administration (NASA) satellite successfully launched in January 2015. In this study, we used a multilinear regression approach to retrieve SM from SMAP data to create a global dataset of SM, which is consistent with SM data retrieved from SMOS. This was achieved by calibrating coefficients of the regression model using the CATDS (Centre Aval de Traitement des Données) SMOS Level 3 SM and the horizontally and vertically polarized brightness temperatures (TB) at 40° incidence angle, over the 2013 - 2014 period. Next, this model was applied to SMAP L3 TB data from Apr 2015 to Jul 2016. The retrieved SM from SMAP (referred to here as SMAP_Reg) was compared to: (i) the operational SMAP L3 SM (SMAP_SCA), retrieved using the baseline Single Channel retrieval Algorithm (SCA); and (ii) the operational SMOSL3 SM, derived from the multiangular inversion of the L-MEB model (L-MEB algorithm) (SMOSL3). This inter-comparison was made against in situ soil moisture measurements from more than 400 sites spread over the globe, which are used here as a reference soil moisture dataset. The in situ observations were obtained from the International Soil Moisture Network (ISMN; https://ismn.geo.tuwien.ac.at/) in North of America (PBO_H2O, SCAN, SNOTEL, iRON, and USCRN), in Australia (Oznet), Africa (DAHRA), and in Europe (REMEDHUS, SMOSMANIA, FMI, and RSMN). The agreement was analyzed in terms of four classical statistical criteria: Root Mean Squared Error (RMSE), Bias, Unbiased RMSE (UnbRMSE), and correlation coefficient (R). Results of the comparison of these various products with in situ
Inter-comparison of multiple statistically downscaled climate datasets for the Pacific Northwest, USA

PubMed Central

Jiang, Yueyang; Kim, John B.; Still, Christopher J.; Kerns, Becky K.; Kline, Jeffrey D.; Cunningham, Patrick G.

2018-01-01

Statistically downscaled climate data have been widely used to explore possible impacts of climate change in various fields of study. Although many studies have focused on characterizing differences in the downscaling methods, few studies have evaluated actual downscaled datasets being distributed publicly. Spatially focusing on the Pacific Northwest, we compare five statistically downscaled climate datasets distributed publicly in the US: ClimateNA, NASA NEX-DCP30, MACAv2-METDATA, MACAv2-LIVNEH and WorldClim. We compare the downscaled projections of climate change, and the associated observational data used as training data for downscaling. We map and quantify the variability among the datasets and characterize the spatio-temporal patterns of agreement and disagreement among the datasets. Pair-wise comparisons of datasets identify the coast and high-elevation areas as areas of disagreement for temperature. For precipitation, high-elevation areas, rainshadows and the dry, eastern portion of the study area have high dissimilarity among the datasets. By spatially aggregating the variability measures into watersheds, we develop guidance for selecting datasets within the Pacific Northwest climate change impact studies. PMID:29461513
Inter-comparison of multiple statistically downscaled climate datasets for the Pacific Northwest, USA.

PubMed

Jiang, Yueyang; Kim, John B; Still, Christopher J; Kerns, Becky K; Kline, Jeffrey D; Cunningham, Patrick G

2018-02-20

Statistically downscaled climate data have been widely used to explore possible impacts of climate change in various fields of study. Although many studies have focused on characterizing differences in the downscaling methods, few studies have evaluated actual downscaled datasets being distributed publicly. Spatially focusing on the Pacific Northwest, we compare five statistically downscaled climate datasets distributed publicly in the US: ClimateNA, NASA NEX-DCP30, MACAv2-METDATA, MACAv2-LIVNEH and WorldClim. We compare the downscaled projections of climate change, and the associated observational data used as training data for downscaling. We map and quantify the variability among the datasets and characterize the spatio-temporal patterns of agreement and disagreement among the datasets. Pair-wise comparisons of datasets identify the coast and high-elevation areas as areas of disagreement for temperature. For precipitation, high-elevation areas, rainshadows and the dry, eastern portion of the study area have high dissimilarity among the datasets. By spatially aggregating the variability measures into watersheds, we develop guidance for selecting datasets within the Pacific Northwest climate change impact studies.

Unique Datasets Collected by NOAA Hurricane Hunter Aircraft during the 2017 Atlantic Hurricane Season

NASA Astrophysics Data System (ADS)

Zawislak, J.; Reasor, P.

2017-12-01

Each year, NOAA's Atlantic Oceanographic & Meteorological Laboratory (AOML) Hurricane Research Division (HRD), in partnership with the National Hurricane Center (NHC) and NOAA's Environmental Modeling Center (EMC), operates a hurricane field program, the Intensity Forecast Experiment (IFEX). The experiment leverages the NOAA P-3 and G-IV hurricane hunter aircraft, based at NOAA's Office of Marine and Aviation Operations (OMAO) Aircraft Operations Center (AOC). The goals of IFEX are to improve understanding of physical processes in tropical cyclones (TCs), improve operational forecasts of TC intensity, structure, and rainfall by providing data into operational numerical modeling systems, and to develop and refine measurement technologies. This season the IFEX program, leveraging mainly operationally tasked EMC and NHC missions, sampled extensively Hurricanes Harvey, Irma, Jose, Maria, and Nate, as well as Tropical Storm Franklin. We will contribute to this important session by providing an overview of aircraft missions into these storms, guidance on the datasets made available from instruments onboard the P-3 and G-IV, and will offer some perspective on the science that can be addressed with these unique datasets, such as the value of those datasets towards model forecast improvement. NOAA aircraft sampled these storms during critical periods of intensification, and for Hurricanes Harvey and Irma, just prior to the devastating landfalls in the Caribbean and United States. The unique instrument suite on the P-3 offers inner core observations of the three-dimensional precipitation and vortex structure, lower troposphere (boundary layer) thermodynamic properties, and surface wind speed. In contrast, the G-IV flies at higher altitudes, sampling the environment surrounding the storms, and provides deep-tropospheric soundings from dropsondes.
Passive Containment DataSet

EPA Pesticide Factsheets

This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).
The Lunar Source Disk: Old Lunar Datasets on a New CD-ROM

NASA Astrophysics Data System (ADS)

Hiesinger, H.

1998-01-01

A compilation of previously published datasets on CD-ROM is presented. This Lunar Source Disk is intended to be a first step in the improvement/expansion of the Lunar Consortium Disk, in order to create an "image-cube"-like data pool that can be easily accessed and might be useful for a variety of future lunar investigations. All datasets were transformed to a standard map projection that allows direct comparison of different types of information on a pixel-by pixel basis. Lunar observations have a long history and have been important to mankind for centuries, notably since the work of Plutarch and Galileo. As a consequence of centuries of lunar investigations, knowledge of the characteristics and properties of the Moon has accumulated over time. However, a side effect of this accumulation is that it has become more and more complicated for scientists to review all the datasets obtained through different techniques, to interpret them properly, to recognize their weaknesses and strengths in detail, and to combine them synoptically in geologic interpretations. Such synoptic geologic interpretations are crucial for the study of planetary bodies through remote-sensing data in order to avoid misinterpretation. In addition, many of the modem datasets, derived from Earth-based telescopes as well as from spacecraft missions, are acquired at different geometric and radiometric conditions. These differences make it challenging to compare or combine datasets directly or to extract information from different datasets on a pixel-by-pixel basis. Also, as there is no convention for the presentation of lunar datasets, different authors choose different map projections, depending on the location of the investigated areas and their personal interests. Insufficient or incomplete information on the map parameters used by different authors further complicates the reprojection of these datasets to a standard geometry. The goal of our efforts was to transfer previously published lunar
Lessons learned in the generation of biomedical research datasets using Semantic Open Data technologies.

PubMed

Legaz-García, María del Carmen; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás

2015-01-01

Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources. Such heterogeneity makes difficult not only the generation of research-oriented dataset but also its exploitation. In recent years, the Open Data paradigm has proposed new ways for making data available in ways that sharing and integration are facilitated. Open Data approaches may pursue the generation of content readable only by humans and by both humans and machines, which are the ones of interest in our work. The Semantic Web provides a natural technological space for data integration and exploitation and offers a range of technologies for generating not only Open Datasets but also Linked Datasets, that is, open datasets linked to other open datasets. According to the Berners-Lee's classification, each open dataset can be given a rating between one and five stars attending to can be given to each dataset. In the last years, we have developed and applied our SWIT tool, which automates the generation of semantic datasets from heterogeneous data sources. SWIT produces four stars datasets, given that fifth one can be obtained by being the dataset linked from external ones. In this paper, we describe how we have applied the tool in two projects related to health care records and orthology data, as well as the major lessons learned from such efforts.
Global Precipitation Measurement: Methods, Datasets and Applications

NASA Technical Reports Server (NTRS)

Tapiador, Francisco; Turk, Francis J.; Petersen, Walt; Hou, Arthur Y.; Garcia-Ortega, Eduardo; Machado, Luiz, A. T.; Angelis, Carlos F.; Salio, Paola; Kidd, Chris; Huffman, George J.;

2011-01-01

This paper reviews the many aspects of precipitation measurement that are relevant to providing an accurate global assessment of this important environmental parameter. Methods discussed include ground data, satellite estimates and numerical models. First, the methods for measuring, estimating, and modeling precipitation are discussed. Then, the most relevant datasets gathering precipitation information from those three sources are presented. The third part of the paper illustrates a number of the many applications of those measurements and databases. The aim of the paper is to organize the many links and feedbacks between precipitation measurement, estimation and modeling, indicating the uncertainties and limitations of each technique in order to identify areas requiring further attention, and to show the limits within which datasets can be used.

CoINcIDE: A framework for discovery of patient subtypes across multiple datasets.

PubMed

Planey, Catherine R; Gevaert, Olivier

2016-03-09

Patient disease subtypes have the potential to transform personalized medicine. However, many patient subtypes derived from unsupervised clustering analyses on high-dimensional datasets are not replicable across multiple datasets, limiting their clinical utility. We present CoINcIDE, a novel methodological framework for the discovery of patient subtypes across multiple datasets that requires no between-dataset transformations. We also present a high-quality database collection, curatedBreastData, with over 2,500 breast cancer gene expression samples. We use CoINcIDE to discover novel breast and ovarian cancer subtypes with prognostic significance and novel hypothesized ovarian therapeutic targets across multiple datasets. CoINcIDE and curatedBreastData are available as R packages.
Annotating spatio-temporal datasets for meaningful analysis in the Web

NASA Astrophysics Data System (ADS)

Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

2014-05-01

More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.
Land cover trends dataset, 1973-2000

USGS Publications Warehouse

Soulard, Christopher E.; Acevedo, William; Auch, Roger F.; Sohl, Terry L.; Drummond, Mark A.; Sleeter, Benjamin M.; Sorenson, Daniel G.; Kambly, Steven; Wilson, Tamara S.; Taylor, Janis L.; Sayler, Kristi L.; Stier, Michael P.; Barnes, Christopher A.; Methven, Steven C.; Loveland, Thomas R.; Headley, Rachel; Brooks, Mark S.

2014-01-01

The U.S. Geological Survey Land Cover Trends Project is releasing a 1973–2000 time-series land-use/land-cover dataset for the conterminous United States. The dataset contains 5 dates of land-use/land-cover data for 2,688 sample blocks randomly selected within 84 ecological regions. The nominal dates of the land-use/land-cover maps are 1973, 1980, 1986, 1992, and 2000. The land-use/land-cover maps were classified manually from Landsat Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus imagery using a modified Anderson Level I classification scheme. The resulting land-use/land-cover data has a 60-meter resolution and the projection is set to Albers Equal-Area Conic, North American Datum of 1983. The files are labeled using a standard file naming convention that contains the number of the ecoregion, sample block, and Landsat year. The downloadable files are organized by ecoregion, and are available in the ERDAS IMAGINETM (.img) raster file format.
Evolving hard problems: Generating human genetics datasets with a complex etiology.

PubMed

Himmelstein, Daniel S; Greene, Casey S; Moore, Jason H

2011-07-07

A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.
A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

ERIC Educational Resources Information Center

Kadijevich, Djordje M.

2015-01-01

Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…
The Development of a Noncontact Letter Input Interface “Fingual” Using Magnetic Dataset

NASA Astrophysics Data System (ADS)

Fukushima, Taishi; Miyazaki, Fumio; Nishikawa, Atsushi

We have newly developed a noncontact letter input interface called “Fingual”. Fingual uses a glove mounted with inexpensive and small magnetic sensors. Using the glove, users can input letters to form the finger alphabets, a kind of sign language. The proposed method uses some dataset which consists of magnetic field and the corresponding letter information. In this paper, we show two recognition methods using the dataset. First method uses Euclidean norm, and second one additionally uses Gaussian function as a weighting function. Then we conducted verification experiments for the recognition rate of each method in two situations. One of the situations is that subjects used their own dataset; the other is that they used another person's dataset. As a result, the proposed method could recognize letters with a high rate in both situations, even though it is better to use their own dataset than to use another person's dataset. Though Fingual needs to collect magnetic dataset for each letter in advance, its feature is the ability to recognize letters without the complicated calculations such as inverse problems. This paper shows results of the recognition experiments, and shows the utility of the proposed system “Fingual”.
Quantifying Spatially Integrated Floodplain and Wetland Systems for the Conterminous US

NASA Astrophysics Data System (ADS)

Lane, C.; D'Amico, E.; Wing, O.; Bates, P. D.

2017-12-01

Wetlands interact with other waters across a variable connectivity continuum, from permanent to transient, from fast to slow, and from primarily surface water to exclusively groundwater flows. Floodplain wetlands typically experience fast and frequent surface and near-surface groundwater interactions with their river networks, leading to an increasing effort to tailor management strategies for these wetlands. Management of floodplain wetlands is contingent on accurate floodplain delineation, and though this has proven challenging, multiple efforts are being made to alleviate this data gap at the conterminous scale using spatial, physical, and hydrological floodplain proxies. In this study, we derived and contrasted floodplain extents using the following nationally available approaches: 1) a geospatial-buffer floodplain proxy (Lane and D'Amico 2016, JAWRA 52(3):705-722, 2) a regionalized flood frequency analysis coupled to a 30m resolution continental-scale hydraulic model (RFFA; Smith et al. 2015, WRR 51:539-553), and 3) a soils-based floodplain analysis (Sangwan and Merwade 2015, JAWRA 51(5):1286-1304). The geospatial approach uses National Wetlands Inventory and buffered National Hydrography Datasets. RFFA estimates extreme flows based on catchment size, regional climatology and upstream annual rainfall and routes these flows through a hydraulic model built with data from USGS HydroSHEDS, NOAA, and the National Elevation Dataset. Soil-based analyses define floodplains based on attributes within the USDA soil-survey data (SSURGO). Nearly 30% (by count) of U.S. freshwater wetlands are located within floodplains with geospatial analyses, contrasted with 37% (soils-based), and 53% (RFFA-based). The dichotomies between approaches are mainly a function of input data-layer resolution, accuracy, coverage, and extent, further discussed in this presentation. Ultimately, these spatial analyses and findings will improve floodplain and integrated wetland system extent
A new dataset validation system for the Planetary Science Archive

NASA Astrophysics Data System (ADS)

Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

2007-08-01

The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that
Data Recommender: An Alternative Way to Discover Open Scientific Datasets

NASA Astrophysics Data System (ADS)

Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

2017-12-01

Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce
Data assimilation and model evaluation experiment datasets

NASA Technical Reports Server (NTRS)

Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

1994-01-01

The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.
Artificial intelligence (AI) systems for interpreting complex medical datasets.

PubMed

Altman, R B

2017-05-01

Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.
Use of Electronic Health-Related Datasets in Nursing and Health-Related Research.

PubMed

Al-Rawajfah, Omar M; Aloush, Sami; Hewitt, Jeanne Beauchamp

2015-07-01

Datasets of gigabyte size are common in medical sciences. There is increasing consensus that significant untapped knowledge lies hidden in these large datasets. This review article aims to discuss Electronic Health-Related Datasets (EHRDs) in terms of types, features, advantages, limitations, and possible use in nursing and health-related research. Major scientific databases, MEDLINE, ScienceDirect, and Scopus, were searched for studies or review articles regarding using EHRDs in research. A total number of 442 articles were located. After application of study inclusion criteria, 113 articles were included in the final review. EHRDs were categorized into Electronic Administrative Health-Related Datasets and Electronic Clinical Health-Related Datasets. Subcategories of each major category were identified. EHRDs are invaluable assets for nursing the health-related research. Advanced research skills such as using analytical softwares, advanced statistical procedures, dealing with missing data and missing variables will maximize the efficient utilization of EHRDs in research. © The Author(s) 2014.
Recent Development on the NOAA's Global Surface Temperature Dataset

NASA Astrophysics Data System (ADS)

Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

2016-12-01

Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.
Realistic computer network simulation for network intrusion detection dataset generation

NASA Astrophysics Data System (ADS)

Payer, Garrett

2015-05-01

The KDD-99 Cup dataset is dead. While it can continue to be used as a toy example, the age of this dataset makes it all but useless for intrusion detection research and data mining. Many of the attacks used within the dataset are obsolete and do not reflect the features important for intrusion detection in today's networks. Creating a new dataset encompassing a large cross section of the attacks found on the Internet today could be useful, but would eventually fall to the same problem as the KDD-99 Cup; its usefulness would diminish after a period of time. To continue research into intrusion detection, the generation of new datasets needs to be as dynamic and as quick as the attacker. Simply examining existing network traffic and using domain experts such as intrusion analysts to label traffic is inefficient, expensive, and not scalable. The only viable methodology is simulation using technologies including virtualization, attack-toolsets such as Metasploit and Armitage, and sophisticated emulation of threat and user behavior. Simulating actual user behavior and network intrusion events dynamically not only allows researchers to vary scenarios quickly, but enables online testing of intrusion detection mechanisms by interacting with data as it is generated. As new threat behaviors are identified, they can be added to the simulation to make quicker determinations as to the effectiveness of existing and ongoing network intrusion technology, methodology and models.
Synthesizing Global and Local Datasets to Estimate Jurisdictional Forest Carbon Fluxes in Berau, Indonesia

PubMed Central

Griscom, Bronson W.; Ellis, Peter W.; Baccini, Alessandro; Marthinus, Delon; Evans, Jeffrey S.; Ruslandi

2016-01-01

Background Forest conservation efforts are increasingly being implemented at the scale of sub-national jurisdictions in order to mitigate global climate change and provide other ecosystem services. We see an urgent need for robust estimates of historic forest carbon emissions at this scale, as the basis for credible measures of climate and other benefits achieved. Despite the arrival of a new generation of global datasets on forest area change and biomass, confusion remains about how to produce credible jurisdictional estimates of forest emissions. We demonstrate a method for estimating the relevant historic forest carbon fluxes within the Regency of Berau in eastern Borneo, Indonesia. Our method integrates best available global and local datasets, and includes a comprehensive analysis of uncertainty at the regency scale. Principal Findings and Significance We find that Berau generated 8.91 ± 1.99 million tonnes of net CO2 emissions per year during 2000–2010. Berau is an early frontier landscape where gross emissions are 12 times higher than gross sequestration. Yet most (85%) of Berau’s original forests are still standing. The majority of net emissions were due to conversion of native forests to unspecified agriculture (43% of total), oil palm (28%), and fiber plantations (9%). Most of the remainder was due to legal commercial selective logging (17%). Our overall uncertainty estimate offers an independent basis for assessing three other estimates for Berau. Two other estimates were above the upper end of our uncertainty range. We emphasize the importance of including an uncertainty range for all parameters of the emissions equation to generate a comprehensive uncertainty estimate–which has not been done before. We believe comprehensive estimates of carbon flux uncertainty are increasingly important as national and international institutions are challenged with comparing alternative estimates and identifying a credible range of historic emissions values

Some links on this page may take you to non-federal websites. Their policies may differ from this site.