Heliophysics Legacy Data Restoration
NASA Astrophysics Data System (ADS)
Candey, R. M.; Bell, E. V., II; Bilitza, D.; Chimiak, R.; Cooper, J. F.; Garcia, L. N.; Grayzeck, E. J.; Harris, B. T.; Hills, H. K.; Johnson, R. C.; Kovalick, T. J.; Lal, N.; Leckner, H. A.; Liu, M. H.; McCaslin, P. W.; McGuire, R. E.; Papitashvili, N. E.; Rhodes, S. A.; Roberts, D. A.; Yurow, R. E.
2016-12-01
The Space Physics Data Facility (SPDF)
A web Accessible Framework for Discovery, Visualization and Dissemination of Polar Data
NASA Astrophysics Data System (ADS)
Kirsch, P. J.; Breen, P.; Barnes, T. D.
2007-12-01
A web accessible information framework, currently under development within the Physical Sciences Division of the British Antarctic Survey is described. The datasets accessed are generally heterogeneous in nature from fields including space physics, meteorology, atmospheric chemistry, ice physics, and oceanography. Many of these are returned in near real time over a 24/7 limited bandwidth link from remote Antarctic Stations and ships. The requirement is to provide various user groups - each with disparate interests and demands - a system incorporating a browsable and searchable catalogue; bespoke data summary visualization, metadata access facilities and download utilities. The system allows timely access to raw and processed datasets through an easily navigable discovery interface. Once discovered, a summary of the dataset can be visualized in a manner prescribed by the particular projects and user communities or the dataset may be downloaded, subject to accessibility restrictions that may exist. In addition, access to related ancillary information including software, documentation, related URL's and information concerning non-electronic media (of particular relevance to some legacy datasets) is made directly available having automatically been associated with a dataset during the discovery phase. Major components of the framework include the relational database containing the catalogue, the organizational structure of the systems holding the data - enabling automatic updates of the system catalogue and real-time access to data -, the user interface design, and administrative and data management scripts allowing straightforward incorporation of utilities, datasets and system maintenance.
NASA Astrophysics Data System (ADS)
Akpinar, A.
2017-11-01
This study explores whether specific types of green spaces (i.e. urban green spaces, forests, agricultural lands, rangelands, and wetlands) are associated with physical activity, quality of life, and cardiovascular disease prevalence. A sample of 8,976 respondents from the Behavioral Risk Factor Surveillance System, conducted in 2006 in Washington State across 291 zip-codes, was analyzed. Measures included physical activity status, quality of life, and cardiovascular disease prevalence (i.e. heart attack, angina, and stroke). Percentage of green spaces was derived from the National Land Cover Dataset and measured with Geographical Information System. Multilevel regression analyses were conducted to analyze the data while controlling for age, sex, race, weight, marital status, occupation, income, education level, and zip-code population and socio-economic situation. Regression results reveal that no green space types were associated with physical activity, quality of life, and cardiovascular disease prevalence. On the other hand, the analysis shows that physical activity was associated with general health, quality of life, and cardiovascular disease prevalence. The findings suggest that other factors such as size, structure and distribution (sprawled or concentrated, large or small), quality, and characteristics of green space might be important in general health, quality of life, and cardiovascular disease prevalence rather than green space types. Therefore, further investigations are needed.
Determining Scale-dependent Patterns in Spatial and Temporal Datasets
NASA Astrophysics Data System (ADS)
Roy, A.; Perfect, E.; Mukerji, T.; Sylvester, L.
2016-12-01
Spatial and temporal datasets of interest to Earth scientists often contain plots of one variable against another, e.g., rainfall magnitude vs. time or fracture aperture vs. spacing. Such data, comprised of distributions of events along a transect / timeline along with their magnitudes, can display persistent or antipersistent trends, as well as random behavior, that may contain signatures of underlying physical processes. Lacunarity is a technique that was originally developed for multiscale analysis of data. In a recent study we showed that lacunarity can be used for revealing changes in scale-dependent patterns in fracture spacing data. Here we present a further improvement in our technique, with lacunarity applied to various non-binary datasets comprised of event spacings and magnitudes. We test our technique on a set of four synthetic datasets, three of which are based on an autoregressive model and have magnitudes at every point along the "timeline" thus representing antipersistent, persistent, and random trends. The fourth dataset is made up of five clusters of events, each containing a set of random magnitudes. The concept of lacunarity ratio, LR, is introduced; this is the lacunarity of a given dataset normalized to the lacunarity of its random counterpart. It is demonstrated that LR can successfully delineate scale-dependent changes in terms of antipersistence and persistence in the synthetic datasets. This technique is then applied to three different types of data: a hundred-year rainfall record from Knoxville, TN, USA, a set of varved sediments from Marca Shale, and a set of fracture aperture and spacing data from NE Mexico. While the rainfall data and varved sediments both appear to be persistent at small scales, at larger scales they both become random. On the other hand, the fracture data shows antipersistence at small scale (within cluster) and random behavior at large scales. Such differences in behavior with respect to scale-dependent changes in antipersistence to random, persistence to random, or otherwise, maybe be related to differences in the physicochemical properties and processes contributing to multiscale datasets.
The New LASP Interactive Solar IRradiance Datacenter (LISIRD)
NASA Astrophysics Data System (ADS)
Baltzer, T.; Wilson, A.; Lindholm, D. M.; Snow, M. A.; Woodraska, D.; Pankratz, C. K.
2017-12-01
The New LASP Interactive Solar IRradiance Datacenter (LISIRD) The University of Colorado at Boulder's Laboratory for Atmospheric and Space Physics (LASP) has a long history of providing state of the art Solar instrumentation and datasets to the community. In 2005, LASP created a web interface called LISIRD which provided plotting of and access to a number of Solar Irradiance measured and modeled datasets, and it has been used extensively by members of the community both within and outside of LASP. In August of 2017, LASP is set to release a new version of LISIRD for use by anyone interested in viewing and downloading the datasets it serves. This talk will describe the new LISIRD with emphasis on features enabled by it to include: New and more functional plotting interfaces Better dataset browse and search capabilities More datasets Easier to add datasets from a wider array of resources Cleaner interface with better use of screen real estate Much easier to update metadata describing each dataset Much of this capability is leveraged off new infrastructure that will also be touched upon.
The PO.DAAC Portal and its use of the Drupal Framework
NASA Astrophysics Data System (ADS)
Alarcon, C.; Huang, T.; Bingham, A.; Cosic, S.
2011-12-01
The Physical Oceanography Distributed Active Archive Center portal (http://podaac.jpl.nasa.gov) is the primary interface for discovering and accessing oceanographic datasets collected from the vantage point of space. In addition, it provides information about NASA's satellite missions and operational activities at the data center. Recently the portal underwent a major redesign and deployment utilizing the Drupal framework. The Drupal framework was chosen as the platform for the portal due to its flexibility, open source community, and modular infrastructure. The portal features efficient content addition and management, mailing lists, forums, role based access control, and a faceted dataset browse capability. The dataset browsing was built as a custom Drupal module and integrates with a SOLR search engine.
Climate Model Diagnostic Analyzer
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Pan, Lei; Zhai, Chengxing; Tang, Benyang; Kubar, Terry; Zhang, Zia; Wang, Wei
2015-01-01
The comprehensive and innovative evaluation of climate models with newly available global observations is critically needed for the improvement of climate model current-state representation and future-state predictability. A climate model diagnostic evaluation process requires physics-based multi-variable analyses that typically involve large-volume and heterogeneous datasets, making them both computation- and data-intensive. With an exploratory nature of climate data analyses and an explosive growth of datasets and service tools, scientists are struggling to keep track of their datasets, tools, and execution/study history, let alone sharing them with others. In response, we have developed a cloud-enabled, provenance-supported, web-service system called Climate Model Diagnostic Analyzer (CMDA). CMDA enables the physics-based, multivariable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. At the same time, CMDA provides a crowd-sourcing space where scientists can organize their work efficiently and share their work with others. CMDA is empowered by many current state-of-the-art software packages in web service, provenance, and semantic search.
Emerging Technologies for Assessing Physical Activity Behaviors in Space and Time
Hurvitz, Philip M.; Moudon, Anne Vernez; Kang, Bumjoon; Saelens, Brian E.; Duncan, Glen E.
2014-01-01
Precise measurement of physical activity is important for health research, providing a better understanding of activity location, type, duration, and intensity. This article describes a novel suite of tools to measure and analyze physical activity behaviors in spatial epidemiology research. We use individual-level, high-resolution, objective data collected in a space-time framework to investigate built and social environment influences on activity. First, we collect data with accelerometers, global positioning system units, and smartphone-based digital travel and photo diaries to overcome many limitations inherent in self-reported data. Behaviors are measured continuously over the full spectrum of environmental exposures in daily life, instead of focusing exclusively on the home neighborhood. Second, data streams are integrated using common timestamps into a single data structure, the “LifeLog.” A graphic interface tool, “LifeLog View,” enables simultaneous visualization of all LifeLog data streams. Finally, we use geographic information system SmartMap rasters to measure spatially continuous environmental variables to capture exposures at the same spatial and temporal scale as in the LifeLog. These technologies enable precise measurement of behaviors in their spatial and temporal settings but also generate very large datasets; we discuss current limitations and promising methods for processing and analyzing such large datasets. Finally, we provide applications of these methods in spatially oriented research, including a natural experiment to evaluate the effects of new transportation infrastructure on activity levels, and a study of neighborhood environmental effects on activity using twins as quasi-causal controls to overcome self-selection and reverse causation problems. In summary, the integrative characteristics of large datasets contained in LifeLogs and SmartMaps hold great promise for advancing spatial epidemiologic research to promote healthy behaviors. PMID:24479113
VizieR Online Data Catalog: Outliers and similarity in APOGEE (Reis+, 2018)
NASA Astrophysics Data System (ADS)
Reis, I.; Poznanski, D.; Baron, D.; Zasowski, G.; Shahaf, S.
2017-11-01
t-SNE is a dimensionality reduction algorithm that is particularly well suited for the visualization of high-dimensional datasets. We use t-SNE to visualize our distance matrix. A-priori, these distances could define a space with almost as many dimensions as objects, i.e., tens of thousand of dimensions. Obviously, since many stars are quite similar, and their spectra are defined by a few physical parameters, the minimal spanning space might be smaller. By using t-SNE we can examine the structure of our sample projected into 2D. We use our distance matrix as input to the t-SNE algorithm and in return get a 2D map of the objects in our dataset. For each star in a sample of 183232 APOGEE stars, the APOGEE IDs of the 99 stars with most similar spectra (according to the method described in paper), ordered by similarity. (3 data files).
Assimilation of nontraditional datasets to improve atmospheric compensation
NASA Astrophysics Data System (ADS)
Kelly, Michael A.; Osei-Wusu, Kwame; Spisz, Thomas S.; Strong, Shadrian; Setters, Nathan; Gibson, David M.
2012-06-01
Detection and characterization of space objects require the capability to derive physical properties such as brightness temperature and reflectance. These quantities, together with trajectory and position, are often used to correlate an object from a catalogue of known characteristics. However, retrieval of these physical quantities can be hampered by the radiative obscuration of the atmosphere. Atmospheric compensation must therefore be applied to remove the radiative signature of the atmosphere from electro-optical (EO) collections and enable object characterization. The JHU/APL Atmospheric Compensation System (ACS) was designed to perform atmospheric compensation for long, slant-range paths at wavelengths from the visible to infrared. Atmospheric compensation is critically important for airand ground-based sensors collecting at low elevations near the Earth's limb. It can be demonstrated that undetected thin, sub-visual cirrus clouds in the line of sight (LOS) can significantly alter retrieved target properties (temperature, irradiance). The ACS algorithm employs non-traditional cirrus datasets and slant-range atmospheric profiles to estimate and remove atmospheric radiative effects from EO/IR collections. Results are presented for a NASA-sponsored collection in the near-IR (NIR) during hypersonic reentry of the Space Shuttle during STS-132.
Retrieving the aerosol lidar ratio profile by combining ground- and space-based elastic lidars.
Feiyue, Mao; Wei, Gong; Yingying, Ma
2012-02-15
The aerosol lidar ratio is a key parameter for the retrieval of aerosol optical properties from elastic lidar, which changes largely for aerosols with different chemical and physical properties. We proposed a method for retrieving the aerosol lidar ratio profile by combining simultaneous ground- and space-based elastic lidars. The method was tested by a simulated case and a real case at 532 nm wavelength. The results demonstrated that our method is robust and can obtain accurate lidar ratio and extinction coefficient profiles. Our method can be useful for determining the local and global lidar ratio and validating space-based lidar datasets.
New physics in the visible final states of B → D(*) τν
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ligeti, Zoltan; Papucci, Michele; Robinson, Dean J.
We derive compact expressions for the helicity amplitudes of the many-body B → D (*) (→ DY)τ(→ Xν)ν decays, specifically for X = ℓν or π and Y = π or γ. We include contributions from all ten possible new physics four-Fermi operators with arbitrary couplings. Our results capture interference effects in the full phase space of the visible τ and D * decay products which are missed in analyses that treat the τ or D * or both as stable. The τ interference effects are sizable, formally of order m τ/m B for the standard model, and may bemore » of order unity in the presence of new physics. Treating interference correctly is essential when considering kinematic distributions of the τ or D * decay products, and when including experimentally unavoidable phase space cuts. Our amplitude-level results also allow for efficient exploration of new physics effects in the fully differential phase space, by enabling experiments to perform such studies on fully simulated Monte Carlo datasets via efficient event reweighing. As an example, we explore a class of new physics interactions that can fit the observed R(D (*) ) ratios, and show that analyses including more differential kinematic information can provide greater discriminating power for new physics, than single kinematic variables alone.« less
New physics in the visible final states of B → D(*) τν
Ligeti, Zoltan; Papucci, Michele; Robinson, Dean J.
2017-01-18
We derive compact expressions for the helicity amplitudes of the many-body B → D (*) (→ DY)τ(→ Xν)ν decays, specifically for X = ℓν or π and Y = π or γ. We include contributions from all ten possible new physics four-Fermi operators with arbitrary couplings. Our results capture interference effects in the full phase space of the visible τ and D * decay products which are missed in analyses that treat the τ or D * or both as stable. The τ interference effects are sizable, formally of order m τ/m B for the standard model, and may bemore » of order unity in the presence of new physics. Treating interference correctly is essential when considering kinematic distributions of the τ or D * decay products, and when including experimentally unavoidable phase space cuts. Our amplitude-level results also allow for efficient exploration of new physics effects in the fully differential phase space, by enabling experiments to perform such studies on fully simulated Monte Carlo datasets via efficient event reweighing. As an example, we explore a class of new physics interactions that can fit the observed R(D (*) ) ratios, and show that analyses including more differential kinematic information can provide greater discriminating power for new physics, than single kinematic variables alone.« less
NASA Astrophysics Data System (ADS)
Koblick, D. C.; Shankar, P.; Xu, S.
Previously, there have been many commercial proposals and extensive academic studies regarding ground and space based sensors to assist a space surveillance network in obtaining metric observations of satellites and debris near Geosynchronous Earth Orbit (GEO). Most use physics based models for geometric constraints, lighting, and tasker/scheduler operations of sensor architectures. Under similar physics modeling assumptions, the space object catalog is often different due to proprietary standards and datasets. Lack of catalog commonality between studies creates barriers and difficulty comparing performance benefits of sensor trades. To solve this problem, we have constructed a future GEO space catalog from publicly available datasets and literature. The annual number of new payloads and rocket bodies is drawn from a Poisson distribution while the growth of the current GEO catalog is bootstrapped from the historical payload, upper stage, and debris data. We adopt a spherically symmetric explosion model and couple it with the NASA standard breakup model to simulate explosions of payloads and rocket bodies as they are the primary drivers of the debris population growth. The cumulative number of fragments follow a power-law distribution. Result from 1,000 random catalog growth simulations indicates that the GEO space object population in the year 2050 will include over 3,600 objects, nearly half of which are debris greater than 10 cm spherical diameter. The number of rocket bodies and dead payloads is projected to nearly double over the next 33 years. For comparison, the current Air Force Space Command catalog snapshot contains fewer than 50 pieces of debris and coarse Radar Cross Section (RCS) estimates which include: small, medium, and large. The current catalog may be sufficient for conjunction studies, but not for analyzing future sensor system performance. The 2050 GEO projected catalog will be available online for commercial/academic research and development.
NASA Astrophysics Data System (ADS)
Génot, V.; André, N.; Cecconi, B.; Bouchemit, M.; Budnik, E.; Bourrel, N.; Gangloff, M.; Dufourg, N.; Hess, S.; Modolo, R.; Renard, B.; Lormant, N.; Beigbeder, L.; Popescu, D.; Toniutti, J.-P.
2014-11-01
The interest for data communication between analysis tools in planetary sciences and space physics is illustrated in this paper via several examples of the uses of SAMP. The Simple Application Messaging Protocol is developed in the frame of the IVOA from an earlier protocol called PLASTIC. SAMP enables easy communication and interoperability between astronomy software, stand-alone and web-based; it is now increasingly adopted by the planetary sciences and space physics community. Its attractiveness is based, on one hand, on the use of common file formats for exchange and, on the other hand, on established messaging models. Examples of uses at the CDPP and elsewhere are presented. The CDPP (Centre de Données de la Physique des Plasmas, http://cdpp.eu/), the French data center for plasma physics, is engaged for more than a decade in the archiving and dissemination of data products from space missions and ground observatories. Besides these activities, the CDPP developed services like AMDA (Automated Multi Dataset Analysis, http://amda.cdpp.eu/) which enables in depth analysis of large amount of data through dedicated functionalities such as: visualization, conditional search and cataloging. Besides AMDA, the 3DView (http://3dview.cdpp.eu/) tool provides immersive visualizations and is further developed to include simulation and observational data. These tools and their interactions with each other, notably via SAMP, are presented via science cases of interest to planetary sciences and space physics communities.
The swiss army knife of job submission tools: grid-control
NASA Astrophysics Data System (ADS)
Stober, F.; Fischer, M.; Schleper, P.; Stadie, H.; Garbers, C.; Lange, J.; Kovalchuk, N.
2017-10-01
grid-control is a lightweight and highly portable open source submission tool that supports all common workflows in high energy physics (HEP). It has been used by a sizeable number of HEP analyses to process tasks that sometimes consist of up to 100k jobs. grid-control is built around a powerful plugin and configuration system, that allows users to easily specify all aspects of the desired workflow. Job submission to a wide range of local or remote batch systems or grid middleware is supported. Tasks can be conveniently specified through the parameter space that will be processed, which can consist of any number of variables and data sources with complex dependencies on each other. Dataset information is processed through a configurable pipeline of dataset filters, partition plugins and partition filters. The partition plugins can take the number of files, size of the work units, metadata or combinations thereof into account. All changes to the input datasets or variables are propagated through the processing pipeline and can transparently trigger adjustments to the parameter space and the job submission. While the core functionality is completely experiment independent, full integration with the CMS computing environment is provided by a small set of plugins.
Development and comparison of projection and image space 3D nodule insertion techniques
NASA Astrophysics Data System (ADS)
Robins, Marthony; Solomon, Justin; Sahbaee, Pooyan; Samei, Ehsan
2016-04-01
This study aimed to develop and compare two methods of inserting computerized virtual lesions into CT datasets. 24 physical (synthetic) nodules of three sizes and four morphologies were inserted into an anthropomorphic chest phantom (LUNGMAN, KYOTO KAGAKU). The phantom was scanned (Somatom Definition Flash, Siemens Healthcare) with and without nodules present, and images were reconstructed with filtered back projection and iterative reconstruction (SAFIRE) at 0.6 mm slice thickness using a standard thoracic CT protocol at multiple dose settings. Virtual 3D CAD models based on the physical nodules were virtually inserted (accounting for the system MTF) into the nodule-free CT data using two techniques. These techniques include projection-based and image-based insertion. Nodule volumes were estimated using a commercial segmentation tool (iNtuition, TeraRecon, Inc.). Differences were tested using paired t-tests and R2 goodness of fit between the virtually and physically inserted nodules. Both insertion techniques resulted in nodule volumes very similar to the real nodules (<3% difference) and in most cases the differences were not statistically significant. Also, R2 values were all <0.97 for both insertion techniques. These data imply that these techniques can confidently be used as a means of inserting virtual nodules in CT datasets. These techniques can be instrumental in building hybrid CT datasets composed of patient images with virtually inserted nodules.
HYDRA Hyperspectral Data Research Application Tom Rink and Tom Whittaker
NASA Astrophysics Data System (ADS)
Rink, T.; Whittaker, T.
2005-12-01
HYDRA is a freely available, easy to install tool for visualization and analysis of large local or remote hyper/multi-spectral datasets. HYDRA is implemented on top of the open source VisAD Java library via Jython - the Java implementation of the user friendly Python programming language. VisAD provides data integration, through its generalized data model, user-display interaction and display rendering. Jython has an easy to read, concise, scripting-like, syntax which eases software development. HYDRA allows data sharing of large datasets through its support of the OpenDAP and OpenADDE server-client protocols. The users can explore and interrogate data, and subset in physical and/or spectral space to isolate key areas of interest for further analysis without having to download an entire dataset. It also has an extensible data input architecture to recognize new instruments and understand different local file formats, currently NetCDF and HDF4 are supported.
Geomagnetic and Solar Indices Data at NGDC
NASA Astrophysics Data System (ADS)
Mabie, J. J.
2012-12-01
The National Geophysical Data Center, Solar and Terrestrial Physics Indices program is a central repository for global indices derived at numerous organizations around the world. These datasets are used by customers to drive models, evaluate the solar and geomagnetic environment, and to understand space climate. Our goal is to obtain and disseminate this data in a timely and accurate manner, and to provide the short term McNish-Lincoln sunspot number prediction. NGDC is in partnership with the NOAA Space Weather Prediction Center (SWPC), University Center for Atmospheric Sciences (UCAR), the Potsdam Helmholtz Center (GFZ), the Solar Indices Data Center (SIDC), the World Data Center for Geomagnetism Kyoto and many other organizations. The large number of available indices and the complexity in how they are derived makes understanding the data one of the biggest challenges for the users of indices. Our data services include expertise in our indices and related datasets to provide feedback and analysis for our global customer base.
The Virtual Space Physics Observatory: Quick Access to Data and Tools
NASA Technical Reports Server (NTRS)
Cornwell, Carl; Roberts, D. Aaron; McGuire, Robert E.
2006-01-01
The Virtual Space Physics Observatory (VSPO; see http://vspo.gsfc.nasa.gov) has grown to provide a way to find and access about 375 data products and services from over 100 spacecraft/observatories in space and solar physics. The datasets are mainly chosen to be the most requested, and include most of the publicly available data products from operating NASA Heliophysics spacecraft as well as from solar observatories measuring across the frequency spectrum. Service links include a "quick orbits" page that uses SSCWeb Web Services to provide a rapid answer to questions such as "What spacecraft were in orbit in July 1992?" and "Where were Geotail, Cluster, and Polar on 2 June 2001?" These queries are linked back to the data search page. The VSPO interface provides many ways of looking for data based on terms used in a registry of resources using the SPASE Data Model that will be the standard for Heliophysics Virtual Observatories. VSPO itself is accessible via an API that allows other applications to use it as a Web Service; this has been implemented in one instance using the ViSBARD visualization program. The VSPO will become part of the Space Physics Data Facility, and will continue to expand its access to data. A challenge for all VOs will be to provide uniform access to data at the variable level, and we will be addressing this question in a number of ways.
Machine learning action parameters in lattice quantum chromodynamics
NASA Astrophysics Data System (ADS)
Shanahan, Phiala E.; Trewartha, Daniel; Detmold, William
2018-05-01
Numerical lattice quantum chromodynamics studies of the strong interaction are important in many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. The high information content and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.
The Application of the SPASE Metadata Standard in the U.S. and Worldwide
NASA Astrophysics Data System (ADS)
Thieman, J. R.; King, T. A.; Roberts, D.
2012-12-01
The Space Physics Archive Search and Extract (SPASE) Metadata standard for Heliophysics and related data is now an established standard within the NASA-funded space and solar physics community and is spreading to the international groups within that community. Development of SPASE had involved a number of international partners and the current version of the SPASE Metadata Model (version 2.2.2) has not needed any structural modifications since January 2011 . The SPASE standard has been adopted by groups such as NASA's Heliophysics division, the Canadian Space Science Data Portal (CSSDP), Canada's AUTUMN network, Japan's Inter-university Upper atmosphere Global Observation NETwork (IUGONET), Centre de Données de la Physique des Plasmas (CDPP), and the near-Earth space data infrastructure for e-Science (ESPAS). In addition, portions of the SPASE dictionary have been modeled in semantic web ontologies for use with reasoners and semantic searches. While we anticipate additional modifications to the model in the future to accommodate simulation and model data, these changes will not affect the data descriptions already generated for instrument-related datasets. Examples of SPASE descriptions can be viewed at
Sonification Prototype for Space Physics
NASA Astrophysics Data System (ADS)
Candey, R. M.; Schertenleib, A. M.; Diaz Merced, W. L.
2005-12-01
As an alternative and adjunct to visual displays, auditory exploration of data via sonification (data controlled sound) and audification (audible playback of data samples) is promising for complex or rapidly/temporally changing visualizations, for data exploration of large datasets (particularly multi-dimensional datasets), and for exploring datasets in frequency rather than spatial dimensions (see also International Conferences on Auditory Display
EnviroAtlas - Austin, TX - Estimated Percent Green Space Along Walkable Roads
This EnviroAtlas dataset estimates green space along walkable roads. Green space within 25 meters of the road centerline is included and the percentage is based on the total area between street intersections. Green space provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
Machine learning action parameters in lattice quantum chromodynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shanahan, Phiala; Trewartha, Daneil; Detmold, William
Numerical lattice quantum chromodynamics studies of the strong interaction underpin theoretical understanding of many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. Finally, the high information contentmore » and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.« less
Machine learning action parameters in lattice quantum chromodynamics
Shanahan, Phiala; Trewartha, Daneil; Detmold, William
2018-05-16
Numerical lattice quantum chromodynamics studies of the strong interaction underpin theoretical understanding of many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. Finally, the high information contentmore » and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.« less
Precision Cosmology: The First Half Million Years
NASA Astrophysics Data System (ADS)
Jones, Bernard J. T.
2017-06-01
Cosmology seeks to characterise our Universe in terms of models based on well-understood and tested physics. Today we know our Universe with a precision that once would have been unthinkable. This book develops the entire mathematical, physical and statistical framework within which this has been achieved. It tells the story of how we arrive at our profound conclusions, starting from the early twentieth century and following developments up to the latest data analysis of big astronomical datasets. It provides an enlightening description of the mathematical, physical and statistical basis for understanding and interpreting the results of key space- and ground-based data. Subjects covered include general relativity, cosmological models, the inhomogeneous Universe, physics of the cosmic background radiation, and methods and results of data analysis. Extensive online supplementary notes, exercises, teaching materials, and exercises in Python make this the perfect companion for researchers, teachers and students in physics, mathematics, and astrophysics.
EnviroAtlas - Cleveland, OH - Estimated Percent Green Space Along Walkable Roads
This EnviroAtlas dataset estimates green space along walkable roads. Green space within 25 meters of the road centerline is included and the percentage is based on the total area between street intersections. In this community, green space is defined as Trees & Forest, Grass & Herbaceous, Woody Wetlands, and Emergent Wetlands. In this metric, water is also included in green space. Green space provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Minneapolis/St. Paul, MN - Estimated Percent Green Space Along Walkable Roads
This EnviroAtlas dataset estimates green space along walkable roads. Green space within 25 meters of the road centerline is included and the percentage is based on the total area between street intersections. In this community, green space is defined as Trees and Forest, Grass and Herbaceous, Agriculture, Woody Wetlands, and Emergent Wetlands. In this metric, water is also included in green space. Green space provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas/EnviroAtlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
Evaluation of Application Space Expansion for the Sensor Fish
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeRolph, Christopher R.; Bevelhimer, Mark S.
The Pacific Northwest National Laboratory has developed an instrument known as the sensor fish that can be released into downstream passage routes at hydropower facilities to collect data on the physical conditions that a fish might be exposed to during passage through a turbine. The US Department of Energy Wind and Water Power Program sees value in expanding the sensor fish application space beyond large Kaplan turbines in the northwest United States to evaluate conditions to which a greater variety of fish species are exposed. Development of fish-friendly turbines requires an understanding of both physical passage conditions and biological responsesmore » to those conditions. Expanding the use of sensor fish into other application spaces will add to the knowledge base of physical passage conditions and could also enhance the use of sensor fish as a site-specific tool in mitigating potential impacts to fish populations from hydropower. The Oak Ridge National Laboratory (ORNL) National Hydropower Assessment Program (NHAAP) database contains hydropower facility characteristics that, along with national fish distribution data, were used to evaluate potential interactions between fish species and project characteristics related to downstream passage issues. ORNL developed rankings for the turbine types in the NHAAP database in terms of their potential to impact fish through injury or mortality during downstream turbine passage. National-scale fish distributions for 31 key migratory species were spatially intersected with hydropower plant locations to identify facilities where turbines with a high threat to fish injury or mortality overlap with the potential range of a sensitive fish species. A dataset was produced that identifies hydropower facilities where deployment of the sensor fish technology might be beneficial in addressing issues related to downstream fish passage. The dataset can be queried to target specific geographic regions, fish species, license expiration dates, generation capacity levels, ownership characteristics, turbine characteristics, or any combination of these metrics.« less
NASA Astrophysics Data System (ADS)
Kellerman, A. C.; Shprits, Y.; McPherron, R. L.; Kondrashov, D. A.; Weygand, J. M.; Zhu, H.; Drozdov, A.
2017-12-01
Presented is an analysis of the phase-space density (PSD) response to the stream-interaction region (SIR), which utilizes a reanalysis dataset principally comprised of the data-assimilative Versatile Electron Radiation Belt (VERB) code, Van Allen Probe and GOES observations. The dataset spans the period 2012-2017, and includes several SIR (and CIR) storms. The PSD is examined for evidence of injections, transport, acceleration, and loss by considering the instantaneous and time-averaged change at adiabatic invariant values that correspond to ring-current, relativistic, and ultra-relativistic energies. In the solar wind, the following variables in the slow and fast wind on either side of the stream interface (SI) are considered in each case: the coronal hole polarity, IMF, solar wind speed, density, pressure, and SI tilt angle. In the magnetosphere, the Dst, AE, and past PSD state are considered. Presented is an analysis of the dominant mechanisms, both external and internal to the magnetosphere, that cause radiation-belt electron non-adiabatic changes during the passage of these fascinating solar wind structures.
Dataset definition for CMS operations and physics analyses
NASA Astrophysics Data System (ADS)
Franzoni, Giovanni; Compact Muon Solenoid Collaboration
2016-04-01
Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.
Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets
NASA Astrophysics Data System (ADS)
Tonkova, V.; Paulus, D.; Neeb, H.
2013-02-01
We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.
APPLYING DATA MINING APPROACHES TO FURTHER ...
This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space. This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space.
Subsampling for dataset optimisation
NASA Astrophysics Data System (ADS)
Ließ, Mareike
2017-04-01
Soil-landscapes have formed by the interaction of soil-forming factors and pedogenic processes. In modelling these landscapes in their pedodiversity and the underlying processes, a representative unbiased dataset is required. This concerns model input as well as output data. However, very often big datasets are available which are highly heterogeneous and were gathered for various purposes, but not to model a particular process or data space. As a first step, the overall data space and/or landscape section to be modelled needs to be identified including considerations regarding scale and resolution. Then the available dataset needs to be optimised via subsampling to well represent this n-dimensional data space. A couple of well-known sampling designs may be adapted to suit this purpose. The overall approach follows three main strategies: (1) the data space may be condensed and de-correlated by a factor analysis to facilitate the subsampling process. (2) Different methods of pattern recognition serve to structure the n-dimensional data space to be modelled into units which then form the basis for the optimisation of an existing dataset through a sensible selection of samples. Along the way, data units for which there is currently insufficient soil data available may be identified. And (3) random samples from the n-dimensional data space may be replaced by similar samples from the available dataset. While being a presupposition to develop data-driven statistical models, this approach may also help to develop universal process models and identify limitations in existing models.
Advanced Methodologies for NASA Science Missions
NASA Astrophysics Data System (ADS)
Hurlburt, N. E.; Feigelson, E.; Mentzel, C.
2017-12-01
Most of NASA's commitment to computational space science involves the organization and processing of Big Data from space-based satellites, and the calculations of advanced physical models based on these datasets. But considerable thought is also needed on what computations are needed. The science questions addressed by space data are so diverse and complex that traditional analysis procedures are often inadequate. The knowledge and skills of the statistician, applied mathematician, and algorithmic computer scientist must be incorporated into programs that currently emphasize engineering and physical science. NASA's culture and administrative mechanisms take full cognizance that major advances in space science are driven by improvements in instrumentation. But it is less well recognized that new instruments and science questions give rise to new challenges in the treatment of satellite data after it is telemetered to the ground. These issues might be divided into two stages: data reduction through software pipelines developed within NASA mission centers; and science analysis that is performed by hundreds of space scientists dispersed through NASA, U.S. universities, and abroad. Both stages benefit from the latest statistical and computational methods; in some cases, the science result is completely inaccessible using traditional procedures. This paper will review the current state of NASA and present example applications using modern methodologies.
geneLAB: Expanding the Impact of NASA's Biological Research in Space
NASA Technical Reports Server (NTRS)
Rayl, Nicole; Smith, Jeffrey D.
2014-01-01
The geneLAB project is designed to leverage the value of large 'omics' datasets from molecular biology projects conducted on the ISS by making these datasets available, citable, discoverable, interpretable, reusable, and reproducible. geneLAB will create a collaboration space with an integrated set of tools for depositing, accessing, analyzing, and modeling these diverse datasets from spaceflight and related terrestrial studies.
Status Update on the GPM Ground Validation Iowa Flood Studies (IFloodS) Field Experiment
NASA Astrophysics Data System (ADS)
Petersen, Walt; Krajewski, Witold
2013-04-01
The overarching objective of integrated hydrologic ground validation activities supporting the Global Precipitation Measurement Mission (GPM) is to provide better understanding of the strengths and limitations of the satellite products, in the context of hydrologic applications. To this end, the GPM Ground Validation (GV) program is conducting the first of several hydrology-oriented field efforts: the Iowa Flood Studies (IFloodS) experiment. IFloodS will be conducted in the central to northeastern part of Iowa in Midwestern United States during the months of April-June, 2013. Specific science objectives and related goals for the IFloodS experiment can be summarized as follows: 1. Quantify the physical characteristics and space/time variability of rain (rates, DSD, process/"regime") and map to satellite rainfall retrieval uncertainty. 2. Assess satellite rainfall retrieval uncertainties at instantaneous to daily time scales and evaluate propagation/impact of uncertainty in flood-prediction. 3. Assess hydrologic predictive skill as a function of space/time scales, basin morphology, and land use/cover. 4. Discern the relative roles of rainfall quantities such as rate and accumulation as compared to other factors (e.g. transport of water in the drainage network) in flood genesis. 5. Refine approaches to "integrated hydrologic GV" concept based on IFloodS experiences and apply to future GPM Integrated GV field efforts. These objectives will be achieved via the deployment of the NASA NPOL S-band and D3R Ka/Ku-band dual-polarimetric radars, University of Iowa X-band dual-polarimetric radars, a large network of paired rain gauge platforms with attendant soil moisture and temperature probes, a large network of both 2D Video and Parsivel disdrometers, and USDA-ARS gauge and soil-moisture measurements (in collaboration with the NASA SMAP mission). The aforementioned measurements will be used to complement existing operational WSR-88D S-band polarimetric radar measurements, USGS streamflow, and Iowa Flood Center stream monitoring measurements. Coincident satellite datasets will be archived from current microwave imaging and sounding radiometers flying on NOAA, DMSP, NASA, and EU (METOP) low-earth orbiters, and rapid-scanned IR datasets collected from geostationary (GOES) platforms. Collectively the observational assets will provide a means to create high quality (time and space sampling) ground "reference" rainfall and stream flow datasets. The ground reference radar and rainfall datasets will provide a means to assess uncertainties in both satellite algorithms (physics) and products. Subsequently, the impact of uncertainties in the satellite products can be evaluated in coupled weather, land-surface and distributed hydrologic modeling frameworks as related to flood prediction.
NASA Astrophysics Data System (ADS)
Loring, B.; Karimabadi, H.; Rortershteyn, V.
2015-10-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim
2014-07-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not.more » We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.« less
Multisource Estimation of Long-term Global Terrestrial Surface Radiation
NASA Astrophysics Data System (ADS)
Peng, L.; Sheffield, J.
2017-12-01
Land surface net radiation is the essential energy source at the earth's surface. It determines the surface energy budget and its partitioning, drives the hydrological cycle by providing available energy, and offers heat, light, and energy for biological processes. Individual components in net radiation have changed historically due to natural and anthropogenic climate change and land use change. Decadal variations in radiation such as global dimming or brightening have important implications for hydrological and carbon cycles. In order to assess the trends and variability of net radiation and evapotranspiration, there is a need for accurate estimates of long-term terrestrial surface radiation. While large progress in measuring top of atmosphere energy budget has been made, huge discrepancies exist among ground observations, satellite retrievals, and reanalysis fields of surface radiation, due to the lack of observational networks, the difficulty in measuring from space, and the uncertainty in algorithm parameters. To overcome the weakness of single source datasets, we propose a multi-source merging approach to fully utilize and combine multiple datasets of radiation components separately, as they are complementary in space and time. First, we conduct diagnostic analysis of multiple satellite and reanalysis datasets based on in-situ measurements such as Global Energy Balance Archive (GEBA), existing validation studies, and other information such as network density and consistency with other meteorological variables. Then, we calculate the optimal weighted average of multiple datasets by minimizing the variance of error between in-situ measurements and other observations. Finally, we quantify the uncertainties in the estimates of surface net radiation and employ physical constraints based on the surface energy balance to reduce these uncertainties. The final dataset is evaluated in terms of the long-term variability and its attribution to changes in individual components. The goal of this study is to provide a merged observational benchmark for large-scale diagnostic analyses, remote sensing and land surface modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Song
CFD (Computational Fluid Dynamics) is a widely used technique in engineering design field. It uses mathematical methods to simulate and predict flow characteristics in a certain physical space. Since the numerical result of CFD computation is very hard to understand, VR (virtual reality) and data visualization techniques are introduced into CFD post-processing to improve the understandability and functionality of CFD computation. In many cases CFD datasets are very large (multi-gigabytes), and more and more interactions between user and the datasets are required. For the traditional VR application, the limitation of computing power is a major factor to prevent visualizing largemore » dataset effectively. This thesis presents a new system designing to speed up the traditional VR application by using parallel computing and distributed computing, and the idea of using hand held device to enhance the interaction between a user and VR CFD application as well. Techniques in different research areas including scientific visualization, parallel computing, distributed computing and graphical user interface designing are used in the development of the final system. As the result, the new system can flexibly be built on heterogeneous computing environment, dramatically shorten the computation time.« less
Metadata improvements driving new tools and services at a NASA data center
NASA Astrophysics Data System (ADS)
Moroni, D. F.; Hausman, J.; Foti, G.; Armstrong, E. M.
2011-12-01
The NASA Physical Oceanography DAAC (PO.DAAC) is responsible for distributing and maintaining satellite derived oceanographic data from a number of NASA and non-NASA missions for the physical disciplines of ocean winds, sea surface temperature, ocean topography and gravity. Currently its holdings consist of over 600 datasets with a data archive in excess of 200 Terrabytes. The PO.DAAC has recently embarked on a metadata quality and completeness project to migrate, update and improve metadata records for over 300 public datasets. An interactive database management tool has been developed to allow data scientists to enter, update and maintain metadata records. This tool communicates directly with PO.DAAC's Data Management and Archiving System (DMAS), which serves as the new archival and distribution backbone as well as a permanent repository of dataset and granule-level metadata. Although we will briefly discuss the tool, more important ramifications are the ability to now expose, propagate and leverage the metadata in a number of ways. First, the metadata are exposed directly through a faceted and free text search interface directly from drupal-based PO.DAAC web pages allowing for quick browsing and data discovery especially by "drilling" through the various facet levels that organize datasets by time/space resolution, processing level, sensor, measurement type etc. Furthermore, the metadata can now be exposed through web services to produce metadata records in a number of different formats such as FGDC and ISO 19115, or potentially propagated to visualization and subsetting tools, and other discovery interfaces. The fundamental concept is that the metadata forms the essential bridge between the user, and the tool or discovery mechanism for a broad range of ocean earth science data records.
An MCMC determination of the primordial helium abundance
NASA Astrophysics Data System (ADS)
Aver, Erik; Olive, Keith A.; Skillman, Evan D.
2012-04-01
Spectroscopic observations of the chemical abundances in metal-poor H II regions provide an independent method for estimating the primordial helium abundance. H II regions are described by several physical parameters such as electron density, electron temperature, and reddening, in addition to y, the ratio of helium to hydrogen. It had been customary to estimate or determine self-consistently these parameters to calculate y. Frequentist analyses of the parameter space have been shown to be successful in these parameter determinations, and Markov Chain Monte Carlo (MCMC) techniques have proven to be very efficient in sampling this parameter space. Nevertheless, accurate determination of the primordial helium abundance from observations of H II regions is constrained by both systematic and statistical uncertainties. In an attempt to better reduce the latter, and continue to better characterize the former, we apply MCMC methods to the large dataset recently compiled by Izotov, Thuan, & Stasińska (2007). To improve the reliability of the determination, a high quality dataset is needed. In pursuit of this, a variety of cuts are explored. The efficacy of the He I λ4026 emission line as a constraint on the solutions is first examined, revealing the introduction of systematic bias through its absence. As a clear measure of the quality of the physical solution, a χ2 analysis proves instrumental in the selection of data compatible with the theoretical model. Nearly two-thirds of the observations fall outside a standard 95% confidence level cut, which highlights the care necessary in selecting systems and warrants further investigation into potential deficiencies of the model or data. In addition, the method also allows us to exclude systems for which parameter estimations are statistical outliers. As a result, the final selected dataset gains in reliability and exhibits improved consistency. Regression to zero metallicity yields Yp = 0.2534 ± 0.0083, in broad agreement with the WMAP result. The inclusion of more observations shows promise for further reducing the uncertainty, but more high quality spectra are required.
Transductive multi-view zero-shot learning.
Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Gong, Shaogang
2015-11-01
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.
Solar Irradiance Data Products at the LASP Interactive Solar IRradiance Datacenter (LISIRD)
NASA Astrophysics Data System (ADS)
Lindholm, D. M.; Ware DeWolfe, A.; Wilson, A.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.
2011-12-01
The Laboratory for Atmospheric and Space Physics (LASP) has developed the LASP Interactive Solar IRradiance Datacenter (LISIRD, http://lasp.colorado.edu/lisird/) web site to provide access to a comprehensive set of solar irradiance measurements and related datasets. Current data holdings include products from NASA missions SORCE, UARS, SME, and TIMED-SEE. The data provided covers a wavelength range from soft X-ray (XUV) at 0.1 nm up to the near infrared (NIR) at 2400 nm, as well as Total Solar Irradiance (TSI). Other datasets include solar indices, spectral and flare models, solar images, and more. The LISIRD web site features updated plotting, browsing, and download capabilities enabled by dygraphs, JavaScript, and Ajax calls to the LASP Time Series Server (LaTiS). In addition to the web browser interface, most of the LISIRD datasets can be accessed via the LaTiS web service interface that supports the OPeNDAP standard. OPeNDAP clients and other programming APIs are available for making requests that subset, aggregate, or filter data on the server before it is transported to the user. This poster provides an overview of the LISIRD system, summarizes the datasets currently available, and provides details on how to access solar irradiance data products through LISIRD's interfaces.
Integrating Satellite, Radar and Surface Observation with Time and Space Matching
NASA Astrophysics Data System (ADS)
Ho, Y.; Weber, J.
2015-12-01
The Integrated Data Viewer (IDV) from Unidata is a Java™-based software framework for analyzing and visualizing geoscience data. It brings together the ability to display and work with satellite imagery, gridded data, surface observations, balloon soundings, NWS WSR-88D Level II and Level III RADAR data, and NOAA National Profiler Network data, all within a unified interface. Applying time and space matching on the satellite, radar and surface observation datasets will automatically synchronize the display from different data sources and spatially subset to match the display area in the view window. These features allow the IDV users to effectively integrate these observations and provide 3 dimensional views of the weather system to better understand the underlying dynamics and physics of weather phenomena.
Dataset of anomalies and malicious acts in a cyber-physical subsystem.
Laso, Pedro Merino; Brosset, David; Puentes, John
2017-10-01
This article presents a dataset produced to investigate how data and information quality estimations enable to detect aNomalies and malicious acts in cyber-physical systems. Data were acquired making use of a cyber-physical subsystem consisting of liquid containers for fuel or water, along with its automated control and data acquisition infrastructure. Described data consist of temporal series representing five operational scenarios - Normal, aNomalies, breakdown, sabotages, and cyber-attacks - corresponding to 15 different real situations. The dataset is publicly available in the .zip file published with the article, to investigate and compare faulty operation detection and characterization methods for cyber-physical systems.
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E.
2011-12-01
Under several NASA grants, we are generating multi-sensor merged atmospheric datasets to enable the detection of instrument biases and studies of climate trends over decades of data. For example, under a NASA MEASURES grant we are producing a water vapor climatology from the A-Train instruments, stratified by the Cloudsat cloud classification for each geophysical scene. The generation and proper use of such multi-sensor climate data records (CDR's) requires a high level of openness, transparency, and traceability. To make the datasets self-documenting and provide access to full metadata and traceability, we have implemented a set of capabilities and services using known, interoperable protocols. These protocols include OpenSearch, OPeNDAP, Open Provenance Model, service & data casting technologies using Atom feeds, and REST-callable analysis workflows implemented as SciFlo (XML) documents. We advocate that our approach can serve as a blueprint for how to openly "document and serve" complex, multi-sensor CDR's with full traceability. The capabilities and services provided include: - Discovery of the collections by keyword search, exposed using OpenSearch protocol; - Space/time query across the CDR's granules and all of the input datasets via OpenSearch; - User-level configuration of the production workflows so that scientists can select additional physical variables from the A-Train to add to the next iteration of the merged datasets; - Efficient data merging using on-the-fly OPeNDAP variable slicing & spatial subsetting of data out of input netCDF and HDF files (without moving the entire files); - Self-documenting CDR's published in a highly usable netCDF4 format with groups used to organize the variables, CF-style attributes for each variable, numeric array compression, & links to OPM provenance; - Recording of processing provenance and data lineage into a query-able provenance trail in Open Provenance Model (OPM) format, auto-captured by the workflow engine; - Open Publishing of all of the workflows used to generate products as machine-callable REST web services, using the capabilities of the SciFlo workflow engine; - Advertising of the metadata (e.g. physical variables provided, space/time bounding box, etc.) for our prepared datasets as "datacasts" using the Atom feed format; - Publishing of all datasets via our "DataDrop" service, which exploits the WebDAV protocol to enable scientists to access remote data directories as local files on their laptops; - Rich "web browse" of the CDR's with full metadata and the provenance trail one click away; - Advertising of all services as Google-discoverable "service casts" using the Atom format. The presentation will describe our use of the interoperable protocols and demonstrate the capabilities and service GUI's.
EnviroAtlas - Austin, TX - Green Space Proximity Gradient
In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - New York, NY - Green Space Proximity Gradient
In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. In this community, green space is defined as Trees & Forest and Grass & Herbaceous. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Des Moines, IA - Green Space Proximity Gradient
In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://enviroatlas.epa.gov/EnviroAtlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Cleveland, OH - Green Space Proximity Gradient
In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. In this community, green space is defined as Trees & Forest, Grass & Herbaceous, Woody Wetlands, and Emergent Wetlands. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas ) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Memphis, TN - Green Space Proximity Gradient
In any given 1-square meter point in this EnviroAtlas dataset, the value shown gives the percentage of square meters of greenspace within 1/4 square kilometer centered over the given point. Green space is defined as Trees & Forest, Grass & Herbaceous, Agriculture, Woody Wetlands, and Emergent Wetlands. Water is shown as -99999 in this dataset to distinguish it from land areas with very low green space. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Durham, NC - Land Cover Summaries by Block Group
This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, wetland, and agriculture. Impervious is a combination of dark and light impervious. Green space is a combination of trees and forest and grass and herbaceous. This dataset also includes the area per capita for each block group for impervious, forest, and green space land cover. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas ) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets ).
NASA Technical Reports Server (NTRS)
Gardner, Adrian
2010-01-01
National Aeronautical and Space Administration (NASA) weather and atmospheric environmental organizations are insatiable consumers of geophysical, hydrometeorological and solar weather statistics. The expanding array of internet-worked sensors producing targeted physical measurements has generated an almost factorial explosion of near real-time inputs to topical statistical datasets. Normalizing and value-based parsing of such statistical datasets in support of time-constrained weather and environmental alerts and warnings is essential, even with dedicated high-performance computational capabilities. What are the optimal indicators for advanced decision making? How do we recognize the line between sufficient statistical sampling and excessive, mission destructive sampling ? How do we assure that the normalization and parsing process, when interpolated through numerical models, yields accurate and actionable alerts and warnings? This presentation will address the integrated means and methods to achieve desired outputs for NASA and consumers of its data.
Preprocessed Consortium for Neuropsychiatric Phenomics dataset.
Gorgolewski, Krzysztof J; Durnez, Joke; Poldrack, Russell A
2017-01-01
Here we present preprocessed MRI data of 265 participants from the Consortium for Neuropsychiatric Phenomics (CNP) dataset. The preprocessed dataset includes minimally preprocessed data in the native, MNI and surface spaces accompanied with potential confound regressors, tissue probability masks, brain masks and transformations. In addition the preprocessed dataset includes unthresholded group level and single subject statistical maps from all tasks included in the original dataset. We hope that availability of this dataset will greatly accelerate research.
A Semantically Enabled Metadata Repository for Solar Irradiance Data Products
NASA Astrophysics Data System (ADS)
Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.
2014-12-01
The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.
Comparative study of standard space and real space analysis of quantitative MR brain data.
Aribisala, Benjamin S; He, Jiabao; Blamire, Andrew M
2011-06-01
To compare the robustness of region of interest (ROI) analysis of magnetic resonance imaging (MRI) brain data in real space with analysis in standard space and to test the hypothesis that standard space image analysis introduces more partial volume effect errors compared to analysis of the same dataset in real space. Twenty healthy adults with no history or evidence of neurological diseases were recruited; high-resolution T(1)-weighted, quantitative T(1), and B(0) field-map measurements were collected. Algorithms were implemented to perform analysis in real and standard space and used to apply a simple standard ROI template to quantitative T(1) datasets. Regional relaxation values and histograms for both gray and white matter tissues classes were then extracted and compared. Regional mean T(1) values for both gray and white matter were significantly lower using real space compared to standard space analysis. Additionally, regional T(1) histograms were more compact in real space, with smaller right-sided tails indicating lower partial volume errors compared to standard space analysis. Standard space analysis of quantitative MRI brain data introduces more partial volume effect errors biasing the analysis of quantitative data compared to analysis of the same dataset in real space. Copyright © 2011 Wiley-Liss, Inc.
Emulation: A fast stochastic Bayesian method to eliminate model space
NASA Astrophysics Data System (ADS)
Roberts, Alan; Hobbs, Richard; Goldstein, Michael
2010-05-01
Joint inversion of large 3D datasets has been the goal of geophysicists ever since the datasets first started to be produced. There are two broad approaches to this kind of problem, traditional deterministic inversion schemes and more recently developed Bayesian search methods, such as MCMC (Markov Chain Monte Carlo). However, using both these kinds of schemes has proved prohibitively expensive, both in computing power and time cost, due to the normally very large model space which needs to be searched using forward model simulators which take considerable time to run. At the heart of strategies aimed at accomplishing this kind of inversion is the question of how to reliably and practicably reduce the size of the model space in which the inversion is to be carried out. Here we present a practical Bayesian method, known as emulation, which can address this issue. Emulation is a Bayesian technique used with considerable success in a number of technical fields, such as in astronomy, where the evolution of the universe has been modelled using this technique, and in the petroleum industry where history matching is carried out of hydrocarbon reservoirs. The method of emulation involves building a fast-to-compute uncertainty-calibrated approximation to a forward model simulator. We do this by modelling the output data from a number of forward simulator runs by a computationally cheap function, and then fitting the coefficients defining this function to the model parameters. By calibrating the error of the emulator output with respect to the full simulator output, we can use this to screen out large areas of model space which contain only implausible models. For example, starting with what may be considered a geologically reasonable prior model space of 10000 models, using the emulator we can quickly show that only models which lie within 10% of that model space actually produce output data which is plausibly similar in character to an observed dataset. We can thus much more tightly constrain the input model space for a deterministic inversion or MCMC method. By using this technique jointly on several datasets (specifically seismic, gravity, and magnetotelluric (MT) describing the same region), we can include in our modelling uncertainties in the data measurements, the relationships between the various physical parameters involved, as well as the model representation uncertainty, and at the same time further reduce the range of plausible models to several percent of the original model space. Being stochastic in nature, the output posterior parameter distributions also allow our understanding of/beliefs about a geological region can be objectively updated, with full assessment of uncertainties, and so the emulator is also an inversion-type tool in it's own right, with the advantage (as with any Bayesian method) that our uncertainties from all sources (both data and model) can be fully evaluated.
NASA Technical Reports Server (NTRS)
Srivastava, V.; Rothermel, J.; Jarzembski, M. A.; Clarke, A. D.; Cutten, D. R.; Bowdle, D. A.; Spinhirne, J. D.; Menzies, R. T.
1999-01-01
Space-based and airborne coherent Doppler lidars designed for measuring global tropospheric wind profiles in cloud-free air rely on backscatter, beta from aerosols acting as passive wind tracers. Aerosol beta distribution in the vertical can vary over as much as 5-6 orders of magnitude. Thus, the design of a wave length-specific, space-borne or airborne lidar must account for the magnitude of 8 in the region or features of interest. The SPAce Readiness Coherent Lidar Experiment under development by the National Aeronautics and Space Administration (NASA) and scheduled for launch on the Space Shuttle in 2001, will demonstrate wind measurements from space using a solid-state 2 micrometer coherent Doppler lidar. Consequently, there is a critical need to understand variability of aerosol beta at 2.1 micrometers, to evaluate signal detection under varying aerosol loading conditions. Although few direct measurements of beta at 2.1 micrometers exist, extensive datasets, including climatologies in widely-separated locations, do exist for other wavelengths based on CO2 and Nd:YAG lidars. Datasets also exist for the associated microphysical and chemical properties. An example of a multi-parametric dataset is that of the NASA GLObal Backscatter Experiment (GLOBE) in 1990 in which aerosol chemistry and size distributions were measured concurrently with multi-wavelength lidar backscatter observations. More recently, continuous-wave (CW) lidar backscatter measurements at mid-infrared wavelengths have been made during the Multicenter Airborne Coherent Atmospheric Wind Sensor (MACAWS) experiment in 1995. Using Lorenz-Mie theory, these datasets have been used to develop a method to convert lidar backscatter to the 2.1 micrometer wavelength. This paper presents comparison of modeled backscatter at wavelengths for which backscatter measurements exist including converted beta (sub 2.1).
Boosting association rule mining in large datasets via Gibbs sampling.
Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua
2016-05-03
Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.
NASA Astrophysics Data System (ADS)
Varouchakis, Emmanouil; Hristopulos, Dionissios
2015-04-01
Space-time geostatistical approaches can improve the reliability of dynamic groundwater level models in areas with limited spatial and temporal data. Space-time residual Kriging (STRK) is a reliable method for spatiotemporal interpolation that can incorporate auxiliary information. The method usually leads to an underestimation of the prediction uncertainty. The uncertainty of spatiotemporal models is usually estimated by determining the space-time Kriging variance or by means of cross validation analysis. For de-trended data the former is not usually applied when complex spatiotemporal trend functions are assigned. A Bayesian approach based on the bootstrap idea and sequential Gaussian simulation are employed to determine the uncertainty of the spatiotemporal model (trend and covariance) parameters. These stochastic modelling approaches produce multiple realizations, rank the prediction results on the basis of specified criteria and capture the range of the uncertainty. The correlation of the spatiotemporal residuals is modeled using a non-separable space-time variogram based on the Spartan covariance family (Hristopulos and Elogne 2007, Varouchakis and Hristopulos 2013). We apply these simulation methods to investigate the uncertainty of groundwater level variations. The available dataset consists of bi-annual (dry and wet hydrological period) groundwater level measurements in 15 monitoring locations for the time period 1981 to 2010. The space-time trend function is approximated using a physical law that governs the groundwater flow in the aquifer in the presence of pumping. The main objective of this research is to compare the performance of two simulation methods for prediction uncertainty estimation. In addition, we investigate the performance of the Spartan spatiotemporal covariance function for spatiotemporal geostatistical analysis. Hristopulos, D.T. and Elogne, S.N. 2007. Analytic properties and covariance functions for a new class of generalized Gibbs random fields. IΕΕΕ Transactions on Information Theory, 53:4667-4467. Varouchakis, E.A. and Hristopulos, D.T. 2013. Improvement of groundwater level prediction in sparsely gauged basins using physical laws and local geographic features as auxiliary variables. Advances in Water Resources, 52:34-49. Research supported by the project SPARTA 1591: "Development of Space-Time Random Fields based on Local Interaction Models and Applications in the Processing of Spatiotemporal Datasets". "SPARTA" is implemented under the "ARISTEIA" Action of the operational programme Education and Lifelong Learning and is co-funded by the European Social Fund (ESF) and National Resources.
Towards a National Space Weather Predictive Capability
NASA Astrophysics Data System (ADS)
Fox, N. J.; Lindstrom, K. L.; Ryschkewitsch, M. G.; Anderson, B. J.; Gjerloev, J. W.; Merkin, V. G.; Kelly, M. A.; Miller, E. S.; Sitnov, M. I.; Ukhorskiy, A. Y.; Erlandson, R. E.; Barnes, R. J.; Paxton, L. J.; Sotirelis, T.; Stephens, G.; Comberiate, J.
2014-12-01
National needs in the area of space weather informational and predictive tools are growing rapidly. Adverse conditions in the space environment can cause disruption of satellite operations, communications, navigation, and electric power distribution grids, leading to a variety of socio-economic losses and impacts on our security. Future space exploration and most modern human endeavors will require major advances in physical understanding and improved transition of space research to operations. At present, only a small fraction of the latest research and development results from NASA, NOAA, NSF and DoD investments are being used to improve space weather forecasting and to develop operational tools. The power of modern research and space weather model development needs to be better utilized to enable comprehensive, timely, and accurate operational space weather tools. The mere production of space weather information is not sufficient to address the needs of those who are affected by space weather. A coordinated effort is required to support research-to-applications transition efforts and to develop the tools required those who rely on this information. In this presentation we will review datasets, tools and models that have resulted from research by scientists at JHU/APL, and examine how they could be applied to support space weather applications in coordination with other community assets and capabilities.
EnviroAtlas - Portland, ME - Land Cover by Block Group
This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, wetland, and agriculture. Impervious is a combination of dark and light impervious. Forest is combination of trees and forest and woody wetlands. Green space is a combination of trees and forest, grass and herbaceous, agriculture, woody wetlands, and emergent wetlands. Wetlands includes both Woody and Emergent Wetlands. This dataset also includes the area per capita for each block group for impervious, forest, and green space land cover. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
NASA Astrophysics Data System (ADS)
Karlsson, K.
2010-12-01
The EUMETSAT CMSAF project (www.cmsaf.eu) compiles climatological datasets from various satellite sources with emphasis on the use of EUMETSAT-operated satellites. However, since climate monitoring primarily has a global scope, also datasets merging data from various satellites and satellite operators are prepared. One such dataset is the CMSAF historic GAC (Global Area Coverage) dataset which is based on AVHRR data from the full historic series of NOAA-satellites and the European METOP satellite in mid-morning orbit launched in October 2006. The CMSAF GAC dataset consists of three groups of products: Macroscopical cloud products (cloud amount, cloud type and cloud top), cloud physical products (cloud phase, cloud optical thickness and cloud liquid water path) and surface radiation products (including surface albedo). Results will be presented and discussed for all product groups, including some preliminary inter-comparisons with other datasets (e.g., PATMOS-X, MODIS and CloudSat/CALIPSO datasets). A background will also be given describing the basic methodology behind the derivation of all products. This will include a short historical review of AVHRR cloud processing and resulting AVHRR applications at SMHI. Historic GAC processing is one of five pilot projects selected by the SCOPE-CM (Sustained Co-Ordinated Processing of Environmental Satellite data for Climate Monitoring) project organised by the WMO Space programme. The pilot project is carried out jointly between CMSAF and NOAA with the purpose of finding an optimal GAC processing approach. The initial activity is to inter-compare results of the CMSAF GAC dataset and the NOAA PATMOS-X dataset for the case when both datasets have been derived using the same inter-calibrated AVHRR radiance dataset. The aim is to get further knowledge of e.g. most useful multispectral methods and the impact of ancillary datasets (for example from meteorological reanalysis datasets from NCEP and ECMWF). The CMSAF project is currently defining plans for another five years (2012-2017) of operations and development. New GAC reprocessing efforts are planned and new methodologies will be tested. Central questions here will be how to increase the quantitative use of the products through improving error and uncertainty estimates and how to compile the information in a way to allow meaningful and efficient ways of using the data for e.g. validation of climate model information.
Evolution of the Southern Oscillation as observed by the Nimbus-7 ERB experiment
NASA Technical Reports Server (NTRS)
Ardanuy, Philip E.; Kyle, H. Lee; Chang, Hyo-Duck
1987-01-01
The Nimbus-7 satellite has been in a 955-km, sun-synchronous orbit since October 1978. The Earth Radiation Budget (ERB) experiment has taken approximately 8 years of high-quality data during this time, of which seven complete years have been archived at the National Space Science Data Center. A final reprocessing of the wide-field-of-view channel dataset is underway. Error analyses indicate a long-term stability of 1 percent better over the length of the data record. As part of the validation of the ERB measurements, the archived 7-year Nimbus-7 ERB dataset is examined for the presence and accuracy of interannual variations including the Southern Oscillation signal. Zonal averages of broadband outgoing longwave radiation indicate a terrestrial response of more than 2 years to the oceanic and atmospheric manifestations of the 1982-83 El Nino/Southern Oscillation (ENSO) event, especially in the tropics. This signal is present in monthly and seasonal averages and is shown here to derive primarily from atmospheric responses to adjustments in the Pacific Ocean. The calibration stability of this dataset thus provides a powerful new tool to examine the physics of the ENSO phenomena.
Brown, Samuel M; Wilson, Emily L; Presson, Angela P; Dinglas, Victor D; Greene, Tom; Hopkins, Ramona O; Needham, Dale M
2017-12-01
With improving short-term mortality in acute respiratory distress syndrome (ARDS), understanding survivors' posthospitalisation outcomes is increasingly important. However, little is known regarding associations among physical, cognitive and mental health outcomes. Identification of outcome subtypes may advance understanding of post-ARDS morbidities. We analysed baseline variables and 6-month health status for participants in the ARDS Network Long-Term Outcomes Study. After division into derivation and validation datasets, we used weighted network analysis to identify subtypes from predictors and outcomes in the derivation dataset. We then used recursive partitioning to develop a subtype classification rule and assessed adequacy of the classification rule using a kappa statistic with the validation dataset. Among 645 ARDS survivors, 430 were in the derivation and 215 in the validation datasets. Physical and mental health status, but not cognitive status, were closely associated. Four distinct subtypes were apparent (percentages in the derivation cohort): (1) mildly impaired physical and mental health (22% of patients), (2) moderately impaired physical and mental health (39%), (3) severely impaired physical health with moderately impaired mental health (15%) and (4) severely impaired physical and mental health (24%). The classification rule had high agreement (kappa=0.89 in validation dataset). Female Latino smokers had the poorest status, while male, non-Latino non-smokers had the best status. We identified four post-ARDS outcome subtypes that were predicted by sex, ethnicity, pre-ARDS smoking status and other baseline factors. These subtypes may help develop tailored rehabilitation strategies, including investigation of combined physical and mental health interventions, and distinct interventions to improve cognitive outcomes. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Evaluating the uniformity of color spaces and performance of color difference formulae
NASA Astrophysics Data System (ADS)
Lian, Yusheng; Liao, Ningfang; Wang, Jiajia; Tan, Boneng; Liu, Zilong
2010-11-01
Using small color difference data sets (Macadam ellipses dataset and RIT-DuPont suprathreshold color difference ellipses dataset), and large color difference data sets (Munsell Renovation Data and OSA Uniform Color Scales dataset), the uniformity of several color spaces and performance of color difference formulae based on these color spaces are evaluated. The color spaces used are CIELAB, DIN99d, IPT, and CIECAM02-UCS. It is found that the uniformity of lightness is better than saturation and hue. Overall, for all these color spaces, the uniformity in the blue area is inferior to the other area. The uniformity of CIECAM02-UCS is superior to the other color spaces for the whole color-difference range from small to large. The uniformity of CIELAB and IPT for the large color difference data sets is better than it for the small color difference data sets, but the DIN99d is opposite. Two common performance factors (PF/3 and STRESS) and the statistical F-test are calculated to test the performance of color difference formula. The results show that the performance of color difference formulae based on these four color spaces is consistent with the uniformity of these color spaces.
Using Graph Indices for the Analysis and Comparison of Chemical Datasets.
Fourches, Denis; Tropsha, Alexander
2013-10-01
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Application and Future Direction of the SPASE Metadata Standard in the U.S. and Worldwide
NASA Astrophysics Data System (ADS)
King, Todd; Thieman, James; Roberts, D. Aaron
2013-04-01
The Space Physics Archive Search and Extract (SPASE) Metadata standard for Heliophysics and related data is now an established standard within the NASA-funded space and solar physics community and is spreading to the international groups within that community. Development of SPASE had involved a number of international partners and the current version of the SPASE Metadata Model (version 2.2.2) has been stable since January 2011. The SPASE standard has been adopted by groups such as NASA's Heliophysics division, the Canadian Space Science Data Portal (CSSDP), Canada's AUTUMN network, Japan's Inter-university Upper atmosphere Global Observation NETwork (IUGONET), Centre de Données de la Physique des Plasmas (CDPP), and the near-Earth space data infrastructure for e-Science (ESPAS). In addition, portions of the SPASE dictionary have been modeled in semantic web ontologies for use with reasoners and semantic searches. In development are modifications to accommodate simulation and model data, as well as enhancements to describe data accessibility. These additions will add features to describe a broader range of data types. In keeping with a SPASE principle of back-compatibility, these changes will not affect the data descriptions already generated for instrument-related datasets. We also look at the long term commitment by NASA to support the SPASE effort and how SPASE metadata can enable value-added services.
Sampling algorithms for validation of supervised learning models for Ising-like systems
NASA Astrophysics Data System (ADS)
Portman, Nataliya; Tamblyn, Isaac
2017-12-01
In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).
Approximating the Generalized Voronoi Diagram of Closely Spaced Objects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, John; Daniel, Eric; Pascucci, Valerio
2015-06-22
We present an algorithm to compute an approximation of the generalized Voronoi diagram (GVD) on arbitrary collections of 2D or 3D geometric objects. In particular, we focus on datasets with closely spaced objects; GVD approximation is expensive and sometimes intractable on these datasets using previous algorithms. With our approach, the GVD can be computed using commodity hardware even on datasets with many, extremely tightly packed objects. Our approach is to subdivide the space with an octree that is represented with an adjacency structure. We then use a novel adaptive distance transform to compute the distance function on octree vertices. Themore » computed distance field is sampled more densely in areas of close object spacing, enabling robust and parallelizable GVD surface generation. We demonstrate our method on a variety of data and show example applications of the GVD in 2D and 3D.« less
Yang, Jie; McArdle, Conor; Daniels, Stephen
2014-01-01
A new data dimension-reduction method, called Internal Information Redundancy Reduction (IIRR), is proposed for application to Optical Emission Spectroscopy (OES) datasets obtained from industrial plasma processes. For example in a semiconductor manufacturing environment, real-time spectral emission data is potentially very useful for inferring information about critical process parameters such as wafer etch rates, however, the relationship between the spectral sensor data gathered over the duration of an etching process step and the target process output parameters is complex. OES sensor data has high dimensionality (fine wavelength resolution is required in spectral emission measurements in order to capture data on all chemical species involved in plasma reactions) and full spectrum samples are taken at frequent time points, so that dynamic process changes can be captured. To maximise the utility of the gathered dataset, it is essential that information redundancy is minimised, but with the important requirement that the resulting reduced dataset remains in a form that is amenable to direct interpretation of the physical process. To meet this requirement and to achieve a high reduction in dimension with little information loss, the IIRR method proposed in this paper operates directly in the original variable space, identifying peak wavelength emissions and the correlative relationships between them. A new statistic, Mean Determination Ratio (MDR), is proposed to quantify the information loss after dimension reduction and the effectiveness of IIRR is demonstrated using an actual semiconductor manufacturing dataset. As an example of the application of IIRR in process monitoring/control, we also show how etch rates can be accurately predicted from IIRR dimension-reduced spectral data. PMID:24451453
EnviroAtlas - Austin, TX - Land Cover by Block Group
This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, and agriculture. Forest is defined as Trees & Forest. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. This dataset also includes the area per capita for each block group for some land cover types. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
Really big data: Processing and analysis of large datasets
USDA-ARS?s Scientific Manuscript database
Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...
Finding Spatio-Temporal Patterns in Large Sensor Datasets
ERIC Educational Resources Information Center
McGuire, Michael Patrick
2010-01-01
Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…
MASER: Measuring, Analysing, Simulating low frequency Radio Emissions.
NASA Astrophysics Data System (ADS)
Cecconi, B.; Le Sidaner, P.; Savalle, R.; Bonnin, X.; Zarka, P. M.; Louis, C.; Coffre, A.; Lamy, L.; Denis, L.; Griessmeier, J. M.; Faden, J.; Piker, C.; André, N.; Genot, V. N.; Erard, S.; King, T. A.; Mafi, J. N.; Sharlow, M.; Sky, J.; Demleitner, M.
2017-12-01
The MASER (Measuring, Analysing and Simulating Radio Emissions) project provides a comprehensive infrastructure dedicated to low frequency radio emissions (typically < 50 to 100 MHz). The four main radio sources observed in this frequency are the Earth, the Sun, Jupiter and Saturn. They are observed either from ground (down to 10 MHz) or from space. Ground observatories are more sensitive than space observatories and capture high resolution data streams (up to a few TB per day for modern instruments). Conversely, space-borne instruments can observe below the ionospheric cut-off (10 MHz) and can be placed closer to the studied object. Several tools have been developed in the last decade for sharing space physcis data. Data visualization tools developed by The CDPP (http://cdpp.eu, Centre de Données de la Physique des Plasmas, in Toulouse, France) and the University of Iowa (Autoplot, http://autoplot.org) are available to display and analyse space physics time series and spectrograms. A planetary radio emission simulation software is developed in LESIA (ExPRES: Exoplanetary and Planetary Radio Emission Simulator). The VESPA (Virtual European Solar and Planetary Access) provides a search interface that allows to discover data of interest for scientific users, and is based on IVOA standards (astronomical International Virtual Observatory Alliance). The University of Iowa also develops Das2server that allows to distribute data with adjustable temporal resolution. MASER is making use of all these tools and standards to distribute datasets from space and ground radio instruments available from the Observatoire de Paris, the Station de Radioastronomie de Nançay and the CDPP deep archive. These datasets include Cassini/RPWS, STEREO/Waves, WIND/Waves, Ulysses/URAP, ISEE3/SBH, Voyager/PRA, Nançay Decameter Array (Routine, NewRoutine, JunoN), RadioJove archive, swedish Viking mission, Interball/POLRAD... MASER also includes a Python software library for reading raw data.
Effect of the time window on the heat-conduction information filtering model
NASA Astrophysics Data System (ADS)
Guo, Qiang; Song, Wen-Jun; Hou, Lei; Zhang, Yi-Lu; Liu, Jian-Guo
2014-05-01
Recommendation systems have been proposed to filter out the potential tastes and preferences of the normal users online, however, the physics of the time window effect on the performance is missing, which is critical for saving the memory and decreasing the computation complexity. In this paper, by gradually expanding the time window, we investigate the impact of the time window on the heat-conduction information filtering model with ten similarity measures. The experimental results on the benchmark dataset Netflix indicate that by only using approximately 11.11% recent rating records, the accuracy could be improved by an average of 33.16% and the diversity could be improved by 30.62%. In addition, the recommendation performance on the dataset MovieLens could be preserved by only considering approximately 10.91% recent records. Under the circumstance of improving the recommendation performance, our discoveries possess significant practical value by largely reducing the computational time and shortening the data storage space.
A fragmentation model of earthquake-like behavior in internet access activity
NASA Astrophysics Data System (ADS)
Paguirigan, Antonino A.; Angco, Marc Jordan G.; Bantang, Johnrob Y.
We present a fragmentation model that generates almost any inverse power-law size distribution, including dual-scaled versions, consistent with the underlying dynamics of systems with earthquake-like behavior. We apply the model to explain the dual-scaled power-law statistics observed in an Internet access dataset that covers more than 32 million requests. The non-Poissonian statistics of the requested data sizes m and the amount of time τ needed for complete processing are consistent with the Gutenberg-Richter-law. Inter-event times δt between subsequent requests are also shown to exhibit power-law distributions consistent with the generalized Omori law. Thus, the dataset is similar to the earthquake data except that two power-law regimes are observed. Using the proposed model, we are able to identify underlying dynamics responsible in generating the observed dual power-law distributions. The model is universal enough for its applicability to any physical and human dynamics that is limited by finite resources such as space, energy, time or opportunity.
A large-scale solar dynamics observatory image dataset for computer vision applications.
Kucuk, Ahmet; Banda, Juan M; Angryk, Rafal A
2017-01-01
The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun's activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA's solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.
Anderson, Jamie
2015-01-01
The extent to which novel land-efficient neighborhood design can promote key health behaviors is examined, concentrating on communal outdoor space provision (COSP). To test whether a neighborhood (Accordia) with a higher ratio of communal to private outdoor space is associated with higher levels of resident's (a) self-reported local health behaviors and (b) observed engagement in local health behaviors, compared to a matched neighborhood with lower proportion of COSP. Health behaviors were examined via direct observation and postal survey. Bespoke observation codes and survey items represented key well-being behaviors including "connecting," "keeping active," "taking notice," "keep learning," and "giving." The questionnaire was validated using psychometric analyses and observed behaviors were mapped in real-time. General pursuit of health behaviors was very similar in both areas but Accordia residents reported substantially greater levels of local activity. Validated testing of survey dataset (n = 256) showed support for a stronger Attitude to Neighborhood Life (connecting and giving locally) in Accordia and partial support of greater physical activity. Analyses of the behavior observation dataset (n = 7,298) support the self-reported findings. Mapped observations revealed a proliferation of activity within Accordia's innovative outdoor hard spaces. Representation is limited to upper-middle class UK groups. However, Accordia was found to promote health behaviors compared a traditional neighborhood that demands considerably more land area. The positive role of home zone streets, hard-standing and semi-civic space highlights the principle of quality as well as quantity. The findings should be considered as part of three forthcoming locally led UK garden cities, to be built before 2020.
Channel Classification across Arid West Landscapes in Support of OHW Delineation
2013-01-01
8 Figure 5. National Hydrography Dataset for Chinle Creek, AZ...the OHW boundary is determined by observing recent physical evidence subsequent to flow. Channel morphology and physical features associated with the...data from the National Hydrography Dataset (NHD) (USGS 2010). The NHD digital ERDC/CRREL TR-13-3 9 stream data were downloaded as a line
Naveja, J. Jesús; Medina-Franco, José L.
2017-01-01
We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints’ similarity. The method uses a ‘satellites’ approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets. PMID:28794856
Naveja, J Jesús; Medina-Franco, José L
2017-01-01
We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints' similarity. The method uses a 'satellites' approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets.
Gravity, aeromagnetic and rock-property data of the central California Coast Ranges
Langenheim, V.E.
2014-01-01
Gravity, aeromagnetic, and rock-property data were collected to support geologic-mapping, water-resource, and seismic-hazard studies for the central California Coast Ranges. These data are combined with existing data to provide gravity, aeromagnetic, and physical-property datasets for this region. The gravity dataset consists of approximately 18,000 measurements. The aeromagnetic dataset consists of total-field anomaly values from several detailed surveys that have been merged and gridded at an interval of 200 m. The physical property dataset consists of approximately 800 density measurements and 1,100 magnetic-susceptibility measurements from rock samples, in addition to previously published borehole gravity surveys from Santa Maria Basin, density logs from Salinas Valley, and intensities of natural remanent magnetization.
NASA Astrophysics Data System (ADS)
Kuppel, S.; Soulsby, C.; Maneta, M. P.; Tetzlaff, D.
2017-12-01
The utility of field measurements to help constrain the model solution space and identify feasible model configurations has been an increasingly central issue in hydrological model calibration. Sufficiently informative observations are necessary to ensure that the goodness of model-data fit attained effectively translates into more physically-sound information for the internal model parameters, as a basis for model structure evaluation. Here we assess to which extent the diversity of information content can inform on the suitability of a complex, process-based ecohydrological model to simulate key water flux and storage dynamics at a long-term research catchment in the Scottish Highlands. We use the fully-distributed ecohydrological model EcH2O, calibrated against long-term datasets that encompass hydrologic and energy exchanges and ecological measurements: stream discharge, soil moisture, net radiation above canopy, and pine stand transpiration. Diverse combinations of these constraints were applied using a multi-objective cost function specifically designed to avoid compensatory effects between model-data metrics. Results revealed that calibration against virtually all datasets enabled the model to reproduce streamflow reasonably well. However, parameterizing the model to adequately capture local flux and storage dynamics, such as soil moisture or transpiration, required calibration with specific observations. This indicates that the footprint of the information contained in observations varies for each type of dataset, and that a diverse database informing about the different compartments of the domain, is critical to test hypotheses of catchment function and identify a consistent model parameterization. The results foster confidence in using EcH2O to help understanding current and future ecohydrological couplings in Northern catchments.
NASA Astrophysics Data System (ADS)
Mandolesi, E.; Jones, A. G.; Roux, E.; Lebedev, S.
2009-12-01
Recently different studies were undertaken on the correlation between diverse geophysical datasets. Magnetotelluric (MT) data are used to map the electrical conductivity structure behind the Earth, but one of the problems in MT method is the lack in resolution in mapping zones beneath a region of high conductivity. Joint inversion of different datasets in which a common structure is recognizable reduces non-uniqueness and may improve the quality of interpretation when different dataset are sensitive to different physical properties with an underlined common structure. A common structure is recognized if the change of physical properties occur at the same spatial locations. Common structure may be recognized in 1D inversion of seismic and MT datasets, and numerous authors show that also 2D common structure may drive to an improvement of inversion quality while dataset are jointly inverted. In this presentation a tool to constrain MT 2D inversion with phase velocity of surface wave seismic data (SW) is proposed and is being developed and tested on synthetic data. Results obtained suggest that a joint inversion scheme could be applied with success along a section profile for which data are compatible with a 2D MT model.
LiDAR Vegetation Investigation and Signature Analysis System (LVISA)
NASA Astrophysics Data System (ADS)
Höfle, Bernhard; Koenig, Kristina; Griesbaum, Luisa; Kiefer, Andreas; Hämmerle, Martin; Eitel, Jan; Koma, Zsófia
2015-04-01
Our physical environment undergoes constant changes in space and time with strongly varying triggers, frequencies, and magnitudes. Monitoring these environmental changes is crucial to improve our scientific understanding of complex human-environmental interactions and helps us to respond to environmental change by adaptation or mitigation. The three-dimensional (3D) description of the Earth surface features and the detailed monitoring of surface processes using 3D spatial data have gained increasing attention within the last decades, such as in climate change research (e.g., glacier retreat), carbon sequestration (e.g., forest biomass monitoring), precision agriculture and natural hazard management. In all those areas, 3D data have helped to improve our process understanding by allowing quantifying the structural properties of earth surface features and their changes over time. This advancement has been fostered by technological developments and increased availability of 3D sensing systems. In particular, LiDAR (light detection and ranging) technology, also referred to as laser scanning, has made significant progress and has evolved into an operational tool in environmental research and geosciences. The main result of LiDAR measurements is a highly spatially resolved 3D point cloud. Each point within the LiDAR point cloud has a XYZ coordinate associated with it and often additional information such as the strength of the returned backscatter. The point cloud provided by LiDAR contains rich geospatial, structural, and potentially biochemical information about the surveyed objects. To deal with the inherently unorganized datasets and the large data volume (frequently millions of XYZ coordinates) of LiDAR datasets, a multitude of algorithms for automatic 3D object detection (e.g., of single trees) and physical surface description (e.g., biomass) have been developed. However, so far the exchange of datasets and approaches (i.e., extraction algorithms) among LiDAR users lacks behind. We propose a novel concept, the LiDAR Vegetation Investigation and Signature Analysis System (LVISA), which shall enhance sharing of i) reference datasets of single vegetation objects with rich reference data (e.g., plant species, basic plant morphometric information) and ii) approaches for information extraction (e.g., single tree detection, tree species classification based on waveform LiDAR features). We will build an extensive LiDAR data repository for supporting the development and benchmarking of LiDAR-based object information extraction. The LiDAR Vegetation Investigation and Signature Analysis System (LVISA) uses international web service standards (Open Geospatial Consortium, OGC) for geospatial data access and also analysis (e.g., OGC Web Processing Services). This will allow the research community identifying plant object specific vegetation features from LiDAR data, while accounting for differences in LiDAR systems (e.g., beam divergence), settings (e.g., point spacing), and calibration techniques. It is the goal of LVISA to develop generic 3D information extraction approaches, which can be seamlessly transferred to other datasets, timestamps and also extraction tasks. The current prototype of LVISA can be visited and tested online via http://uni-heidelberg.de/lvisa. Video tutorials provide a quick overview and entry into the functionality of LVISA. We will present the current advances of LVISA and we will highlight future research and extension of LVISA, such as integrating low-cost LiDAR data and datasets acquired by highly temporal scanning of vegetation (e.g., continuous measurements). Everybody is invited to join the LVISA development and share datasets and analysis approaches in an interoperable way via the web-based LVISA geoportal.
2D data-space cross-gradient joint inversion of MT, gravity and magnetic data
NASA Astrophysics Data System (ADS)
Pak, Yong-Chol; Li, Tonglin; Kim, Gang-Sop
2017-08-01
We have developed a data-space multiple cross-gradient joint inversion algorithm, and validated it through synthetic tests and applied it to magnetotelluric (MT), gravity and magnetic datasets acquired along a 95 km profile in Benxi-Ji'an area of northeastern China. To begin, we discuss a generalized cross-gradient joint inversion for multiple datasets and model parameters sets, and formulate it in data space. The Lagrange multiplier required for the structural coupling in the data-space method is determined using an iterative solver to avoid calculation of the inverse matrix in solving the large system of equations. Next, using model-space and data-space methods, we inverted the synthetic data and field data. Based on our result, the joint inversion in data-space not only delineates geological bodies more clearly than the separate inversion, but also yields nearly equal results with the one in model-space while consuming much less memory.
Towards a National Space Weather Predictive Capability
NASA Astrophysics Data System (ADS)
Fox, N. J.; Ryschkewitsch, M. G.; Merkin, V. G.; Stephens, G. K.; Gjerloev, J. W.; Barnes, R. J.; Anderson, B. J.; Paxton, L. J.; Ukhorskiy, A. Y.; Kelly, M. A.; Berger, T. E.; Bonadonna, L. C. M. F.; Hesse, M.; Sharma, S.
2015-12-01
National needs in the area of space weather informational and predictive tools are growing rapidly. Adverse conditions in the space environment can cause disruption of satellite operations, communications, navigation, and electric power distribution grids, leading to a variety of socio-economic losses and impacts on our security. Future space exploration and most modern human endeavors will require major advances in physical understanding and improved transition of space research to operations. At present, only a small fraction of the latest research and development results from NASA, NOAA, NSF and DoD investments are being used to improve space weather forecasting and to develop operational tools. The power of modern research and space weather model development needs to be better utilized to enable comprehensive, timely, and accurate operational space weather tools. The mere production of space weather information is not sufficient to address the needs of those who are affected by space weather. A coordinated effort is required to support research-to-applications transition efforts and to develop the tools required those who rely on this information. In this presentation we will review the space weather system developed for the Van Allen Probes mission, together with other datasets, tools and models that have resulted from research by scientists at JHU/APL. We will look at how these, and results from future missions such as Solar Probe Plus, could be applied to support space weather applications in coordination with other community assets and capabilities.
Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)
Mascher, Martin; Muehlbauer, Gary J; Rokhsar, Daniel S; Chapman, Jarrod; Schmutz, Jeremy; Barry, Kerrie; Muñoz-Amatriaín, María; Close, Timothy J; Wise, Roger P; Schulman, Alan H; Himmelbach, Axel; Mayer, Klaus FX; Scholz, Uwe; Poland, Jesse A; Stein, Nils; Waugh, Robbie
2013-01-01
Next-generation whole-genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence-based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost-efficient establishment of powerful genomic information for many species. PMID:23998490
Map_plot and bgg_plot: software for integration of geoscience datasets
NASA Astrophysics Data System (ADS)
Gaillot, Philippe; Punongbayan, Jane T.; Rea, Brice
2004-02-01
Since 1985, the Ocean Drilling Program (ODP) has been supporting multidisciplinary research in exploring the structure and history of Earth beneath the oceans. After more than 200 Legs, complementary datasets covering different geological environments, periods and space scales have been obtained and distributed world-wide using the ODP-Janus and Lamont Doherty Earth Observatory-Borehole Research Group (LDEO-BRG) database servers. In Earth Sciences, more than in any other science, the ensemble of these data is characterized by heterogeneous formats and graphical representation modes. In order to fully and quickly assess this information, a set of Unix/Linux and Generic Mapping Tool-based C programs has been designed to convert and integrate datasets acquired during the present ODP and the future Integrated ODP (IODP) Legs. Using ODP Leg 199 datasets, we show examples of the capabilities of the proposed programs. The program map_plot is used to easily display datasets onto 2-D maps. The program bgg_plot (borehole geology and geophysics plot) displays data with respect to depth and/or time. The latter program includes depth shifting, filtering and plotting of core summary information, continuous and discrete-sample core measurements (e.g. physical properties, geochemistry, etc.), in situ continuous logs, magneto- and bio-stratigraphies, specific sedimentological analyses (lithology, grain size, texture, porosity, etc.), as well as core and borehole wall images. Outputs from both programs are initially produced in PostScript format that can be easily converted to Portable Document Format (PDF) or standard image formats (GIF, JPEG, etc.) using widely distributed conversion programs. Based on command line operations and customization of parameter files, these programs can be included in other shell- or database-scripts, automating plotting procedures of data requests. As an open source software, these programs can be customized and interfaced to fulfill any specific plotting need of geoscientists using ODP-like datasets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitrani, J
Bayesian networks (BN) are an excellent tool for modeling uncertainties in systems with several interdependent variables. A BN is a directed acyclic graph, and consists of a structure, or the set of directional links between variables that depend on other variables, and conditional probabilities (CP) for each variable. In this project, we apply BN's to understand uncertainties in NIF ignition experiments. One can represent various physical properties of National Ignition Facility (NIF) capsule implosions as variables in a BN. A dataset containing simulations of NIF capsule implosions was provided. The dataset was generated from a radiation hydrodynamics code, and itmore » contained 120 simulations of 16 variables. Relevant knowledge about the physics of NIF capsule implosions and greedy search algorithms were used to search for hypothetical structures for a BN. Our preliminary results found 6 links between variables in the dataset. However, we thought there should have been more links between the dataset variables based on the physics of NIF capsule implosions. Important reasons for the paucity of links are the relatively small size of the dataset, and the sampling of the values for dataset variables. Another factor that might have caused the paucity of links is the fact that in the dataset, 20% of the simulations represented successful fusion, and 80% didn't, (simulations of unsuccessful fusion are useful for measuring certain diagnostics) which skewed the distributions of several variables, and possibly reduced the number of links. Nevertheless, by illustrating the interdependencies and conditional probabilities of several parameters and diagnostics, an accurate and complete BN built from an appropriate simulation set would provide uncertainty quantification for NIF capsule implosions.« less
Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu
2016-04-26
In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919-2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks.
Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu
2016-01-01
In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919–2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks. PMID:27116565
q-Space Upsampling Using x-q Space Regularization.
Chen, Geng; Dong, Bin; Zhang, Yong; Shen, Dinggang; Yap, Pew-Thian
2017-09-01
Acquisition time in diffusion MRI increases with the number of diffusion-weighted images that need to be acquired. Particularly in clinical settings, scan time is limited and only a sparse coverage of the vast q -space is possible. In this paper, we show how non-local self-similar information in the x - q space of diffusion MRI data can be harnessed for q -space upsampling. More specifically, we establish the relationships between signal measurements in x - q space using a patch matching mechanism that caters to unstructured data. We then encode these relationships in a graph and use it to regularize an inverse problem associated with recovering a high q -space resolution dataset from its low-resolution counterpart. Experimental results indicate that the high-resolution datasets reconstructed using the proposed method exhibit greater quality, both quantitatively and qualitatively, than those obtained using conventional methods, such as interpolation using spherical radial basis functions (SRBFs).
Scalable learning method for feedforward neural networks using minimal-enclosing-ball approximation.
Wang, Jun; Deng, Zhaohong; Luo, Xiaoqing; Jiang, Yizhang; Wang, Shitong
2016-06-01
Training feedforward neural networks (FNNs) is one of the most critical issues in FNNs studies. However, most FNNs training methods cannot be directly applied for very large datasets because they have high computational and space complexity. In order to tackle this problem, the CCMEB (Center-Constrained Minimum Enclosing Ball) problem in hidden feature space of FNN is discussed and a novel learning algorithm called HFSR-GCVM (hidden-feature-space regression using generalized core vector machine) is developed accordingly. In HFSR-GCVM, a novel learning criterion using L2-norm penalty-based ε-insensitive function is formulated and the parameters in the hidden nodes are generated randomly independent of the training sets. Moreover, the learning of parameters in its output layer is proved equivalent to a special CCMEB problem in FNN hidden feature space. As most CCMEB approximation based machine learning algorithms, the proposed HFSR-GCVM training algorithm has the following merits: The maximal training time of the HFSR-GCVM training is linear with the size of training datasets and the maximal space consumption is independent of the size of training datasets. The experiments on regression tasks confirm the above conclusions. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Gelati, Emiliano; Decharme, Bertrand; Calvet, Jean-Christophe; Minvielle, Marie; Polcher, Jan; Fairbairn, David; Weedon, Graham P.
2018-04-01
Physically consistent descriptions of land surface hydrology are crucial for planning human activities that involve freshwater resources, especially in light of the expected climate change scenarios. We assess how atmospheric forcing data uncertainties affect land surface model (LSM) simulations by means of an extensive evaluation exercise using a number of state-of-the-art remote sensing and station-based datasets. For this purpose, we use the CO2-responsive ISBA-A-gs LSM coupled with the CNRM version of the Total Runoff Integrated Pathways (CTRIP) river routing model. We perform multi-forcing simulations over the Euro-Mediterranean area (25-75.5° N, 11.5° W-62.5° E, at 0.5° resolution) from 1979 to 2012. The model is forced using four atmospheric datasets. Three of them are based on the ERA-Interim reanalysis (ERA-I). The fourth dataset is independent from ERA-Interim: PGF, developed at Princeton University. The hydrological impacts of atmospheric forcing uncertainties are assessed by comparing simulated surface soil moisture (SSM), leaf area index (LAI) and river discharge against observation-based datasets: SSM from the European Space Agency's Water Cycle Multi-mission Observation Strategy and Climate Change Initiative projects (ESA-CCI), LAI of the Global Inventory Modeling and Mapping Studies (GIMMS), and Global Runoff Data Centre (GRDC) river discharge. The atmospheric forcing data are also compared to reference datasets. Precipitation is the most uncertain forcing variable across datasets, while the most consistent are air temperature and SW and LW radiation. At the monthly timescale, SSM and LAI simulations are relatively insensitive to forcing uncertainties. Some discrepancies with ESA-CCI appear to be forcing-independent and may be due to different assumptions underlying the LSM and the remote sensing retrieval algorithm. All simulations overestimate average summer and early-autumn LAI. Forcing uncertainty impacts on simulated river discharge are larger on mean values and standard deviations than on correlations with GRDC data. Anomaly correlation coefficients are not inferior to those computed from raw monthly discharge time series, indicating that the model reproduces inter-annual variability fairly well. However, simulated river discharge time series generally feature larger variability compared to measurements. They also tend to overestimate winter-spring high flows and underestimate summer-autumn low flows. Considering that several differences emerge between simulations and reference data, which may not be completely explained by forcing uncertainty, we suggest several research directions. These range from further investigating the discrepancies between LSMs and remote sensing retrievals to developing new model components to represent physical and anthropogenic processes.
NASA Technical Reports Server (NTRS)
Armstrong, Edward; Tauer, Eric
2013-01-01
The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.
Progeny Clustering: A Method to Identify Biological Phenotypes
Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.
2015-01-01
Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets
2011-01-01
Background M. tuberculosis is a formidable bacterial pathogen. There is thus an increasing demand on understanding the function and relationship of proteins in various strains of M. tuberculosis. Protein-protein interactions (PPIs) data are crucial for this kind of knowledge. However, the quality of the main available M. tuberculosis PPI datasets is unclear. This hampers the effectiveness of research works that rely on these PPI datasets. Here, we analyze the two main available M. tuberculosis H37Rv PPI datasets. The first dataset is the high-throughput B2H PPI dataset from Wang et al’s recent paper in Journal of Proteome Research. The second dataset is from STRING database, version 8.3, comprising entirely of H37Rv PPIs predicted using various methods. We find that these two datasets have a surprisingly low level of agreement. We postulate the following causes for this low level of agreement: (i) the H37Rv B2H PPI dataset is of low quality; (ii) the H37Rv STRING PPI dataset is of low quality; and/or (iii) the H37Rv STRING PPIs are predictions of other forms of functional associations rather than direct physical interactions. Results To test the quality of these two datasets, we evaluate them based on correlated gene expression profiles, coherent informative GO term annotations, and conservation in other organisms. We observe a significantly greater portion of PPIs in the H37Rv STRING PPI dataset (with score ≥ 770) having correlated gene expression profiles and coherent informative GO term annotations in both interaction partners than that in the H37Rv B2H PPI dataset. Predicted H37Rv interologs derived from non-M. tuberculosis experimental PPIs are much more similar to the H37Rv STRING functional associations dataset (with score ≥ 770) than the H37Rv B2H PPI dataset. H37Rv predicted physical interologs from IntAct also show extremely low similarity with the H37Rv B2H PPI dataset; and this similarity level is much lower than that between the S. aureus MRSA252 predicted physical interologs from IntAct and S. aureus MRSA252 pull-down PPIs. Comparative analysis with several representative two-hybrid PPI datasets in other species further confirms that the H37Rv B2H PPI dataset is of low quality. Next, to test the possibility that the H37Rv STRING PPIs are not purely direct physical interactions, we compare M. tuberculosis H37Rv protein pairs that catalyze adjacent steps in enzymatic reactions to B2H PPIs and predicted PPIs in STRING, which shows it has much lower similarities with the B2H PPIs than with STRING PPIs. This result strongly suggests that the H37Rv STRING PPIs more likely correspond to indirect relationships between protein pairs than to B2H PPIs. For more precise support, we turn to S. cerevisiae for its comprehensively studied interactome. We compare S. cerevisiae predicted PPIs in STRING to three independent protein relationship datasets which respectively comprise PPIs reported in Y2H assays, protein pairs reported to be in the same protein complexes, and protein pairs that catalyze successive reaction steps in enzymatic reactions. Our analysis reveals that S. cerevisiae predicted STRING PPIs have much higher similarity to the latter two types of protein pairs than to two-hybrid PPIs. As H37Rv STRING PPIs are predicted using similar methods as S. cerevisiae predicted STRING PPIs, this suggests that these H37Rv STRING PPIs are more likely to correspond to the latter two types of protein pairs rather than to two-hybrid PPIs as well. Conclusions The H37Rv B2H PPI dataset has low quality. It should not be used as the gold standard to assess the quality of other (possibly predicted) H37Rv PPI datasets. The H37Rv STRING PPI dataset also has low quality; nevertheless, a subset consisting of STRING PPIs with score ≥770 has satisfactory quality. However, these STRING “PPIs” should be interpreted as functional associations, which include a substantial portion of indirect protein interactions, rather than direct physical interactions. These two factors cause the strikingly low similarity between these two main H37Rv PPI datasets. The results and conclusions from this comparative analysis provide valuable guidance in using these M. tuberculosis H37Rv PPI datasets in subsequent studies for a wide range of purposes. PMID:22369691
Song, Jiajia; Li, Dan; Ma, Xiaoyuan; Teng, Guowei; Wei, Jianming
2017-01-01
Dynamic accurate heart-rate (HR) estimation using a photoplethysmogram (PPG) during intense physical activities is always challenging due to corruption by motion artifacts (MAs). It is difficult to reconstruct a clean signal and extract HR from contaminated PPG. This paper proposes a robust HR-estimation algorithm framework that uses one-channel PPG and tri-axis acceleration data to reconstruct the PPG and calculate the HR based on features of the PPG and spectral analysis. Firstly, the signal is judged by the presence of MAs. Then, the spectral peaks corresponding to acceleration data are filtered from the periodogram of the PPG when MAs exist. Different signal-processing methods are applied based on the amount of remaining PPG spectral peaks. The main MA-removal algorithm (NFEEMD) includes the repeated single-notch filter and ensemble empirical mode decomposition. Finally, HR calibration is designed to ensure the accuracy of HR tracking. The NFEEMD algorithm was performed on the 23 datasets from the 2015 IEEE Signal Processing Cup Database. The average estimation errors were 1.12 BPM (12 training datasets), 2.63 BPM (10 testing datasets) and 1.87 BPM (all 23 datasets), respectively. The Pearson correlation was 0.992. The experiment results illustrate that the proposed algorithm is not only suitable for HR estimation during continuous activities, like slow running (13 training datasets), but also for intense physical activities with acceleration, like arm exercise (10 testing datasets). PMID:29068403
NASA Astrophysics Data System (ADS)
Shrestha, S. R.; Collow, T. W.; Rose, B.
2016-12-01
Scientific datasets are generated from various sources and platforms but they are typically produced either by earth observation systems or by modelling systems. These are widely used for monitoring, simulating, or analyzing measurements that are associated with physical, chemical, and biological phenomena over the ocean, atmosphere, or land. A significant subset of scientific datasets stores values directly as rasters or in a form that can be rasterized. This is where a value exists at every cell in a regular grid spanning the spatial extent of the dataset. Government agencies like NOAA, NASA, EPA, USGS produces large volumes of near real-time, forecast, and historical data that drives climatological and meteorological studies, and underpins operations ranging from weather prediction to sea ice loss. Modern science is computationally intensive because of the availability of an enormous amount of scientific data, the adoption of data-driven analysis, and the need to share these dataset and research results with the public. ArcGIS as a platform is sophisticated and capable of handling such complex domain. We'll discuss constructs and capabilities applicable to multidimensional gridded data that can be conceptualized as a multivariate space-time cube. Building on the concept of a two-dimensional raster, a typical multidimensional raster dataset could contain several "slices" within the same spatial extent. We will share a case from the NOAA Climate Forecast Systems Reanalysis (CFSR) multidimensional data as an example of how large collections of rasters can be efficiently organized and managed through a data model within a geodatabase called "Mosaic dataset" and dynamically transformed and analyzed using raster functions. A raster function is a lightweight, raster-valued transformation defined over a mixed set of raster and scalar input. That means, just like any tool, you can provide a raster function with input parameters. It enables dynamic processing of only the data that's being displayed on the screen or requested by an application. We will present the dynamic processing and analysis of CFSR data using the chains of raster function and share it as dynamic multidimensional image service. This workflow and capabilities can be easily applied to any scientific data formats that are supported in mosaic dataset.
NASA Astrophysics Data System (ADS)
Ott, L.; Sellers, P. J.; Schimel, D.; Moore, B., III; O'Dell, C.; Crowell, S.; Kawa, S. R.; Pawson, S.; Chatterjee, A.; Baker, D. F.; Schuh, A. E.
2017-12-01
Satellite observations of carbon dioxide (CO2) and methane (CH4) are critically needed to improve understanding of the contemporary carbon budget and carbon-climate feedbacks. Though current carbon observing satellites have provided valuable data in regions not covered by surface in situ measurements, limited sampling of key regions and small but spatially coherent biases have limited the ability to estimate fluxes at the time and space scales needed for improved process-level understanding and informed decision-making. Next generation satellites will improve coverage in data sparse regions, either through use of active remote sensing, a geostationary vantage point, or increased swath width, but all techniques have limitations. The relative strengths and weaknesses of these approaches and their synergism have not previously been examined. To address these needs, a significant subset of the US carbon modeling community has come together with support from NASA to conduct a series of coordinated observing system simulation experiments (OSSEs), with close collaboration in framing the experiments and in analyzing the results. Here, we report on the initial phase of this initiative, which focused on creating realistic, physically consistent synthetic CO2 and CH4 observational datasets for use in inversion and signal detection experiments. These datasets have been created using NASA's Goddard Earth Observing System Model (GEOS) to represent the current state of atmospheric carbon as well as best available estimates of expected flux changes. Scenarios represented include changes in urban emissions, release of permafrost soil carbon, changes in carbon uptake in tropical and mid-latitude forests, changes in the Southern Ocean sink, and changes in both anthropogenic and natural methane emissions. This GEOS carbon `nature run' was sampled by instrument simulators representing the most prominent observing strategies with a focus on consistently representing the impacts of random errors and limitations in viewing due to clouds and aerosols. Statistical analyses of these synthetic datasets provide a simple, objective method for evaluating mission design choices. These datasets will also be made publicly available for use by the international carbon modeling community and in mission planning activities.
Space-time modeling of soil moisture
NASA Astrophysics Data System (ADS)
Chen, Zijuan; Mohanty, Binayak P.; Rodriguez-Iturbe, Ignacio
2017-11-01
A physically derived space-time mathematical representation of the soil moisture field is carried out via the soil moisture balance equation driven by stochastic rainfall forcing. The model incorporates spatial diffusion and in its original version, it is shown to be unable to reproduce the relative fast decay in the spatial correlation functions observed in empirical data. This decay resulting from variations in local topography as well as in local soil and vegetation conditions is well reproduced via a jitter process acting multiplicatively over the space-time soil moisture field. The jitter is a multiplicative noise acting on the soil moisture dynamics with the objective to deflate its correlation structure at small spatial scales which are not embedded in the probabilistic structure of the rainfall process that drives the dynamics. These scales of order of several meters to several hundred meters are of great importance in ecohydrologic dynamics. Properties of space-time correlation functions and spectral densities of the model with jitter are explored analytically, and the influence of the jitter parameters, reflecting variabilities of soil moisture at different spatial and temporal scales, is investigated. A case study fitting the derived model to a soil moisture dataset is presented in detail.
The df: A proposed data format standard
NASA Technical Reports Server (NTRS)
Lait, Leslie R.; Nash, Eric R.; Newman, Paul A.
1993-01-01
A standard is proposed describing a portable format for electronic exchange of data in the physical sciences. Writing scientific data in a standard format has three basic advantages: portability; the ability to use metadata to aid in interpretation of the data (understandability); and reusability. An improperly formulated standard format tends towards four disadvantages: (1) it can be inflexible and fail to allow the user to express his data as needed; (2) reading and writing such datasets can involve high overhead in computing time and storage space; (3) the format may be accessible only on certain machines using certain languages; and (4) under some circumstances it may be uncertain whether a given dataset actually conforms to the standard. A format was designed which enhances these advantages and lessens the disadvantages. The fundamental approach is to allow the user to make her own choices regarding strategic tradeoffs to achieve the performance desired in her local environment. The choices made are encoded in a specific and portable way in a set of records. A fully detailed description and specification of the format is given, and examples are used to illustrate various concepts. Implementation is discussed.
Yang, Chihae; Barlow, Susan M; Muldoon Jacobs, Kristi L; Vitcheva, Vessela; Boobis, Alan R; Felter, Susan P; Arvidson, Kirk B; Keller, Detlef; Cronin, Mark T D; Enoch, Steven; Worth, Andrew; Hollnagel, Heli M
2017-11-01
A new dataset of cosmetics-related chemicals for the Threshold of Toxicological Concern (TTC) approach has been compiled, comprising 552 chemicals with 219, 40, and 293 chemicals in Cramer Classes I, II, and III, respectively. Data were integrated and curated to create a database of No-/Lowest-Observed-Adverse-Effect Level (NOAEL/LOAEL) values, from which the final COSMOS TTC dataset was developed. Criteria for study inclusion and NOAEL decisions were defined, and rigorous quality control was performed for study details and assignment of Cramer classes. From the final COSMOS TTC dataset, human exposure thresholds of 42 and 7.9 μg/kg-bw/day were derived for Cramer Classes I and III, respectively. The size of Cramer Class II was insufficient for derivation of a TTC value. The COSMOS TTC dataset was then federated with the dataset of Munro and colleagues, previously published in 1996, after updating the latter using the quality control processes for this project. This federated dataset expands the chemical space and provides more robust thresholds. The 966 substances in the federated database comprise 245, 49 and 672 chemicals in Cramer Classes I, II and III, respectively. The corresponding TTC values of 46, 6.2 and 2.3 μg/kg-bw/day are broadly similar to those of the original Munro dataset. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
The Empirical Canadian High Arctic Ionospheric Model (E-CHAIM): NmF2 and hmF2 specification
NASA Astrophysics Data System (ADS)
Themens, David; Thayyil Jayachandran, P.
2017-04-01
It is well known that the International Reference Ionosphere (IRI) suffers reduced accuracy in its representation of monthly median ionospheric electron density at high latitudes (Themens et al. 2014, Themens et al. 2016). These inaccuracies are believed to stem from a historical lack of data from these regions. Now, roughly thirty and forty years after the development of the original URSI and CCIR foF2 maps, respectively, there exists a much larger dataset of high latitude observations of ionospheric electron density. These new measurements come in the form of new ionosonde deployments, such as those of the Canadian High Arctic Ionospheric Network, the CHAMP, GRACE, and COSMIC radio occultation missions, and the construction of the Poker Flat, Resolute, and EISCAT Incoherent Scatter Radar systems. These new datasets afford an opportunity to revise the IRI's representation of the high latitude ionosphere. For this purpose, we here introduce the Empirical Canadian High Arctic Ionospheric Model (E-CHAIM), which incorporates all of the above datasets, as well as the older observation records, into a new climatological representation of the high latitude ionosphere. In this presentation, we introduce the NmF2 and hmF2 portions of the model, focusing on both climatological and storm-time representations, and present a validation of the new model with respect to ionosonde observations from four high latitude stations. A comparison with respect to IRI performance is also presented, where we see improvements by up to 70% in the representation of peak electron density through using the new E-CHAIM model. In terms of RMS errors, the E-CHAIM model is shown to represent a near-universal improvement over the IRI, sometimes by more than 1 MHz in foF2. For peak height, the E-CHAIM model demonstrates overall RMS errors of 13km at each test site compared to values of 18-35km for the IRI, depending on location. Themens, D.R., P. T. Jayachandran, et al. (2014). J. Geophys. Res. Space Physics, 119, 6689-6703, doi:10.1002/2014JA020052. Themens, D.R., and P.T. Jayachandran (2016). J. Geophys. Res. Space Physics, 121, doi:10.1002/2016JA022664.
Analyzing How We Do Analysis and Consume Data, Results from the SciDAC-Data Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, P.; Aliaga, L.; Mubarak, M.
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data deliverymore » is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption« less
Analyzing how we do Analysis and Consume Data, Results from the SciDAC-Data Project
NASA Astrophysics Data System (ADS)
Ding, P.; Aliaga, L.; Mubarak, M.; Tsaris, A.; Norman, A.; Lyon, A.; Ross, R.
2017-10-01
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.
Farr, W. M.; Mandel, I.; Stevens, D.
2015-01-01
Selection among alternative theoretical models given an observed dataset is an important challenge in many areas of physics and astronomy. Reversible-jump Markov chain Monte Carlo (RJMCMC) is an extremely powerful technique for performing Bayesian model selection, but it suffers from a fundamental difficulty and it requires jumps between model parameter spaces, but cannot efficiently explore both parameter spaces at once. Thus, a naive jump between parameter spaces is unlikely to be accepted in the Markov chain Monte Carlo (MCMC) algorithm and convergence is correspondingly slow. Here, we demonstrate an interpolation technique that uses samples from single-model MCMCs to propose intermodel jumps from an approximation to the single-model posterior of the target parameter space. The interpolation technique, based on a kD-tree data structure, is adaptive and efficient in modest dimensionality. We show that our technique leads to improved convergence over naive jumps in an RJMCMC, and compare it to other proposals in the literature to improve the convergence of RJMCMCs. We also demonstrate the use of the same interpolation technique as a way to construct efficient ‘global’ proposal distributions for single-model MCMCs without prior knowledge of the structure of the posterior distribution, and discuss improvements that permit the method to be used in higher dimensional spaces efficiently. PMID:26543580
#AltPlanets: Exploring the Exoplanet Catalogue with Neural Networks
NASA Astrophysics Data System (ADS)
Laneuville, M.; Tasker, E. J.; Guttenberg, N.
2017-12-01
The launch of Kepler in 2009 brought the number of known exoplanets into the thousands, in a growth explosion that shows no sign of abating. While the data available for individual planets is presently typically restricted to orbital and bulk properties, the quantity of data points allows the potential for meaningful statistical analysis. It is not clear how planet mass, radius, orbital path, stellar properties and neighbouring planets influence one another, therefore it seems inevitable that patterns will be missed simply due to the difficulty of including so many dimensions. Even simple trends may be overlooked if they fall outside our expectation of planet formation; a strong risk in a field where new discoveries have destroyed theories from the first observations of hot Jupiters. A possible way forward is to take advantage of the capabilities of neural network autoencoders. The idea of such algorithms is to learn a representation (encoding) of the data in a lower dimension space, without a priori knowledge about links between the elements. This encoding space can then be used to discover the strongest correlations in the original dataset.The key point is that trends identified by a neural network are independent of any previous analysis and pre-conceived ideas about physical processes. Results can reveal new relationships between planet properties and verify existing trends. We applied this concept to study data from the NASA Exoplanet Archive and while we have begun to explore the potential use of neural networks for exoplanet data, there are many possible extensions. For example, the network can produce a large number of 'alternative planets' whose statistics should match the current distribution. This larger dataset could highlight gaps in the parameter space or indicate observations are missing particular regimes. This could guide instrument proposals towards objects liable to yield the most information.
Functional exploratory data analysis for high-resolution measurements of urban particulate matter.
Ranalli, M Giovanna; Rocco, Giorgia; Jona Lasinio, Giovanna; Moroni, Beatrice; Castellini, Silvia; Crocchianti, Stefano; Cappelletti, David
2016-09-01
In this work we propose the use of functional data analysis (FDA) to deal with a very large dataset of atmospheric aerosol size distribution resolved in both space and time. Data come from a mobile measurement platform in the town of Perugia (Central Italy). An OPC (Optical Particle Counter) is integrated on a cabin of the Minimetrò, an urban transportation system, that moves along a monorail on a line transect of the town. The OPC takes a sample of air every six seconds and counts the number of particles of urban aerosols with a diameter between 0.28 μm and 10 μm and classifies such particles into 21 size bins according to their diameter. Here, we adopt a 2D functional data representation for each of the 21 spatiotemporal series. In fact, space is unidimensional since it is measured as the distance on the monorail from the base station of the Minimetrò. FDA allows for a reduction of the dimensionality of each dataset and accounts for the high space-time resolution of the data. Functional cluster analysis is then performed to search for similarities among the 21 size channels in terms of their spatiotemporal pattern. Results provide a good classification of the 21 size bins into a relatively small number of groups (between three and four) according to the season of the year. Groups including coarser particles have more similar patterns, while those including finer particles show a more different behavior according to the period of the year. Such features are consistent with the physics of atmospheric aerosol and the highlighted patterns provide a very useful ground for prospective model-based studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
An Ensemble Multilabel Classification for Disease Risk Prediction
Liu, Wei; Zhao, Hongling; Zhang, Chaoyang
2017-01-01
It is important to identify and prevent disease risk as early as possible through regular physical examinations. We formulate the disease risk prediction into a multilabel classification problem. A novel Ensemble Label Power-set Pruned datasets Joint Decomposition (ELPPJD) method is proposed in this work. First, we transform the multilabel classification into a multiclass classification. Then, we propose the pruned datasets and joint decomposition methods to deal with the imbalance learning problem. Two strategies size balanced (SB) and label similarity (LS) are designed to decompose the training dataset. In the experiments, the dataset is from the real physical examination records. We contrast the performance of the ELPPJD method with two different decomposition strategies. Moreover, the comparison between ELPPJD and the classic multilabel classification methods RAkEL and HOMER is carried out. The experimental results show that the ELPPJD method with label similarity strategy has outstanding performance. PMID:29065647
New Hubble Space Telescope Multi-Wavelength Imaging of the Eagle Nebula
NASA Astrophysics Data System (ADS)
Levay, Zoltan G.; Christian, Carol A.; Mack, Jennifer; Frattare, Lisa M.; Livio, Mario; Meyett, Michele L.; Mutchler, Maximilian J.; Noll, Keith S.; Hubble Heritage
2015-01-01
One of the most iconic images from the Hubble Space Telescope has been the 1995 WFPC2 image of the Eagle Nebula (M16, sometimes known as the "Pillars of Creation"). Nineteen years after those original observations, new images have been obtained with HST's current instrumentation: a small mosaic in visible-light, narrow-band filters with WFC3/UVIS, infrared, broad-band filters with WFC3/IR, and parallel Hα imaging with ACS/WFC. The wider field of view, higher resolution, and broader wavelength coverage of the new images highlight the improved capabilities of HST over its long-lasting operation, made possible by the upgraded instrumentation installed during Space Shuttle servicing missions. Csite images from these datasets are presented to commemorate the 25th anniversary of HST's launch. Carefully combined, aligned and calibrated datasets from the primary WFC3 fields are available as High-Level Science Products in MAST (http://archive.stsci.edu/prepds/heritage/). Color composite images from these datasets are presented to commemorate the 25th anniversary of HST's launch.
Toward a Physical Characterization of Raindrop Collision Outcome Regimes
NASA Technical Reports Server (NTRS)
Testik, F. Y.; Barros, Ana P.; Bilven, Francis L.
2011-01-01
A comprehensive raindrop collision outcome regime diagram that delineates the physical conditions associated with the outcome regimes (i.e., bounce, coalescence, and different breakup types) of binary raindrop collisions is proposed. The proposed diagram builds on a theoretical regime diagram defined in the phase space of collision Weber numbers We and the drop diameter ratio p by including critical angle of impact considerations. In this study, the theoretical regime diagram is first evaluated against a comprehensive dataset for drop collision experiments representative of raindrop collisions in nature. Subsequently, the theoretical regime diagram is modified to explicitly describe the dominant regimes of raindrop interactions in (We, p) by delineating the physical conditions necessary for the occurrence of distinct types of collision-induced breakup (neck/filament, sheet, disk, and crown breakups) based on critical angle of impact consideration. Crown breakup is a subtype of disk breakup for lower collision kinetic energy that presents distinctive morphology. Finally, the experimental results are analyzed in the context of the comprehensive collision regime diagram, and conditional probabilities that can be used in the parameterization of breakup kernels in stochastic models of raindrop dynamics are provided.
Asteroid Family Physical Properties
NASA Astrophysics Data System (ADS)
Masiero, J. R.; DeMeo, F. E.; Kasuga, T.; Parker, A. H.
An asteroid family is typically formed when a larger parent body undergoes a catastrophic collisional disruption, and as such, family members are expected to show physical properties that closely trace the composition and mineralogical evolution of the parent. Recently a number of new datasets have been released that probe the physical properties of a large number of asteroids, many of which are members of identified families. We review these datasets and the composite properties of asteroid families derived from this plethora of new data. We also discuss the limitations of the current data, as well as the open questions in the field.
Hu, Weiming; Hu, Ruiguang; Xie, Nianhua; Ling, Haibin; Maybank, Stephen
2014-04-01
In this paper, we propose saliency driven image multiscale nonlinear diffusion filtering. The resulting scale space in general preserves or even enhances semantically important structures such as edges, lines, or flow-like structures in the foreground, and inhibits and smoothes clutter in the background. The image is classified using multiscale information fusion based on the original image, the image at the final scale at which the diffusion process converges, and the image at a midscale. Our algorithm emphasizes the foreground features, which are important for image classification. The background image regions, whether considered as contexts of the foreground or noise to the foreground, can be globally handled by fusing information from different scales. Experimental tests of the effectiveness of the multiscale space for the image classification are conducted on the following publicly available datasets: 1) the PASCAL 2005 dataset; 2) the Oxford 102 flowers dataset; and 3) the Oxford 17 flowers dataset, with high classification rates.
EnviroAtlas - Tampa, FL - Land Cover by Block Group
This EnviroAtlas dataset describes the percentage of each block group that is classified as impervious, forest, green space, wetland, and agriculture. Impervious is a combination of dark and light impervious. Forest is a combination of trees and forest and woody wetlands. Green space is a combination of trees and forest, grass and herbaceous, agriculture, woody wetlands, and emergent wetlands. Wetlands includes both Woody and Emergent Wetlands.This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
A program for handling map projections of small-scale geospatial raster data
Finn, Michael P.; Steinwand, Daniel R.; Trent, Jason R.; Buehler, Robert A.; Mattli, David M.; Yamamoto, Kristina H.
2012-01-01
Scientists routinely accomplish small-scale geospatial modeling using raster datasets of global extent. Such use often requires the projection of global raster datasets onto a map or the reprojection from a given map projection associated with a dataset. The distortion characteristics of these projection transformations can have significant effects on modeling results. Distortions associated with the reprojection of global data are generally greater than distortions associated with reprojections of larger-scale, localized areas. The accuracy of areas in projected raster datasets of global extent is dependent on spatial resolution. To address these problems of projection and the associated resampling that accompanies it, methods for framing the transformation space, direct point-to-point transformations rather than gridded transformation spaces, a solution to the wrap-around problem, and an approach to alternative resampling methods are presented. The implementations of these methods are provided in an open-source software package called MapImage (or mapIMG, for short), which is designed to function on a variety of computer architectures.
A sub-space greedy search method for efficient Bayesian Network inference.
Zhang, Qing; Cao, Yong; Li, Yong; Zhu, Yanming; Sun, Samuel S M; Guo, Dianjing
2011-09-01
Bayesian network (BN) has been successfully used to infer the regulatory relationships of genes from microarray dataset. However, one major limitation of BN approach is the computational cost because the calculation time grows more than exponentially with the dimension of the dataset. In this paper, we propose a sub-space greedy search method for efficient Bayesian Network inference. Particularly, this method limits the greedy search space by only selecting gene pairs with higher partial correlation coefficients. Using both synthetic and real data, we demonstrate that the proposed method achieved comparable results with standard greedy search method yet saved ∼50% of the computational time. We believe that sub-space search method can be widely used for efficient BN inference in systems biology. Copyright © 2011 Elsevier Ltd. All rights reserved.
Distributing and storing data efficiently by means of special datasets in the ATLAS collaboration
NASA Astrophysics Data System (ADS)
Köneke, Karsten; ATLAS Collaboration
2011-12-01
With the start of the LHC physics program, the ATLAS experiment started to record vast amounts of data. This data has to be distributed and stored on the world-wide computing grid in a smart way in order to enable an effective and efficient analysis by physicists. This article describes how the ATLAS collaboration chose to create specialized reduced datasets in order to efficiently use computing resources and facilitate physics analyses.
Field Research Facility Data Integration Framework Data Management Plan: Survey Lines Dataset
2016-08-01
CHL and its District partners. The beach morphology surveys on which this report focuses provide quantitative measures of the dynamic nature of...topography • volume change 1.4 Data description The morphology surveys are conducted over a series of 26 shore- perpendicular profile lines spaced 50...dataset input data and products. Table 1. FRF survey lines dataset input data and products. Input Data FDIF Product Description ASCII LARC survey text
Caruso, Geoffrey; Cavailhès, Jean; Peeters, Dominique; Thomas, Isabelle; Frankhauser, Pierre; Vuidel, Gilles
2015-01-01
This paper describes a dataset of 6284 land transactions prices and plot surfaces in 3 medium-sized cities in France (Besançon, Dijon and Brest). The dataset includes road accessibility as obtained from a minimization algorithm, and the amount of green space available to households in the neighborhood of the transactions, as evaluated from a land cover dataset. Further to the data presentation, the paper describes how these variables can be used to estimate the non-observable parameters of a residential choice function explicitly derived from a microeconomic model. The estimates are used by Caruso et al. (2015) to run a calibrated microeconomic urban growth simulation model where households are assumed to trade-off accessibility and local green space amenities. PMID:26958606
Sun, Xiaodian; Jin, Li; Xiong, Momiao
2008-01-01
It is system dynamics that determines the function of cells, tissues and organisms. To develop mathematical models and estimate their parameters are an essential issue for studying dynamic behaviors of biological systems which include metabolic networks, genetic regulatory networks and signal transduction pathways, under perturbation of external stimuli. In general, biological dynamic systems are partially observed. Therefore, a natural way to model dynamic biological systems is to employ nonlinear state-space equations. Although statistical methods for parameter estimation of linear models in biological dynamic systems have been developed intensively in the recent years, the estimation of both states and parameters of nonlinear dynamic systems remains a challenging task. In this report, we apply extended Kalman Filter (EKF) to the estimation of both states and parameters of nonlinear state-space models. To evaluate the performance of the EKF for parameter estimation, we apply the EKF to a simulation dataset and two real datasets: JAK-STAT signal transduction pathway and Ras/Raf/MEK/ERK signaling transduction pathways datasets. The preliminary results show that EKF can accurately estimate the parameters and predict states in nonlinear state-space equations for modeling dynamic biochemical networks. PMID:19018286
NASA Astrophysics Data System (ADS)
Merchant, C. J.; Hulley, G. C.
2013-12-01
There are many datasets describing the evolution of global sea surface temperature (SST) over recent decades -- so why make another one? Answer: to provide observations of SST that have particular qualities relevant to climate applications: independence, accuracy and stability. This has been done within the European Space Agency (ESA) Climate Change Initative (CCI) project on SST. Independence refers to the fact that the new SST CCI dataset is not derived from or tuned to in situ observations. This matters for climate because the in situ observing network used to assess marine climate change (1) was not designed to monitor small changes over decadal timescales, and (2) has evolved significantly in its technology and mix of types of observation, even during the past 40 years. The potential for significant artefacts in our picture of global ocean surface warming is clear. Only by having an independent record can we confirm (or refute) that the work done to remove biases/trend artefacts in in-situ datasets has been successful. Accuracy is the degree to which SSTs are unbiased. For climate applications, a common accuracy target is 0.1 K for all regions of the ocean. Stability is the degree to which the bias, if any, in a dataset is constant over time. Long-term instability introduces trend artefacts. To observe trends of the magnitude of 'global warming', SST datasets need to be stable to <5 mK/year. The SST CCI project has produced a satellite-based dataset that addresses these characteristics relevant to climate applications. Satellite radiances (brightness temperatures) have been harmonised exploiting periods of overlapping observations between sensors. Less well-characterised sensors have had their calibration tuned to that of better characterised sensors (at radiance level). Non-conventional retrieval methods (optimal estimation) have been employed to reduce regional biases to the 0.1 K level, a target violated in most satellite SST datasets. Models for quantifying uncertainty have been developed to attach uncertainty to SST across a range of space-time scales. The stability of the data has been validated.
Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications
NASA Astrophysics Data System (ADS)
Maskey, M.; Ramachandran, R.; Miller, J.
2017-12-01
Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.
Lee, Danny; Greer, Peter B; Pollock, Sean; Kim, Taeho; Keall, Paul
2016-05-01
The dynamic keyhole is a new MR image reconstruction method for thoracic and abdominal MR imaging. To date, this method has not been investigated with cancer patient magnetic resonance imaging (MRI) data. The goal of this study was to assess the dynamic keyhole method for the task of lung tumor localization using cine-MR images reconstructed in the presence of respiratory motion. The dynamic keyhole method utilizes a previously acquired a library of peripheral k-space datasets at similar displacement and phase (where phase is simply used to determine whether the breathing is inhale to exhale or exhale to inhale) respiratory bins in conjunction with central k-space datasets (keyhole) acquired. External respiratory signals drive the process of sorting, matching, and combining the two k-space streams for each respiratory bin, thereby achieving faster image acquisition without substantial motion artifacts. This study was the first that investigates the impact of k-space undersampling on lung tumor motion and area assessment across clinically available techniques (zero-filling and conventional keyhole). In this study, the dynamic keyhole, conventional keyhole and zero-filling methods were compared to full k-space dataset acquisition by quantifying (1) the keyhole size required for central k-space datasets for constant image quality across sixty four cine-MRI datasets from nine lung cancer patients, (2) the intensity difference between the original and reconstructed images in a constant keyhole size, and (3) the accuracy of tumor motion and area directly measured by tumor autocontouring. For constant image quality, the dynamic keyhole method, conventional keyhole, and zero-filling methods required 22%, 34%, and 49% of the keyhole size (P < 0.0001), respectively, compared to the full k-space image acquisition method. Compared to the conventional keyhole and zero-filling reconstructed images with the keyhole size utilized in the dynamic keyhole method, an average intensity difference of the dynamic keyhole reconstructed images (P < 0.0001) was minimal, and resulted in the accuracy of tumor motion within 99.6% (P < 0.0001) and the accuracy of tumor area within 98.0% (P < 0.0001) for lung tumor monitoring applications. This study demonstrates that the dynamic keyhole method is a promising technique for clinical applications such as image-guided radiation therapy requiring the MR monitoring of thoracic tumors. Based on the results from this study, the dynamic keyhole method could increase the imaging frequency by up to a factor of five compared with full k-space methods for real-time lung tumor MRI.
NASA Technical Reports Server (NTRS)
Gottschalck, Jon; Meng, Jesse; Rodel, Matt; Houser, paul
2005-01-01
Land surface models (LSMs) are computer programs, similar to weather and climate prediction models, which simulate the stocks and fluxes of water (including soil moisture, snow, evaporation, and runoff) and energy (including the temperature of and sensible heat released from the soil) after they arrive on the land surface as precipitation and sunlight. It is not currently possible to measure all of the variables of interest everywhere on Earth with sufficient accuracy and space-time resolution. Hence LSMs have been developed to integrate the available observations with our understanding of the physical processes involved, using powerful computers, in order to map these stocks and fluxes as they change in time. The maps are used to improve weather forecasts, support water resources and agricultural applications, and study the Earth's water cycle and climate variability. NASA's Global Land Data Assimilation System (GLDAS) project facilitates testing of several different LSMs with a variety of input datasets (e.g., precipitation, plant type). Precipitation is arguably the most important input to LSMs. Many precipitation datasets have been produced using satellite and rain gauge observations and weather forecast models. In this study, seven different global precipitation datasets were evaluated over the United States, where dense rain gauge networks contribute to reliable precipitation maps. We then used the seven datasets as inputs to GLDAS simulations, so that we could diagnose their impacts on output stocks and fluxes of water. In terms of totals, the Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP) had the closest agreement with the US rain gauge dataset for all seasons except winter. The CMAP precipitation was also the most closely correlated in time with the rain gauge data during spring, fall, and winter, while the satellitebased estimates performed best in summer. The GLDAS simulations revealed that modeled soil moisture is highly sensitive to precipitation, with differences in spring and summer as large as 45% depending on the choice of precipitation input.
On the merging of optical and SAR satellite imagery for surface water mapping applications
NASA Astrophysics Data System (ADS)
Markert, Kel N.; Chishtie, Farrukh; Anderson, Eric R.; Saah, David; Griffin, Robert E.
2018-06-01
Optical and Synthetic Aperture Radar (SAR) imagery from satellite platforms provide a means to discretely map surface water; however, the application of the two data sources in tandem has been inhibited by inconsistent data availability, the distinct physical properties that optical and SAR instruments sense, and dissimilar data delivery platforms. In this paper, we describe a preliminary methodology for merging optical and SAR data into a common data space. We apply our approach over a portion of the Mekong Basin, a region with highly variable surface water cover and persistent cloud cover, for surface water applications requiring dense time series analysis. The methods include the derivation of a representative index from both sensors that transforms data from disparate physical units (reflectance and backscatter) to a comparable dimensionless space applying a consistent water extraction approach to both datasets. The merging of optical and SAR data allows for increased observations in cloud prone regions that can be used to gain additional insight into surface water dynamics or flood mapping applications. This preliminary methodology shows promise for a common optical-SAR water extraction; however, data ranges and thresholding values can vary depending on data source, yielding classification errors in the resulting surface water maps. We discuss some potential future approaches to address these inconsistencies.
NASA Astrophysics Data System (ADS)
Prodanovic, M.; Esteva, M.; Hanlon, M.; Nanda, G.; Agarwal, P.
2015-12-01
Recent advances in imaging have provided a wealth of 3D datasets that reveal pore space microstructure (nm to cm length scale) and allow investigation of nonlinear flow and mechanical phenomena from first principles using numerical approaches. This framework has popularly been called "digital rock physics". Researchers, however, have trouble storing and sharing the datasets both due to their size and the lack of standardized image types and associated metadata for volumetric datasets. This impedes scientific cross-validation of the numerical approaches that characterize large scale porous media properties, as well as development of multiscale approaches required for correct upscaling. A single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal, that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Once widely accepter, the repository will jumpstart productivity and enable scientific inquiry and engineering decisions founded on a data-driven basis. This is the first repository of its kind. We show initial results on incorporating essential software tools and pipelines that make it easier for researchers to store and reuse data, and for educators to quickly visualize and illustrate concepts to a wide audience. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.
Spectral gamuts and spectral gamut mapping
NASA Astrophysics Data System (ADS)
Rosen, Mitchell R.; Derhak, Maxim W.
2006-01-01
All imaging devices have two gamuts: the stimulus gamut and the response gamut. The response gamut of a print engine is typically described in CIE colorimetry units, a system derived to quantify human color response. More fundamental than colorimetric gamuts are spectral gamuts, based on radiance, reflectance or transmittance units. Spectral gamuts depend on the physics of light or on how materials interact with light and do not involve the human's photoreceptor integration or brain processing. Methods for visualizing a spectral gamut raise challenges as do considerations of how to utilize such a data-set for producing superior color reproductions. Recent work has described a transformation of spectra reduced to 6-dimensions called LabPQR. LabPQR was designed as a hybrid space with three explicit colorimetric axes and three additional spectral reconstruction axes. In this paper spectral gamuts are discussed making use of LabPQR. Also, spectral gamut mapping is considered in light of the colorimetric-spectral duality of the LabPQR space.
NASA Astrophysics Data System (ADS)
Lu, M.; Hao, X.; Devineni, N.
2017-12-01
Extreme floods have a long history of being an important cause of death and destruction worldwide. It is estimated by Munich RE and Swiss RE that floods and severe storms dominate all other natural hazards globally in terms of average annual property loss and human fatalities. The top 5 most disastrous floods in the period from 1900 to 2015, ranked by economic damage, are all in the Asian monsoon region. This study presents an interdisciplinary approach integrating hydrometeorology, atmospheric science and state-of-the-art space-time statistics and modeling to investigate the association between the space-time characteristics of floods, precipitation and atmospheric moisture transport in a statistical and physical framework, using tropical moisture export dataset and curve clustering algorithm to study the source-to-destination features; explore the teleconnected climate regulations on the moisture formation process at different timescales (PDO, ENSO and MJO), and study the role of the synoptic-to-large atmospheric steering on the moisture transport and convergence.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.
2008-12-01
NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.
2009-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Saturn ring spokes: an overview of their near-infrared spectral properties from Cassini/VIMS data
NASA Astrophysics Data System (ADS)
D'Aversa, E.; Bellucci, G.; Nicholson, P. D.; Brown, R. H.; Altieri, F.; Carrozzo, F. G.
2013-09-01
The B ring of Saturn is known to periodically host weak elongated features called spokes. They have been clearly detected by the Voyagers, by the Hubble Space Telescope and by Cassini instruments ISS and VIMS. These observations were conducted during three different Saturn equinoxes in 1980, 1995, and 2009 respectively, bringing to the current view of the spoke's physical nature: thin clouds of fine electrically-charged grains levitating over the larger ring boulders. In respect to the previous available datasets, the VIMS one has widened our view of spokes outside the visible spectral range for the first time (longward of 1 micron). On the other hand, the VIMS spatial resolution is often comparable with the typical sizes of spokes, and considerable image processing is needed in order to enhance the spoke images and for the spectra extraction. Here we will report about advances in the spoke spectral analysis with VIMS data and will discuss the possible physical interpretations under the assumption of low spoke optical thickness.
Short-Term Forecasts Using NU-WRF for the Winter Olympics 2018
NASA Technical Reports Server (NTRS)
Srikishen, Jayanthi; Case, Jonathan L.; Petersen, Walter A.; Iguchi, Takamichi; Tao, Wei-Kuo; Zavodsky, Bradley T.; Molthan, Andrew
2017-01-01
The NASA Unified-Weather Research and Forecasting model (NU-WRF) will be included for testing and evaluation in the forecast demonstration project (FDP) of the International Collaborative Experiment -PyeongChang 2018 Olympic and Paralympic (ICE-POP) Winter Games. An international array of radar and supporting ground based observations together with various forecast and now-cast models will be operational during ICE-POP. In conjunction with personnel from NASA's Goddard Space Flight Center, the NASA Short-term Prediction Research and Transition (SPoRT) Center is developing benchmark simulations for a real-time NU-WRF configuration to run during the FDP. ICE-POP observational datasets will be used to validate model simulations and investigate improved model physics and performance for prediction of snow events during the research phase (RDP) of the project The NU-WRF model simulations will also support NASA Global Precipitation Measurement (GPM) Mission ground-validation physical and direct validation activities in relation to verifying, testing and improving satellite-based snowfall retrieval algorithms over complex terrain.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sellers, P.J.; Collatz, J.; Koster, R.
1996-09-01
A comprehensive series of global datasets for land-atmosphere models has been collected, formatted to a common grid, and released on a set of CD-ROMs. This paper describes the motivation for and the contents of the dataset. In June of 1992, an interdisciplinary earth science workshop was convened in Columbia, Maryland, to assess progress in land-atmosphere research, specifically in the areas of models, satellite data algorithms, and field experiments. At the workshop, representatives of the land-atmosphere modeling community defined a need for global datasets to prescribe boundary conditions, initialize state variables, and provide near-surface meteorological and radiative forcings for their models.more » The International Satellite Land Surface Climatology Project (ISLSCP), a part of the Global Energy and Water Cycle Experiment, worked with the Distributed Active Archive Center of the National Aeronautics and Space Administration Goddard Space Flight Center to bring the required datasets together in a usable format. The data have since been released on a collection of CD-ROMs. The datasets on the CD-ROMs are grouped under the following headings: vegetation; hydrology and soils; snow, ice, and oceans; radiation and clouds; and near-surface meteorology. All datasets cover the period 1987-88, and all but a few are spatially continuous over the earth`s land surface. All have been mapped to a common 1{degree} x 1{degree} equal-angle grid. The temporal frequency for most of the datasets is monthly. A few of the near-surface meteorological parameters are available both as six-hourly values and as monthly means. 26 refs., 8 figs., 2 tabs.« less
Generation of the 30 M-Mesh Global Digital Surface Model by Alos Prism
NASA Astrophysics Data System (ADS)
Tadono, T.; Nagai, H.; Ishida, H.; Oda, F.; Naito, S.; Minakawa, K.; Iwamoto, H.
2016-06-01
Topographical information is fundamental to many geo-spatial related information and applications on Earth. Remote sensing satellites have the advantage in such fields because they are capable of global observation and repeatedly. Several satellite-based digital elevation datasets were provided to examine global terrains with medium resolutions e.g. the Shuttle Radar Topography Mission (SRTM), the global digital elevation model by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER GDEM). A new global digital surface model (DSM) dataset using the archived data of the Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM) onboard the Advanced Land Observing Satellite (ALOS, nicknamed "Daichi") has been completed on March 2016 by Japan Aerospace Exploration Agency (JAXA) collaborating with NTT DATA Corp. and Remote Sensing Technology Center, Japan. This project is called "ALOS World 3D" (AW3D), and its dataset consists of the global DSM dataset with 0.15 arcsec. pixel spacing (approx. 5 m mesh) and ortho-rectified PRISM image with 2.5 m resolution. JAXA is also processing the global DSM with 1 arcsec. spacing (approx. 30 m mesh) based on the AW3D DSM dataset, and partially releasing it free of charge, which calls "ALOS World 3D 30 m mesh" (AW3D30). The global AW3D30 dataset will be released on May 2016. This paper describes the processing status, a preliminary validation result of the AW3D30 DSM dataset, and its public release status. As a summary of the preliminary validation of AW3D30 DSM, 4.40 m (RMSE) of the height accuracy of the dataset was confirmed using 5,121 independent check points distributed in the world.
The Montage architecture for grid-enabled science processing of large, distributed datasets
NASA Technical Reports Server (NTRS)
Jacob, Joseph C.; Katz, Daniel S .; Prince, Thomas; Berriman, Bruce G.; Good, John C.; Laity, Anastasia C.; Deelman, Ewa; Singh, Gurmeet; Su, Mei-Hui
2004-01-01
Montage is an Earth Science Technology Office (ESTO) Computational Technologies (CT) Round III Grand Challenge investigation to deploy a portable, compute-intensive, custom astronomical image mosaicking service for the National Virtual Observatory (NVO). Although Montage is developing a compute- and data-intensive service for the astronomy community, we are also helping to address a problem that spans both Earth and Space science, namely how to efficiently access and process multi-terabyte, distributed datasets. In both communities, the datasets are massive, and are stored in distributed archives that are, in most cases, remote from the available Computational resources. Therefore, state of the art computational grid technologies are a key element of the Montage portal architecture. This paper describes the aspects of the Montage design that are applicable to both the Earth and Space science communities.
Sampling errors for a nadir viewing instrument on the International Space Station
NASA Astrophysics Data System (ADS)
Berger, H. I.; Pincus, R.; Evans, F.; Santek, D.; Ackerman, S.; Ackerman, S.
2001-12-01
In an effort to improve the observational charactarization of ice clouds in the earth's atmosphere, we are developing a sub-millimeter wavelength radiometer which we propose to fly on the International Space Station for two years. Our goal is to accurately measure the ice water path and mass-weighted particle size at the finest possible temporal and spatial resolution. The ISS orbit precesses, sampling through the dirunal cycle every 16 days, but technological constraints limit our instrument to a single pixel viewed near nadir. We discuss sampling errors associated with this instrument/platform configuration. We use as "truth" the ISCCP dataset of pixel-level cloud optical retrievals, which acts as a proxy for ice water path; this dataset is sampled according to the orbital characteristics of the space station, and the statistics computed from the sub-sampled population are compared with those from the full dataset. We explore the tradeoffs in average sampling error as a function of the averaging time and spatial scale, and explore the possibility of resolving the dirunal cycle.
Canessa, Andrea; Gibaldi, Agostino; Chessa, Manuela; Fato, Marco; Solari, Fabio; Sabatini, Silvio P.
2017-01-01
Binocular stereopsis is the ability of a visual system, belonging to a live being or a machine, to interpret the different visual information deriving from two eyes/cameras for depth perception. From this perspective, the ground-truth information about three-dimensional visual space, which is hardly available, is an ideal tool both for evaluating human performance and for benchmarking machine vision algorithms. In the present work, we implemented a rendering methodology in which the camera pose mimics realistic eye pose for a fixating observer, thus including convergent eye geometry and cyclotorsion. The virtual environment we developed relies on highly accurate 3D virtual models, and its full controllability allows us to obtain the stereoscopic pairs together with the ground-truth depth and camera pose information. We thus created a stereoscopic dataset: GENUA PESTO—GENoa hUman Active fixation database: PEripersonal space STereoscopic images and grOund truth disparity. The dataset aims to provide a unified framework useful for a number of problems relevant to human and computer vision, from scene exploration and eye movement studies to 3D scene reconstruction. PMID:28350382
CheS-Mapper - Chemical Space Mapping and Visualization in 3D.
Gütlein, Martin; Karwath, Andreas; Kramer, Stefan
2012-03-17
Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis.
CheS-Mapper - Chemical Space Mapping and Visualization in 3D
2012-01-01
Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis. PMID:22424447
The global coastline dataset: the observed relation between erosion and sea-level rise
NASA Astrophysics Data System (ADS)
Donchyts, G.; Baart, F.; Luijendijk, A.; Hagenaars, G.
2017-12-01
Erosion of sandy coasts is considered one of the key risks of sea-level rise. Because sandy coastlines of the world are often highly populated, erosive coastline trends result in risk to populations and infrastructure. Most of our understanding of the relation between sea-level rise and coastal erosion is based on local or regional observations and generalizations of numerical and physical experiments. Until recently there was no reliable global scale assessment of the location of sandy coasts and their rate of erosion and accretion. Here we present the global coastline dataset that covers erosion indicators on a local scale with global coverage. The dataset uses our global coastline transects grid defined with an alongshore spacing of 250 m and a cross shore length extending 1 km seaward and 1 km landward. This grid matches up with pre-existing local grids where available. We present the latest results on validation of coastal-erosion trends (based on optical satellites) and classification of sandy versus non-sandy coasts. We show the relation between sea-level rise (based both on tide-gauges and multi-mission satellite altimetry) and observed erosion trends over the last decades, taking into account broken-coastline trends (for example due to nourishments).An interactive web application presents the publicly-accessible results using a backend based on Google Earth Engine. It allows both researchers and stakeholders to use objective estimates of coastline trends, particularly when authoritative sources are not available.
NASA Technical Reports Server (NTRS)
Larson, Jay W.
1998-01-01
Atmospheric data assimilation is a method of combining actual observations with model forecasts to produce a more accurate description of the earth system than the observations or forecast alone can provide. The output of data assimilation, sometimes called the analysis, are regular, gridded datasets of observed and unobserved variables. Analysis plays a key role in numerical weather prediction and is becoming increasingly important for climate research. These applications, and the need for timely validation of scientific enhancements to the data assimilation system pose computational demands that are best met by distributed parallel software. The mission of the NASA Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. The system used to create these datasets is the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The core components of the the GEOS DAS are: the GEOS General Circulation Model (GCM), the Physical-space Statistical Analysis System (PSAS), the Observer, the on-line Quality Control (QC) system, the Coupler (which feeds analysis increments back to the GCM), and an I/O package for processing the large amounts of data the system produces (which will be described in another presentation in this session). The discussion will center on the following issues: the computational complexity for the whole GEOS DAS, assessment of the performance of the individual elements of GEOS DAS, and parallelization strategy for some of the components of the system.
AstroGrid: the UK's Virtual Observatory Initiative
NASA Astrophysics Data System (ADS)
Mann, Robert G.; Astrogrid Consortium; Lawrence, Andy; Davenhall, Clive; Mann, Bob; McMahon, Richard; Irwin, Mike; Walton, Nic; Rixon, Guy; Watson, Mike; Osborne, Julian; Page, Clive; Allan, Peter; Giaretta, David; Perry, Chris; Pike, Dave; Sherman, John; Murtagh, Fionn; Harra, Louise; Bentley, Bob; Mason, Keith; Garrington, Simon
AstroGrid is the UK's Virtual Observatory (VO) initiative. It brings together the principal astronomical data centres in the UK, and has been funded to the tune of ˜pounds 5M over the next three years, via PPARC, as part of the UK e--science programme. Its twin goals are the provision of the infrastructure and tools for the federation and exploitation of large astronomical (X-ray to radio), solar and space plasma physics datasets, and the delivery of federations of current datasets for its user communities to exploit using those tools. Whilst AstroGrid's work will be centred on existing and future (e.g. VISTA) UK datasets, it will seek solutions to generic VO problems and will contribute to the developing international virtual observatory framework: AstroGrid is a member of the EU-funded Astrophysical Virtual Observatory project, has close links to a second EU Grid initiative, the European Grid of Solar Observations (EGSO), and will seek an active role in the development of the common standards on which the international virtual observatory will rely. In this paper we shall primarily describe the concrete plans for AstroGrid's one-year Phase A study, which will centre on: (i) the definition of detailed science requirements through community consultation; (ii) the undertaking of a ``functionality market survey" to test the utility of existing technologies for the VO; and (iii) a pilot programme of database federations, each addressing different aspects of the general database federation problem. Further information on AstroGrid can be found at AstroGrid .
NASA Astrophysics Data System (ADS)
Daliakopoulos, Ioannis; Tsanis, Ioannis
2017-04-01
Mitigating the vulnerability of Mediterranean rangelands against degradation is limited by our ability to understand and accurately characterize those impacts in space and time. The Normalized Difference Vegetation Index (NDVI) is a radiometric measure of the photosynthetically active radiation absorbed by green vegetation canopy chlorophyll and is therefore a good surrogate measure of vegetation dynamics. On the other hand, meteorological indices such as the drought assessing Standardised Precipitation Index (SPI) are can be easily estimated from historical and projected datasets at the global scale. This work investigates the potential of driving Random Forest (RF) models with meteorological indices to approximate NDVI-based vegetation dynamics. A sufficiently large number of RF models are trained using random subsets of the dataset as predictors, in a bootstrapping approach to account for the uncertainty introduced by the subset selection. The updated E-OBS-v13.1 dataset of the ENSEMBLES EU FP6 program provides observed monthly meteorological input to estimate SPI over the Mediterranean rangelands. RF models are trained to depict vegetation dynamics using the latest version (3g.v1) of the third generation GIMMS NDVI generated from NOAA's Advanced Very High Resolution Radiometer (AVHRR) sensors. Analysis is conducted for the period 1981-2015 at a gridded spatial resolution of 25 km. Preliminary results demonstrate the potential of machine learning algorithms to effectively mimic the underlying physical relationship of drought and Earth Observation vegetation indices to provide estimates based on precipitation variability.
NASA Astrophysics Data System (ADS)
Moritz, R. E.; Rigor, I.
2006-12-01
ABSTRACT: The Arctic Buoy Program was initiated in 1978 to measure surface air pressure, surface temperature and sea-ice motion in the Arctic Ocean, on the space and time scales of synoptic weather systems, and to make the data available for research, forecasting and operations. The program, subsequently renamed the International Arctic Buoy Programme (IABP), has endured and expanded over the past 28 years. A hallmark of the IABP is the production, dissemination and archival of research-quality datasets and analyses. These datasets have been used by the authors of over 500 papers on meteorolgy, sea-ice physics, oceanography, air-sea interactions, climate, remote sensing and other topics. Elements of the IABP are described briefly, including measurements, analysis, data dissemination and data archival. Selected highlights of the research applications are reviewed, including ice dynamics, ocean-ice modeling, low-frequency variability of Arctic air-sea-ice circulation, and recent changes in the age, thickness and extent of Arctic Sea-ice. The extended temporal coverage of the data disseminated on the Environmental Working Group CD's is important for interpreting results in the context of climate.
Big Data Analytics for Demand Response: Clustering Over Space and Time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chelmis, Charalampos; Kolte, Jahanvi; Prasanna, Viktor K.
The pervasive deployment of advanced sensing infrastructure in Cyber-Physical systems, such as the Smart Grid, has resulted in an unprecedented data explosion. Such data exhibit both large volumes and high velocity characteristics, two of the three pillars of Big Data, and have a time-series notion as datasets in this context typically consist of successive measurements made over a time interval. Time-series data can be valuable for data mining and analytics tasks such as identifying the “right” customers among a diverse population, to target for Demand Response programs. However, time series are challenging to mine due to their high dimensionality. Inmore » this paper, we motivate this problem using a real application from the smart grid domain. We explore novel representations of time-series data for BigData analytics, and propose a clustering technique for determining natural segmentation of customers and identification of temporal consumption patterns. Our method is generizable to large-scale, real-world scenarios, without making any assumptions about the data. We evaluate our technique using real datasets from smart meters, totaling ~ 18,200,000 data points, and show the efficacy of our technique in efficiency detecting the number of optimal number of clusters.« less
NASA Astrophysics Data System (ADS)
Taylor, Stephen R.; Simon, Joseph; Sampson, Laura
2017-01-01
The final parsec of supermassive black-hole binary evolution is subject to the complex interplay of stellar loss-cone scattering, circumbinary disk accretion, and gravitational-wave emission, with binary eccentricity affected by all of these. The strain spectrum of gravitational-waves in the pulsar-timing band thus encodes rich information about the binary population's response to these various environmental mechanisms. Current spectral models have heretofore followed basic analytic prescriptions, and attempt to investigate these final-parsec mechanisms in an indirect fashion. Here we describe a new technique to directly probe the environmental properties of supermassive black-hole binaries through "Bayesian model-emulation". We perform black-hole binary population synthesis simulations at a restricted set of environmental parameter combinations, compute the strain spectra from these, then train a Gaussian process to learn the shape of the spectrum at any point in parameter space. We describe this technique, demonstrate its efficacy with a program of simulated datasets, then illustrate its power by directly constraining final-parsec physics in a Bayesian analysis of the NANOGrav 5-year dataset. The technique is fast, flexible, and robust.
NASA Astrophysics Data System (ADS)
Taylor, Stephen; Simon, Joseph; Sampson, Laura
2017-01-01
The final parsec of supermassive black-hole binary evolution is subject to the complex interplay of stellar loss-cone scattering, circumbinary disk accretion, and gravitational-wave emission, with binary eccentricity affected by all of these. The strain spectrum of gravitational-waves in the pulsar-timing band thus encodes rich information about the binary population's response to these various environmental mechanisms. Current spectral models have heretofore followed basic analytic prescriptions, and attempt to investigate these final-parsec mechanisms in an indirect fashion. Here we describe a new technique to directly probe the environmental properties of supermassive black-hole binaries through ``Bayesian model-emulation''. We perform black-hole binary population synthesis simulations at a restricted set of environmental parameter combinations, compute the strain spectra from these, then train a Gaussian process to learn the shape of spectrum at any point in parameter space. We describe this technique, demonstrate its efficacy with a program of simulated datasets, then illustrate its power by directly constraining final-parsec physics in a Bayesian analysis of the NANOGrav 5-year dataset. The technique is fast, flexible, and robust.
Predictability of the California Current System
NASA Technical Reports Server (NTRS)
Miller, Arthur J.; Chereskin, T.; Cornuelle, B. D.; Niiler, P. P.; Moisan, J. R.; Lindstrom, Eric (Technical Monitor)
2001-01-01
The physical and biological oceanography of the Southern California Bight (SCB), a highly productive subregion of the California Current System (CCS) that extends from Point Conception, California, south to Ensenada, Mexico, continues to be extensively studied. For example, the California Cooperative Oceanic Fisheries Investigations (CalCOFI) program has sampled this region for over 50 years, providing an unparalleled time series of physical and biological data. However, our understanding of what physical processes control the large-scale and mesoscale variations in these properties is incomplete. In particular, the non-synoptic and relatively coarse spatial sampling (70km) of the hydrographic grid does not completely resolve the mesoscale eddy field (Figure 1a). Moreover, these unresolved physical variations exert a dominant influence on the evolution of the ecosystem. In recent years, additional datasets that partially sample the SCB have become available. Acoustic Doppler Current Profiler (ADCP) measurements, which now sample upper-ocean velocity between stations, and sea level observations along TOPEX tracks give a more complete picture of the mesoscale variability. However, both TOPEX and ADCP are well-sampled only along the cruise or orbit tracks and coarsely sampled in time and between tracks. Surface Lagrangian drifters also sample the region, although irregularly in time and space. SeaWiFS provides estimates of upper-ocean chlorophyll-a (chl-alpha), usually giving nearly complete coverage for week-long intervals, depending on cloud coverage. Historical ocean color data from the Coastal Zone Color Scanner (CZCS) has been used extensively to determine phytoplankton patterns and variability, characterize the primary production across the SCB coastal fronts, and describe the seasonal and interannual variability in pigment concentrations. As in CalCOFI, these studies described much of the observed structures and their variability over relatively large space and time scales.
Islam, Md Rabiul; Tanaka, Toshihisa; Molla, Md Khademul Islam
2018-05-08
When designing multiclass motor imagery-based brain-computer interface (MI-BCI), a so-called tangent space mapping (TSM) method utilizing the geometric structure of covariance matrices is an effective technique. This paper aims to introduce a method using TSM for finding accurate operational frequency bands related brain activities associated with MI tasks. A multichannel electroencephalogram (EEG) signal is decomposed into multiple subbands, and tangent features are then estimated on each subband. A mutual information analysis-based effective algorithm is implemented to select subbands containing features capable of improving motor imagery classification accuracy. Thus obtained features of selected subbands are combined to get feature space. A principal component analysis-based approach is employed to reduce the features dimension and then the classification is accomplished by a support vector machine (SVM). Offline analysis demonstrates the proposed multiband tangent space mapping with subband selection (MTSMS) approach outperforms state-of-the-art methods. It acheives the highest average classification accuracy for all datasets (BCI competition dataset 2a, IIIa, IIIb, and dataset JK-HH1). The increased classification accuracy of MI tasks with the proposed MTSMS approach can yield effective implementation of BCI. The mutual information-based subband selection method is implemented to tune operation frequency bands to represent actual motor imagery tasks.
The DataBridge: A System For Optimizing The Use Of Dark Data From The Long Tail Of Science
NASA Astrophysics Data System (ADS)
Lander, H.; Rajasekar, A.
2015-12-01
The DataBridge is a National Science Foundation funded collaborative project (OCI-1247652, OCI-1247602, OCI-1247663) designed to assist in the discovery of dark data sets from the long tail of science. The DataBridge aims to to build queryable communities of datasets using sociometric network analysis. This approach is being tested to evaluate the ability to leverage various forms of metadata to facilitate discovery of new knowledge. Each dataset in the Databridge has an associated name space used as a first level partitioning. In addition to testing known algorithms for SNA community building, the DataBridge project has built a message-based platform that allows users to provide their own algorithms for each of the stages in the community building process. The stages are: Signature Generation (SG): An SG algorithm creates a metadata signature for a dataset. Signature algorithms might use text metadata provided by the dataset creator or derive metadata. Relevance Algorithm (RA): An RA compares a pair of datasets and produces a similarity value between 0 and 1 for the two datasets. Sociometric Network Analysis (SNA): The SNA will operate on a similarity matrix produced by an RA to partition all of the datasets in the name space into a set of clusters. These clusters represent communities of closely related datasets. The DataBridge also includes a web application that produces a visual representation of the clustering. Future work includes a more complete application that will allow different types of searching of the network of datasets. The DataBridge approach is relevant to geoscience research and informatics. In this presentation we will outline the project, illustrate the deployment of the approach, and discuss other potential applications and next steps for the research such as applying this approach to models. In addition we will explore the relevance of DataBridge to other geoscience projects such as various EarthCube Building Blocks and DIBBS projects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Danny; Pollock, Sean; Keall, Paul, E-mail: paul.keall@sydney.edu.au
2016-05-15
Purpose: The dynamic keyhole is a new MR image reconstruction method for thoracic and abdominal MR imaging. To date, this method has not been investigated with cancer patient magnetic resonance imaging (MRI) data. The goal of this study was to assess the dynamic keyhole method for the task of lung tumor localization using cine-MR images reconstructed in the presence of respiratory motion. Methods: The dynamic keyhole method utilizes a previously acquired a library of peripheral k-space datasets at similar displacement and phase (where phase is simply used to determine whether the breathing is inhale to exhale or exhale to inhale)more » respiratory bins in conjunction with central k-space datasets (keyhole) acquired. External respiratory signals drive the process of sorting, matching, and combining the two k-space streams for each respiratory bin, thereby achieving faster image acquisition without substantial motion artifacts. This study was the first that investigates the impact of k-space undersampling on lung tumor motion and area assessment across clinically available techniques (zero-filling and conventional keyhole). In this study, the dynamic keyhole, conventional keyhole and zero-filling methods were compared to full k-space dataset acquisition by quantifying (1) the keyhole size required for central k-space datasets for constant image quality across sixty four cine-MRI datasets from nine lung cancer patients, (2) the intensity difference between the original and reconstructed images in a constant keyhole size, and (3) the accuracy of tumor motion and area directly measured by tumor autocontouring. Results: For constant image quality, the dynamic keyhole method, conventional keyhole, and zero-filling methods required 22%, 34%, and 49% of the keyhole size (P < 0.0001), respectively, compared to the full k-space image acquisition method. Compared to the conventional keyhole and zero-filling reconstructed images with the keyhole size utilized in the dynamic keyhole method, an average intensity difference of the dynamic keyhole reconstructed images (P < 0.0001) was minimal, and resulted in the accuracy of tumor motion within 99.6% (P < 0.0001) and the accuracy of tumor area within 98.0% (P < 0.0001) for lung tumor monitoring applications. Conclusions: This study demonstrates that the dynamic keyhole method is a promising technique for clinical applications such as image-guided radiation therapy requiring the MR monitoring of thoracic tumors. Based on the results from this study, the dynamic keyhole method could increase the imaging frequency by up to a factor of five compared with full k-space methods for real-time lung tumor MRI.« less
Graham, David; Kholodov, Alexander; Wilson, Cathy; Moon, Ji-Won; Romanovsky, Vladimir; Busey, Bob
2018-02-05
This dataset provides the results of physical, chemical, and thermal characterization of soils at the Teller Road Site, Seward Peninsula, Alaska. Soil pits were dug from 7-14 September 2016 at designated Intensive Stations 2 through 9 at the Teller Road MM 27 Site. This dataset includes field observations and descriptions of soil layers or horizons, field measurements of soil volumetric water content, soil temperature, thermal conductivity, and heat capacity. Laboratory measurements of soil properties include gravimetric water content, bulk density, volumetric water content, and total carbon and nitrogen.
Alexander Kholodov; David Graham; Ji-Won Moon
2018-01-22
This dataset provides the results of physical, chemical, and thermal characterization of soils at the Council Road Site at MM71, Seward Peninsula, Alaska. Soil pits were dug on 11 September 2016 at three sites. This dataset includes field observations and descriptions of soil layers or horizons, field measurements of soil volumetric water content, soil temperature, thermal conductivity, and heat capacity. Laboratory measurements of soil properties include gravimetric water content, bulk density, volumetric water content, total carbon and nitrogen, and elemental composition from X-ray fluorescence for some elements.
Wang, Xueyi
2012-02-08
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Feasibility of approaches combining sensor and source features in brain-computer interface.
Ahn, Minkyu; Hong, Jun Hee; Jun, Sung Chan
2012-02-15
Brain-computer interface (BCI) provides a new channel for communication between brain and computers through brain signals. Cost-effective EEG provides good temporal resolution, but its spatial resolution is poor and sensor information is blurred by inherent noise. To overcome these issues, spatial filtering and feature extraction techniques have been developed. Source imaging, transformation of sensor signals into the source space through source localizer, has gained attention as a new approach for BCI. It has been reported that the source imaging yields some improvement of BCI performance. However, there exists no thorough investigation on how source imaging information overlaps with, and is complementary to, sensor information. Information (visible information) from the source space may overlap as well as be exclusive to information from the sensor space is hypothesized. Therefore, we can extract more information from the sensor and source spaces if our hypothesis is true, thereby contributing to more accurate BCI systems. In this work, features from each space (sensor or source), and two strategies combining sensor and source features are assessed. The information distribution among the sensor, source, and combined spaces is discussed through a Venn diagram for 18 motor imagery datasets. Additional 5 motor imagery datasets from the BCI Competition III site were examined. The results showed that the addition of source information yielded about 3.8% classification improvement for 18 motor imagery datasets and showed an average accuracy of 75.56% for BCI Competition data. Our proposed approach is promising, and improved performance may be possible with better head model. Copyright © 2011 Elsevier B.V. All rights reserved.
Kittel, T.G.F.; Rosenbloom, N.A.; Royle, J. Andrew; Daly, Christopher; Gibson, W.P.; Fisher, H.H.; Thornton, P.; Yates, D.N.; Aulenbach, S.; Kaufman, C.; McKeown, R.; Bachelet, D.; Schimel, D.S.; Neilson, R.; Lenihan, J.; Drapek, R.; Ojima, D.S.; Parton, W.J.; Melillo, J.M.; Kicklighter, D.W.; Tian, H.; McGuire, A.D.; Sykes, M.T.; Smith, B.; Cowling, S.; Hickler, T.; Prentice, I.C.; Running, S.; Hibbard, K.A.; Post, W.M.; King, A.W.; Smith, T.; Rizzo, B.; Woodward, F.I.
2004-01-01
Analysis and simulation of biospheric responses to historical forcing require surface climate data that capture those aspects of climate that control ecological processes, including key spatial gradients and modes of temporal variability. We developed a multivariate, gridded historical climate dataset for the conterminous USA as a common input database for the Vegetation/Ecosystem Modeling and Analysis Project (VEMAP), a biogeochemical and dynamic vegetation model intercomparison. The dataset covers the period 1895-1993 on a 0.5?? latitude/longitude grid. Climate is represented at both monthly and daily timesteps. Variables are: precipitation, mininimum and maximum temperature, total incident solar radiation, daylight-period irradiance, vapor pressure, and daylight-period relative humidity. The dataset was derived from US Historical Climate Network (HCN), cooperative network, and snowpack telemetry (SNOTEL) monthly precipitation and mean minimum and maximum temperature station data. We employed techniques that rely on geostatistical and physical relationships to create the temporally and spatially complete dataset. We developed a local kriging prediction model to infill discontinuous and limited-length station records based on spatial autocorrelation structure of climate anomalies. A spatial interpolation model (PRISM) that accounts for physiographic controls was used to grid the infilled monthly station data. We implemented a stochastic weather generator (modified WGEN) to disaggregate the gridded monthly series to dailies. Radiation and humidity variables were estimated from the dailies using a physically-based empirical surface climate model (MTCLIM3). Derived datasets include a 100 yr model spin-up climate and a historical Palmer Drought Severity Index (PDSI) dataset. The VEMAP dataset exhibits statistically significant trends in temperature, precipitation, solar radiation, vapor pressure, and PDSI for US National Assessment regions. The historical climate and companion datasets are available online at data archive centers. ?? Inter-Research 2004.
Photoresist and stochastic modeling
NASA Astrophysics Data System (ADS)
Hansen, Steven G.
2018-01-01
Analysis of physical modeling results can provide unique insights into extreme ultraviolet stochastic variation, which augment, and sometimes refute, conclusions based on physical intuition and even wafer experiments. Simulations verify the primacy of "imaging critical" counting statistics (photons, electrons, and net acids) and the image/blur-dependent dose sensitivity in describing the local edge or critical dimension variation. But the failure of simple counting when resist thickness is varied highlights a limitation of this exact analytical approach, so a calibratable empirical model offers useful simplicity and convenience. Results presented here show that a wide range of physical simulation results can be well matched by an empirical two-parameter model based on blurred image log-slope (ILS) for lines/spaces and normalized ILS for holes. These results are largely consistent with a wide range of published experimental results; however, there is some disagreement with the recently published dataset of De Bisschop. The present analysis suggests that the origin of this model failure is an unexpected blurred ILS:dose-sensitivity relationship failure in that resist process. It is shown that a photoresist mechanism based on high photodecomposable quencher loading and high quencher diffusivity can give rise to pitch-dependent blur, which may explain the discrepancy.
Parente, Joana; Pereira, Mário G; Tonini, Marj
2016-07-15
The present study focuses on the dependence of the space-time permutation scan statistics (STPSS) (1) on the input database's characteristics and (2) on the use of this methodology to assess changes on the fire regime due to different type of climate and fire management activities. Based on the very strong relationship between weather and the fire incidence in Portugal, the detected clusters will be interpreted in terms of the atmospheric conditions. Apart from being the country most affected by the fires in the European context, Portugal meets all the conditions required to carry out this study, namely: (i) two long and comprehensive official datasets, i.e. the Portuguese Rural Fire Database (PRFD) and the National Mapping Burnt Areas (NMBA), respectively based on ground and satellite measurements; (ii) the two types of climate (Csb in the north and Csa in the south) that characterizes the Mediterranean basin regions most affected by the fires also divide the mainland Portuguese area; and, (iii) the national plan for the defence of forest against fires was approved a decade ago and it is now reasonable to assess its impacts. Results confirmed (1) the influence of the dataset's characteristics on the detected clusters, (2) the existence of two different fire regimes in the country promoted by the different types of climate, (3) the positive impacts of the fire prevention policy decisions and (4) the ability of the STPSS to correctly identify clusters, regarding their number, location, and space-time size in spite of eventual space and/or time splits of the datasets. Finally, the role of the weather on days when clustered fires were active was confirmed for the classes of small, medium and large fires. Copyright © 2016 Elsevier B.V. All rights reserved.
Physical properties of biological entities: an introduction to the ontology of physics for biology.
Cook, Daniel L; Bookstein, Fred L; Gennari, John H
2011-01-01
As biomedical investigators strive to integrate data and analyses across spatiotemporal scales and biomedical domains, they have recognized the benefits of formalizing languages and terminologies via computational ontologies. Although ontologies for biological entities-molecules, cells, organs-are well-established, there are no principled ontologies of physical properties-energies, volumes, flow rates-of those entities. In this paper, we introduce the Ontology of Physics for Biology (OPB), a reference ontology of classical physics designed for annotating biophysical content of growing repositories of biomedical datasets and analytical models. The OPB's semantic framework, traceable to James Clerk Maxwell, encompasses modern theories of system dynamics and thermodynamics, and is implemented as a computational ontology that references available upper ontologies. In this paper we focus on the OPB classes that are designed for annotating physical properties encoded in biomedical datasets and computational models, and we discuss how the OPB framework will facilitate biomedical knowledge integration. © 2011 Cook et al.
NASA Astrophysics Data System (ADS)
Bereau, Tristan; DiStasio, Robert A.; Tkatchenko, Alexandre; von Lilienfeld, O. Anatole
2018-06-01
Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions—electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters—optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically relevant molecules. We further focus on hydrogen-bonded complexes—essential but challenging due to their directional nature—where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML for denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.
Modernizing Earth and Space Science Modeling Workflows in the Big Data Era
NASA Astrophysics Data System (ADS)
Kinter, J. L.; Feigelson, E.; Walker, R. J.; Tino, C.
2017-12-01
Modeling is a major aspect of the Earth and space science research. The development of numerical models of the Earth system, planetary systems or astrophysical systems is essential to linking theory with observations. Optimal use of observations that are quite expensive to obtain and maintain typically requires data assimilation that involves numerical models. In the Earth sciences, models of the physical climate system are typically used for data assimilation, climate projection, and inter-disciplinary research, spanning applications from analysis of multi-sensor data sets to decision-making in climate-sensitive sectors with applications to ecosystems, hazards, and various biogeochemical processes. In space physics, most models are from first principles, require considerable expertise to run and are frequently modified significantly for each case study. The volume and variety of model output data from modeling Earth and space systems are rapidly increasing and have reached a scale where human interaction with data is prohibitively inefficient. A major barrier to progress is that modeling workflows isn't deemed by practitioners to be a design problem. Existing workflows have been created by a slow accretion of software, typically based on undocumented, inflexible scripts haphazardly modified by a succession of scientists and students not trained in modern software engineering methods. As a result, existing modeling workflows suffer from an inability to onboard new datasets into models; an inability to keep pace with accelerating data production rates; and irreproducibility, among other problems. These factors are creating an untenable situation for those conducting and supporting Earth system and space science. Improving modeling workflows requires investments in hardware, software and human resources. This paper describes the critical path issues that must be targeted to accelerate modeling workflows, including script modularization, parallelization, and automation in the near term, and longer term investments in virtualized environments for improved scalability, tolerance for lossy data compression, novel data-centric memory and storage technologies, and tools for peer reviewing, preserving and sharing workflows, as well as fundamental statistical and machine learning algorithms.
Big Data in HEP: A comprehensive use case study
NASA Astrophysics Data System (ADS)
Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter; Jayatilaka, Bo; Kowalkowski, Jim; Pivarski, Jim; Sehrish, Saba; Mantilla Surez, Cristina; Svyatkovskiy, Alexey; Tran, Nhan
2017-10-01
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.
GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.
Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung
2015-07-02
A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.
GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare
Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung
2015-01-01
A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a “data modeler” tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets. PMID:26147731
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lydia Vaughn; Margaret Torn; Rachel Porras
Dataset includes Delta14C measurements made from CO2 that was collected and purified in 2012-2014 from surface soil chambers, soil pore space, and background atmosphere. In addition to 14CO2 data, dataset includes co-located measurements of CO2 and CH4 flux, soil and air temperature, and soil moisture. Measurements and field samples were taken from intensive study site 1 areas A, B, and C, and the site 0 and AB transects, from specified positions in high-centered, flat-centered, and low centered polygons.
Deep neural networks for texture classification-A theoretical analysis.
Basu, Saikat; Mukhopadhyay, Supratik; Karki, Manohar; DiBiano, Robert; Ganguly, Sangram; Nemani, Ramakrishna; Gayaka, Shreekant
2018-01-01
We investigate the use of Deep Neural Networks for the classification of image datasets where texture features are important for generating class-conditional discriminative representations. To this end, we first derive the size of the feature space for some standard textural features extracted from the input dataset and then use the theory of Vapnik-Chervonenkis dimension to show that hand-crafted feature extraction creates low-dimensional representations which help in reducing the overall excess error rate. As a corollary to this analysis, we derive for the first time upper bounds on the VC dimension of Convolutional Neural Network as well as Dropout and Dropconnect networks and the relation between excess error rate of Dropout and Dropconnect networks. The concept of intrinsic dimension is used to validate the intuition that texture-based datasets are inherently higher dimensional as compared to handwritten digits or other object recognition datasets and hence more difficult to be shattered by neural networks. We then derive the mean distance from the centroid to the nearest and farthest sampling points in an n-dimensional manifold and show that the Relative Contrast of the sample data vanishes as dimensionality of the underlying vector space tends to infinity. Copyright © 2017 Elsevier Ltd. All rights reserved.
James, Eric P.; Benjamin, Stanley G.; Marquis, Melinda
2016-10-28
A new gridded dataset for wind and solar resource estimation over the contiguous United States has been derived from hourly updated 1-h forecasts from the National Oceanic and Atmospheric Administration High-Resolution Rapid Refresh (HRRR) 3-km model composited over a three-year period (approximately 22 000 forecast model runs). The unique dataset features hourly data assimilation, and provides physically consistent wind and solar estimates for the renewable energy industry. The wind resource dataset shows strong similarity to that previously provided by a Department of Energy-funded study, and it includes estimates in southern Canada and northern Mexico. The solar resource dataset represents anmore » initial step towards application-specific fields such as global horizontal and direct normal irradiance. This combined dataset will continue to be augmented with new forecast data from the advanced HRRR atmospheric/land-surface model.« less
Theory of impossible worlds: Toward a physics of information.
Buscema, Paolo Massimo; Sacco, Pier Luigi; Della Torre, Francesca; Massini, Giulia; Breda, Marco; Ferilli, Guido
2018-05-01
In this paper, we introduce an innovative approach to the fusion between datasets in terms of attributes and observations, even when they are not related at all. With our technique, starting from datasets representing independent worlds, it is possible to analyze a single global dataset, and transferring each dataset onto the others is always possible. This procedure allows a deeper perspective in the study of a problem, by offering the chance of looking into it from other, independent points of view. Even unrelated datasets create a metaphoric representation of the problem, useful in terms of speed of convergence and predictive results, preserving the fundamental relationships in the data. In order to extract such knowledge, we propose a new learning rule named double backpropagation, by which an auto-encoder concurrently codifies all the different worlds. We test our methodology on different datasets and different issues, to underline the power and flexibility of the Theory of Impossible Worlds.
Theory of impossible worlds: Toward a physics of information
NASA Astrophysics Data System (ADS)
Buscema, Paolo Massimo; Sacco, Pier Luigi; Della Torre, Francesca; Massini, Giulia; Breda, Marco; Ferilli, Guido
2018-05-01
In this paper, we introduce an innovative approach to the fusion between datasets in terms of attributes and observations, even when they are not related at all. With our technique, starting from datasets representing independent worlds, it is possible to analyze a single global dataset, and transferring each dataset onto the others is always possible. This procedure allows a deeper perspective in the study of a problem, by offering the chance of looking into it from other, independent points of view. Even unrelated datasets create a metaphoric representation of the problem, useful in terms of speed of convergence and predictive results, preserving the fundamental relationships in the data. In order to extract such knowledge, we propose a new learning rule named double backpropagation, by which an auto-encoder concurrently codifies all the different worlds. We test our methodology on different datasets and different issues, to underline the power and flexibility of the Theory of Impossible Worlds.
NASA Technical Reports Server (NTRS)
Kaplan, Michael L.; Lin, Yuh-Lang
2004-01-01
During the research project, sounding datasets were generated for the region surrounding 9 major airports, including Dallas, TX, Boston, MA, New York, NY, Chicago, IL, St. Louis, MO, Atlanta, GA, Miami, FL, San Francico, CA, and Los Angeles, CA. The numerical simulation of winter and summer environments during which no instrument flight rule impact was occurring at these 9 terminals was performed using the most contemporary version of the Terminal Area PBL Prediction System (TAPPS) model nested from 36 km to 6 km to 1 km horizontal resolution and very detailed vertical resolution in the planetary boundary layer. The soundings from the 1 km model were archived at 30 minute time intervals for a 24 hour period and the vertical dependent variables as well as derived quantities, i.e., 3-dimensional wind components, temperatures, pressures, mixing ratios, turbulence kinetic energy and eddy dissipation rates were then interpolated to 5 m vertical resolution up to 1000 m elevation above ground level. After partial validation against field experiment datasets for Dallas as well as larger scale and much coarser resolution observations at the other 8 airports, these sounding datasets were sent to NASA for use in the Virtual Air Space and Modeling program. The application of these datasets being to determine representative airport weather environments to diagnose the response of simulated wake vortices to realistic atmospheric environments. These virtual datasets are based on large scale observed atmospheric initial conditions that are dynamically interpolated in space and time. The 1 km nested-grid simulated datasets providing a very coarse and highly smoothed representation of airport environment meteorological conditions. Details concerning the airport surface forcing are virtually absent from these simulated datasets although the observed background atmospheric processes have been compared to the simulated fields and the fields were found to accurately replicate the flows surrounding the airport where coarse verification data were available as well as where airport scale datasets were available.
The Cluster Science Archive: from Time Period to Physics Based Search
NASA Astrophysics Data System (ADS)
Masson, A.; Escoubet, C. P.; Laakso, H. E.; Perry, C. H.
2015-12-01
Since 2000, the Cluster spacecraft relay the most detailed information on how the solar wind affects our geospace in three dimensions. Science output from Cluster is a leap forward in our knowledge of space plasma physics: the science behind space weather. It has been key in improving the modeling of the magnetosphere and understanding its various physical processes. Cluster data have enabled the publication of more than 2000 refereed papers and counting. This substantial scientific return is often attributed to the online availability of the Cluster data archive, now called the Cluster Science Archive (CSA). It is being developed by the ESAC Science Data Center (ESDC) team and maintained alongside other science ESA archives at ESAC (ESA Space Astronomy Center, Madrid, Spain). CSA is a public archive, which contains the entire set of Cluster high-resolution data, and other related products in a standard format and with a complete set of metadata. Since May 2015, it also contains data from the CNSA/ESA Double Star mission (2003-2008), a mission operated in conjunction with Cluster. The total amount of data format now exceeds 100 TB. Accessing CSA requires to be registered to enable user profiles and CSA accounts more than 1,500 users. CSA provides unique tools for visualizing its data including - on-demand particle distribution functions visualization - fast data browsing with more than 15TB of pre-generated plots - inventory plots It also offers command line capabilities (e.g. data access via Matlab or IDL softwares, data streaming). Despite its reliability, users can only request data for a specific time period while scientists often focus on specific regions or data signatures. For these reasons, a data-mining tool is being developed to do just that. It offers an interface to select data based not only on a time period but on various criteria including: key physical parameters, regions of space and spacecraft constellation geometry. The output of this tool is a list of time periods that fits the criteria imposed by the user. Such a list enables to download any bunch of datasets for all these time periods in one go. We propose to present the state of development of this tool and interact with the scientific community to better fit its needs.
a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis
NASA Astrophysics Data System (ADS)
Huang, W.; Li, S.; Xu, S.
2016-06-01
How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the approximate 55% spatiotemporal clusters distributed in different locations can be eventually grouped as the same type of clusters with consideration of semantic aspect.
Chi, Baofang; Tao, Shiheng; Liu, Yanlin
2015-01-01
Sampling the solution space of genome-scale models is generally conducted to determine the feasible region for metabolic flux distribution. Because the region for actual metabolic states resides only in a small fraction of the entire space, it is necessary to shrink the solution space to improve the predictive power of a model. A common strategy is to constrain models by integrating extra datasets such as high-throughput datasets and C13-labeled flux datasets. However, studies refining these approaches by performing a meta-analysis of massive experimental metabolic flux measurements, which are closely linked to cellular phenotypes, are limited. In the present study, experimentally identified metabolic flux data from 96 published reports were systematically reviewed. Several strong associations among metabolic flux phenotypes were observed. These phenotype-phenotype associations at the flux level were quantified and integrated into a Saccharomyces cerevisiae genome-scale model as extra physiological constraints. By sampling the shrunken solution space of the model, the metabolic flux fluctuation level, which is an intrinsic trait of metabolic reactions determined by the network, was estimated and utilized to explore its relationship to gene expression noise. Although no correlation was observed in all enzyme-coding genes, a relationship between metabolic flux fluctuation and expression noise of genes associated with enzyme-dosage sensitive reactions was detected, suggesting that the metabolic network plays a role in shaping gene expression noise. Such correlation was mainly attributed to the genes corresponding to non-essential reactions, rather than essential ones. This was at least partially, due to regulations underlying the flux phenotype-phenotype associations. Altogether, this study proposes a new approach in shrinking the solution space of a genome-scale model, of which sampling provides new insights into gene expression noise.
Modelling the Burstiness of Complex Space Plasmas Using Linear Fractional Stable Motion
NASA Astrophysics Data System (ADS)
Watkins, N. W.; Rosenberg, S. J.; Chapman, S. C.; Sanchez, R.; Credgington, D.
2009-12-01
The Earth's magnetosphere is quite clearly “complex" in the everyday sense of the word. However, in the last 15 to 20 years there has been a growing thread in space physics (e.g. Freeman & Watkins [Science, 2002] , Chapman & Watkins [Space Science Reviews, 2001]) using and developing some of the emerging science of complex systems (e.g. Sornette, 2nd Edition, 2004). A particularly well-studied set of system properties has been derived from those used in the study of critical phenomena, notably correlation functions, power spectra, distributions of bursts above a threshold, and so on (e.g. Watkins [Nonlinear Processes in Geophysics, 2002]). These have revealed behaviours familiar from many other complex systems, such as burstiness, long range dependence, heavy tailed probability distributions and so forth. The results of these studies are typically interpreted within existing paradigms, most notably self-organised criticality. However, just as in other developing areas of complexity science (Sornette, op. cit.; Watkins & Freeman [Science, 2008]), it is increasingly being realised that the diagnostics in use have not been extensively studied outside the context in which they were originally proposed. This means that, for example, it is not well established what the expected distribution of bursts above a fixed threshold will be for time series other than Brownian (or fractional Brownian) motion. We will describe some preliminary investigations (Watkins et al [Physical Review E, 2009]) into the burst distribution problem, using Linear Fractional Stable Motion as a controllable toy model of a process exhibiting both long-range dependence and heavy tails. A by product of the work was a differential equation for LFSM (Watkins et al, op cit), which we also briefly discuss. Current and future work will also focus on the thorny problem of distinguishing turbulence from SOC in natural datasets (Watkins et al; Uritsky et al [Physical Review Letters, 2009]) with limited dynamic range, an area which will also be briefly discussed.
Dynamical Networks Characterization of Space Weather Events
NASA Astrophysics Data System (ADS)
Orr, L.; Chapman, S. C.; Dods, J.; Gjerloev, J. W.
2017-12-01
Space weather can cause disturbances to satellite systems, impacting navigation technology and telecommunications; it can cause power loss and aviation disruption. A central aspect of the earth's magnetospheric response to space weather events are large scale and rapid changes in ionospheric current patterns. Space weather is highly dynamic and there are still many controversies about how the current system evolves in time. The recent SuperMAG initiative, collates ground-based vector magnetic field time series from over 200 magnetometers with 1-minute temporal resolution. In principle this combined dataset is an ideal candidate for quantification using dynamical networks. Network properties and parameters allow us to characterize the time dynamics of the full spatiotemporal pattern of the ionospheric current system. However, applying network methodologies to physical data presents new challenges. We establish whether a given pair of magnetometers are connected in the network by calculating their canonical cross correlation. The magnetometers are connected if their cross correlation exceeds a threshold. In our physical time series this threshold needs to be both station specific, as it varies with (non-linear) individual station sensitivity and location, and able to vary with season, which affects ground conductivity. Additionally, the earth rotates and therefore the ground stations move significantly on the timescales of geomagnetic disturbances. The magnetometers are non-uniformly spatially distributed. We will present new methodology which addresses these problems and in particular achieves dynamic normalization of the physical time series in order to form the network. Correlated disturbances across the magnetometers capture transient currents. Once the dynamical network has been obtained [1][2] from the full magnetometer data set it can be used to directly identify detailed inferred transient ionospheric current patterns and track their dynamics. We will show our first results that use network properties such as cliques and clustering coefficients to map these highly dynamic changes in ionospheric current patterns.[l] Dods et al, J. Geophys. Res 120, doi:10.1002/2015JA02 (2015). [2] Dods et al, J. Geophys. Res. 122, doi:10.1002/2016JA02 (2017).
A Bayesian network approach for modeling local failure in lung cancer
NASA Astrophysics Data System (ADS)
Oh, Jung Hun; Craft, Jeffrey; Lozi, Rawan Al; Vaidya, Manushka; Meng, Yifan; Deasy, Joseph O.; Bradley, Jeffrey D.; El Naqa, Issam
2011-03-01
Locally advanced non-small cell lung cancer (NSCLC) patients suffer from a high local failure rate following radiotherapy. Despite many efforts to develop new dose-volume models for early detection of tumor local failure, there was no reported significant improvement in their application prospectively. Based on recent studies of biomarker proteins' role in hypoxia and inflammation in predicting tumor response to radiotherapy, we hypothesize that combining physical and biological factors with a suitable framework could improve the overall prediction. To test this hypothesis, we propose a graphical Bayesian network framework for predicting local failure in lung cancer. The proposed approach was tested using two different datasets of locally advanced NSCLC patients treated with radiotherapy. The first dataset was collected retrospectively, which comprises clinical and dosimetric variables only. The second dataset was collected prospectively in which in addition to clinical and dosimetric information, blood was drawn from the patients at various time points to extract candidate biomarkers as well. Our preliminary results show that the proposed method can be used as an efficient method to develop predictive models of local failure in these patients and to interpret relationships among the different variables in the models. We also demonstrate the potential use of heterogeneous physical and biological variables to improve the model prediction. With the first dataset, we achieved better performance compared with competing Bayesian-based classifiers. With the second dataset, the combined model had a slightly higher performance compared to individual physical and biological models, with the biological variables making the largest contribution. Our preliminary results highlight the potential of the proposed integrated approach for predicting post-radiotherapy local failure in NSCLC patients.
Legaz-García, María del Carmen; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás
2015-01-01
Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources. Such heterogeneity makes difficult not only the generation of research-oriented dataset but also its exploitation. In recent years, the Open Data paradigm has proposed new ways for making data available in ways that sharing and integration are facilitated. Open Data approaches may pursue the generation of content readable only by humans and by both humans and machines, which are the ones of interest in our work. The Semantic Web provides a natural technological space for data integration and exploitation and offers a range of technologies for generating not only Open Datasets but also Linked Datasets, that is, open datasets linked to other open datasets. According to the Berners-Lee's classification, each open dataset can be given a rating between one and five stars attending to can be given to each dataset. In the last years, we have developed and applied our SWIT tool, which automates the generation of semantic datasets from heterogeneous data sources. SWIT produces four stars datasets, given that fifth one can be obtained by being the dataset linked from external ones. In this paper, we describe how we have applied the tool in two projects related to health care records and orthology data, as well as the major lessons learned from such efforts.
Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space.
Miyao, Tomoyuki; Funatsu, Kimito
2017-08-01
When chemical structures are searched based on descriptor values, or descriptors are interpreted based on values, it is important that corresponding chemical structures actually exist. In order to consider the existence of chemical structures located in a specific region in the chemical space, we propose to search them inside training data domains (TDDs), which are dense areas of a training dataset in the chemical space. We investigated TDDs' features using diverse and local datasets, assuming that GDB11 is the chemical universe. These two analyses showed that considering TDDs gives higher chance of finding chemical structures than a random search-based method, and that novel chemical structures actually exist inside TDDs. In addition to those findings, we tested the hypothesis that chemical structures were distributed on the limited areas of chemical space. This hypothesis was confirmed by the fact that distances among chemical structures in several descriptor spaces were much shorter than those among randomly generated coordinates in the training data range. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Detecting and Quantifying Forest Change: The Potential of Existing C- and X-Band Radar Datasets.
Tanase, Mihai A; Ismail, Ismail; Lowell, Kim; Karyanto, Oka; Santoro, Maurizio
2015-01-01
This paper evaluates the opportunity provided by global interferometric radar datasets for monitoring deforestation, degradation and forest regrowth in tropical and semi-arid environments. The paper describes an easy to implement method for detecting forest spatial changes and estimating their magnitude. The datasets were acquired within space-borne high spatial resolutions radar missions at near-global scales thus being significant for monitoring systems developed under the United Framework Convention on Climate Change (UNFCCC). The approach presented in this paper was tested in two areas located in Indonesia and Australia. Forest change estimation was based on differences between a reference dataset acquired in February 2000 by the Shuttle Radar Topography Mission (SRTM) and TanDEM-X mission (TDM) datasets acquired in 2011 and 2013. The synergy between SRTM and TDM datasets allowed not only identifying changes in forest extent but also estimating their magnitude with respect to the reference through variations in forest height.
The Status of the NASA MEaSUREs Combined ASTER and MODIS Emissivity Over Land (CAMEL) Products
NASA Astrophysics Data System (ADS)
Borbas, E. E.; Feltz, M.; Hulley, G. C.; Knuteson, R. O.; Hook, S. J.
2017-12-01
As part of a NASA MEaSUREs Land Surface Temperature and Emissivity project, the University of Wisconsin, Space Science and Engineering Center and the NASA's Jet Propulsion Laboratory have developed a global monthly mean emissivity Earth System Data Record (ESDR). The CAMEL ESDR was produced by merging two current state-of-the-art emissivity datasets: the UW-Madison MODIS Infrared emissivity dataset (UWIREMIS), and the JPL ASTER Global Emissivity Dataset v4 (GEDv4). The dataset includes monthly global data records of emissivity, uncertainty at 13 hinge points between 3.6-14.3 µm, and Principal Components Analysis (PCA) coefficients at 5 kilometer resolution for years 2003 to 2015. A high spectral resolution algorithm is also provided for HSR applications. The dataset is currently being tested in sounder retrieval algorithm (e.g. CrIS, IASI) and has already been implemented in RTTOV-12 for immediate use in numerical weather modeling and data assimilation. This poster will present the current status of the dataset.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balke, Nina; Kalinin, Sergei V.; Jesse, Stephen
Kelvin probe force microscopy (KPFM) has provided deep insights into the role local electronic, ionic and electrochemical processes play on the global functionality of materials and devices, even down to the atomic scale. Conventional KPFM utilizes heterodyne detection and bias feedback to measure the contact potential difference (CPD) between tip and sample. This measurement paradigm, however, permits only partial recovery of the information encoded in bias- and time-dependent electrostatic interactions between the tip and sample and effectively down-samples the cantilever response to a single measurement of CPD per pixel. This level of detail is insufficient for electroactive materials, devices, ormore » solid-liquid interfaces, where non-linear dielectrics are present or spurious electrostatic events are possible. Here, we simulate and experimentally validate a novel approach for spatially resolved KPFM capable of a full information transfer of the dynamic electric processes occurring between tip and sample. General acquisition mode, or G-Mode, adopts a big data approach utilising high speed detection, compression, and storage of the raw cantilever deflection signal in its entirety at high sampling rates (> 4 MHz), providing a permanent record of the tip trajectory. We develop a range of methodologies for analysing the resultant large multidimensional datasets involving classical, physics-based and information-based approaches. Physics-based analysis of G-Mode KPFM data recovers the parabolic bias dependence of the electrostatic force for each cycle of the excitation voltage, leading to a multidimensional dataset containing spatial and temporal dependence of the CPD and capacitance channels. We use multivariate statistical methods to reduce data volume and separate the complex multidimensional data sets into statistically significant components that can then be mapped onto separate physical mechanisms. Overall, G-Mode KPFM offers a new paradigm to study dynamic electric phenomena in electroactive interfaces as well as offer a promising approach to extend KPFM to solid-liquid interfaces.« less
NASA Astrophysics Data System (ADS)
Oware, E. K.; Moysey, S. M.
2016-12-01
Regularization stabilizes the geophysical imaging problem resulting from sparse and noisy measurements that render solutions unstable and non-unique. Conventional regularization constraints are, however, independent of the physics of the underlying process and often produce smoothed-out tomograms with mass underestimation. Cascaded time-lapse (CTL) is a widely used reconstruction technique for monitoring wherein a tomogram obtained from the background dataset is employed as starting model for the inversion of subsequent time-lapse datasets. In contrast, a proper orthogonal decomposition (POD)-constrained inversion framework enforces physics-based regularization based upon prior understanding of the expected evolution of state variables. The physics-based constraints are represented in the form of POD basis vectors. The basis vectors are constructed from numerically generated training images (TIs) that mimic the desired process. The target can be reconstructed from a small number of selected basis vectors, hence, there is a reduction in the number of inversion parameters compared to the full dimensional space. The inversion involves finding the optimal combination of the selected basis vectors conditioned on the geophysical measurements. We apply the algorithm to 2-D lab-scale saline transport experiments with electrical resistivity (ER) monitoring. We consider two transport scenarios with one and two mass injection points evolving into unimodal and bimodal plume morphologies, respectively. The unimodal plume is consistent with the assumptions underlying the generation of the TIs, whereas bimodality in plume morphology was not conceptualized. We compare difference tomograms retrieved from POD with those obtained from CTL. Qualitative comparisons of the difference tomograms with images of their corresponding dye plumes suggest that POD recovered more compact plumes in contrast to those of CTL. While mass recovery generally deteriorated with increasing number of time-steps, POD outperformed CTL in terms of mass recovery accuracy rates. POD is computationally superior requiring only 2.5 mins to complete each inversion compared to 3 hours for CTL to do the same.
Balke, Nina; Kalinin, Sergei V.; Jesse, Stephen; ...
2016-08-12
Kelvin probe force microscopy (KPFM) has provided deep insights into the role local electronic, ionic and electrochemical processes play on the global functionality of materials and devices, even down to the atomic scale. Conventional KPFM utilizes heterodyne detection and bias feedback to measure the contact potential difference (CPD) between tip and sample. This measurement paradigm, however, permits only partial recovery of the information encoded in bias- and time-dependent electrostatic interactions between the tip and sample and effectively down-samples the cantilever response to a single measurement of CPD per pixel. This level of detail is insufficient for electroactive materials, devices, ormore » solid-liquid interfaces, where non-linear dielectrics are present or spurious electrostatic events are possible. Here, we simulate and experimentally validate a novel approach for spatially resolved KPFM capable of a full information transfer of the dynamic electric processes occurring between tip and sample. General acquisition mode, or G-Mode, adopts a big data approach utilising high speed detection, compression, and storage of the raw cantilever deflection signal in its entirety at high sampling rates (> 4 MHz), providing a permanent record of the tip trajectory. We develop a range of methodologies for analysing the resultant large multidimensional datasets involving classical, physics-based and information-based approaches. Physics-based analysis of G-Mode KPFM data recovers the parabolic bias dependence of the electrostatic force for each cycle of the excitation voltage, leading to a multidimensional dataset containing spatial and temporal dependence of the CPD and capacitance channels. We use multivariate statistical methods to reduce data volume and separate the complex multidimensional data sets into statistically significant components that can then be mapped onto separate physical mechanisms. Overall, G-Mode KPFM offers a new paradigm to study dynamic electric phenomena in electroactive interfaces as well as offer a promising approach to extend KPFM to solid-liquid interfaces.« less
Implementing DOIs for Oceanographic Satellite Data at PO.DAAC
NASA Astrophysics Data System (ADS)
Hausman, J.; Tauer, E.; Chung, N.; Chen, C.; Moroni, D. F.
2013-12-01
The Physical Oceanographic Distributed Active Archive Center (PO.DAAC) is NASA's archive for physical oceanographic satellite data. It distributes over 500 datasets from gravity, ocean wind, sea surface topography, sea ice, ocean currents, salinity, and sea surface temperature satellite missions. A dataset is a collection of granules/files that share the same mission/project, versioning, processing level, spatial, and temporal characteristics. The large number of datasets is partially due to the number of satellite missions, but mostly because a single satellite mission typically has multiple versions or even temporal and spatial resolutions of data. As a result, a user might mistake one dataset for a different dataset from the same satellite mission. Due to the PO.DAAC'S vast variety and volume of data and growing requirements to report dataset usage, it has begun implementing DOIs for the datasets it archives and distributes. However, this was not as simple as registering a name for a DOI and providing a URL. Before implementing DOIs multiple questions needed to be answered. What are the sponsor and end-user expectations regarding DOIs? At what level does a DOI get assigned (dataset, file/granule)? Do all data get a DOI, or only selected data? How do we create a DOI? How do we create landing pages and manage them? What changes need to be made to the data archive, life cycle policy and web portal to accommodate DOIs? What if the data also exists at another archive and a DOI already exists? How is a DOI included if the data were obtained via a subsetting tool? How does a researcher or author provide a unique, definitive reference (standard citation) for a given dataset? This presentation will discuss how these questions were answered through changes in policy, process, and system design. Implementing DOIs is not a trivial undertaking, but as DOIs are rapidly becoming the de facto approach, it is worth the effort. Researchers have historically referenced the source satellite and data center (or archive), but scientific writings do not typically provide enough detail to point to a singular, uniquely identifiable dataset. DOIs provide the means to help researchers be precise in their data citations and provide needed clarity, standardization and permanence.
NASA Astrophysics Data System (ADS)
Vallat, C.; Besse, S.; Barbarisi, I.; Arviset, C.; De Marchi, G.; Barthelemy, M.; Coia, D.; Costa, M.; Docasal, R.; Fraga, D.; Heather, D. J.; Lim, T.; Macfarlane, A.; Martinez, S.; Rios, C.; Vallejo, F.; Said, J.
2017-09-01
The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://psa.esa.int. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA has started to implement a number of significant improvements, mostly driven by the evolution of the PDS standards, and the growing need for better interfaces and advanced applications to support science exploitation.
A Hierarchical Framework for State-Space Matrix Inference and Clustering.
Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz
2016-09-01
In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.
Sousa, Daniel; Small, Christopher
2018-02-14
Planned hyperspectral satellite missions and the decreased revisit time of multispectral imaging offer the potential for data fusion to leverage both the spectral resolution of hyperspectral sensors and the temporal resolution of multispectral constellations. Hyperspectral imagery can also be used to better understand fundamental properties of multispectral data. In this analysis, we use five flight lines from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) archive with coincident Landsat 8 acquisitions over a spectrally diverse region of California to address the following questions: (1) How much of the spectral dimensionality of hyperspectral data is captured in multispectral data?; (2) Is the characteristic pyramidal structure of the multispectral feature space also present in the low order dimensions of the hyperspectral feature space at comparable spatial scales?; (3) How much variability in rock and soil substrate endmembers (EMs) present in hyperspectral data is captured by multispectral sensors? We find nearly identical partitions of variance, low-order feature space topologies, and EM spectra for hyperspectral and multispectral image composites. The resulting feature spaces and EMs are also very similar to those from previous global multispectral analyses, implying that the fundamental structure of the global feature space is present in our relatively small spatial subset of California. Finally, we find that the multispectral dataset well represents the substrate EM variability present in the study area - despite its inability to resolve narrow band absorptions. We observe a tentative but consistent physical relationship between the gradation of substrate reflectance in the feature space and the gradation of sand versus clay content in the soil classification system.
Small, Christopher
2018-01-01
Planned hyperspectral satellite missions and the decreased revisit time of multispectral imaging offer the potential for data fusion to leverage both the spectral resolution of hyperspectral sensors and the temporal resolution of multispectral constellations. Hyperspectral imagery can also be used to better understand fundamental properties of multispectral data. In this analysis, we use five flight lines from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) archive with coincident Landsat 8 acquisitions over a spectrally diverse region of California to address the following questions: (1) How much of the spectral dimensionality of hyperspectral data is captured in multispectral data?; (2) Is the characteristic pyramidal structure of the multispectral feature space also present in the low order dimensions of the hyperspectral feature space at comparable spatial scales?; (3) How much variability in rock and soil substrate endmembers (EMs) present in hyperspectral data is captured by multispectral sensors? We find nearly identical partitions of variance, low-order feature space topologies, and EM spectra for hyperspectral and multispectral image composites. The resulting feature spaces and EMs are also very similar to those from previous global multispectral analyses, implying that the fundamental structure of the global feature space is present in our relatively small spatial subset of California. Finally, we find that the multispectral dataset well represents the substrate EM variability present in the study area – despite its inability to resolve narrow band absorptions. We observe a tentative but consistent physical relationship between the gradation of substrate reflectance in the feature space and the gradation of sand versus clay content in the soil classification system. PMID:29443900
Data-driven Applications for the Sun-Earth System
NASA Astrophysics Data System (ADS)
Kondrashov, D. A.
2016-12-01
Advances in observational and data mining techniques allow extracting information from the large volume of Sun-Earth observational data that can be assimilated into first principles physical models. However, equations governing Sun-Earth phenomena are typically nonlinear, complex, and high-dimensional. The high computational demand of solving the full governing equations over a large range of scales precludes the use of a variety of useful assimilative tools that rely on applied mathematical and statistical techniques for quantifying uncertainty and predictability. Effective use of such tools requires the development of computationally efficient methods to facilitate fusion of data with models. This presentation will provide an overview of various existing as well as newly developed data-driven techniques adopted from atmospheric and oceanic sciences that proved to be useful for space physics applications, such as computationally efficient implementation of Kalman Filter in radiation belts modeling, solar wind gap-filling by Singular Spectrum Analysis, and low-rank procedure for assimilation of low-altitude ionospheric magnetic perturbations into the Lyon-Fedder-Mobarry (LFM) global magnetospheric model. Reduced-order non-Markovian inverse modeling and novel data-adaptive decompositions of Sun-Earth datasets will be also demonstrated.
NASA Astrophysics Data System (ADS)
Leka, K. D.; Barnes, Graham; Wagner, Eric
2018-04-01
A classification infrastructure built upon Discriminant Analysis (DA) has been developed at NorthWest Research Associates for examining the statistical differences between samples of two known populations. Originating to examine the physical differences between flare-quiet and flare-imminent solar active regions, we describe herein some details of the infrastructure including: parametrization of large datasets, schemes for handling "null" and "bad" data in multi-parameter analysis, application of non-parametric multi-dimensional DA, an extension through Bayes' theorem to probabilistic classification, and methods invoked for evaluating classifier success. The classifier infrastructure is applicable to a wide range of scientific questions in solar physics. We demonstrate its application to the question of distinguishing flare-imminent from flare-quiet solar active regions, updating results from the original publications that were based on different data and much smaller sample sizes. Finally, as a demonstration of "Research to Operations" efforts in the space-weather forecasting context, we present the Discriminant Analysis Flare Forecasting System (DAFFS), a near-real-time operationally-running solar flare forecasting tool that was developed from the research-directed infrastructure.
Modal Analysis Using the Singular Value Decomposition and Rational Fraction Polynomials
2017-04-06
information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and...results. The programs are designed for experimental datasets with multiple drive and response points and have proven effective even for systems with... designed for experimental datasets with multiple drive and response points and have proven effective even for systems with numerous closely-spaced
Christopher Daly; Melissa E. Slater; Joshua A. Roberti; Stephanie H. Laseter; Lloyd W. Swift
2017-01-01
A 69-station, densely spaced rain gauge network was maintained over the period 1951â1958 in the Coweeta Hydrologic Laboratory, located in the southern Appalachians in western North Carolina, USA. This unique dataset was used to develop the first digital seasonal and annual precipitation maps for the Coweeta basin, using elevation regression functions and...
Prediction of Solvent Physical Properties using the Hierarchical Clustering Method
Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...
The discovery of structural form
Kemp, Charles; Tenenbaum, Joshua B.
2008-01-01
Algorithms for finding structure in data have become increasingly important both as tools for scientific data analysis and as models of human learning, yet they suffer from a critical limitation. Scientists discover qualitatively new forms of structure in observed data: For instance, Linnaeus recognized the hierarchical organization of biological species, and Mendeleev recognized the periodic structure of the chemical elements. Analogous insights play a pivotal role in cognitive development: Children discover that object category labels can be organized into hierarchies, friendship networks are organized into cliques, and comparative relations (e.g., “bigger than” or “better than”) respect a transitive order. Standard algorithms, however, can only learn structures of a single form that must be specified in advance: For instance, algorithms for hierarchical clustering create tree structures, whereas algorithms for dimensionality-reduction create low-dimensional spaces. Here, we present a computational model that learns structures of many different forms and that discovers which form is best for a given dataset. The model makes probabilistic inferences over a space of graph grammars representing trees, linear orders, multidimensional spaces, rings, dominance hierarchies, cliques, and other forms and successfully discovers the underlying structure of a variety of physical, biological, and social domains. Our approach brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development. PMID:18669663
NASA Astrophysics Data System (ADS)
Pham, Tien-Lam; Nguyen, Nguyen-Duong; Nguyen, Van-Doan; Kino, Hiori; Miyake, Takashi; Dam, Hieu-Chi
2018-05-01
We have developed a descriptor named Orbital Field Matrix (OFM) for representing material structures in datasets of multi-element materials. The descriptor is based on the information regarding atomic valence shell electrons and their coordination. In this work, we develop an extension of OFM called OFM1. We have shown that these descriptors are highly applicable in predicting the physical properties of materials and in providing insights on the materials space by mapping into a low embedded dimensional space. Our experiments with transition metal/lanthanide metal alloys show that the local magnetic moments and formation energies can be accurately reproduced using simple nearest-neighbor regression, thus confirming the relevance of our descriptors. Using kernel ridge regressions, we could accurately reproduce formation energies and local magnetic moments calculated based on first-principles, with mean absolute errors of 0.03 μB and 0.10 eV/atom, respectively. We show that meaningful low-dimensional representations can be extracted from the original descriptor using descriptive learning algorithms. Intuitive prehension on the materials space, qualitative evaluation on the similarities in local structures or crystalline materials, and inference in the designing of new materials by element substitution can be performed effectively based on these low-dimensional representations.
Application of constrained k-means clustering in ground motion simulation validation
NASA Astrophysics Data System (ADS)
Khoshnevis, N.; Taborda, R.
2017-12-01
The validation of ground motion synthetics has received increased attention over the last few years due to the advances in physics-based deterministic and hybrid simulation methods. Unlike for low frequency simulations (f ≤ 0.5 Hz), for which it has become reasonable to expect a good match between synthetics and data, in the case of high-frequency simulations (f ≥ 1 Hz) it is not possible to match results on a wiggle-by-wiggle basis. This is mostly due to the various complexities and uncertainties involved in earthquake ground motion modeling. Therefore, in order to compare synthetics with data we turn to different time series metrics, which are used as a means to characterize how the synthetics match the data on qualitative and statistical sense. In general, these metrics provide GOF scores that measure the level of similarity in the time and frequency domains. It is common for these scores to be scaled from 0 to 10, with 10 representing a perfect match. Although using individual metrics for particular applications is considered more adequate, there is no consensus or a unified method to classify the comparison between a set of synthetic and recorded seismograms when the various metrics offer different scores. We study the relationship among these metrics through a constrained k-means clustering approach. We define 4 hypothetical stations with scores 3, 5, 7, and 9 for all metrics. We put these stations in the category of cannot-link constraints. We generate the dataset through the validation of the results from a deterministic (physics-based) ground motion simulation for a moderate magnitude earthquake in the greater Los Angeles basin using three velocity models. The maximum frequency of the simulation is 4 Hz. The dataset involves over 300 stations and 11 metrics, or features, as they are understood in the clustering process, where the metrics form a multi-dimensional space. We address the high-dimensional feature effects with a subspace-clustering analysis, generate a final labeled dataset of stations, and discuss the within-class statistical characteristics of each metric. Labeling these stations is the first step towards developing a unified metric to evaluate ground motion simulations in an application-independent manner.
A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system
NASA Astrophysics Data System (ADS)
Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley
2016-04-01
An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between observed and simulated streamflows as a result of more realistic sub-daily meteorological forcing.
Extremes and bursts in complex multi-scale plasmas
NASA Astrophysics Data System (ADS)
Watkins, N. W.; Chapman, S. C.; Hnat, B.
2012-04-01
Quantifying the spectrum of sizes and durations of large and/or long-lived fluctuations in complex, multi-scale, space plasmas is a topic of both theoretical and practical importance. The predictions of inherently multi-scale physical theories such as MHD turbulence have given one direct stimulus for its investigation. There are also space weather implications to an improved ability to assess the likelihood of an extreme fluctuation of a given size. Our intuition as scientists tends to be formed on the familiar Gaussian "normal" distribution, which has a very low likelihood of extreme fluctuations. Perhaps surprisingly, there is both theoretical and observational evidence that favours non-Gaussian, heavier-tailed, probability distributions for some space physics datasets. Additionally there is evidence for the existence of long-ranged memory between the values of fluctuations. In this talk I will show how such properties can be captured in a preliminary way by a self-similar, fractal model. I will show how such a fractal model can be used to make predictions for experimental accessible quantities like the size and duration of a buurst (a sequence of values that exceed a given threshold), or the survival probability of a burst [c.f. preliminary results in Watkins et al, PRE, 2009]. In real-world time series scaling behaviour need not be "mild" enough to be captured by a single self-similarity exponent H, but might instead require a "wild" multifractal spectrum of scaling exponents [e.g. Rypdal and Rypdal, JGR, 2011; Moloney and Davidsen, JGR, 2011] to give a complete description. I will discuss preliminary work on extending the burst approach into the multifractal domain [see also Watkins et al, chapter in press for AGU Chapman Conference on Complexity and Extreme Events in the Geosciences, Hyderabad].
NASA Astrophysics Data System (ADS)
Kamide, Y.; Balan, Nanan
2016-12-01
In the history of geomagnetism, geoelectricity and space science including solar terrestrial physics, ground magnetic records have been demonstrated to be a powerful tool for monitoring the levels of overall geomagnetic activity. For example, the Kp and ap indices having perhaps the long-history geomagnetic indices have and are being used as space weather parameters, where "p" stands for "planetary" implying that these indices express average geomagnetic disturbances on the entire Earth in a planetary scale. To quantify the intensity level of geomagnetic storms, however, it is common to rely on the Dst index, which is supposed to show the magnitude of the storm-time ring current. Efforts were also made to inter-calibrate various activity indices. Different indices were proposed to express different aspects of a phenomenon in the near-Earth space. In the early 1980s, several research groups in Japan, Russia, Europe and the US developed the so-called magnetogram-inversion techniques, which were proposed all independently. Subsequent improvements of the magnetogram-inversion algorithms allowed their technology to be applied to a number of different datasets for magnetospheric convection and substorms. In the present review, we demonstrate how important it was to make full use of ground magnetic data covering a large extent in both latitudinal and longitudinal directions. It is now possible to map a number of electrodynamic parameters in the polar ionosphere on an instantaneous basis. By applying these new inverse methods to a number of ground-based geomagnetic observations, it was found that two basic elements in spatial patterns can be viewed as two physical processes for solar wind-magnetosphere energy coupling.
Improving the use of environmental diversity as a surrogate for species representation.
Albuquerque, Fabio; Beier, Paul
2018-01-01
The continuous p-median approach to environmental diversity (ED) is a reliable way to identify sites that efficiently represent species. A recently developed maximum dispersion (maxdisp) approach to ED is computationally simpler, does not require the user to reduce environmental space to two dimensions, and performed better than continuous p-median for datasets of South African animals. We tested whether maxdisp performs as well as continuous p-median for 12 datasets that included plants and other continents, and whether particular types of environmental variables produced consistently better models of ED. We selected 12 species inventories and atlases to span a broad range of taxa (plants, birds, mammals, reptiles, and amphibians), spatial extents, and resolutions. For each dataset, we used continuous p-median ED and maxdisp ED in combination with five sets of environmental variables (five combinations of temperature, precipitation, insolation, NDVI, and topographic variables) to select environmentally diverse sites. We used the species accumulation index (SAI) to evaluate the efficiency of ED in representing species for each approach and set of environmental variables. Maxdisp ED represented species better than continuous p-median ED in five of 12 biodiversity datasets, and about the same for the other seven biodiversity datasets. Efficiency of ED also varied with type of variables used to define environmental space, but no particular combination of variables consistently performed best. We conclude that maxdisp ED performs at least as well as continuous p-median ED, and has the advantage of faster and simpler computation. Surprisingly, using all 38 environmental variables was not consistently better than using subsets of variables, nor did any subset emerge as consistently best or worst; further work is needed to identify the best variables to define environmental space. Results can help ecologists and conservationists select sites for species representation and assist in conservation planning.
Big Data in HEP: A comprehensive use case study
Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter; ...
2017-11-23
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.more » In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. Lastly, we will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.« less
Big Data in HEP: A comprehensive use case study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.more » In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. Lastly, we will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.« less
Quantifying the tibiofemoral joint space using x-ray tomosynthesis.
Kalinosky, Benjamin; Sabol, John M; Piacsek, Kelly; Heckel, Beth; Gilat Schmidt, Taly
2011-12-01
Digital x-ray tomosynthesis (DTS) has the potential to provide 3D information about the knee joint in a load-bearing posture, which may improve diagnosis and monitoring of knee osteoarthritis compared with projection radiography, the current standard of care. Manually quantifying and visualizing the joint space width (JSW) from 3D tomosynthesis datasets may be challenging. This work developed a semiautomated algorithm for quantifying the 3D tibiofemoral JSW from reconstructed DTS images. The algorithm was validated through anthropomorphic phantom experiments and applied to three clinical datasets. A user-selected volume of interest within the reconstructed DTS volume was enhanced with 1D multiscale gradient kernels. The edge-enhanced volumes were divided by polarity into tibial and femoral edge maps and combined across kernel scales. A 2D connected components algorithm was performed to determine candidate tibial and femoral edges. A 2D joint space width map (JSW) was constructed to represent the 3D tibiofemoral joint space. To quantify the algorithm accuracy, an adjustable knee phantom was constructed, and eleven posterior-anterior (PA) and lateral DTS scans were acquired with the medial minimum JSW of the phantom set to 0-5 mm in 0.5 mm increments (VolumeRad™, GE Healthcare, Chalfont St. Giles, United Kingdom). The accuracy of the algorithm was quantified by comparing the minimum JSW in a region of interest in the medial compartment of the JSW map to the measured phantom setting for each trial. In addition, the algorithm was applied to DTS scans of a static knee phantom and the JSW map compared to values estimated from a manually segmented computed tomography (CT) dataset. The algorithm was also applied to three clinical DTS datasets of osteoarthritic patients. The algorithm segmented the JSW and generated a JSW map for all phantom and clinical datasets. For the adjustable phantom, the estimated minimum JSW values were plotted against the measured values for all trials. A linear fit estimated a slope of 0.887 (R² = 0.962) and a mean error across all trials of 0.34 mm for the PA phantom data. The estimated minimum JSW values for the lateral adjustable phantom acquisitions were found to have low correlation to the measured values (R² = 0.377), with a mean error of 2.13 mm. The error in the lateral adjustable-phantom datasets appeared to be caused by artifacts due to unrealistic features in the phantom bones. JSW maps generated by DTS and CT varied by a mean of 0.6 mm and 0.8 mm across the knee joint, for PA and lateral scans. The tibial and femoral edges were successfully segmented and JSW maps determined for PA and lateral clinical DTS datasets. A semiautomated method is presented for quantifying the 3D joint space in a 2D JSW map using tomosynthesis images. The proposed algorithm quantified the JSW across the knee joint to sub-millimeter accuracy for PA tomosynthesis acquisitions. Overall, the results suggest that x-ray tomosynthesis may be beneficial for diagnosing and monitoring disease progression or treatment of osteoarthritis by providing quantitative images of JSW in the load-bearing knee.
Hayat, Maqsood; Tahir, Muhammad
2015-08-01
Membrane protein is a central component of the cell that manages intra and extracellular processes. Membrane proteins execute a diversity of functions that are vital for the survival of organisms. The topology of transmembrane proteins describes the number of transmembrane (TM) helix segments and its orientation. However, owing to the lack of its recognized structures, the identification of TM helix and its topology through experimental methods is laborious with low throughput. In order to identify TM helix segments reliably, accurately, and effectively from topogenic sequences, we propose the PSOFuzzySVM-TMH model. In this model, evolutionary based information position specific scoring matrix and discrete based information 6-letter exchange group are used to formulate transmembrane protein sequences. The noisy and extraneous attributes are eradicated using an optimization selection technique, particle swarm optimization, from both feature spaces. Finally, the selected feature spaces are combined in order to form ensemble feature space. Fuzzy-support vector Machine is utilized as a classification algorithm. Two benchmark datasets, including low and high resolution datasets, are used. At various levels, the performance of the PSOFuzzySVM-TMH model is assessed through 10-fold cross validation test. The empirical results reveal that the proposed framework PSOFuzzySVM-TMH outperforms in terms of classification performance in the examined datasets. It is ascertained that the proposed model might be a useful and high throughput tool for academia and research community for further structure and functional studies on transmembrane proteins.
Steerable Principal Components for Space-Frequency Localized Images*
Landa, Boris; Shkolnisky, Yoel
2017-01-01
As modern scientific image datasets typically consist of a large number of images of high resolution, devising methods for their accurate and efficient processing is a central research task. In this paper, we consider the problem of obtaining the steerable principal components of a dataset, a procedure termed “steerable PCA” (steerable principal component analysis). The output of the procedure is the set of orthonormal basis functions which best approximate the images in the dataset and all of their planar rotations. To derive such basis functions, we first expand the images in an appropriate basis, for which the steerable PCA reduces to the eigen-decomposition of a block-diagonal matrix. If we assume that the images are well localized in space and frequency, then such an appropriate basis is the prolate spheroidal wave functions (PSWFs). We derive a fast method for computing the PSWFs expansion coefficients from the images' equally spaced samples, via a specialized quadrature integration scheme, and show that the number of required quadrature nodes is similar to the number of pixels in each image. We then establish that our PSWF-based steerable PCA is both faster and more accurate then existing methods, and more importantly, provides us with rigorous error bounds on the entire procedure. PMID:29081879
Ferrari, Ulisse
2016-08-01
Maximum entropy models provide the least constrained probability distributions that reproduce statistical properties of experimental datasets. In this work we characterize the learning dynamics that maximizes the log-likelihood in the case of large but finite datasets. We first show how the steepest descent dynamics is not optimal as it is slowed down by the inhomogeneous curvature of the model parameters' space. We then provide a way for rectifying this space which relies only on dataset properties and does not require large computational efforts. We conclude by solving the long-time limit of the parameters' dynamics including the randomness generated by the systematic use of Gibbs sampling. In this stochastic framework, rather than converging to a fixed point, the dynamics reaches a stationary distribution, which for the rectified dynamics reproduces the posterior distribution of the parameters. We sum up all these insights in a "rectified" data-driven algorithm that is fast and by sampling from the parameters' posterior avoids both under- and overfitting along all the directions of the parameters' space. Through the learning of pairwise Ising models from the recording of a large population of retina neurons, we show how our algorithm outperforms the steepest descent method.
Evaluating soil moisture retrievals from ESA's SMOS and NASA's SMAP brightness temperature datasets
USDA-ARS?s Scientific Manuscript database
Two satellites are currently monitoring surface soil moisture (SM) from L-band observations: SMOS (Soil Moisture and Ocean Salinity), a European Space Agency (ESA) satellite that was launched on November 2, 2009 and SMAP (Soil Moisture Active Passive), a National Aeronautics and Space Administration...
NASA Astrophysics Data System (ADS)
Besse, S.; Vallat, C.; Geiger, B.; Grieger, B.; Costa, M.; Barbarisi, I.
2017-06-01
The Planetary Science Archive (PSA) is the European Space Agency’s (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://psa.esa.int.
NASA Astrophysics Data System (ADS)
Bellugi, D. G.; Tennant, C.; Larsen, L.
2016-12-01
Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.
Range and Panoramic Image Fusion Into a Textured Range Image for Culture Heritage Documentation
NASA Astrophysics Data System (ADS)
Bila, Z.; Reznicek, J.; Pavelka, K.
2013-07-01
This paper deals with a fusion of range and panoramic images, where the range image is acquired by a 3D laser scanner and the panoramic image is acquired with a digital still camera mounted on a panoramic head and tripod. The fused resulting dataset, called "textured range image", provides more reliable information about the investigated object for conservators and historians, than using both datasets separately. A simple example of fusion of a range and panoramic images, both obtained in St. Francis Xavier Church in town Opařany, is given here. Firstly, we describe the process of data acquisition, then the processing of both datasets into a proper format for following fusion and the process of fusion. The process of fusion can be divided into a two main parts: transformation and remapping. In the first, transformation, part, both images are related by matching similar features detected on both images with a proper detector, which results in transformation matrix enabling transformation of the range image onto a panoramic image. Then, the range data are remapped from the range image space into a panoramic image space and stored as an additional "range" channel. The process of image fusion is validated by comparing similar features extracted on both datasets.
Reagan, Matthew T.; Moridis, George J.; Seim, Katie S.
2017-03-27
A recent Department of Energy field test on the Alaska North Slope has increased interest in the ability to simulate systems of mixed CO 2-CH 4 hydrates. However, the physically realistic simulation of mixed-hydrate simulation is not yet a fully solved problem. Limited quantitative laboratory data leads to the use of various ab initio, statistical mechanical, or other mathematic representations of mixed-hydrate phase behavior. Few of these methods are suitable for inclusion in reservoir simulations, particularly for systems with large number of grid elements, 3D systems, or systems with complex geometric configurations. In this paper, we present a set ofmore » fast parametric relationships describing the thermodynamic properties and phase behavior of a mixed methane-carbon dioxide hydrate system. We use well-known, off-the-shelf hydrate physical properties packages to generate a sufficiently large dataset, select the most convenient and efficient mathematical forms, and fit the data to those forms to create a physical properties package suitable for inclusion in the TOUGH+ family of codes. Finally, the mapping of the phase and thermodynamic space reveals the complexity of the mixed-hydrate system and allows understanding of the thermodynamics at a level beyond what much of the existing laboratory data and literature currently offer.« less
NASA Astrophysics Data System (ADS)
Reagan, Matthew T.; Moridis, George J.; Seim, Katie S.
2017-06-01
A recent Department of Energy field test on the Alaska North Slope has increased interest in the ability to simulate systems of mixed CO2-CH4 hydrates. However, the physically realistic simulation of mixed-hydrate simulation is not yet a fully solved problem. Limited quantitative laboratory data leads to the use of various ab initio, statistical mechanical, or other mathematic representations of mixed-hydrate phase behavior. Few of these methods are suitable for inclusion in reservoir simulations, particularly for systems with large number of grid elements, 3D systems, or systems with complex geometric configurations. In this work, we present a set of fast parametric relationships describing the thermodynamic properties and phase behavior of a mixed methane-carbon dioxide hydrate system. We use well-known, off-the-shelf hydrate physical properties packages to generate a sufficiently large dataset, select the most convenient and efficient mathematical forms, and fit the data to those forms to create a physical properties package suitable for inclusion in the TOUGH+ family of codes. The mapping of the phase and thermodynamic space reveals the complexity of the mixed-hydrate system and allows understanding of the thermodynamics at a level beyond what much of the existing laboratory data and literature currently offer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reagan, Matthew T.; Moridis, George J.; Seim, Katie S.
A recent Department of Energy field test on the Alaska North Slope has increased interest in the ability to simulate systems of mixed CO 2-CH 4 hydrates. However, the physically realistic simulation of mixed-hydrate simulation is not yet a fully solved problem. Limited quantitative laboratory data leads to the use of various ab initio, statistical mechanical, or other mathematic representations of mixed-hydrate phase behavior. Few of these methods are suitable for inclusion in reservoir simulations, particularly for systems with large number of grid elements, 3D systems, or systems with complex geometric configurations. In this paper, we present a set ofmore » fast parametric relationships describing the thermodynamic properties and phase behavior of a mixed methane-carbon dioxide hydrate system. We use well-known, off-the-shelf hydrate physical properties packages to generate a sufficiently large dataset, select the most convenient and efficient mathematical forms, and fit the data to those forms to create a physical properties package suitable for inclusion in the TOUGH+ family of codes. Finally, the mapping of the phase and thermodynamic space reveals the complexity of the mixed-hydrate system and allows understanding of the thermodynamics at a level beyond what much of the existing laboratory data and literature currently offer.« less
NASA Astrophysics Data System (ADS)
Vionnet, Vincent; Six, Delphine; Auger, Ludovic; Lafaysse, Matthieu; Quéno, Louis; Réveillet, Marion; Dombrowski-Etchevers, Ingrid; Thibert, Emmanuel; Dumont, Marie
2017-04-01
Capturing spatial and temporal variabilities of meteorological conditions at fine scale is necessary for modelling snowpack and glacier winter mass balance in alpine terrain. In particular, precipitation amount and phase are strongly influenced by the complex topography. In this study, we assess the impact of three sub-kilometer precipitation datasets (rainfall and snowfall) on distributed simulations of snowpack and glacier winter mass balance with the detailed snowpack model Crocus for winter 2011-2012. The different precipitation datasets at 500-m grid spacing over part of the French Alps (200*200 km2 area) are coming either from (i) the SAFRAN precipitation analysis specially developed for alpine terrain, or from (ii) operational outputs of the atmospheric model AROME at 2.5-km grid spacing downscaled to 500 m with fixed lapse rate or from (iii) a version of the atmospheric model AROME at 500-m grid spacing. Others atmospherics forcings (air temperature and humidity, incoming longwave and shortwave radiation, wind speed) are taken from the AROME simulations at 500-m grid spacing. These atmospheric forcings are firstly compared against a network of automatic weather stations. Results are analysed with respect to station location (valley, mid- and high-altitude). The spatial pattern of seasonal snowfall and its dependency with elevation is then analysed for the different precipitation datasets. Large differences between SAFRAN and the two versions of AROME are found at high-altitude. Finally, results of Crocus snowpack simulations are evaluated against (i) punctual in-situ measurements of snow depth and snow water equivalent, and (ii) maps of snow covered areas retrieved from optical satellite data (MODIS). Measurements of winter accumulation of six glaciers of the French Alps are also used and provide very valuable information on precipitation at high-altitude where the conventional observation network is scarce. This study illustrates the potential and limitations of high-resolution atmospheric models to drive simulations of snowpack and glacier winter mass balance in alpine terrain.
NASA Technical Reports Server (NTRS)
Jagge, Amy
2016-01-01
With ever changing landscapes and environmental conditions due to human induced climate change, adaptability is imperative for the long-term success of facilities and Federal agency missions. To mitigate the effects of climate change, indicators such as above-ground biomass change must be identified to establish a comprehensive monitoring effort. Researching the varying effects of climate change on ecosystems can provide a scientific framework that will help produce informative, strategic and tactical policies for environmental adaptation. As a proactive approach to climate change mitigation, NASA tasked the Climate Change Adaptation Science Investigators Workgroup (CASI) to provide climate change expertise and data to Center facility managers and planners in order to ensure sustainability based on predictive models and current research. Generation of historical datasets that will be used in an agency-wide effort to establish strategies for climate change mitigation and adaptation at NASA facilities is part of the CASI strategy. Using time series of historical remotely sensed data is well-established means of measuring change over time. CASI investigators have acquired multispectral and hyperspectral optical and LiDAR remotely sensed datasets from NASA Earth Observation Satellites (including the International Space Station), airborne sensors, and astronaut photography using hand held digital cameras to create a historical dataset for the Johnson Space Center, as well as the Houston and Galveston area. The raster imagery within each dataset has been georectified, and the multispectral and hyperspectral imagery has been atmospherically corrected. Using ArcGIS for Server, the CASI-Regional Remote Sensing data has been published as an image service, and can be visualized through a basic web mapping application. Future work will include a customized web mapping application created using a JavaScript Application Programming Interface (API), and inclusion of the CASI data for the NASA Johnson Space Center into a NASA-Wide GIS Institutional Portal.
Golla, Gowtham Kumar; Carlson, Jordan A; Huan, Jun; Kerr, Jacqueline; Mitchell, Tarrah; Borner, Kelsey
2016-10-01
Sedentary behavior of youth is an important determinant of health. However, better measures are needed to improve understanding of this relationship and the mechanisms at play, as well as to evaluate health promotion interventions. Wearable accelerometers are considered as the standard for assessing physical activity in research, but do not perform well for assessing posture (i.e., sitting vs. standing), a critical component of sedentary behavior. The machine learning algorithms that we propose for assessing sedentary behavior will allow us to re-examine existing accelerometer data to better understand the association between sedentary time and health in various populations. We collected two datasets, a laboratory-controlled dataset and a free-living dataset. We trained machine learning classifiers separately on each dataset and compared performance across datasets. The classifiers predict five postures: sit, stand, sit-stand, stand-sit, and stand\\walk. We compared a manually constructed Hidden Markov model (HMM) with an automated HMM from existing software. The manually constructed HMM gave more F1-Macro score on both datasets.
Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.
Rossi, Jean-Pierre; Nardin, Maxime; Godefroid, Martin; Ruiz-Diaz, Manuela; Sergent, Anne-Sophie; Martinez-Meier, Alejandro; Pâques, Luc; Rozenberg, Philippe
2014-01-01
Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA), a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller) during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i) the temporal evolution of spatial structures and ii) the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.
NASA GeneLab Project: Bridging Space Radiation Omics with Ground Studies.
Beheshti, Afshin; Miller, Jack; Kidane, Yared; Berrios, Daniel; Gebre, Samrawit G; Costes, Sylvain V
2018-06-01
Accurate assessment of risks of long-term space missions is critical for human space exploration. It is essential to have a detailed understanding of the biological effects on humans living and working in deep space. Ionizing radiation from galactic cosmic rays (GCR) is a major health risk factor for astronauts on extended missions outside the protective effects of the Earth's magnetic field. Currently, there are gaps in our knowledge of the health risks associated with chronic low-dose, low-dose-rate ionizing radiation, specifically ions associated with high (H) atomic number (Z) and energy (E). The NASA GeneLab project ( https://genelab.nasa.gov/ ) aims to provide a detailed library of omics datasets associated with biological samples exposed to HZE. The GeneLab Data System (GLDS) includes datasets from both spaceflight and ground-based studies, a majority of which involve exposure to ionizing radiation. In addition to detailed information on radiation exposure for ground-based studies, GeneLab is adding detailed, curated dosimetry information for spaceflight experiments. GeneLab is the first comprehensive omics database for space-related research from which an investigator can generate hypotheses to direct future experiments, utilizing both ground and space biological radiation data. The GLDS is continually expanding as omics-related data are generated by the space life sciences community. Here we provide a brief summary of the space radiation-related data available at GeneLab.
A Spatial-Temporal Comparison of Lake Mendota CO2 Fluxes and Collection Methods
NASA Astrophysics Data System (ADS)
Baldocchi, A. K.; Reed, D. E.; Desai, A. R.; Loken, L. C.; Schramm, P.; Stanley, E. H.
2017-12-01
Monitoring of carbon fluxes at the lake/atmosphere interface can help us determine baselines from which to understand responses in both space and time that may result from our warming climate or increasing nutrient inputs. Since recent research has shown lakes to be hotspots of global carbon cycling, it is important to quantify carbon sink and source dynamics as well as to verify observations between multiple methods in the context of long-term data collection efforts. Here we evaluate a new method for measuring space and time variation in CO2 fluxes based on novel speedboat-based collection method of aquatic greenhouse gas concentrations and a flux computation and interpolation algorithm. Two-hundred and forty-nine consecutive days of spatial flux maps over the 2016 open ice period were compared to ongoing eddy covariance tower flux measurements on the shore of Lake Mendota, Wisconsin US using a flux footprint analysis. Spatial and temporal alignments of the fluxes from these two observational datasets revealed both similar trends from daily to seasonal timescales as well as biases between methods. For example, throughout the Spring carbon fluxes showed strong correlation although off by an order of magnitude. Isolating physical patterns of agreement between the two methods of the lake/atmosphere CO2 fluxes allows us to pinpoint where biology and physical drivers contribute to the global carbon cycle and help improve modelling of lakes and utilize lakes as leading indicators of climate change.
Non-gaussianity versus nonlinearity of cosmological perturbations.
Verde, L
2001-06-01
Following the discovery of the cosmic microwave background, the hot big-bang model has become the standard cosmological model. In this theory, small primordial fluctuations are subsequently amplified by gravity to form the large-scale structure seen today. Different theories for unified models of particle physics, lead to different predictions for the statistical properties of the primordial fluctuations, that can be divided in two classes: gaussian and non-gaussian. Convincing evidence against or for gaussian initial conditions would rule out many scenarios and point us toward a physical theory for the origin of structures. The statistical distribution of cosmological perturbations, as we observe them, can deviate from the gaussian distribution in several different ways. Even if perturbations start off gaussian, nonlinear gravitational evolution can introduce non-gaussian features. Additionally, our knowledge of the Universe comes principally from the study of luminous material such as galaxies, but galaxies might not be faithful tracers of the underlying mass distribution. The relationship between fluctuations in the mass and in the galaxies distribution (bias), is often assumed to be local, but could well be nonlinear. Moreover, galaxy catalogues use the redshift as third spatial coordinate: the resulting redshift-space map of the galaxy distribution is nonlinearly distorted by peculiar velocities. Nonlinear gravitational evolution, biasing, and redshift-space distortion introduce non-gaussianity, even in an initially gaussian fluctuation field. I investigate the statistical tools that allow us, in principle, to disentangle the above different effects, and the observational datasets we require to do so in practice.
A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition
Saez, Yago; Baldominos, Alejandro; Isasi, Pedro
2016-01-01
Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called “deep learning”, which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google’s TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset. This finding enables the design of more energy-efficient devices and facilitates cold starts and big data processing of physical activity records. PMID:28042838
A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition.
Saez, Yago; Baldominos, Alejandro; Isasi, Pedro
2016-12-30
Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called "deep learning", which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google's TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset. This finding enables the design of more energy-efficient devices and facilitates cold starts and big data processing of physical activity records.
Space physics and policy for contemporary society
NASA Astrophysics Data System (ADS)
Cassak, P. A.; Emslie, A. G.; Halford, A. J.; Baker, D. N.; Spence, H. E.; Avery, S. K.; Fisk, L. A.
2017-04-01
Space physics is the study of Earth's home in space. Elements of space physics include how the Sun works from its interior to its atmosphere, the environment between the Sun and planets out to the interstellar medium, and the physics of the magnetic barriers surrounding Earth and other planets. Space physics is highly relevant to society. Space weather, with its goal of predicting how Earth's technological infrastructure responds to activity on the Sun, is an oft-cited example, but there are many more. Space physics has important impacts in formulating public policy.
Global evaluation of ammonia bidirectional exchange and livestock diurnal variation schemes
There is no EPA generated dataset in this study.This dataset is associated with the following publication:Zhu, L., D. Henze, J. Bash , G. Jeong, K. Cady-Pereira, M. Shephard, M. Luo, F. Poulot, and S. Capps. Global evaluation of ammonia bidirectional exchange and livestock diurnal variation schemes. Atmospheric Chemistry and Physics. Copernicus Publications, Katlenburg-Lindau, GERMANY, 15: 12823-12843, (2015).
Self-reported physical activity among blacks: estimates from national surveys.
Whitt-Glover, Melicia C; Taylor, Wendell C; Heath, Gregory W; Macera, Caroline A
2007-11-01
National surveillance data provide population-level estimates of physical activity participation, but generally do not include detailed subgroup analyses, which could provide a better understanding of physical activity among subgroups. This paper presents a descriptive analysis of self-reported regular physical activity among black adults using data from the 2003 Behavioral Risk Factor Surveillance System (n=19,189), the 2004 National Health Interview Survey (n=4263), and the 1999-2004 National Health and Nutrition Examination Survey (n=3407). Analyses were conducted between January and March 2006. Datasets were analyzed separately to estimate the proportion of black adults meeting national physical activity recommendations overall and stratified by gender and other demographic subgroups. The proportion of black adults reporting regular PA ranged from 24% to 36%. Regular physical activity was highest among men; younger age groups; highest education and income groups; those who were employed and married; overweight, but not obese, men; and normal-weight women. This pattern was consistent across surveys. The observed physical activity patterns were consistent with national trends. The data suggest that older black adults and those with low education and income levels are at greatest risk for inactive lifestyles and may require additional attention in efforts to increase physical activity in black adults. The variability across datasets reinforces the need for objective measures in national surveys.
Natural texture retrieval based on perceptual similarity measurement
NASA Astrophysics Data System (ADS)
Gao, Ying; Dong, Junyu; Lou, Jianwen; Qi, Lin; Liu, Jun
2018-04-01
A typical texture retrieval system performs feature comparison and might not be able to make human-like judgments of image similarity. Meanwhile, it is commonly known that perceptual texture similarity is difficult to be described by traditional image features. In this paper, we propose a new texture retrieval scheme based on texture perceptual similarity. The key of the proposed scheme is that prediction of perceptual similarity is performed by learning a non-linear mapping from image features space to perceptual texture space by using Random Forest. We test the method on natural texture dataset and apply it on a new wallpapers dataset. Experimental results demonstrate that the proposed texture retrieval scheme with perceptual similarity improves the retrieval performance over traditional image features.
NASA Technical Reports Server (NTRS)
Zhang, Taiping; Stackhouse, Paul W., Jr.; Chandler, William S.; Westberg, David J.
2014-01-01
The DIRINDEX model was designed to estimate hourly solar beam irradiances from hourly global horizontal irradiances. This model was applied to the NASA GEWEX SRB(Rel. 3.0) 3-hourly global horizontal irradiance data to derive3-hourly global maps of beam, or direct normal, irradiance for the period from January 2000 to December 2005 at the 1 deg. x 1 deg. resolution. The DIRINDEX model is a combination of the DIRINT model, a quasi-physical global-to-beam irradiance model based on regression of hourly observed data, and a broadband simplified version of the SOLIS clear-sky beam irradiance model. In this study, the input variables of the DIRINDEX model are 3-hourly global horizontal irradiance, solar zenith angle, dew-point temperature, surface elevation, surface pressure, sea-level pressure, aerosol optical depth at 700 nm, and column water vapor. The resulting values of the 3-hourly direct normal irradiance are then used to compute daily and monthly means. The results are validated against the ground-based BSRN data. The monthly means show better agreement with the BSRN data than the results from an earlier endeavor which empirically derived the monthly mean direct normal irradiance from the GEWEX SRB monthly mean global horizontal irradiance. To assimilate the observed information into the final results, the direct normal fluxes from the DIRINDEX model are adjusted according to the comparison statistics in the latitude-longitude-cosine of solar zenith angle phase space, in which the inverse-distance interpolation is used for the adjustment. Since the NASA Surface meteorology and Solar Energy derives its data from the GEWEX SRB datasets, the results discussed herein will serve to extend the former.
Understanding The Time Evolution Of Luminosity And Associated Accretion Structures In X-Ray Pulsars
NASA Astrophysics Data System (ADS)
Laycock, Silas
We propose to analyze the large archive of RXTE, XMM-Newton and Chandra observations of X-ray Binary Pulsars in the Magellanic Clouds and Milky Way. There are some 2000 individual RXTE PCA pointings on the SMC spanning 15 years, and a smaller number on the LMC. Each PCA observation covers a large fraction of the whole SMC (or LMC) population, and we are able to deconvolve the sometimes simultaneous signals to create an unrivaled record of pulsar temporal behavior. More than 200 XMM- Newton and Chandra observations of the SMC/LMC and individual Galactic pulsars provide information at lower luminosity levels. Together, these datasets cover the entire range of variability timescales and accretion regimes in High Mass X-ray Binaries. We will produce a comprehensive library of energy- resolved pulse profiles covering the entire luminosity and spin-period parameter space, and make this available to the community. We will then model these pulse profiles using state of the art techniques to parameterize the morphology, and publish the resulting data-cube. This result will include for example the distribution of offsets between magnetic and spin axes. These products are needed for the next generation of advances in neutron star theory and modeling. The unique dataset will also enable us to determine the upper and lower limits of accretion powered luminosity in a large statistically complete sample of neutron stars, and hence make several direct tests of fundamental NS parameters and accretion physics. In addition the long-duration of the dataset and "whole-galaxy" nature of the SMC sample make possible a new statistical approach to uncover the duty-cycle distribution and hence population demographics of transient High Mass X-ray Binary (HMXB) populations.
Modeling Nonstationarity in Space and Time
2017-01-01
Summary We propose to model a spatio-temporal random field that has nonstationary covariance structure in both space and time domains by applying the concept of the dimension expansion method in Bornn et al. (2012). Simulations are conducted for both separable and nonseparable space-time covariance models, and the model is also illustrated with a streamflow dataset. Both simulation and data analyses show that modeling nonstationarity in both space and time can improve the predictive performance over stationary covariance models or models that are nonstationary in space but stationary in time. PMID:28134977
Modeling nonstationarity in space and time.
Shand, Lyndsay; Li, Bo
2017-09-01
We propose to model a spatio-temporal random field that has nonstationary covariance structure in both space and time domains by applying the concept of the dimension expansion method in Bornn et al. (2012). Simulations are conducted for both separable and nonseparable space-time covariance models, and the model is also illustrated with a streamflow dataset. Both simulation and data analyses show that modeling nonstationarity in both space and time can improve the predictive performance over stationary covariance models or models that are nonstationary in space but stationary in time. © 2017, The International Biometric Society.
Connecting the Public to Scientific Research Data - Science On a Sphere°
NASA Astrophysics Data System (ADS)
Henderson, M. A.; Russell, E. L.; Science on a Sphere Datasets
2011-12-01
Connecting the Public to Scientific Research Data - Science On a Sphere° Maurice Henderson, NASA Goddard Space Flight Center Elizabeth Russell, NOAA Earth System Research Laboratory, University of Colorado Cooperative Institute for Research in Environmental Sciences Science On a Sphere° is a six foot animated globe developed by the National Ocean and Atmospheric Administration, NOAA, as a means to display global scientific research data in an intuitive, engaging format in public forums. With over 70 permanent installations of SOS around the world in science museums, visitor's centers and universities, the audience that enjoys SOS yearly is substantial, wide-ranging, and diverse. Through partnerships with the National Aeronautics and Space Administration, NASA, the SOS Data Catalog (http://sos.noaa.gov/datasets/) has grown to a collection of over 350 datasets from NOAA, NASA, and many others. Using an external projection system, these datasets are displayed onto the sphere creating a seamless global image. In a cross-site evaluation of Science On a Sphere°, 82% of participants said yes, seeing information displayed on a sphere changed their understanding of the information. This unique technology captivates viewers and exposes them to scientific research data in a way that is accessible, presentable, and understandable. The datasets that comprise the SOS Data Catalog are scientific research data that have been formatted for display on SOS. By formatting research data into visualizations that can be used on SOS, NOAA and NASA are able to turn research data into educational materials that are easily accessible for users. In many cases, visualizations do not need to be modified because SOS uses a common map projection. The SOS Data Catalog has become a "one-stop shop" for a broad range of global datasets from across NOAA and NASA, and as a result, the traffic on the site is more than just SOS users. While the target audience for this site is SOS users, many inquiries come from teachers, book editors, film producers and students interested in using the available datasets. The SOS Data Catalog online includes a written description of each dataset, rendered images of the data, animated movies of the data, links to more information, details on the data source and creator, and a link to a FTP server where each dataset can be downloaded. Many of the datasets are also displayed on the SOS YouTube Channel and Facebook page. In addition, NASA has developed NASA Earth Observations, NEO, which is a collection of global satellite datasets. The NEO website allows users to layer multiple datasets and perform basic analysis. Through a new iPad application, the NASA Earth Observations datasets can be exported to SOS and analyzed on the sphere. This new capability greatly expands the number of datasets that can be shown on SOS and adds a new element of interactivity with the datasets.
Operational use of spaceborne lidar datasets
NASA Astrophysics Data System (ADS)
Marenco, Franco; Halloran, Gemma; Forsythe, Mary
2018-04-01
The Met Office plans to use space lidar datasets from CALIPSO, CATS, Aeolus and EarthCARE operationally in near real time (NRT), for the detection of aerosols. The first step is the development of NRT imagery for nowcasting of volcanic events, air quality, and mineral dust episodes. Model verification and possibly assimilation will be explored. Assimilation trials of Aeolus winds are also planned. Here we will present our first in-house imagery and our operational requirements.
Space Environmental Effects Knowledgebase
NASA Technical Reports Server (NTRS)
Wood, B. E.
2007-01-01
This report describes the results of an NRA funded program entitled Space Environmental Effects Knowledgebase that received funding through a NASA NRA (NRA8-31) and was monitored by personnel in the NASA Space Environmental Effects (SEE) Program. The NASA Project number was 02029. The Satellite Contamination and Materials Outgassing Knowledgebase (SCMOK) was created as a part of the earlier NRA8-20. One of the previous tasks and part of the previously developed Knowledgebase was to accumulate data from facilities using QCMs to measure the outgassing data for satellite materials. The main object of this current program was to increase the number of material outgassing datasets from 250 up to approximately 500. As a part of this effort, a round-robin series of materials outgassing measurements program was also executed that allowed comparison of the results for the same materials tested in 10 different test facilities. Other programs tasks included obtaining datasets or information packages for 1) optical effects of contaminants on optical surfaces, thermal radiators, and sensor systems and 2) space environmental effects data and incorporating these data into the already existing NASA/SEE Knowledgebase.
NASA Technical Reports Server (NTRS)
1991-01-01
Space physics is defined as the study of the heliosphere as one system; that is, of the Sun and solar wind, and their interactions with the upper atmospheres, ionospheres, and magnetospheres of the planets and comets, with energetic particles, and with the interstellar medium. This report contains a number of reports by different panels on the major topics in the space physics program including: (1) the cosmic and heliospheric physics program for the years 1995 to 2010; (2) ionosphere, thermosphere, and mesosphere studies; (3) magnetospheric physics; (4) solar physics; and (5) space physics theory.
Spatio-temporal Eigenvector Filtering: Application on Bioenergy Crop Impacts
NASA Astrophysics Data System (ADS)
Wang, M.; Kamarianakis, Y.; Georgescu, M.
2017-12-01
A suite of 10-year ensemble-based simulations was conducted to investigate the hydroclimatic impacts due to large-scale deployment of perennial bioenergy crops across the continental United States. Given the large size of the simulated dataset (about 60Tb), traditional hierarchical spatio-temporal statistical modelling cannot be implemented for the evaluation of physics parameterizations and biofuel impacts. In this work, we propose a filtering algorithm that takes into account the spatio-temporal autocorrelation structure of the data while avoiding spatial confounding. This method is used to quantify the robustness of simulated hydroclimatic impacts associated with bioenergy crops to alternative physics parameterizations and observational datasets. Results are evaluated against those obtained from three alternative Bayesian spatio-temporal specifications.
Evolving Deep Networks Using HPC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, Steven R.; Rose, Derek C.; Johnston, Travis
While a large number of deep learning networks have been studied and published that produce outstanding results on natural image datasets, these datasets only make up a fraction of those to which deep learning can be applied. These datasets include text data, audio data, and arrays of sensors that have very different characteristics than natural images. As these “best” networks for natural images have been largely discovered through experimentation and cannot be proven optimal on some theoretical basis, there is no reason to believe that they are the optimal network for these drastically different datasets. Hyperparameter search is thus oftenmore » a very important process when applying deep learning to a new problem. In this work we present an evolutionary approach to searching the possible space of network hyperparameters and construction that can scale to 18, 000 nodes. This approach is applied to datasets of varying types and characteristics where we demonstrate the ability to rapidly find best hyperparameters in order to enable practitioners to quickly iterate between idea and result.« less
Disk storage management for LHCb based on Data Popularity estimator
NASA Astrophysics Data System (ADS)
Hushchyn, Mikhail; Charpentier, Philippe; Ustyuzhanin, Andrey
2015-12-01
This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data.
Data-driven probability concentration and sampling on manifold
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2016-09-15
A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation methodmore » for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems.« less
Schubert, Nicole; Axer, Markus; Schober, Martin; Huynh, Anh-Minh; Huysegoms, Marcel; Palomero-Gallagher, Nicola; Bjaalie, Jan G.; Leergaard, Trygve B.; Kirlangic, Mehmet E.; Amunts, Katrin; Zilles, Karl
2016-01-01
High-resolution multiscale and multimodal 3D models of the brain are essential tools to understand its complex structural and functional organization. Neuroimaging techniques addressing different aspects of brain organization should be integrated in a reference space to enable topographically correct alignment and subsequent analysis of the various datasets and their modalities. The Waxholm Space (http://software.incf.org/software/waxholm-space) is a publicly available 3D coordinate-based standard reference space for the mapping and registration of neuroanatomical data in rodent brains. This paper provides a newly developed pipeline combining imaging and reconstruction steps with a novel registration strategy to integrate new neuroimaging modalities into the Waxholm Space atlas. As a proof of principle, we incorporated large scale high-resolution cyto-, muscarinic M2 receptor, and fiber architectonic images of rat brains into the 3D digital MRI based atlas of the Sprague Dawley rat in Waxholm Space. We describe the whole workflow, from image acquisition to reconstruction and registration of these three modalities into the Waxholm Space rat atlas. The registration of the brain sections into the atlas is performed by using both linear and non-linear transformations. The validity of the procedure is qualitatively demonstrated by visual inspection, and a quantitative evaluation is performed by measurement of the concordance between representative atlas-delineated regions and the same regions based on receptor or fiber architectonic data. This novel approach enables for the first time the generation of 3D reconstructed volumes of nerve fibers and fiber tracts, or of muscarinic M2 receptor density distributions, in an entire rat brain. Additionally, our pipeline facilitates the inclusion of further neuroimaging datasets, e.g., 3D reconstructed volumes of histochemical stainings or of the regional distributions of multiple other receptor types, into the Waxholm Space. Thereby, a multiscale and multimodal rat brain model was created in the Waxholm Space atlas of the rat brain. Since the registration of these multimodal high-resolution datasets into the same coordinate system is an indispensable requisite for multi-parameter analyses, this approach enables combined studies on receptor and cell distributions as well as fiber densities in the same anatomical structures at microscopic scales for the first time. PMID:27199682
Schubert, Nicole; Axer, Markus; Schober, Martin; Huynh, Anh-Minh; Huysegoms, Marcel; Palomero-Gallagher, Nicola; Bjaalie, Jan G; Leergaard, Trygve B; Kirlangic, Mehmet E; Amunts, Katrin; Zilles, Karl
2016-01-01
High-resolution multiscale and multimodal 3D models of the brain are essential tools to understand its complex structural and functional organization. Neuroimaging techniques addressing different aspects of brain organization should be integrated in a reference space to enable topographically correct alignment and subsequent analysis of the various datasets and their modalities. The Waxholm Space (http://software.incf.org/software/waxholm-space) is a publicly available 3D coordinate-based standard reference space for the mapping and registration of neuroanatomical data in rodent brains. This paper provides a newly developed pipeline combining imaging and reconstruction steps with a novel registration strategy to integrate new neuroimaging modalities into the Waxholm Space atlas. As a proof of principle, we incorporated large scale high-resolution cyto-, muscarinic M2 receptor, and fiber architectonic images of rat brains into the 3D digital MRI based atlas of the Sprague Dawley rat in Waxholm Space. We describe the whole workflow, from image acquisition to reconstruction and registration of these three modalities into the Waxholm Space rat atlas. The registration of the brain sections into the atlas is performed by using both linear and non-linear transformations. The validity of the procedure is qualitatively demonstrated by visual inspection, and a quantitative evaluation is performed by measurement of the concordance between representative atlas-delineated regions and the same regions based on receptor or fiber architectonic data. This novel approach enables for the first time the generation of 3D reconstructed volumes of nerve fibers and fiber tracts, or of muscarinic M2 receptor density distributions, in an entire rat brain. Additionally, our pipeline facilitates the inclusion of further neuroimaging datasets, e.g., 3D reconstructed volumes of histochemical stainings or of the regional distributions of multiple other receptor types, into the Waxholm Space. Thereby, a multiscale and multimodal rat brain model was created in the Waxholm Space atlas of the rat brain. Since the registration of these multimodal high-resolution datasets into the same coordinate system is an indispensable requisite for multi-parameter analyses, this approach enables combined studies on receptor and cell distributions as well as fiber densities in the same anatomical structures at microscopic scales for the first time.
Upper ankle joint space detection on low contrast intraoperative fluoroscopic C-arm projections
NASA Astrophysics Data System (ADS)
Thomas, Sarina; Schnetzke, Marc; Brehler, Michael; Swartman, Benedict; Vetter, Sven; Franke, Jochen; Grützner, Paul A.; Meinzer, Hans-Peter; Nolden, Marco
2017-03-01
Intraoperative mobile C-arm fluoroscopy is widely used for interventional verification in trauma surgery, high flexibility combined with low cost being the main advantages of the method. However, the lack of global device-to- patient orientation is challenging, when comparing the acquired data to other intrapatient datasets. In upper ankle joint fracture reduction accompanied with an unstable syndesmosis, a comparison to the unfractured contralateral site is helpful for verification of the reduction result. To reduce dose and operation time, our approach aims at the comparison of single projections of the unfractured ankle with volumetric images of the reduced fracture. For precise assessment, a pre-alignment of both datasets is a crucial step. We propose a contour extraction pipeline to estimate the joint space location for a prealignment of fluoroscopic C-arm projections containing the upper ankle joint. A quadtree-based hierarchical variance comparison extracts potential feature points and a Hough transform is applied to identify bone shaft lines together with the tibiotalar joint space. By using this information we can define the coarse orientation of the projections independent from the ankle pose during acquisition in order to align those images to the volume of the fractured ankle. The proposed method was evaluated on thirteen cadaveric datasets consisting of 100 projections each with manually adjusted image planes by three trauma surgeons. The results show that the method can be used to detect the joint space orientation. The correlation between angle deviation and anatomical projection direction gives valuable input on the acquisition direction for future clinical experiments.
The 3D Reference Earth Model: Status and Preliminary Results
NASA Astrophysics Data System (ADS)
Moulik, P.; Lekic, V.; Romanowicz, B. A.
2017-12-01
In the 20th century, seismologists constructed models of how average physical properties (e.g. density, rigidity, compressibility, anisotropy) vary with depth in the Earth's interior. These one-dimensional (1D) reference Earth models (e.g. PREM) have proven indispensable in earthquake location, imaging of interior structure, understanding material properties under extreme conditions, and as a reference in other fields, such as particle physics and astronomy. Over the past three decades, new datasets motivated more sophisticated efforts that yielded models of how properties vary both laterally and with depth in the Earth's interior. Though these three-dimensional (3D) models exhibit compelling similarities at large scales, differences in the methodology, representation of structure, and dataset upon which they are based, have prevented the creation of 3D community reference models. As part of the REM-3D project, we are compiling and reconciling reference seismic datasets of body wave travel-time measurements, fundamental mode and overtone surface wave dispersion measurements, and normal mode frequencies and splitting functions. These reference datasets are being inverted for a long-wavelength, 3D reference Earth model that describes the robust long-wavelength features of mantle heterogeneity. As a community reference model with fully quantified uncertainties and tradeoffs and an associated publically available dataset, REM-3D will facilitate Earth imaging studies, earthquake characterization, inferences on temperature and composition in the deep interior, and be of improved utility to emerging scientific endeavors, such as neutrino geoscience. Here, we summarize progress made in the construction of the reference long period dataset and present a preliminary version of REM-3D in the upper-mantle. In order to determine the level of detail warranted for inclusion in REM-3D, we analyze the spectrum of discrepancies between models inverted with different subsets of the reference dataset. This procedure allows us to evaluate the extent of consistency in imaging heterogeneity at various depths and between spatial scales.
The Human and Physical Determinants of Wildfires and Burnt Areas in Israel
NASA Astrophysics Data System (ADS)
Levin, Noam; Tessler, Naama; Smith, Andrew; McAlpine, Clive
2016-09-01
Wildfires are expected to increase in Mediterranean landscapes as a result of climate change and changes in land-use practices. In order to advance our understanding of human and physical factors shaping spatial patterns of wildfires in the region, we compared two independently generated datasets of wildfires for Israel that cover approximately the same study period. We generated a site-based dataset containing the location of 10,879 wildfires (1991-2011), and compared it to a dataset of burnt areas derived from MODIS imagery (2000-2011). We hypothesized that the physical and human factors explaining the spatial distribution of burnt areas derived from remote sensing (mostly large fires, >100 ha) will differ from those explaining site-based wildfires recorded by national agencies (mostly small fires, <10 ha). Small wildfires recorded by forestry agencies were concentrated within planted forests and near built-up areas, whereas the largest wildfires were located in more remote regions, often associated with military training areas and herbaceous vegetation. We conclude that to better understand wildfire dynamics, consolidation of wildfire databases should be achieved, combining field reports and remote sensing. As nearly all wildfires in Mediterranean landscapes are caused by human activities, improving the management of forest areas and raising public awareness to fire risk are key considerations in reducing fire danger.
The Human and Physical Determinants of Wildfires and Burnt Areas in Israel.
Levin, Noam; Tessler, Naama; Smith, Andrew; McAlpine, Clive
2016-09-01
Wildfires are expected to increase in Mediterranean landscapes as a result of climate change and changes in land-use practices. In order to advance our understanding of human and physical factors shaping spatial patterns of wildfires in the region, we compared two independently generated datasets of wildfires for Israel that cover approximately the same study period. We generated a site-based dataset containing the location of 10,879 wildfires (1991-2011), and compared it to a dataset of burnt areas derived from MODIS imagery (2000-2011). We hypothesized that the physical and human factors explaining the spatial distribution of burnt areas derived from remote sensing (mostly large fires, >100 ha) will differ from those explaining site-based wildfires recorded by national agencies (mostly small fires, <10 ha). Small wildfires recorded by forestry agencies were concentrated within planted forests and near built-up areas, whereas the largest wildfires were located in more remote regions, often associated with military training areas and herbaceous vegetation. We conclude that to better understand wildfire dynamics, consolidation of wildfire databases should be achieved, combining field reports and remote sensing. As nearly all wildfires in Mediterranean landscapes are caused by human activities, improving the management of forest areas and raising public awareness to fire risk are key considerations in reducing fire danger.
NASA Astrophysics Data System (ADS)
Alvarez-Garreton, C. D.; Mendoza, P. A.; Zambrano-Bigiarini, M.; Galleguillos, M. H.; Boisier, J. P.; Lara, A.; Cortés, G.; Garreaud, R.; McPhee, J. P.; Addor, N.; Puelma, C.
2017-12-01
We provide the first catchment-based hydrometeorological, vegetation and physical data set over 531 catchments in Chile (17.8 S - 55.0 S). We compiled publicly available streamflow records at daily time steps for the period 1980-2015, and generated basin-averaged time series of the following hydrometeorological variables: 1) daily precipitation coming from three different gridded sources (re-analysis and satellite-based); 2) daily maximum and minimum temperature; 3) 8-days potential evapotranspiration (PET) based on MODIS imagery and daily PET based on Hargreaves formula; and 4) daily snow water equivalent. Additionally, catchments are characterized by their main physical (area, mean elevation, mean slope) and land cover characteristics. We synthetized these datasets with several indices characterizing the spatial distribution of climatic, hydrological, topographic and vegetation attributes. The new catchment-based dataset is unprecedented in the region and provides information that can be used in a myriad of applications, including catchment classification and regionalization studies, impacts of different land cover types on catchment response, characterization of drought history and projections, climate change impacts on hydrological processes, etc. Derived practical applications include water management and allocation strategies, decision making and adaptation planning to climate change. This data set will be publicly available and we encourage the community to use it.
OLYMPEX Data Workshop: GPM View
NASA Technical Reports Server (NTRS)
Petersen, W.
2017-01-01
OLYMPEX Primary Objectives: Datasets to enable: (1) Direct validation over complex terrain at multiple scales, liquid and frozen precip types, (a) Do we capture terrain and synoptic regime transitions, orographic enhancements/structure, full range of precipitation intensity (e.g., very light to heavy) and types, spatial variability? (b) How well can we estimate space/time-accumulated precipitation over terrain (liquid + frozen)? (2) Physical validation of algorithms in mid-latitude cold season frontal systems over ocean and complex terrain, (a) What are the column properties of frozen, melting, liquid hydrometeors-their relative contributions to estimated surface precipitation, transition under the influence of terrain gradients, and systematic variability as a function of synoptic regime? (3) Integrated hydrologic validation in complex terrain, (a) Can satellite estimates be combined with modeling over complex topography to drive improved products (assimilation, downscaling) [Level IV products] (b) What are capabilities and limitations for use of satellite-based precipitation estimates in stream/river flow forecasting?
Optical and Physical Methods for Mapping Flooding with Satellite Imagery
NASA Technical Reports Server (NTRS)
Fayne, Jessica Fayne; Bolten, John; Lakshmi, Venkat; Ahamed, Aakash
2016-01-01
Flood and surface water mapping is becoming increasingly necessary, as extreme flooding events worldwide can damage crop yields and contribute to billions of dollars economic damages as well as social effects including fatalities and destroyed communities (Xaio et al. 2004; Kwak et al. 2015; Mueller et al. 2016).Utilizing earth observing satellite data to map standing water from space is indispensable to flood mapping for disaster response, mitigation, prevention, and warning (McFeeters 1996; Brakenridge and Anderson 2006). Since the early 1970s(Landsat, USGS 2013), researchers have been able to remotely sense surface processes such as extreme flood events to help offset some of these problems. Researchers have demonstrated countless methods and modifications of those methods to help increase knowledge of areas at risk and areas that are flooded using remote sensing data from optical and radar systems, as well as free publically available and costly commercial datasets.
A Framework to Debug Diagnostic Matrices
NASA Technical Reports Server (NTRS)
Kodal, Anuradha; Robinson, Peter; Patterson-Hine, Ann
2013-01-01
Diagnostics is an important concept in system health and monitoring of space operations. Many of the existing diagnostic algorithms utilize system knowledge in the form of diagnostic matrix (D-matrix, also popularly known as diagnostic dictionary, fault signature matrix or reachability matrix) gleaned from physical models. But, sometimes, this may not be coherent to obtain high diagnostic performance. In such a case, it is important to modify this D-matrix based on knowledge obtained from other sources such as time-series data stream (simulated or maintenance data) within the context of a framework that includes the diagnostic/inference algorithm. A systematic and sequential update procedure, diagnostic modeling evaluator (DME) is proposed to modify D-matrix and wrapper logic considering least expensive solution first. This iterative procedure includes conditions ranging from modifying 0s and 1s in the matrix, or adding/removing the rows (failure sources) columns (tests). We will experiment this framework on datasets from DX challenge 2009.
NASA Astrophysics Data System (ADS)
Xu, Y.; Sun, Z.; Boerner, R.; Koch, T.; Hoegner, L.; Stilla, U.
2018-04-01
In this work, we report a novel way of generating ground truth dataset for analyzing point cloud from different sensors and the validation of algorithms. Instead of directly labeling large amount of 3D points requiring time consuming manual work, a multi-resolution 3D voxel grid for the testing site is generated. Then, with the help of a set of basic labeled points from the reference dataset, we can generate a 3D labeled space of the entire testing site with different resolutions. Specifically, an octree-based voxel structure is applied to voxelize the annotated reference point cloud, by which all the points are organized by 3D grids of multi-resolutions. When automatically annotating the new testing point clouds, a voting based approach is adopted to the labeled points within multiple resolution voxels, in order to assign a semantic label to the 3D space represented by the voxel. Lastly, robust line- and plane-based fast registration methods are developed for aligning point clouds obtained via various sensors. Benefiting from the labeled 3D spatial information, we can easily create new annotated 3D point clouds of different sensors of the same scene directly by considering the corresponding labels of 3D space the points located, which would be convenient for the validation and evaluation of algorithms related to point cloud interpretation and semantic segmentation.
NASA Astrophysics Data System (ADS)
Ferrari, Ulisse
A maximal entropy model provides the least constrained probability distribution that reproduces experimental averages of an observables set. In this work we characterize the learning dynamics that maximizes the log-likelihood in the case of large but finite datasets. We first show how the steepest descent dynamics is not optimal as it is slowed down by the inhomogeneous curvature of the model parameters space. We then provide a way for rectifying this space which relies only on dataset properties and does not require large computational efforts. We conclude by solving the long-time limit of the parameters dynamics including the randomness generated by the systematic use of Gibbs sampling. In this stochastic framework, rather than converging to a fixed point, the dynamics reaches a stationary distribution, which for the rectified dynamics reproduces the posterior distribution of the parameters. We sum up all these insights in a ``rectified'' Data-Driven algorithm that is fast and by sampling from the parameters posterior avoids both under- and over-fitting along all the directions of the parameters space. Through the learning of pairwise Ising models from the recording of a large population of retina neurons, we show how our algorithm outperforms the steepest descent method. This research was supported by a Grant from the Human Brain Project (HBP CLAP).
Liao, Pei-An; Chang, Hung-Hao; Wang, Jiun-Hao; Wu, Min-Chen
2013-06-01
This study examined the relationship between the changes of physical fitness across the 3-year spectrum of senior high school study and academic performance measured by standardized tests in Taiwan. A unique dataset of 149 240 university-bound senior high school students from 2009 to 2011 was constructed by merging two nationwide administrative datasets of physical fitness test performance and the university entrance exam scores. Hierarchical linear regression models were used. All regressions included controls for students' baseline physical fitness status, changes of physical fitness performance over time, age and family economic status. Some notable findings were revealed. An increase of 1 SD on students' overall physical fitness from the first to third school year is associated with an increase in the university entrance exam scores by 0.007 and 0.010 SD for male and female students, respectively. An increase of 1 SD on anaerobic power (flexibility) from the first to third school year is positively associated with an increase in the university entrance exam scores by 0.018 (0.010) SD among female students. We suggest that education and school health policymakers should consider and design policies to improve physical fitness as part of their overall strategy of improving academic performance.
DiscoverySpace: an interactive data analysis application
Robertson, Neil; Oveisi-Fordorei, Mehrdad; Zuyderduyn, Scott D; Varhol, Richard J; Fjell, Christopher; Marra, Marco; Jones, Steven; Siddiqui, Asim
2007-01-01
DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data. The application is freely available online. PMID:17210078
NASA GeneLab Project: Bridging Space Radiation Omics with Ground Studies
NASA Technical Reports Server (NTRS)
Beheshti, Afshin; Miller, Jack; Kidane, Yared H.; Berrios, Daniel; Gebre, Samrawit G.; Costes, Sylvain V.
2018-01-01
Accurate assessment of risk factors for long-term space missions is critical for human space exploration: therefore it is essential to have a detailed understanding of the biological effects on humans living and working in deep space. Ionizing radiation from Galactic Cosmic Rays (GCR) is one of the major risk factors factor that will impact health of astronauts on extended missions outside the protective effects of the Earth's magnetic field. Currently there are gaps in our knowledge of the health risks associated with chronic low dose, low dose rate ionizing radiation, specifically ions associated with high (H) atomic number (Z) and energy (E). The GeneLab project (genelab.nasa.gov) aims to provide a detailed library of Omics datasets associated with biological samples exposed to HZE. The GeneLab Data System (GLDS) currently includes datasets from both spaceflight and ground-based studies, a majority of which involve exposure to ionizing radiation. In addition to detailed information for ground-based studies, we are in the process of adding detailed, curated dosimetry information for spaceflight missions. GeneLab is the first comprehensive Omics database for space related research from which an investigator can generate hypotheses to direct future experiments utilizing both ground and space biological radiation data. In addition to previously acquired data, the GLDS is continually expanding as Omics related data are generated by the space life sciences community. Here we provide a brief summary of space radiation related data available at GeneLab.
State Space Model with hidden variables for reconstruction of gene regulatory networks.
Wu, Xi; Li, Peng; Wang, Nan; Gong, Ping; Perkins, Edward J; Deng, Youping; Zhang, Chaoyang
2011-01-01
State Space Model (SSM) is a relatively new approach to inferring gene regulatory networks. It requires less computational time than Dynamic Bayesian Networks (DBN). There are two types of variables in the linear SSM, observed variables and hidden variables. SSM uses an iterative method, namely Expectation-Maximization, to infer regulatory relationships from microarray datasets. The hidden variables cannot be directly observed from experiments. How to determine the number of hidden variables has a significant impact on the accuracy of network inference. In this study, we used SSM to infer Gene regulatory networks (GRNs) from synthetic time series datasets, investigated Bayesian Information Criterion (BIC) and Principle Component Analysis (PCA) approaches to determining the number of hidden variables in SSM, and evaluated the performance of SSM in comparison with DBN. True GRNs and synthetic gene expression datasets were generated using GeneNetWeaver. Both DBN and linear SSM were used to infer GRNs from the synthetic datasets. The inferred networks were compared with the true networks. Our results show that inference precision varied with the number of hidden variables. For some regulatory networks, the inference precision of DBN was higher but SSM performed better in other cases. Although the overall performance of the two approaches is compatible, SSM is much faster and capable of inferring much larger networks than DBN. This study provides useful information in handling the hidden variables and improving the inference precision.
Nanocubes for real-time exploration of spatiotemporal datasets.
Lins, Lauro; Klosowski, James T; Scheidegger, Carlos
2013-12-01
Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
A hybrid personalized data recommendation approach for geoscience data sharing
NASA Astrophysics Data System (ADS)
WANG, M.; Wang, J.
2016-12-01
Recommender systems are effective tools helping Internet users overcome information overloading. The two most widely used recommendation algorithms are collaborating filtering (CF) and content-based filtering (CBF). A number of recommender systems based on those two algorithms were developed for multimedia, online sells, and other domains. Each of the two algorithms has its advantages and shortcomings. Hybrid approaches that combine these two algorithms are better choices in many cases. In geoscience data sharing domain, where the items (datasets) are more informative (in space and time) and domain-specific, no recommender system is specialized for data users. This paper reports a dynamic weighted hybrid recommendation algorithm that combines CF and CBF for geoscience data sharing portal. We first derive users' ratings on items with their historical visiting time by Jenks Natural Break. In the CBF part, we incorporate the space, time, and subject information of geoscience datasets to compute item similarity. Predicted ratings were computed with k-NN method separately using CBF and CF, and then combined with weights. With training dataset we attempted to find the best model describing ideal weights and users' co-rating numbers. A logarithmic function was confirmed to be the best model. The model was then used to tune the weights of CF and CBF on user-item basis with test dataset. Evaluation results show that the dynamic weighted approach outperforms either solo CF or CBF approach in terms of Precision and Recall.
NASA Astrophysics Data System (ADS)
Macher, H.; Landes, T.; Grussenmeyer, P.
2016-06-01
Laser scanners are widely used for the modelling of existing buildings and particularly in the creation process of as-built BIM (Building Information Modelling). However, the generation of as-built BIM from point clouds involves mainly manual steps and it is consequently time consuming and error-prone. Along the path to automation, a three steps segmentation approach has been developed. This approach is composed of two phases: a segmentation into sub-spaces namely floors and rooms and a plane segmentation combined with the identification of building elements. In order to assess and validate the developed approach, different case studies are considered. Indeed, it is essential to apply algorithms to several datasets and not to develop algorithms with a unique dataset which could influence the development with its particularities. Indoor point clouds of different types of buildings will be used as input for the developed algorithms, going from an individual house of almost one hundred square meters to larger buildings of several thousand square meters. Datasets provide various space configurations and present numerous different occluding objects as for example desks, computer equipments, home furnishings and even wine barrels. For each dataset, the results will be illustrated. The analysis of the results will provide an insight into the transferability of the developed approach for the indoor modelling of several types of buildings.
NASA Astrophysics Data System (ADS)
Nisha, N.; Punia, M.
2016-12-01
Mountain stratigraphic system cannot be claimed as the extraordinarily fragile but a greater range of vulnerability to disturbance than many landscape, in the physical space leading to disturbance in social space, makes it special eco-sensitive zone with greater degree of fragility. The present study furnishes socio-economic vulnerability mapping of the Bhagirathi basin through computation of the Socio vulnerability Index (SoVI). SoVI correlates vulnerability to natural or anthropogenic disasters to socio - economic development and illustrates how developmental parameters alter equation of potential effect and recovery in event of a natural catastrophe in the study region. Use of time-series datasets from different sources, including the optical remote sensing data and the use of social and/or economic data to quantify the vulnerabilities during extreme events is attempted. From the analysis it has been found that the areas with high social vulnerability index are might prone to disaster than low index area. However, the analysis of social vulnerability not only helps to identify flood risk area but also raises the question how the key drivers trigger flood and controlled by the governments and local authorities.
Scaling Relations between Gas and Star Formation in Nearby Galaxies
NASA Astrophysics Data System (ADS)
Bigiel, Frank; Leroy, Adam; Walter, Fabian
2011-04-01
High resolution, multi-wavelength maps of a sizeable set of nearby galaxies have made it possible to study how the surface densities of H i, H2 and star formation rate (ΣHI, ΣH2, ΣSFR) relate on scales of a few hundred parsecs. At these scales, individual galaxy disks are comfortably resolved, making it possible to assess gas-SFR relations with respect to environment within galaxies. ΣH2, traced by CO intensity, shows a strong correlation with ΣSFR and the ratio between these two quantities, the molecular gas depletion time, appears to be constant at about 2 Gyr in large spiral galaxies. Within the star-forming disks of galaxies, ΣSFR shows almost no correlation with ΣHI. In the outer parts of galaxies, however, ΣSFR does scale with ΣHI, though with large scatter. Combining data from these different environments yields a distribution with multiple regimes in Σgas - ΣSFR space. If the underlying assumptions to convert observables to physical quantities are matched, even combined datasets based on different SFR tracers, methodologies and spatial scales occupy a well define locus in Σgas - ΣSFR space.
POCS-enhanced correction of motion artifacts in parallel MRI.
Samsonov, Alexey A; Velikina, Julia; Jung, Youngkyoo; Kholmovski, Eugene G; Johnson, Chris R; Block, Walter F
2010-04-01
A new method for correction of MRI motion artifacts induced by corrupted k-space data, acquired by multiple receiver coils such as phased arrays, is presented. In our approach, a projections onto convex sets (POCS)-based method for reconstruction of sensitivity encoded MRI data (POCSENSE) is employed to identify corrupted k-space samples. After the erroneous data are discarded from the dataset, the artifact-free images are restored from the remaining data using coil sensitivity profiles. The error detection and data restoration are based on informational redundancy of phased-array data and may be applied to full and reduced datasets. An important advantage of the new POCS-based method is that, in addition to multicoil data redundancy, it can use a priori known properties about the imaged object for improved MR image artifact correction. The use of such information was shown to improve significantly k-space error detection and image artifact correction. The method was validated on data corrupted by simulated and real motion such as head motion and pulsatile flow.
NASA Astrophysics Data System (ADS)
Tamminen, J.; Sofieva, V.; Kyrölä, E.; Laine, M.; Degenstein, D. A.; Bourassa, A. E.; Roth, C.; Zawada, D.; Weber, M.; Rozanov, A.; Rahpoe, N.; Stiller, G. P.; Laeng, A.; von Clarmann, T.; Walker, K. A.; Sheese, P.; Hubert, D.; Van Roozendael, M.; Zehner, C.; Damadeo, R. P.; Zawodny, J. M.; Kramarova, N. A.; Bhartia, P. K.
2017-12-01
We present a merged dataset of ozone profiles from several satellite instruments: SAGE II on ERBS, GOMOS, SCIAMACHY and MIPAS on Envisat, OSIRIS on Odin, ACE-FTS on SCISAT, and OMPS on Suomi-NPP. The merged dataset is created in the framework of European Space Agency Climate Change Initiative (Ozone_cci) with the aim of analyzing stratospheric ozone trends. For the merged dataset, we used the latest versions of the original ozone datasets. The datasets from the individual instruments have been extensively validated and inter-compared; only those datasets, which are in good agreement and do not exhibit significant drifts with respect to collocated ground-based observations and with respect to each other, are used for merging. The long-term SAGE-CCI-OMPS dataset is created by computation and merging of deseasonalized anomalies from individual instruments. The merged SAGE-CCI-OMPS dataset consists of deseasonalized anomalies of ozone in 10° latitude bands from 90°S to 90°N and from 10 to 50 km in steps of 1 km covering the period from October 1984 to July 2016. This newly created dataset is used for evaluating ozone trends in the stratosphere through multiple linear regression. Negative ozone trends in the upper stratosphere are observed before 1997 and positive trends are found after 1997. The upper stratospheric trends are statistically significant at mid-latitudes in the upper stratosphere and indicate ozone recovery, as expected from the decrease of stratospheric halogens that started in the middle of the 1990s.
In-vehicle group activity modeling and simulation in sensor-based virtual environment
NASA Astrophysics Data System (ADS)
Shirkhodaie, Amir; Telagamsetti, Durga; Poshtyar, Azin; Chan, Alex; Hu, Shuowen
2016-05-01
Human group activity recognition is a very complex and challenging task, especially for Partially Observable Group Activities (POGA) that occur in confined spaces with limited visual observability and often under severe occultation. In this paper, we present IRIS Virtual Environment Simulation Model (VESM) for the modeling and simulation of dynamic POGA. More specifically, we address sensor-based modeling and simulation of a specific category of POGA, called In-Vehicle Group Activities (IVGA). In VESM, human-alike animated characters, called humanoids, are employed to simulate complex in-vehicle group activities within the confined space of a modeled vehicle. Each articulated humanoid is kinematically modeled with comparable physical attributes and appearances that are linkable to its human counterpart. Each humanoid exhibits harmonious full-body motion - simulating human-like gestures and postures, facial impressions, and hands motions for coordinated dexterity. VESM facilitates the creation of interactive scenarios consisting of multiple humanoids with different personalities and intentions, which are capable of performing complicated human activities within the confined space inside a typical vehicle. In this paper, we demonstrate the efficiency and effectiveness of VESM in terms of its capabilities to seamlessly generate time-synchronized, multi-source, and correlated imagery datasets of IVGA, which are useful for the training and testing of multi-source full-motion video processing and annotation. Furthermore, we demonstrate full-motion video processing of such simulated scenarios under different operational contextual constraints.
Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach.
Liu, Li; Shao, Ling; Li, Xuelong; Lu, Ke
2016-01-01
Extracting discriminative and robust features from video sequences is the first and most critical step in human action recognition. In this paper, instead of using handcrafted features, we automatically learn spatio-temporal motion features for action recognition. This is achieved via an evolutionary method, i.e., genetic programming (GP), which evolves the motion feature descriptor on a population of primitive 3D operators (e.g., 3D-Gabor and wavelet). In this way, the scale and shift invariant features can be effectively extracted from both color and optical flow sequences. We intend to learn data adaptive descriptors for different datasets with multiple layers, which makes fully use of the knowledge to mimic the physical structure of the human visual cortex for action recognition and simultaneously reduce the GP searching space to effectively accelerate the convergence of optimal solutions. In our evolutionary architecture, the average cross-validation classification error, which is calculated by an support-vector-machine classifier on the training set, is adopted as the evaluation criterion for the GP fitness function. After the entire evolution procedure finishes, the best-so-far solution selected by GP is regarded as the (near-)optimal action descriptor obtained. The GP-evolving feature extraction method is evaluated on four popular action datasets, namely KTH, HMDB51, UCF YouTube, and Hollywood2. Experimental results show that our method significantly outperforms other types of features, either hand-designed or machine-learned.
Yadav, Mukesh; Joshi, Shobha; Nayarisseri, Anuraj; Jain, Anuja; Hussain, Aabid; Dubey, Tushar
2013-06-01
Global QSAR models predict biological response of molecular structures which are generic in particular class. A global QSAR dataset admits structural features derived from larger chemical space, intricate to model but more applicable in medicinal chemistry. The present work is global in either sense of structural diversity in QSAR dataset or large number of descriptor input. Forty phenethylamine structure derivatives were selected from a large pool (904) of similar phenethylamines available in Pubchem database. LogP values of selected candidates were collected from physical properties database (PHYSPROP) determined in identical set of conditions. Attempts to model logP value have produced significant QSAR models. MLR aided linear one-variable and two-variable QSAR models with their respective R(2) (0.866, 0.937), R(2)A (0.862, 0.932), F-stat (181.936, 199.812) and Standard Error (0.365, 0.255) are statistically fit and found predictive after internal validation and external validation. The descriptors chosen after improvisation and optimization reveal mechanistic part of work in terms of Verhaar model of Fish base-line toxicity from MLOGP, i.e. (BLTF96) and 3D-MoRSE -signal 15 /unweighted molecular descriptor calculated by summing atom weights viewed by a different angular scattering function (Mor15u) are crucial in regulation of logP values of phenethylamines.
Effects of Distant Green Space on Physical Activity in Sydney, Australia.
Chong, Shanley; Byun, Roy; Mazumdar, Soumya; Bauman, Adrian; Jalaludin, Bin
2017-01-01
The aim was to investigate the association between distant green space and physical activity modified by local green space. Information about physical activity, demographic and socioeconomic background at the individual level was extracted from the New South Wales Population Health Survey. The proportion of a postcode that was parkland was used as a proxy measure for access to parklands and was calculated for each individual. There was a significant relationship between distant green space and engaging in moderate-to-vigorous physical activity (MVPA) at least once a week. No significant relationship was found between adequate physical activity and distant green space. No significant relationships were found between adequate physical activity, engaging in MVPA, and local green space. However, if respondents lived in greater local green space (≥25%), there was a significant relationship between engaging in MVPA at least once a week and distance green space of ≥20%. This study highlights the important effect of distant green space on physical activity. Our findings also suggest that moderate size of local green space together with moderate size of distant green space are important levers for participation of physical activity.
NASA Astrophysics Data System (ADS)
Pariser, O.; Calef, F.; Manning, E. M.; Ardulov, V.
2017-12-01
We will present implementation and study of several use-cases of utilizing Virtual Reality (VR) for immersive display, interaction and analysis of large and complex 3D datasets. These datasets have been acquired by the instruments across several Earth, Planetary and Solar Space Robotics Missions. First, we will describe the architecture of the common application framework that was developed to input data, interface with VR display devices and program input controllers in various computing environments. Tethered and portable VR technologies will be contrasted and advantages of each highlighted. We'll proceed to presenting experimental immersive analytics visual constructs that enable augmentation of 3D datasets with 2D ones such as images and statistical and abstract data. We will conclude by presenting comparative analysis with traditional visualization applications and share the feedback provided by our users: scientists and engineers.
ESPAS: the European e-science platform to access near-Earth space data (Invited)
NASA Astrophysics Data System (ADS)
Belehaki, A.; Hapgood, M. A.; Ritschel, B.; Manola, N.
2013-12-01
The aim of ESPAS platform is to integrate heterogeneous data from the earth's thermosphere, ionosphere, plasmasphere and magnetosphere. ESPAS supports the systematic exploration of multipoint measurements from the near-Earth space through homogenised access to multi-instrument data. It provides access to more than 40 datasets: Cluster, EISCAT, GIRO, DIAS, SWACI, CHAMP, SuperDARN, FPI, magnetometers INGV, SGO, DTU, IMAGE, TGO, IMAGE/RPI, ACE, SOHO, PROBA2, NOAA/POES, etc. The concept of extensibility to new data sets is an important element in the ESPAS architecture. Within the first year of the project, the main components of the system have been developed, namely, the data model, the XML schemas for metadata exchange format, the ontology, the wrapper installed at the data nodes so that the main platform harvest the metadata, the main platform built on the D-NET framework and the GUI with its designed workflows. The first working prototype supports the search for datasets among a selected number of databases (i.e., EDAM, DIAS, Cluster, SWACI data). The next immediate step would be the implementation of search for characteristics within the datasets. For the second release we are planning to deploy tools for conjunctions between ground-space and space-space and for coincidences. For the final phase of the project the ESPAS infrastructure will be extensively tested through the application of several use cases, designed to serve the needs of the wide interdisciplinary users and producers communities, such as the ionospheric, thermospheric, magnetospheric, space weather and space climate communities, the geophysics community, the space communications engineering, HF users, satellite operators, navigation and surveillance systems, and space agencies. The final ESPAS platform is expected to be delivered in 2015. The abstract is submitted on behalf of the ESPAS-FP7EU team (http://www.espas-fp7.eu): Mike Hapgood, Anna Belehaki, Spiros Ventouras, Natalia Manola, Antonis Lebesis, Bruno Zolesi, Tatjana Gerzen, Ingemar Häggström, Anna Charisi, Ivan Galkin, Jurgen Watermann, Matthew Angling, Timo Asikainen, Alan Aylward, Henrike Barkmann, Peter Bergqvist, Andrew Bushell, Fabien Darrouzet, Dimitris Dialetis, Carl-Fredrik Enell, Daniel Heynderickx, Norbert Jakowski, Magnar Johnsen, Jean Lilensten, Ian McCrea, Kalevi Mursula, Bogdan Nicula, Michael Pezzopane, Viviane Pierrard, Bodo Reinisch, Bernd Ritschel, Luca Spogli, Iwona Stanislawska, Claudia Stolle, Eija Tanskanen, Ioanna Tsagouri, Esa Turunen, Thomas Ulich, Matthew Wild, Tim Yeoman
Scalable, Secure Analysis of Social Sciences Data on the Azure Platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Deng, Litao; Kumbhare, Alok
2012-05-07
Human activity and interaction data is beginning to be collected at population scales through the pervasiveness of social media and willingness of people to volunteer information. This can allow social science researchers to understand and model human behavior with better accuracy and prediction power. Political and social scientists are starting to correlate such large scale social media datasets with events that impact society as evidence abound of the virtual and physical public spaces intersecting and influencing each other [1,2]. Managers of Cyber Physical Systems such as Smart Power Grid utilities are investigating the impact of consumer behavior on power consumption,more » and the possibility of influencing the usage profile [3]. Data collection is also made easier through technology such as mobile apps, social media sites and search engines that directly collect data, and sensors such smart meters and room occupancy sensors that indirectly measure human activity. These technology platforms also provide a convenient framework for “human sensors” to record and broadcast data for behavioral studies, as a form of crowd sourced citizen science. This has the added advantage of engaging the broader public in STEM activities and help influence public policy.« less
Copes, Lynn E.; Lucas, Lynn M.; Thostenson, James O.; Hoekstra, Hopi E.; Boyer, Doug M.
2016-01-01
A dataset of high-resolution microCT scans of primate skulls (crania and mandibles) and certain postcranial elements was collected to address questions about primate skull morphology. The sample consists of 489 scans taken from 431 specimens, representing 59 species of most Primate families. These data have transformative reuse potential as such datasets are necessary for conducting high power research into primate evolution, but require significant time and funding to collect. Similar datasets were previously only available to select research groups across the world. The physical specimens are vouchered at Harvard’s Museum of Comparative Zoology. The data collection took place at the Center for Nanoscale Systems at Harvard. The dataset is archived on MorphoSource.org. Though this is the largest high fidelity comparative dataset yet available, its provisioning on a web archive that allows unlimited researcher contributions promises a future with vastly increased digital collections available at researchers’ finger tips. PMID:26836025
SU-E-J-161: Inverse Problems for Optical Parameters in Laser Induced Thermal Therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fahrenholtz, SJ; Stafford, RJ; Fuentes, DT
Purpose: Magnetic resonance-guided laser-induced thermal therapy (MRgLITT) is investigated as a neurosurgical intervention for oncological applications throughout the body by active post market studies. Real-time MR temperature imaging is used to monitor ablative thermal delivery in the clinic. Additionally, brain MRgLITT could improve through effective planning for laser fiber's placement. Mathematical bioheat models have been extensively investigated but require reliable patient specific physical parameter data, e.g. optical parameters. This abstract applies an inverse problem algorithm to characterize optical parameter data obtained from previous MRgLITT interventions. Methods: The implemented inverse problem has three primary components: a parameter-space search algorithm, a physicsmore » model, and training data. First, the parameter-space search algorithm uses a gradient-based quasi-Newton method to optimize the effective optical attenuation coefficient, μ-eff. A parameter reduction reduces the amount of optical parameter-space the algorithm must search. Second, the physics model is a simplified bioheat model for homogeneous tissue where closed-form Green's functions represent the exact solution. Third, the training data was temperature imaging data from 23 MRgLITT oncological brain ablations (980 nm wavelength) from seven different patients. Results: To three significant figures, the descriptive statistics for μ-eff were 1470 m{sup −1} mean, 1360 m{sup −1} median, 369 m{sup −1} standard deviation, 933 m{sup −1} minimum and 2260 m{sup −1} maximum. The standard deviation normalized by the mean was 25.0%. The inverse problem took <30 minutes to optimize all 23 datasets. Conclusion: As expected, the inferred average is biased by underlying physics model. However, the standard deviation normalized by the mean is smaller than literature values and indicates an increased precision in the characterization of the optical parameters needed to plan MRgLITT procedures. This investigation demonstrates the potential for the optimization and validation of more sophisticated bioheat models that incorporate the uncertainty of the data into the predictions, e.g. stochastic finite element methods.« less
NASA Technical Reports Server (NTRS)
Roberts, W. T.; Kropp, J.; Taylor, W. W. L.
1986-01-01
This paper outlines the currently planned utilization of the Space Station to perform investigations in solar physics, solar terrestrial physics, and plasma physics. The investigations and instrumentation planned for the Solar Terrestrial Observatory (STO) and its associated Space Station accommodation requirements are discussed as well as the planned placement of the STO instruments and typical operational scenarios. In the area of plasma physics, some preliminary plans for scientific investigations and for the accommodation of a plasma physics facility attached to the Space Station are outlined. These preliminary experiment concepts use the space environment around the Space Station as an unconfined plasma laboratory. In solar physics, the initial instrument complement and associated accommodation requirements of the Advanced Solar Observatory are described. The planned evolutionary development of this observatory is outlined, making use of the Space Station capabilities for servicing and instrument reconfiguration.
3D Reconstruction of Space Objects from Multi-Views by a Visible Sensor
Zhang, Haopeng; Wei, Quanmao; Jiang, Zhiguo
2017-01-01
In this paper, a novel 3D reconstruction framework is proposed to recover the 3D structural model of a space object from its multi-view images captured by a visible sensor. Given an image sequence, this framework first estimates the relative camera poses and recovers the depths of the surface points by the structure from motion (SFM) method, then the patch-based multi-view stereo (PMVS) algorithm is utilized to generate a dense 3D point cloud. To resolve the wrong matches arising from the symmetric structure and repeated textures of space objects, a new strategy is introduced, in which images are added to SFM in imaging order. Meanwhile, a refining process exploiting the structural prior knowledge that most sub-components of artificial space objects are composed of basic geometric shapes is proposed and applied to the recovered point cloud. The proposed reconstruction framework is tested on both simulated image datasets and real image datasets. Experimental results illustrate that the recovered point cloud models of space objects are accurate and have a complete coverage of the surface. Moreover, outliers and points with severe noise are effectively filtered out by the refinement, resulting in an distinct improvement of the structure and visualization of the recovered points. PMID:28737675
Salehizadeh, Seyed M. A.; Dao, Duy; Bolkhovsky, Jeffrey; Cho, Chae; Mendelson, Yitzhak; Chon, Ki H.
2015-01-01
Accurate estimation of heart rates from photoplethysmogram (PPG) signals during intense physical activity is a very challenging problem. This is because strenuous and high intensity exercise can result in severe motion artifacts in PPG signals, making accurate heart rate (HR) estimation difficult. In this study we investigated a novel technique to accurately reconstruct motion-corrupted PPG signals and HR based on time-varying spectral analysis. The algorithm is called Spectral filter algorithm for Motion Artifacts and heart rate reconstruction (SpaMA). The idea is to calculate the power spectral density of both PPG and accelerometer signals for each time shift of a windowed data segment. By comparing time-varying spectra of PPG and accelerometer data, those frequency peaks resulting from motion artifacts can be distinguished from the PPG spectrum. The SpaMA approach was applied to three different datasets and four types of activities: (1) training datasets from the 2015 IEEE Signal Process. Cup Database recorded from 12 subjects while performing treadmill exercise from 1 km/h to 15 km/h; (2) test datasets from the 2015 IEEE Signal Process. Cup Database recorded from 11 subjects while performing forearm and upper arm exercise. (3) Chon Lab dataset including 10 min recordings from 10 subjects during treadmill exercise. The ECG signals from all three datasets provided the reference HRs which were used to determine the accuracy of our SpaMA algorithm. The performance of the SpaMA approach was calculated by computing the mean absolute error between the estimated HR from the PPG and the reference HR from the ECG. The average estimation errors using our method on the first, second and third datasets are 0.89, 1.93 and 1.38 beats/min respectively, while the overall error on all 33 subjects is 1.86 beats/min and the performance on only treadmill experiment datasets (22 subjects) is 1.11 beats/min. Moreover, it was found that dynamics of heart rate variability can be accurately captured using the algorithm where the mean Pearson’s correlation coefficient between the power spectral densities of the reference and the reconstructed heart rate time series was found to be 0.98. These results show that the SpaMA method has a potential for PPG-based HR monitoring in wearable devices for fitness tracking and health monitoring during intense physical activities. PMID:26703618
Salehizadeh, Seyed M A; Dao, Duy; Bolkhovsky, Jeffrey; Cho, Chae; Mendelson, Yitzhak; Chon, Ki H
2015-12-23
Accurate estimation of heart rates from photoplethysmogram (PPG) signals during intense physical activity is a very challenging problem. This is because strenuous and high intensity exercise can result in severe motion artifacts in PPG signals, making accurate heart rate (HR) estimation difficult. In this study we investigated a novel technique to accurately reconstruct motion-corrupted PPG signals and HR based on time-varying spectral analysis. The algorithm is called Spectral filter algorithm for Motion Artifacts and heart rate reconstruction (SpaMA). The idea is to calculate the power spectral density of both PPG and accelerometer signals for each time shift of a windowed data segment. By comparing time-varying spectra of PPG and accelerometer data, those frequency peaks resulting from motion artifacts can be distinguished from the PPG spectrum. The SpaMA approach was applied to three different datasets and four types of activities: (1) training datasets from the 2015 IEEE Signal Process. Cup Database recorded from 12 subjects while performing treadmill exercise from 1 km/h to 15 km/h; (2) test datasets from the 2015 IEEE Signal Process. Cup Database recorded from 11 subjects while performing forearm and upper arm exercise. (3) Chon Lab dataset including 10 min recordings from 10 subjects during treadmill exercise. The ECG signals from all three datasets provided the reference HRs which were used to determine the accuracy of our SpaMA algorithm. The performance of the SpaMA approach was calculated by computing the mean absolute error between the estimated HR from the PPG and the reference HR from the ECG. The average estimation errors using our method on the first, second and third datasets are 0.89, 1.93 and 1.38 beats/min respectively, while the overall error on all 33 subjects is 1.86 beats/min and the performance on only treadmill experiment datasets (22 subjects) is 1.11 beats/min. Moreover, it was found that dynamics of heart rate variability can be accurately captured using the algorithm where the mean Pearson's correlation coefficient between the power spectral densities of the reference and the reconstructed heart rate time series was found to be 0.98. These results show that the SpaMA method has a potential for PPG-based HR monitoring in wearable devices for fitness tracking and health monitoring during intense physical activities.
Schure, Mark R; Davis, Joe M
2017-11-10
Orthogonality metrics (OMs) for three and higher dimensional separations are proposed as extensions of previously developed OMs, which were used to evaluate the zone utilization of two-dimensional (2D) separations. These OMs include correlation coefficients, dimensionality, information theory metrics and convex-hull metrics. In a number of these cases, lower dimensional subspace metrics exist and can be readily calculated. The metrics are used to interpret previously generated experimental data. The experimental datasets are derived from Gilar's peptide data, now modified to be three dimensional (3D), and a comprehensive 3D chromatogram from Moore and Jorgenson. The Moore and Jorgenson chromatogram, which has 25 identifiable 3D volume elements or peaks, displayed good orthogonality values over all dimensions. However, OMs based on discretization of the 3D space changed substantially with changes in binning parameters. This example highlights the importance in higher dimensions of having an abundant number of retention times as data points, especially for methods that use discretization. The Gilar data, which in a previous study produced 21 2D datasets by the pairing of 7 one-dimensional separations, was reinterpreted to produce 35 3D datasets. These datasets show a number of interesting properties, one of which is that geometric and harmonic means of lower dimensional subspace (i.e., 2D) OMs correlate well with the higher dimensional (i.e., 3D) OMs. The space utilization of the Gilar 3D datasets was ranked using OMs, with the retention times of the datasets having the largest and smallest OMs presented as graphs. A discussion concerning the orthogonality of higher dimensional techniques is given with emphasis on molecular diversity in chromatographic separations. In the information theory work, an inconsistency is found in previous studies of orthogonality using the 2D metric often identified as %O. A new choice of metric is proposed, extended to higher dimensions, characterized by mixes of ordered and random retention times, and applied to the experimental datasets. In 2D, the new metric always equals or exceeds the original one. However, results from both the original and new methods are given. Copyright © 2017 Elsevier B.V. All rights reserved.
Laboratory space physics: Investigating the physics of space plasmas in the laboratory
NASA Astrophysics Data System (ADS)
Howes, Gregory G.
2018-05-01
Laboratory experiments provide a valuable complement to explore the fundamental physics of space plasmas without the limitations inherent to spacecraft measurements. Specifically, experiments overcome the restriction that spacecraft measurements are made at only one (or a few) points in space, enable greater control of the plasma conditions and applied perturbations, can be reproducible, and are orders of magnitude less expensive than launching spacecraft. Here, I highlight key open questions about the physics of space plasmas and identify the aspects of these problems that can potentially be tackled in laboratory experiments. Several past successes in laboratory space physics provide concrete examples of how complementary experiments can contribute to our understanding of physical processes at play in the solar corona, solar wind, planetary magnetospheres, and the outer boundary of the heliosphere. I present developments on the horizon of laboratory space physics, identifying velocity space as a key new frontier, highlighting new and enhanced experimental facilities, and showcasing anticipated developments to produce improved diagnostics and innovative analysis methods. A strategy for future laboratory space physics investigations will be outlined, with explicit connections to specific fundamental plasma phenomena of interest.
JPL/USC GAIM: Using COSMIC Occultations in a Real-Time Global Ionospheric Data Assimilation Model
NASA Astrophysics Data System (ADS)
Mandrake, L.; Komjathy, A.; Wilson, B. D.; Pi, X.; Hajj, G.; Iijima, B.; Wang, C.
2006-12-01
We are in the midst of a revolution in ionospheric remote sensing driven by the illuminating powers of ground and space-based GPS receivers, new UV remote sensing satellites, and the advent of data assimilation techniques for space weather. In particular, the COSMIC 6-satellite constellation launched in April 2006. COSMIC will provide unprecedented global coverage of GPS occultations (~5000 per day), each of which yields electron density information with unprecedented ~1 km vertical resolution. Calibrated measurements of ionospheric delay (total electron content or TEC) suitable for input into assimilation models will be available in near real-time (NRT) from the COSMIC project with a latency of 30 to 120 minutes. Similarly, NRT TEC data are available from two worldwide NRT networks of ground GPS receivers (~75 5-minute sites and ~125 more hourly sites, operated by JPL and others). The combined NRT ground and space-based GPS datasets provide a new opportunity to more accurately specify the 3-dimensional ionospheric density with a time lag of only 15 to 120 minutes. With the addition of the vertically-resolved NRT occultation data, the retrieved profile shapes will model the hour-to-hour ionospheric "weather" much more accurately. The University of Southern California (USC) and the Jet Propulsion Laboratory (JPL) have jointly developed a real-time Global Assimilative Ionospheric Model (GAIM) to monitor space weather, study storm effects, and provide ionospheric calibration for DoD customers and NASA flight projects. JPL/USC GAIM is a physics- based 3D data assimilation model that uses both 4DVAR and Kalman filter techniques to solve for the ion & electron density state and key drivers such as equatorial electrodynamics, neutral winds, and production terms. Daily (delayed) GAIM runs can accept as input ground GPS TEC data from 1000+ sites, occultation links from CHAMP, SAC-C, and the COSMIC constellation, UV limb and nadir scans from the TIMED and DMSP satellites, and in situ data from a variety of satellites (DMSP and C/NOFS). RTGAIM ingests multiple data sources in real time, updates the 3D electron density grid every 5 minutes, and solves for improved drivers every 1-2 hours. Since our forward physics model and the adjoint model were expressly designed for data assimilation and computational efficiency, all of this can be accomplished on a single dual-processor Unix workstation. Customers are currently evaluating the accuracy of JPL/USC GAIM "nowcasts" for ray tracing applications and trans-ionospheric path delay calibration. In the talk, we will discuss the expected impact of COSMIC occultation data; show first results for ingest of COSMIC data using the GAIM Kalman filter; present validation of the GAIM electron density grid by comparisons to Abel profiles and independent datasets; describe recent improvements to the JPL/USC GAIM model; and describe our plans for NRT ingest of COSMIC data into RTGAIM.
Book Review: Dolores Knipp’s Understanding Space Weather and the Physics Behind It
NASA Astrophysics Data System (ADS)
Moldwin, Mark
2012-08-01
Delores Knipp's textbook Understanding Space Weather and the Physics Behind It provides a comprehensive resource for space physicists teaching in a variety of academic departments to introduce space weather to advanced undergraduates. The book benefits from Knipp's extensive experience teaching introductory and advanced undergraduate physics courses at the U.S. Air Force Academy. The fundamental physics concepts are clearly explained and are connected directly to the space physics concepts being discussed. To expand upon the relevant basic physics, current research areas and new observations are highlighted, with many of the chapters including contributions from a number of leading space physicists.
NASA Technical Reports Server (NTRS)
Platnick, Steven; Meyer, Kerry G.; King, Michael D.; Wind, Galina; Amarasinghe, Nandana; Marchant, Benjamin G.; Arnold, G. Thomas; Zhang, Zhibo; Hubanks, Paul A.; Holz, Robert E.;
2016-01-01
The MODIS Level-2 cloud product (Earth Science Data Set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud-top properties (day and night pressure, temperature, and height) and cloud optical properties(optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases daytime only). Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively. Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product. Notable C6 optical and microphysical algorithm changes include: (i) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach, (ii) improvement in the skill of the shortwave-derived cloud thermodynamic phase, (iii) separate cloud effective radius retrieval datasets for each spectral combination used in previous collections, (iv) separate retrievals for partly cloudy pixels and those associated with cloud edges, (v) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space, and (vi) enhanced pixel-level retrieval uncertainty calculations.The C6 algorithm changes collectively can result in significant changes relative to C5,though the magnitude depends on the dataset and the pixels retrieval location in the cloud parameter space. Example Level-2 granule and Level-3 gridded dataset differences between the two collections are shown. While the emphasis is on the suite of cloud opticalproperty datasets, other MODIS cloud datasets are discussed when relevant.
Platnick, Steven; Meyer, Kerry G; King, Michael D; Wind, Galina; Amarasinghe, Nandana; Marchant, Benjamin; Arnold, G Thomas; Zhang, Zhibo; Hubanks, Paul A; Holz, Robert E; Yang, Ping; Ridgway, William L; Riedi, Jérôme
2017-01-01
The MODIS Level-2 cloud product (Earth Science Data Set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud-top properties (day and night pressure, temperature, and height) and cloud optical properties (optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases-daytime only). Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively. Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product. Notable C6 optical and microphysical algorithm changes include: (i) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach, (ii) improvement in the skill of the shortwave-derived cloud thermodynamic phase, (iii) separate cloud effective radius retrieval datasets for each spectral combination used in previous collections, (iv) separate retrievals for partly cloudy pixels and those associated with cloud edges, (v) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space, and (vi) enhanced pixel-level retrieval uncertainty calculations. The C6 algorithm changes collectively can result in significant changes relative to C5, though the magnitude depends on the dataset and the pixel's retrieval location in the cloud parameter space. Example Level-2 granule and Level-3 gridded dataset differences between the two collections are shown. While the emphasis is on the suite of cloud optical property datasets, other MODIS cloud datasets are discussed when relevant.
Platnick, Steven; Meyer, Kerry G.; King, Michael D.; Wind, Galina; Amarasinghe, Nandana; Marchant, Benjamin; Arnold, G. Thomas; Zhang, Zhibo; Hubanks, Paul A.; Holz, Robert E.; Yang, Ping; Ridgway, William L.; Riedi, Jérôme
2018-01-01
The MODIS Level-2 cloud product (Earth Science Data Set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud-top properties (day and night pressure, temperature, and height) and cloud optical properties (optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases–daytime only). Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively. Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product. Notable C6 optical and microphysical algorithm changes include: (i) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach, (ii) improvement in the skill of the shortwave-derived cloud thermodynamic phase, (iii) separate cloud effective radius retrieval datasets for each spectral combination used in previous collections, (iv) separate retrievals for partly cloudy pixels and those associated with cloud edges, (v) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space, and (vi) enhanced pixel-level retrieval uncertainty calculations. The C6 algorithm changes collectively can result in significant changes relative to C5, though the magnitude depends on the dataset and the pixel’s retrieval location in the cloud parameter space. Example Level-2 granule and Level-3 gridded dataset differences between the two collections are shown. While the emphasis is on the suite of cloud optical property datasets, other MODIS cloud datasets are discussed when relevant. PMID:29657349
NASA Astrophysics Data System (ADS)
Dungan, J. L.; Wang, W.; Hashimoto, H.; Michaelis, A.; Milesi, C.; Ichii, K.; Nemani, R. R.
2009-12-01
In support of NACP, we are conducting an ensemble modeling exercise using the Terrestrial Observation and Prediction System (TOPS) to evaluate uncertainties among ecosystem models, satellite datasets, and in-situ measurements. The models used in the experiment include public-domain versions of Biome-BGC, LPJ, TOPS-BGC, and CASA, driven by a consistent set of climate fields for North America at 8km resolution and daily/monthly time steps over the period of 1982-2006. The reference datasets include MODIS Gross Primary Production (GPP) and Net Primary Production (NPP) products, Fluxnet measurements, and other observational data. The simulation results and the reference datasets are consistently processed and systematically compared in the climate (temperature-precipitation) space; in particular, an alternative to the Taylor diagram is developed to facilitate model-data intercomparisons in multi-dimensional space. The key findings of this study indicate that: the simulated GPP/NPP fluxes are in general agreement with observations over forests, but are biased low (underestimated) over non-forest types; large uncertainties of biomass and soil carbon stocks are found among the models (and reference datasets), often induced by seemingly “small” differences in model parameters and implementation details; the simulated Net Ecosystem Production (NEP) mainly responds to non-respiratory disturbances (e.g. fire) in the models and therefore is difficult to compare with flux data; and the seasonality and interannual variability of NEP varies significantly among models and reference datasets. These findings highlight the problem inherent in relying on only one modeling approach to map surface carbon fluxes and emphasize the pressing necessity of expanded and enhanced monitoring systems to narrow critical structural and parametrical uncertainties among ecosystem models.
Version 2 of the IASI NH3 neural network retrieval algorithm: near-real-time and reanalysed datasets
NASA Astrophysics Data System (ADS)
Van Damme, Martin; Whitburn, Simon; Clarisse, Lieven; Clerbaux, Cathy; Hurtmans, Daniel; Coheur, Pierre-François
2017-12-01
Recently, Whitburn et al.(2016) presented a neural-network-based algorithm for retrieving atmospheric ammonia (NH3) columns from Infrared Atmospheric Sounding Interferometer (IASI) satellite observations. In the past year, several improvements have been introduced, and the resulting new baseline version, Artificial Neural Network for IASI (ANNI)-NH3-v2.1, is documented here. One of the main changes to the algorithm is that separate neural networks were trained for land and sea observations, resulting in a better training performance for both groups. By reducing and transforming the input parameter space, performance is now also better for observations associated with favourable sounding conditions (i.e. enhanced thermal contrasts). Other changes relate to the introduction of a bias correction over land and sea and the treatment of the satellite zenith angle. In addition to these algorithmic changes, new recommendations for post-filtering the data and for averaging data in time or space are formulated. We also introduce a second dataset (ANNI-NH3-v2.1R-I) which relies on ERA-Interim ECMWF meteorological input data, along with surface temperature retrieved from a dedicated network, rather than the operationally provided Eumetsat IASI Level 2 (L2) data used for the standard near-real-time version. The need for such a dataset emerged after a series of sharp discontinuities were identified in the NH3 time series, which could be traced back to incremental changes in the IASI L2 algorithms for temperature and clouds. The reanalysed dataset is coherent in time and can therefore be used to study trends. Furthermore, both datasets agree reasonably well in the mean on recent data, after the date when the IASI meteorological L2 version 6 became operational (30 September 2014).
Charging of Space Debris and Their Dynamical Consequences
2016-01-08
field of plasmas and space physics . 15. SUBJECT TERMS Space Plasma Physics , Space Debris 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT...opens up potential new areas of fundamental and applied research in the field of plasmas and space physics ...object in a plasma”, accepted for publication in Physics of Plasmas. (attached as Annexure III) For details on (iv) please refer to the
Physical environment virtualization for human activities recognition
NASA Astrophysics Data System (ADS)
Poshtkar, Azin; Elangovan, Vinayak; Shirkhodaie, Amir; Chan, Alex; Hu, Shuowen
2015-05-01
Human activity recognition research relies heavily on extensive datasets to verify and validate performance of activity recognition algorithms. However, obtaining real datasets are expensive and highly time consuming. A physics-based virtual simulation can accelerate the development of context based human activity recognition algorithms and techniques by generating relevant training and testing videos simulating diverse operational scenarios. In this paper, we discuss in detail the requisite capabilities of a virtual environment to aid as a test bed for evaluating and enhancing activity recognition algorithms. To demonstrate the numerous advantages of virtual environment development, a newly developed virtual environment simulation modeling (VESM) environment is presented here to generate calibrated multisource imagery datasets suitable for development and testing of recognition algorithms for context-based human activities. The VESM environment serves as a versatile test bed to generate a vast amount of realistic data for training and testing of sensor processing algorithms. To demonstrate the effectiveness of VESM environment, we present various simulated scenarios and processed results to infer proper semantic annotations from the high fidelity imagery data for human-vehicle activity recognition under different operational contexts.
NASA Technical Reports Server (NTRS)
Beheshti, Afshin
2018-01-01
GeneLab as a general tool for the scientific community; Utilizing GeneLab datasets to generate hypothesis and determining potential biological targets against health risks due to long-term space missions; How can OpenTarget be used to discover novel drugs to test as countermeasures that can be utilized by astronauts.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.
2012-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-located arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in lat/lon bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Hua, H.
2012-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Hand, foot and mouth disease: spatiotemporal transmission and climate.
Wang, Jin-feng; Guo, Yan-Sha; Christakos, George; Yang, Wei-Zhong; Liao, Yi-Lan; Li, Zhong-Jie; Li, Xiao-Zhou; Lai, Sheng-Jie; Chen, Hong-Yan
2011-04-05
The Hand-Foot-Mouth Disease (HFMD) is the most common infectious disease in China, its total incidence being around 500,000~1,000,000 cases per year. The composite space-time disease variation is the result of underlining attribute mechanisms that could provide clues about the physiologic and demographic determinants of disease transmission and also guide the appropriate allocation of medical resources to control the disease. HFMD cases were aggregated into 1456 counties and during a period of 11 months. Suspected climate attributes to HFMD were recorded monthly at 674 stations throughout the country and subsequently interpolated within 1456 × 11 cells across space-time (same as the number of HFMD cases) using the Bayesian Maximum Entropy (BME) method while taking into consideration the relevant uncertainty sources. The dimensionalities of the two datasets together with the integrated dataset combining the two previous ones are very high when the topologies of the space-time relationships between cells are taken into account. Using a self-organizing map (SOM) algorithm the dataset dimensionality was effectively reduced into 2 dimensions, while the spatiotemporal attribute structure was maintained. 16 types of spatiotemporal HFMD transmission were identified, and 3-4 high spatial incidence clusters of the HFMD types were found throughout China, which are basically within the scope of the monthly climate (precipitation) types. HFMD propagates in a composite space-time domain rather than showing a purely spatial and purely temporal variation. There is a clear relationship between HFMD occurrence and climate. HFMD cases are geographically clustered and closely linked to the monthly precipitation types of the region. The occurrence of the former depends on the later.
Optimizing tertiary storage organization and access for spatio-temporal datasets
NASA Technical Reports Server (NTRS)
Chen, Ling Tony; Rotem, Doron; Shoshani, Arie; Drach, Bob; Louis, Steve; Keating, Meridith
1994-01-01
We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and interactive visualization. We have developed data partitioning techniques that partition datasets into 'clusters' based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.
NASA Astrophysics Data System (ADS)
Vandegriff, J. D.; King, T. A.; Weigel, R. S.; Faden, J.; Roberts, D. A.; Harris, B. T.; Lal, N.; Boardsen, S. A.; Candey, R. M.; Lindholm, D. M.
2017-12-01
We present the Heliophysics Application Programmers Interface (HAPI), a new interface specification that both large and small data centers can use to expose time series data holdings in a standard way. HAPI was inspired by the similarity of existing services at many Heliophysics data centers, and these data centers have collaborated to define a single interface that captures best practices and represents what everyone considers the essential, lowest common denominator for basic data access. This low level access can serve as infrastructure to support greatly enhanced interoperability among analysis tools, with the goal being simplified analysis and comparison of data from any instrument, model, mission or data center. The three main services a HAPI server must perform are 1. list a catalog of datasets (one unique ID per dataset), 2. describe the content of one dataset (JSON metadata), and 3. retrieve numerical content for one dataset (stream the actual data). HAPI defines both the format of the query to the server, and the response from the server. The metadata is lightweight, focusing on use rather than discovery, and the data format is a streaming one, with Comma Separated Values (CSV) being required and binary or JSON streaming being optional. The HAPI specification is available at GitHub, where projects are also underway to develop reference implementation servers that data providers can adapt and use at their own sites. Also in the works are data analysis clients in multiple languages (IDL, Python, Matlab, and Java). Institutions which have agreed to adopt HAPI include Goddard (CDAWeb for data and CCMC for models), LASP at the University of Colorado Boulder, the Particles and Plasma Interactions node of the Planetary Data System (PPI/PDS) at UCLA, the Plasma Wave Group at the University of Iowa, the Space Sector at the Johns Hopkins Applied Physics Lab (APL), and the tsds.org site maintained at George Mason University. Over the next year, the adoption of a uniform way to access time series data is expected to significantly enhance interoperability within the Heliophysics data environment. https://github.com/hapi-server/data-specification
Deiana, Antonio; Giansanti, Andrea
2010-04-21
Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.
2010-01-01
Background Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. Results In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Conclusions Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated. PMID:20409339
Maitland, Clover; Stratton, Gareth; Foster, Sarah; Braham, Rebecca; Rosenberg, Michael
2014-12-24
Recent changes in home physical environments, such as decreasing outdoor space and increasing electronic media, may negatively affect health by facilitating sedentariness and reducing physical activity. As children spend much of their time at home they are particularly vulnerable. This study qualitatively explored family perceptions of physical environmental influences on sedentary behaviour and physical activity within the home space. Home based interviews were conducted with 28 families with children aged 9-13 years (total n = 74 individuals), living in Perth, Australia. Families were stratified by socioeconomic status and selected to provide variation in housing. Qualitative methods included a family interview, observation and home tour where families guided the researcher through their home, enabling discussion while in the physical home space. Audio recordings were transcribed verbatim and thematically analysed. Emergent themes related to children's sedentariness and physical activity included overall size, space and design of the home; allocation of home space; equipment within the home space; perceived safety of the home space; and the changing nature of the home space. Families reported that children's activity options were limited when houses and yards were small. In larger homes, multiple indoor living rooms usually housed additional sedentary entertainment options, although parents reported that open plan home layouts could facilitate monitoring of children's electronic media use. Most families reported changing the allocation and contents of their home space in response to changing priorities and circumstances. The physical home environment can enhance or limit opportunities for children's sedentary behaviour and physical activity. However, the home space is a dynamic ecological setting that is amenable to change and is largely shaped by the family living within it, thus differentiating it from other settings. While size and space were considered important, how families prioritise the use of their home space and overcome the challenges posed by the physical environment may be of equal or greater importance in establishing supportive home environments. Further research is required to tease out how physical, social and individual factors interact within the family home space to influence children's sedentary behaviour and physical activity at home.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong
2015-09-01
Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
NASA Astrophysics Data System (ADS)
McDonald, S. E.; Emmert, J. T.; Krall, J.; Mannucci, A. J.; Vergados, P.
2017-12-01
To understand how and why the distribution of geospace plasma in the ionosphere/plasmasphere is evolving over multi-decadal time scales in response to solar, heliospheric and atmospheric forcing, it is critically important to have long-term, stable datasets. In this study, we use a newly constructed dataset of GPS-based total electron content (TEC) developed by JPL. The JPL Global Ionosphere Mapping (GIM) algorithm was used to generate a 35-station dataset spanning two solar minimum periods (1993-2014). We also use altimeter-derived TEC measurements from TOPEX-Poseidon and Jason-1 to construct a continuous dataset for the 1995-2014 time period. Both longterm datasets are compared to each other to study interminimum changes in the global TEC (during 1995-1995 and 2008-2009). We use the SAMI3 physics-based model of the ionosphere to compare the simulations of 1995-2014 with the JPL TEC and TOPEX/Jason-1 datasets. To drive SAMI3, we use the Naval Research Laboratory Solar Spectral Irradiance (NRLSSI) model to specify the EUV irradiances, and NRLMSIS to specify the thermosphere. We adjust the EUV irradiances and thermospheric constituents to match the TEC datasets and draw conclusions regarding sources of the differences between the two solar minimum periods.
Parallel Visualization of Large-Scale Aerodynamics Calculations: A Case Study on the Cray T3E
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu; Crockett, Thomas W.
1999-01-01
This paper reports the performance of a parallel volume rendering algorithm for visualizing a large-scale, unstructured-grid dataset produced by a three-dimensional aerodynamics simulation. This dataset, containing over 18 million tetrahedra, allows us to extend our performance results to a problem which is more than 30 times larger than the one we examined previously. This high resolution dataset also allows us to see fine, three-dimensional features in the flow field. All our tests were performed on the Silicon Graphics Inc. (SGI)/Cray T3E operated by NASA's Goddard Space Flight Center. Using 511 processors, a rendering rate of almost 9 million tetrahedra/second was achieved with a parallel overhead of 26%.
GODIVA2: interactive visualization of environmental data on the Web.
Blower, J D; Haines, K; Santokhee, A; Liu, C L
2009-03-13
GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.
NASA GeneLab Concept of Operations
NASA Technical Reports Server (NTRS)
Thompson, Terri; Gibbs, Kristina; Rask, Jon; Coughlan, Joseph; Smith, Jeffrey
2014-01-01
NASA's GeneLab aims to greatly increase the number of scientists that are using data from space biology investigations on board ISS, emphasizing a systems biology approach to the science. When completed, GeneLab will provide the integrated software and hardware infrastructure, analytical tools and reference datasets for an assortment of model organisms. GeneLab will also provide an environment for scientists to collaborate thereby increasing the possibility for data to be reused for future experimentation. To maximize the value of data from life science experiments performed in space and to make the most advantageous use of the remaining ISS research window, GeneLab will apply an open access approach to conducting spaceflight experiments by generating, and sharing the datasets derived from these biological studies in space.Onboard the ISS, a wide variety of model organisms will be studied and returned to Earth for analysis. Laboratories on the ground will analyze these samples and provide genomic, transcriptomic, metabolomic and proteomic data. Upon receipt, NASA will conduct data quality control tasks and format raw data returned from the omics centers into standardized, annotated information sets that can be readily searched and linked to spaceflight metadata. Once prepared, the biological datasets, as well as any analysis completed, will be made public through the GeneLab Space Bioinformatics System webb as edportal. These efforts will support a collaborative research environment for spaceflight studies that will closely resemble environments created by the Department of Energy (DOE), National Center for Biotechnology Information (NCBI), and other institutions in additional areas of study, such as cancer and environmental biology. The results will allow for comparative analyses that will help scientists around the world take a major leap forward in understanding the effect of microgravity, radiation, and other aspects of the space environment on model organisms. These efforts will speed the process of scientific sharing, iteration, and discovery.
Computational Physics for Space Flight Applications
NASA Technical Reports Server (NTRS)
Reed, Robert A.
2004-01-01
This paper presents viewgraphs on computational physics for space flight applications. The topics include: 1) Introduction to space radiation effects in microelectronics; 2) Using applied physics to help NASA meet mission objectives; 3) Example of applied computational physics; and 4) Future directions in applied computational physics.
Interoperable Solar Data and Metadata via LISIRD 3
NASA Astrophysics Data System (ADS)
Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.
2015-12-01
LISIRD 3 is a major upgrade of the LASP Interactive Solar Irradiance Data Center (LISIRD), which serves several dozen space based solar irradiance and related data products to the public. Through interactive plots, LISIRD 3 provides data browsing supported by data subsetting and aggregation. Incorporating a semantically enabled metadata repository, LISIRD 3 users see current, vetted, consistent information about the datasets offered. Users can now also search for datasets based on metadata fields such as dataset type and/or spectral or temporal range. This semantic database enables metadata browsing, so users can discover the relationships between datasets, instruments, spacecraft, mission and PI. The database also enables creation and publication of metadata records in a variety of formats, such as SPASE or ISO, making these datasets more discoverable. The database also enables the possibility of a public SPARQL endpoint, making the metadata browsable in an automated fashion. LISIRD 3's data access middleware, LaTiS, provides dynamic, on demand reformatting of data and timestamps, subsetting and aggregation, and other server side functionality via a RESTful OPeNDAP compliant API, enabling interoperability between LASP datasets and many common tools. LISIRD 3's templated front end design, coupled with the uniform data interface offered by LaTiS, allows easy integration of new datasets. Consequently the number and variety of datasets offered by LISIRD has grown to encompass several dozen, with many more to come. This poster will discuss design and implementation of LISIRD 3, including tools used, capabilities enabled, and issues encountered.
Segmentation-less Digital Rock Physics
NASA Astrophysics Data System (ADS)
Tisato, N.; Ikeda, K.; Goldfarb, E. J.; Spikes, K. T.
2017-12-01
In the last decade, Digital Rock Physics (DRP) has become an avenue to investigate physical and mechanical properties of geomaterials. DRP offers the advantage of simulating laboratory experiments on numerical samples that are obtained from analytical methods. Potentially, DRP could allow sparing part of the time and resources that are allocated to perform complicated laboratory tests. Like classic laboratory tests, the goal of DRP is to estimate accurately physical properties of rocks like hydraulic permeability or elastic moduli. Nevertheless, the physical properties of samples imaged using micro-computed tomography (μCT) are estimated through segmentation of the μCT dataset. Segmentation proves to be a challenging and arbitrary procedure that typically leads to inaccurate estimates of physical properties. Here we present a novel technique to extract physical properties from a μCT dataset without the use of segmentation. We show examples in which we use segmentation-less method to simulate elastic wave propagation and pressure wave diffusion to estimate elastic properties and permeability, respectively. The proposed method takes advantage of effective medium theories and uses the density and the porosity that are measured in the laboratory to constrain the results. We discuss the results and highlight that segmentation-less DRP is more accurate than segmentation based DRP approaches and theoretical modeling for the studied rock. In conclusion, the segmentation-less approach here presented seems to be a promising method to improve accuracy and to ease the overall workflow of DRP.
Fault Tolerant Frequent Pattern Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan
FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less
Recognition of skin melanoma through dermoscopic image analysis
NASA Astrophysics Data System (ADS)
Gómez, Catalina; Herrera, Diana Sofia
2017-11-01
Melanoma skin cancer diagnosis can be challenging due to the similarities of the early stage symptoms with regular moles. Standardized visual parameters can be determined and characterized to suspect a melanoma cancer type. The automation of this diagnosis could have an impact in the medical field by providing a tool to support the specialists with high accuracy. The objective of this study is to develop an algorithm trained to distinguish a highly probable melanoma from a non-dangerous mole by the segmentation and classification of dermoscopic mole images. We evaluate our approach on the dataset provided by the International Skin Imaging Collaboration used in the International Challenge Skin Lesion Analysis Towards Melanoma Detection. For the segmentation task, we apply a preprocessing algorithm and use Otsu's thresholding in the best performing color space; the average Jaccard Index in the test dataset is 70.05%. For the subsequent classification stage, we use joint histograms in the YCbCr color space, a RBF Gaussian SVM trained with five features concerning circularity and irregularity of the segmented lesion, and the Gray Level Co-occurrence matrix features for texture analysis. These features are combined to obtain an Average Classification Accuracy of 63.3% in the test dataset.
Human action classification using procrustes shape theory
NASA Astrophysics Data System (ADS)
Cho, Wanhyun; Kim, Sangkyoon; Park, Soonyoung; Lee, Myungeun
2015-02-01
In this paper, we propose new method that can classify a human action using Procrustes shape theory. First, we extract a pre-shape configuration vector of landmarks from each frame of an image sequence representing an arbitrary human action, and then we have derived the Procrustes fit vector for pre-shape configuration vector. Second, we extract a set of pre-shape vectors from tanning sample stored at database, and we compute a Procrustes mean shape vector for these preshape vectors. Third, we extract a sequence of the pre-shape vectors from input video, and we project this sequence of pre-shape vectors on the tangent space with respect to the pole taking as a sequence of mean shape vectors corresponding with a target video. And we calculate the Procrustes distance between two sequences of the projection pre-shape vectors on the tangent space and the mean shape vectors. Finally, we classify the input video into the human action class with minimum Procrustes distance. We assess a performance of the proposed method using one public dataset, namely Weizmann human action dataset. Experimental results reveal that the proposed method performs very good on this dataset.
Karnik, Rahul; Beer, Michael A.
2015-01-01
The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884
Karnik, Rahul; Beer, Michael A
2015-01-01
The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
Human mobility in space from three modes of public transportation
NASA Astrophysics Data System (ADS)
Jiang, Shixiong; Guan, Wei; Zhang, Wenyi; Chen, Xu; Yang, Liu
2017-10-01
The human mobility patterns have drew much attention from researchers for decades, considering about its importance for urban planning and traffic management. In this study, the taxi GPS trajectories, smart card transaction data of subway and bus from Beijing are utilized to model human mobility in space. The original datasets are cleaned and processed to attain the displacement of each trip according to the origin and destination locations. Then, the Akaike information criterion is adopted to screen out the best fitting distribution for each mode from candidate ones. The results indicate that displacements of taxi trips follow the exponential distribution. Besides, the exponential distribution also fits displacements of bus trips well. However, their exponents are significantly different. Displacements of subway trips show great specialties and can be well fitted by the gamma distribution. It is obvious that human mobility of each mode is different. To explore the overall human mobility, the three datasets are mixed up to form a fusion dataset according to the annual ridership proportions. Finally, the fusion displacements follow the power-law distribution with an exponential cutoff. It is innovative to combine different transportation modes to model human mobility in the city.
Exploring relationship between human mobility and social ties: Physical distance is not dead
NASA Astrophysics Data System (ADS)
Jin, Bo; Liao, Binbing; Yuan, Ning; Wang, Wenjun
2015-06-01
Partly due to the difficulty of the access to a worldwide dataset that simultaneously captures the location history and social networks, our understanding of the relationship between human mobility and the social ties has been limited. However, this topic is essential for a deeper study from human dynamics and social networks aspects. In this paper, we examine the location history data and social networks data of 712 email users and 399 offline events users from a map-editing based social network website. Based on these data, we expand all our experiment both from individual aspect and community aspect. We find that the physical distance is still the most influential factor to social ties among the nine representative human mobility features extracted from our GPS trajectory dataset, although Internet revolution has made long-distance communication dramatically faster, easier and cheaper than ever before, and in turn, partly expand the physical scope of social networks. Furthermore, we find that to a certain extent, the proximity of South-North direction is more influential than East-West direction to social ties. To the our best of our knowledge, this difference between South-North and East-West is the first time to be raised and quantitatively supported by a large dataset. We believe our findings on the interplay of human mobility and social ties offer a new perspective to this field of study.
Ren, Kai; Xu, Leiqing
2017-10-01
The data related in this paper are related to "Environmental-behavior studies of sustainable construction of the third place - based on outdoor environment-behavior cross-feed symbiotic analysis and verification of selective activities" (Ren, 2017) [1]. The dataset was from a field sub-time extended investigation to children of Hohhot West Inner Mongolia Electric Power Community Residential Area in Inner Mongolia of China that belongs to cold region of ID area according to Chinese design code for buildings. This filed data provided descriptive statistics about outdoor time, behavior scale specificity, age exclusivity and self-centeredness for children in different ages (babies, preschool children, school age children). This data provided five measurement elements of child-friendly space and their weight ratio. The field data set is made publicly available to enable critical or extended analyzes.
A modified active appearance model based on an adaptive artificial bee colony.
Abdulameer, Mohammed Hasan; Sheikh Abdullah, Siti Norul Huda; Othman, Zulaiha Ali
2014-01-01
Active appearance model (AAM) is one of the most popular model-based approaches that have been extensively used to extract features by highly accurate modeling of human faces under various physical and environmental circumstances. However, in such active appearance model, fitting the model with original image is a challenging task. State of the art shows that optimization method is applicable to resolve this problem. However, another common problem is applying optimization. Hence, in this paper we propose an AAM based face recognition technique, which is capable of resolving the fitting problem of AAM by introducing a new adaptive ABC algorithm. The adaptation increases the efficiency of fitting as against the conventional ABC algorithm. We have used three datasets: CASIA dataset, property 2.5D face dataset, and UBIRIS v1 images dataset in our experiments. The results have revealed that the proposed face recognition technique has performed effectively, in terms of accuracy of face recognition.
Long-term dataset on aquatic responses to concurrent climate change and recovery from acidification
NASA Astrophysics Data System (ADS)
Leach, Taylor H.; Winslow, Luke A.; Acker, Frank W.; Bloomfield, Jay A.; Boylen, Charles W.; Bukaveckas, Paul A.; Charles, Donald F.; Daniels, Robert A.; Driscoll, Charles T.; Eichler, Lawrence W.; Farrell, Jeremy L.; Funk, Clara S.; Goodrich, Christine A.; Michelena, Toby M.; Nierzwicki-Bauer, Sandra A.; Roy, Karen M.; Shaw, William H.; Sutherland, James W.; Swinton, Mark W.; Winkler, David A.; Rose, Kevin C.
2018-04-01
Concurrent regional and global environmental changes are affecting freshwater ecosystems. Decadal-scale data on lake ecosystems that can describe processes affected by these changes are important as multiple stressors often interact to alter the trajectory of key ecological phenomena in complex ways. Due to the practical challenges associated with long-term data collections, the majority of existing long-term data sets focus on only a small number of lakes or few response variables. Here we present physical, chemical, and biological data from 28 lakes in the Adirondack Mountains of northern New York State. These data span the period from 1994-2012 and harmonize multiple open and as-yet unpublished data sources. The dataset creation is reproducible and transparent; R code and all original files used to create the dataset are provided in an appendix. This dataset will be useful for examining ecological change in lakes undergoing multiple stressors.
[Reflections on physical spaces and mental spaces].
Chen, Hung-Yi
2013-08-01
This article analyzes certain reciprocal impacts from physical spaces to mental spaces. If the epistemological construction and the spatial imagination from the subject of cogito or the social collectivities are able to influence the construction and creation of the physical spaces of that subject, then the context of that physical space may also affect the cognitive or social subject's mental cognition. This article applies the methodology of iconology from art history (E. Panofsky) and sociology (P. Bourdieu) to explore correlations between the creation of imaginative and physical spaces from the collective consciousness and mental cognition. The author uses Gilles Deleuses's opinion regarding the 17th-century Baroque style and contemporary social collective symptoms as an explanation. From these theoretical studies, the author analyzes the differences of spatial epistemology generated by Taiwan's special geological text. Finally, the author applies Michel Foucault's studies on spatial context to assess the possible application of this thesis of reciprocal impacts from mental spaces to physical spaces in a nursing context.
Koohsari, Mohammad Javad; Mavoa, Suzanne; Villanueva, Karen; Sugiyama, Takemi; Badland, Hannah; Kaczynski, Andrew T; Owen, Neville; Giles-Corti, Billie
2015-05-01
Public open spaces such as parks and green spaces are key built environment elements within neighbourhoods for encouraging a variety of physical activity behaviours. Over the past decade, there has been a burgeoning number of active living research studies examining the influence of public open space on physical activity. However, the evidence shows mixed associations between different aspects of public open space (e.g., proximity, size, quality) and physical activity. These inconsistencies hinder the development of specific evidence-based guidelines for urban designers and policy-makers for (re)designing public open space to encourage physical activity. This paper aims to move this research agenda forward, by identifying key conceptual and methodological issues that may contribute to inconsistencies in research examining relations between public open space and physical activity. Copyright © 2015 Elsevier Ltd. All rights reserved.
AVNM: A Voting based Novel Mathematical Rule for Image Classification.
Vidyarthi, Ankit; Mittal, Namita
2016-12-01
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Standardization Process for Space Radiation Models Used for Space System Design
NASA Technical Reports Server (NTRS)
Barth, Janet; Daly, Eamonn; Brautigam, Donald
2005-01-01
The space system design community has three concerns related to models of the radiation belts and plasma: 1) AP-8 and AE-8 models are not adequate for modern applications; 2) Data that have become available since the creation of AP-8 and AE-8 are not being fully exploited for modeling purposes; 3) When new models are produced, there is no authorizing organization identified to evaluate the models or their datasets for accuracy and robustness. This viewgraph presentation provided an overview of the roadmap adopted by the Working Group Meeting on New Standard Radiation Belt and Space Plasma Models.
Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun
2014-01-01
Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation.
Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun
2014-01-01
Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. PMID:25405760
Trace Gas/Aerosol Interactions and GMI Modeling Support
NASA Technical Reports Server (NTRS)
Penner, Joyce E.; Liu, Xiaohong; Das, Bigyani; Bergmann, Dan; Rodriquez, Jose M.; Strahan, Susan; Wang, Minghuai; Feng, Yan
2005-01-01
Current global aerosol models use different physical and chemical schemes and parameters, different meteorological fields, and often different emission sources. Since the physical and chemical parameterization schemes are often tuned to obtain results that are consistent with observations, it is difficult to assess the true uncertainty due to meteorology alone. Under the framework of the NASA global modeling initiative (GMI), the differences and uncertainties in aerosol simulations (for sulfate, organic carbon, black carbon, dust and sea salt) solely due to different meteorological fields are analyzed and quantified. Three meteorological datasets available from the NASA DAO GCM, the GISS-II' GCM, and the NASA finite volume GCM (FVGCM) are used to drive the same aerosol model. The global sulfate and mineral dust burdens with FVGCM fields are 40% and 20% less than those with DAO and GISS fields, respectively due to its heavier rainfall. Meanwhile, the sea salt burden predicted with FVGCM fields is 56% and 43% higher than those with DAO and GISS, respectively, due to its stronger convection especially over the Southern Hemispheric Ocean. Sulfate concentrations at the surface in the Northern Hemisphere extratropics and in the middle to upper troposphere differ by more than a factor of 3 between the three meteorological datasets. The agreement between model calculated and observed aerosol concentrations in the industrial regions (e.g., North America and Europe) is quite similar for all three meteorological datasets. Away from the source regions, however, the comparisons with observations differ greatly for DAO, FVGCM and GISS, and the performance of the model using different datasets varies largely depending on sites and species. Global annual average aerosol optical depth at 550 nm is 0.120-0.131 for the three meteorological datasets.
Spatial contexts for temporal variability in alpine vegetation under ongoing climate change
Fagre, Daniel B.; ,; George P. Malanson,
2013-01-01
A framework to monitor mountain summit vegetation (The Global Observation Research Initiative in Alpine Environments, GLORIA) was initiated in 1997. GLORIA results should be taken within a regional context of the spatial variability of alpine tundra. Changes observed at GLORIA sites in Glacier National Park, Montana, USA are quantified within the context of the range of variability observed in alpine tundra across much of western North America. Dissimilarity is calculated and used in nonmetric multidimensional scaling for repeated measures of vascular species cover at 14 GLORIA sites with 525 nearby sites and with 436 sites in western North America. The lengths of the trajectories of the GLORIA sites in ordination space are compared to the dimensions of the space created by the larger datasets. The absolute amount of change on the GLORIA summits over 5 years is high, but the degree of change is small relative to the geographical context. The GLORIA sites are on the margin of the ordination volumes with the large datasets. The GLORIA summit vegetation appears to be specialized, arguing for the intrinsic value of early observed change in limited niche space.
A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.
Peikari, Mohammad; Salama, Sherine; Nofech-Mozes, Sharon; Martel, Anne L
2018-05-08
Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.
PolyCheck: Dynamic Verification of Iteration Space Transformations on Affine Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bao, Wenlei; Krishnamoorthy, Sriram; Pouchet, Louis-noel
2016-01-11
High-level compiler transformations, especially loop transformations, are widely recognized as critical optimizations to restructure programs to improve data locality and expose parallelism. Guaranteeing the correctness of program transformations is essential, and to date three main approaches have been developed: proof of equivalence of affine programs, matching the execution traces of programs, and checking bit-by-bit equivalence of the outputs of the programs. Each technique suffers from limitations in either the kind of transformations supported, space complexity, or the sensitivity to the testing dataset. In this paper, we take a novel approach addressing all three limitations to provide an automatic bug checkermore » to verify any iteration reordering transformations on affine programs, including non-affine transformations, with space consumption proportional to the original program data, and robust to arbitrary datasets of a given size. We achieve this by exploiting the structure of affine program control- and data-flow to generate at compile-time lightweight checker code to be executed within the transformed program. Experimental results assess the correctness and effectiveness of our method, and its increased coverage over previous approaches.« less
Thermodynamic Data Rescue and Informatics for Deep Carbon Science
NASA Astrophysics Data System (ADS)
Zhong, H.; Ma, X.; Prabhu, A.; Eleish, A.; Pan, F.; Parsons, M. A.; Ghiorso, M. S.; West, P.; Zednik, S.; Erickson, J. S.; Chen, Y.; Wang, H.; Fox, P. A.
2017-12-01
A large number of legacy datasets are contained in geoscience literature published between 1930 and 1980 and not expressed external to the publication text in digitized formats. Extracting, organizing, and reusing these "dark" datasets is highly valuable for many within the Earth and planetary science community. As a part of the Deep Carbon Observatory (DCO) data legacy missions, the DCO Data Science Team and Extreme Physics and Chemistry community identified thermodynamic datasets related to carbon, or more specifically datasets about the enthalpy and entropy of chemicals, as a proof of principle analysis. The data science team endeavored to develop a semi-automatic workflow, which includes identifying relevant publications, extracting contained datasets using OCR methods, collaborative reviewing, and registering the datasets via the DCO Data Portal where the 'Linked Data' feature of the data portal provides a mechanism for connecting rescued datasets beyond their individual data sources, to research domains, DCO Communities, and more, making data discovery and retrieval more effective.To date, the team has successfully rescued, deposited and registered additional datasets from publications with thermodynamic sources. These datasets contain 3 main types of data: (1) heat content or enthalpy data determined for a given compound as a function of temperature using high-temperature calorimetry, (2) heat content or enthalpy data determined for a given compound as a function of temperature using adiabatic calorimetry, and (3) direct determination of heat capacity of a compound as a function of temperature using differential scanning calorimetry. The data science team integrated these datasets and delivered a spectrum of data analytics including visualizations, which will lead to a comprehensive characterization of the thermodynamics of carbon and carbon-related materials.
Dataset on the energy performance of atrium type hotel buildings.
Vujosevic, Milica; Krstic-Furundzic, Aleksandra
2018-04-01
The data presented in this article are related to the research article entitled "The Influence of Atrium on Energy Performance of Hotel Building" (Vujosevic and Krstic-Furundzic, 2017) [1], which describes the annual energy performance of atrium type hotel building in Belgrade climate conditions, with the objective to present the impact of the atrium on the hotel building's energy demands for space heating and cooling. This dataset is made publicly available to show energy performance of selected hotel design alternatives, in order to enable extended analyzes of these data for other researchers.
Semi-automated surface mapping via unsupervised classification
NASA Astrophysics Data System (ADS)
D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.
2017-09-01
Due to the increasing volume of the returned data from space mission, the human search for correlation and identification of interesting features becomes more and more unfeasible. Statistical extraction of features via machine learning methods will increase the scientific output of remote sensing missions and aid the discovery of yet unknown feature hidden in dataset. Those methods exploit algorithm trained on features from multiple instrument, returning classification maps that explore intra-dataset correlation, allowing for the discovery of unknown features. We present two applications, one for Mercury and one for Vesta.
Analysis models for the estimation of oceanic fields
NASA Technical Reports Server (NTRS)
Carter, E. F.; Robinson, A. R.
1987-01-01
A general model for statistically optimal estimates is presented for dealing with scalar, vector and multivariate datasets. The method deals with anisotropic fields and treats space and time dependence equivalently. Problems addressed include the analysis, or the production of synoptic time series of regularly gridded fields from irregular and gappy datasets, and the estimate of fields by compositing observations from several different instruments and sampling schemes. Technical issues are discussed, including the convergence of statistical estimates, the choice of representation of the correlations, the influential domain of an observation, and the efficiency of numerical computations.
Pattern detection in stream networks: Quantifying spatialvariability in fish distribution
Torgersen, Christian E.; Gresswell, Robert E.; Bateman, Douglas S.
2004-01-01
Biological and physical properties of rivers and streams are inherently difficult to sample and visualize at the resolution and extent necessary to detect fine-scale distributional patterns over large areas. Satellite imagery and broad-scale fish survey methods are effective for quantifying spatial variability in biological and physical variables over a range of scales in marine environments but are often too coarse in resolution to address conservation needs in inland fisheries management. We present methods for sampling and analyzing multiscale, spatially continuous patterns of stream fishes and physical habitat in small- to medium-size watersheds (500–1000 hectares). Geospatial tools, including geographic information system (GIS) software such as ArcInfo dynamic segmentation and ArcScene 3D analyst modules, were used to display complex biological and physical datasets. These tools also provided spatial referencing information (e.g. Cartesian and route-measure coordinates) necessary for conducting geostatistical analyses of spatial patterns (empirical semivariograms and wavelet analysis) in linear stream networks. Graphical depiction of fish distribution along a one-dimensional longitudinal profile and throughout the stream network (superimposed on a 10-metre digital elevation model) provided the spatial context necessary for describing and interpreting the relationship between landscape pattern and the distribution of coastal cutthroat trout (Oncorhynchus clarki clarki) in western Oregon, U.S.A. The distribution of coastal cutthroat trout was highly autocorrelated and exhibited a spherical semivariogram with a defined nugget, sill, and range. Wavelet analysis of the main-stem longitudinal profile revealed periodicity in trout distribution at three nested spatial scales corresponding ostensibly to landscape disturbances and the spacing of tributary junctions.
EnviroAtlas - Memphis, TN - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Portland, ME - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - New York, NY - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Green Bay, WI - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Pittsburgh, PA - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Portland, OR - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Paterson, NJ - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Des Moines, IA - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Phoenix, AZ - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Milwaukee, WI - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Tampa, FL - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Durham, NC - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Fresno, CA - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - New Bedford, MA - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Woodbine, IA - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
EnviroAtlas - Austin, TX - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
Discriminating Projections for Estimating Face Age in Wild Images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tokola, Ryan A; Bolme, David S; Ricanek, Karl
2014-01-01
We introduce a novel approach to estimating the age of a human from a single uncontrolled image. Current face age estimation algorithms work well in highly controlled images, and some are robust to changes in illumination, but it is usually assumed that images are close to frontal. This bias is clearly seen in the datasets that are commonly used to evaluate age estimation, which either entirely or mostly consist of frontal images. Using pose-specific projections, our algorithm maps image features into a pose-insensitive latent space that is discriminative with respect to age. Age estimation is then performed using a multi-classmore » SVM. We show that our approach outperforms other published results on the Images of Groups dataset, which is the only age-related dataset with a non-trivial number of off-axis face images, and that we are competitive with recent age estimation algorithms on the mostly-frontal FG-NET dataset. We also experimentally demonstrate that our feature projections introduce insensitivity to pose.« less
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
NASA Astrophysics Data System (ADS)
Dong, J. Y.; Cheng, W.; Ma, C. P.; Tan, Y. T.; Xin, L. S.
2017-04-01
The residential public space is an important part in designing the ecological residence, and a proper physics environment of public space is of greater significance to urban residence in China. Actually, the measure to apply computer aided design software into residential design can effectively avoid an inconformity of design intent with actual using condition, and a negative impact on users due to bad architectural physics environment of buildings, etc. The paper largely adopts a design method of analyzing architectural physics environment of residential public space. By analyzing and evaluating various physics environments, a suitability assessment is obtained for residential public space, thereby guiding the space design.
A Boundary-Layer Scaling Analysis Comparing Complex And Flat Terrain
NASA Astrophysics Data System (ADS)
Fitton, George; Tchiguirinskaia, Ioulia; Scherzter, Daniel; Lovejoy, Shaun
2013-04-01
A comparison of two boundary-layer (at approximately 50m) wind datasets shows the existence of reproducible scaling behaviour in two very topographically different sites. The first test site was in Corsica, an island in the South of France, subject to both orographic and convective effects due to its mountainous terrain and close proximity to the sea respectively. The data recorded in Corsica consisted of 10Hz sonic anemometer velocities measured over a six-month period. The second site consists of measurements from the Growian experiment. The testing site for this experiment was also in close proximity to the sea, however, the surrounding terrain is very flat. The data in this experiment was recorded using propellor anemometers at 2.5Hz. Note the resolution of the sonics was better, however, we found in both cases, using spectral methods, that the quality of the data was unusable below frequencies of one second. The scales that we will discuss therefore are from one second to fourteen hours. In both cases three scaling subranges are observed. Starting from the lower frequencies, both datasets have a spectral exponent of approximately two from six hours to fourteen hours. Our first scaling analyses were only done on the Corsica dataset and thus we proposed that this change in scaling was due to the orography. The steep slope of the hill on which the mast was positioned was causing the wind's orientation to be directed vertically. This implied that the vertical shears of the horizontal wind may scale as Bogiano-Obhukov's 11/5 power law. Further analysis on the second (Growian) dataset resulted in the same behaviour over the same time-scales. Since the Growian experiment was performed over nearly homogenous terrain our first hypothesis is questionable. Alternatively we propose that for frequencies above six hours Taylor's hypothesis is no longer valid. This implies that in order to observe the scaling properties of structures with eddy turnover times larger than six hours direct measurements in space are necessary. In again both cases, for time-scales less than six hours up to an hour we observed a scaling power law that resembled something between Kolmogorov's 5/3s and a -1 energy production power law (a spectral exponent of 1.3). Finally from one hour to a second, two very different scaling behaviours occurred. For the Corsica dataset we observe a (close to) purely Kolmogorov 5/3s scaling subrange suggesting surface-layer mixing is the dominant process. For the Growian dataset we observe a scaling subrange that is close to Bolgiano-Obhukov's 11/5s suggesting temperature plays a dominant role. Additionally, for the Growian dataset we found that temperature is an active scaler for time-scales above an hour unlike for the Cosica dataset. This suggests that orographic effects may suppress convective forces over the large scales resulting in different small scale shear profiles in the cascade process. Given we can reproduce this scaling behaviour within a multifractal framework it will be of great interest to stochastically simulate the corresponding vector fields for the two situations in order to properly understand the physical meaning of our observations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Benthem, Mark H.
2016-05-04
This software is employed for 3D visualization of X-ray diffraction (XRD) data with functionality for slicing, reorienting, isolating and plotting of 2D color contour maps and 3D renderings of large datasets. The program makes use of the multidimensionality of textured XRD data where diffracted intensity is not constant over a given set of angular positions (as dictated by the three defined dimensional angles of phi, chi, and two-theta). Datasets are rendered in 3D with intensity as a scaler which is represented as a rainbow color scale. A GUI interface and scrolling tools along with interactive function via the mouse allowmore » for fast manipulation of these large datasets so as to perform detailed analysis of diffraction results with full dimensionality of the diffraction space.« less
Big geo data surface approximation using radial basis functions: A comparative study
NASA Astrophysics Data System (ADS)
Majdisova, Zuzana; Skala, Vaclav
2017-12-01
Approximation of scattered data is often a task in many engineering problems. The Radial Basis Function (RBF) approximation is appropriate for big scattered datasets in n-dimensional space. It is a non-separable approximation, as it is based on the distance between two points. This method leads to the solution of an overdetermined linear system of equations. In this paper the RBF approximation methods are briefly described, a new approach to the RBF approximation of big datasets is presented, and a comparison for different Compactly Supported RBFs (CS-RBFs) is made with respect to the accuracy of the computation. The proposed approach uses symmetry of a matrix, partitioning the matrix into blocks and data structures for storage of the sparse matrix. The experiments are performed for synthetic and real datasets.
Carrea, Laura; Embury, Owen; Merchant, Christopher J
2015-11-01
Datasets containing information to locate and identify water bodies have been generated from data locating static-water-bodies with resolution of about 300 m (1/360 ∘ ) recently released by the Land Cover Climate Change Initiative (LC CCI) of the European Space Agency. The LC CCI water-bodies dataset has been obtained from multi-temporal metrics based on time series of the backscattered intensity recorded by ASAR on Envisat between 2005 and 2010. The new derived datasets provide coherently: distance to land, distance to water, water-body identifiers and lake-centre locations. The water-body identifier dataset locates the water bodies assigning the identifiers of the Global Lakes and Wetlands Database (GLWD), and lake centres are defined for in-land waters for which GLWD IDs were determined. The new datasets therefore link recent lake/reservoir/wetlands extent to the GLWD, together with a set of coordinates which locates unambiguously the water bodies in the database. Information on distance-to-land for each water cell and the distance-to-water for each land cell has many potential applications in remote sensing, where the applicability of geophysical retrieval algorithms may be affected by the presence of water or land within a satellite field of view (image pixel). During the generation and validation of the datasets some limitations of the GLWD database and of the LC CCI water-bodies mask have been found. Some examples of the inaccuracies/limitations are presented and discussed. Temporal change in water-body extent is common. Future versions of the LC CCI dataset are planned to represent temporal variation, and this will permit these derived datasets to be updated.
X-ray Pulsars Across the Parameter Space of Luminosity, Accretion Mode, and Spin
NASA Astrophysics Data System (ADS)
Laycock, Silas
We propose to expand the scope of our successful project providing a multi-satellite library of X-ray Pulsar observations to the community. The library provides high-level products, activity monitoring, pulse-profiles, phased event files, spectra, and a unique pulse-profile modeling interface. The library's scientific footprint will expand in 4 key directions: (1) Update, by processing all new XMM-Newton and Chandra observations (2015-2017) of X-ray Binary Pulsars in the Magellanic Clouds. (2) Expand, by including all archival Suzaku, Swift and NuStar observations, and including Galactic pulsars. (3) Improve, by offering innovative data products that provide deeper insight. (4) Advance, by implementing a new generation of physically motivated emission and pulse-profile models. The library currently includes some 2000 individual RXTE-PCA, 200 Chandra ACIS-I, and 120 XMM-PN observations of the SMC spanning 15 years, creating an unrivaled record of pulsar temporal behavior. In Phase-2, additional observations of SMC pulsars will be added: 221 Chandra (ACIS-S and ACIS-I), 22 XMM-PN, 142 XMM-MOS, 92 Suzaku, 25 NuSTAR, and >10,000 Swift; leveraging our pipeline and analysis techniques already developed. With the addition of 7 Galactic pulsars each having many hundred multisatellite observations, these datasets cover the entire range of variability timescales and accretion regimes. We will model the pulse-profiles using state of the art techniques to parameterize their morphology and obtain the distribution of offsets between magnetic and spin axes, and create samples of profiles under specific accretion modes (whether pencil-beam or fan-beam dominated). These products are needed for the next generation of advances in neutron star theory and modeling. The long-duration of the dataset and “whole-galaxy" nature of the SMC sample make possible a new statistical approach to uncover the duty-cycle distribution and hence population demographics of transient High Mass X-ray Binary (HMXB) populations. Our unique library is already fueling progress on fundamental NS parameters and accretion physics.
Assessing the Impact of Land Use and Land Cover Change on Global Water Resources
NASA Astrophysics Data System (ADS)
Batra, N.; Yang, Y. E.; Choi, H. I.; Islam, A.; Charlotte, D. F.; Cai, X.; Kumar, P.
2007-12-01
Land use and land cover changes (LULCC) significantly modify the hydrological regime of the watersheds, affecting water resources and environment from regional to global scale. This study seeks to advance and integrate water and energy cycle observation, scientific understanding, and human impacts to assess future water availability. To achieve the research objective, we integrate and interpret past and current space based and in situ observations into a global hydrologic model (GHM). GHM is developed with enhanced spatial and temporal resolution, physical complexity, hydrologic theory and processes to quantify the impact of LULCC on physical variables: surface runoff, subsurface flow, groundwater, infiltration, ET, soil moisture, etc. Coupled with the common land model (CLM), a 3-dimensional volume averaged soil-moisture transport (VAST) model is expanded to incorporate the lateral flow and subgrid heterogeneity. The model consists of 11 soil-hydrology layers to predict lateral as well as vertical moisture flux transport based on Richard's equations. The primary surface boundary conditions (SBCs) include surface elevation and its derivatives, land cover category, sand and clay fraction profiles, bedrock depth and fractional vegetation cover. A consistent global GIS-based dataset is constructed for the SBCs of the model from existing observational datasets comprising of various resolutions, map projections and data formats. Global ECMWF data at 6-hour time steps for the period 1971 through 2000 is processed to get the forcing data which includes incoming longwave and shortwave radiation, precipitation, air temperature, pressure, wind components, boundary layer height and specific humidity. Land use land cover data, generated using IPCC scenarios for every 10 years from 2000 to 2100 is used for future assessment on water resources. Alterations due to LULCC on surface water balance components: ET, groundwater recharge and runoff are then addressed in the study. Land use change disrupts the hydrological cycle through increasing the water yield at some places leading to floods while diminishing, or even eliminating the low flow at other places.
GRASS GIS: The first Open Source Temporal GIS
NASA Astrophysics Data System (ADS)
Gebbert, Sören; Leppelt, Thomas
2015-04-01
GRASS GIS is a full featured, general purpose Open Source geographic information system (GIS) with raster, 3D raster and vector processing support[1]. Recently, time was introduced as a new dimension that transformed GRASS GIS into the first Open Source temporal GIS with comprehensive spatio-temporal analysis, processing and visualization capabilities[2]. New spatio-temporal data types were introduced in GRASS GIS version 7, to manage raster, 3D raster and vector time series. These new data types are called space time datasets. They are designed to efficiently handle hundreds of thousands of time stamped raster, 3D raster and vector map layers of any size. Time stamps can be defined as time intervals or time instances in Gregorian calendar time or relative time. Space time datasets are simplifying the processing and analysis of large time series in GRASS GIS, since these new data types are used as input and output parameter in temporal modules. The handling of space time datasets is therefore equal to the handling of raster, 3D raster and vector map layers in GRASS GIS. A new dedicated Python library, the GRASS GIS Temporal Framework, was designed to implement the spatio-temporal data types and their management. The framework provides the functionality to efficiently handle hundreds of thousands of time stamped map layers and their spatio-temporal topological relations. The framework supports reasoning based on the temporal granularity of space time datasets as well as their temporal topology. It was designed in conjunction with the PyGRASS [3] library to support parallel processing of large datasets, that has a long tradition in GRASS GIS [4,5]. We will present a subset of more than 40 temporal modules that were implemented based on the GRASS GIS Temporal Framework, PyGRASS and the GRASS GIS Python scripting library. These modules provide a comprehensive temporal GIS tool set. The functionality range from space time dataset and time stamped map layer management over temporal aggregation, temporal accumulation, spatio-temporal statistics, spatio-temporal sampling, temporal algebra, temporal topology analysis, time series animation and temporal topology visualization to time series import and export capabilities with support for NetCDF and VTK data formats. We will present several temporal modules that support parallel processing of raster and 3D raster time series. [1] GRASS GIS Open Source Approaches in Spatial Data Handling In Open Source Approaches in Spatial Data Handling, Vol. 2 (2008), pp. 171-199, doi:10.1007/978-3-540-74831-19 by M. Neteler, D. Beaudette, P. Cavallini, L. Lami, J. Cepicky edited by G. Brent Hall, Michael G. Leahy [2] Gebbert, S., Pebesma, E., 2014. A temporal GIS for field based environmental modeling. Environ. Model. Softw. 53, 1-12. [3] Zambelli, P., Gebbert, S., Ciolli, M., 2013. Pygrass: An Object Oriented Python Application Programming Interface (API) for Geographic Resources Analysis Support System (GRASS) Geographic Information System (GIS). ISPRS Intl Journal of Geo-Information 2, 201-219. [4] Löwe, P., Klump, J., Thaler, J. (2012): The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster, (Geophysical Research Abstracts Vol. 14, EGU2012-4491, 2012), General Assembly European Geosciences Union (Vienna, Austria 2012). [5] Akhter, S., Aida, K., Chemin, Y., 2010. "GRASS GIS on High Performance Computing with MPI, OpenMP and Ninf-G Programming Framework". ISPRS Conference, Kyoto, 9-12 August 2010
NASA Astrophysics Data System (ADS)
Carmona, A.; Poveda, G.; Sivapalan, M.; Vallejo-Bernal, S. M.; Bustamante, E.
2015-12-01
We study a 3-D generalization of Budyko's framework that involves the complementary relationship between long-term mean actual evapotranspiration (E) and potential evapotranspiration (Ep), and that captures the mutual interdependence among E, Ep, and mean annual precipitation (P). For this purpose we use three dimensionless and dependent quantities: Ψ=E/P, Φ=Ep/P and Ω=E/Ep. We demonstrate analytically that Budyko-type equations are unable to capture the physical limit of the relation between Ω and Φ in humid environments, owing to the unfeasibility of Ep/P→0 at E/Ep=1. Using independent datasets from 146 sub-catchments in the Amazon River basin we overcome this physical inconsistency by proposing a physically consistent power law Ψ=kΦ e with pre-factor k=0.66 and scaling exponent e=0.83 (R2=0.93). The proposed power law is compared with other Budyko-type equations, namely those by Yang et al (2008) and Cheng et al (2011). Taking into account the goodness of fits with confidence bounds set at 95% level and the ability to comply with the physical limits of the 3-D space, our results show that the power law works better to model the long-term water and energy balances within the Amazon River basin. At the interannual time scale, parameters from the three studied equations are estimated for each catchment using 27 years of information and interesting regional patterns emerge, as well as evidence of space-time symmetry. In addition, results show that within individual catchments the parameters from the linear relationship by Cheng et al (2011) and from the power law resemble and are related to the partitioning of energy via evapotranspiration in terms of Ω. Finally, signs of co-evolution of catchments are explored by linking the emerging spatial patterns of the parameters with landscape properties that represent some of the main features of the Amazon River basin, including topography, water in soils and vegetation.
LEAP: biomarker inference through learning and evaluating association patterns.
Jiang, Xia; Neapolitan, Richard E
2015-03-01
Single nucleotide polymorphism (SNP) high-dimensional datasets are available from Genome Wide Association Studies (GWAS). Such data provide researchers opportunities to investigate the complex genetic basis of diseases. Much of genetic risk might be due to undiscovered epistatic interactions, which are interactions in which combination of several genes affect disease. Research aimed at discovering interacting SNPs from GWAS datasets proceeded in two directions. First, tools were developed to evaluate candidate interactions. Second, algorithms were developed to search over the space of candidate interactions. Another problem when learning interacting SNPs, which has not received much attention, is evaluating how likely it is that the learned SNPs are associated with the disease. A complete system should provide this information as well. We develop such a system. Our system, called LEAP, includes a new heuristic search algorithm for learning interacting SNPs, and a Bayesian network based algorithm for computing the probability of their association. We evaluated the performance of LEAP using 100 1,000-SNP simulated datasets, each of which contains 15 SNPs involved in interactions. When learning interacting SNPs from these datasets, LEAP outperformed seven others methods. Furthermore, only SNPs involved in interactions were found to be probable. We also used LEAP to analyze real Alzheimer's disease and breast cancer GWAS datasets. We obtained interesting and new results from the Alzheimer's dataset, but limited results from the breast cancer dataset. We conclude that our results support that LEAP is a useful tool for extracting candidate interacting SNPs from high-dimensional datasets and determining their probability. © 2015 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.
Manifold Learning by Preserving Distance Orders.
Ataer-Cansizoglu, Esra; Akcakaya, Murat; Orhan, Umut; Erdogmus, Deniz
2014-03-01
Nonlinear dimensionality reduction is essential for the analysis and the interpretation of high dimensional data sets. In this manuscript, we propose a distance order preserving manifold learning algorithm that extends the basic mean-squared error cost function used mainly in multidimensional scaling (MDS)-based methods. We develop a constrained optimization problem by assuming explicit constraints on the order of distances in the low-dimensional space. In this optimization problem, as a generalization of MDS, instead of forcing a linear relationship between the distances in the high-dimensional original and low-dimensional projection space, we learn a non-decreasing relation approximated by radial basis functions. We compare the proposed method with existing manifold learning algorithms using synthetic datasets based on the commonly used residual variance and proposed percentage of violated distance orders metrics. We also perform experiments on a retinal image dataset used in Retinopathy of Prematurity (ROP) diagnosis.
Acton, Charles; Slavney, Susan; Arvidson, Raymond E.; Gaddis, Lisa R.; Gordon, Mitchell; Lavoie, Susan
2017-01-01
In the early 1980s, the Space Science Board (SSB) of the National Research Council was concerned about the poor and inconsistent treatment of scientific information returned from NASA’s space science missions. The SSB formed a panel [The Committee on Data Management and Computation (CODMAC)] to assess the situation and make recommendations to NASA for improvements. The CODMAC panel issued a report [1,2] that led to a number of actions, one of which was the convening of a Planetary Data Workshop in November 1983 [3]. The key findings of that workshop were that (1) important datasets were being irretrievably lost, and (2) the use of planetary data by the wider community is constrained by inaccessibility and a lack of commonality in format and documentation. The report further stated, “Most participants felt the present system (of data archiving and access) is inadequate and immediate changes are necessary to insure retention of and access to these and future datasets.”
Global Optimization Ensemble Model for Classification Methods
Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab
2014-01-01
Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382
Kireeva, N; Baskin, I I; Gaspar, H A; Horvath, D; Marcou, G; Varnek, A
2012-04-01
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure-activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self-Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Nedimovic, M. R.; Mountain, G. S.; Austin, J. A., Jr.; Fulthorpe, C.; Aali, M.; Baldwin, K.; Bhatnagar, T.; Johnson, C.; Küçük, H. M.; Newton, A.; Stanley, J.
2015-12-01
In June-July 2015, we acquired the first 3D/2D hybrid (short/long streamer) multichannel seismic (MCS) reflection dataset. These data were collected simultaneously across IODP Exp. 313 drillsites, off New Jersey, using R/V Langsethand cover ~95% of the planned 12x50 km box. Despite the large survey area, the lateral and vertical resolution for the 3D dataset is almost a magnitude of order higher than for data gathered for standard petroleum exploration. Such high-resolution was made possible by collection of common midpoint (CMP) lines whose combined length is ~3 times the Earth's circumference (~120,000 profile km) and a source rich in high-frequencies. We present details on the data acquisition, ongoing data analysis, and preliminary results. The science driving this project is presented by Mountain et al. The 3D component of this innovative survey used an athwartship cross cable, extended laterally by 2 barovanes roughly 357.5 m apart and trailed by 24 50-m P-Cables spaced ~12.5 m with near-trace offset of 53 m. Each P-Cable had 8 single hydrophone groups spaced at 6.25 m for a total of 192 channels. Record length was 4 s and sample rate 0.5 ms, with no low cut and an 824 Hz high cut filter. We ran 77 sail lines spaced ~150 m. Receiver locations were determined using 2 GPS receivers mounted on floats and 2 compasses and depth sensors per streamer. Streamer depths varied from 2.1 to 3.7 m. The 2D component used a single 3 km streamer, with 240 9-hydrophone groups spaced at 12.5 m, towed astern with near-trace offset of 229 m. The record length was 4 s and sample rate 0.5 ms, with low cut filter at 2 Hz and high cut at 412 Hz. Receiver locations were recorded using GPS at the head float and tail buoy, combined with 12 bird compasses spaced ~300 m. Nominal streamer depth was 4.5 m. The source for both systems was a 700 in3 linear array of 4 Bolt air guns suspended at 4.5 m towing depth, 271.5 m behind the ship's stern. Shot spacing was 12.5 m. Data analysis to prestack time migration is being carried out by Absolute Imaging, a commercial company. The shipboard QC analysis and brute stacks indicate that the final product will be superb. Key advantages of the hybrid 3D/2D dataset are: (1) Velocity control from the 2D long-streamer data combined with the ultra-high resolution of the P-Cable 3D dataset; (2) Opportunity for prestack and poststack attribute analysis.
Research and technology: Fiscal year 1984 report
NASA Technical Reports Server (NTRS)
1985-01-01
Topics covered include extraterrestrial physics, high energy astrophysics, astronomy, solar physics, atmospheres, oceans, terrestrial physics, space technology, sensors, techniques, user space data systems, space communications and navigation, and system and software engineering.
Ocean Carbon States: Data Mining in Observations and Numerical Simulations Results
NASA Astrophysics Data System (ADS)
Latto, R.; Romanou, A.
2017-12-01
Advanced data mining techniques are rapidly becoming widely used in Climate and Earth Sciences with the purpose of extracting new meaningful information from increasingly larger and more complex datasets. This is particularly important in studies of the global carbon cycle, where any lack of understanding of its combined physical and biogeochemical drivers is detrimental to our ability to accurately describe, understand, and predict CO2 concentrations and their changes in the major carbon reservoirs. The analysis presented here evaluates the use of cluster analysis as a means of identifying and comparing spatial and temporal patterns extracted from observational and model datasets. As the observational data is organized into various regimes, which we will call "ocean carbon states", we gain insight into the physical and/or biogeochemical processes controlling the ocean carbon cycle as well as how well these processes are simulated by a state-of-the-art climate model. We find that cluster analysis effectively produces realistic, dynamic regimes that can be associated with specific processes at different temporal scales for both observations and the model. In addition, we show how these regimes can be used to illustrate and characterize the model biases in the model air-sea flux of CO2. These biases are attributed to biases in salinity, sea surface temperature, wind speed, and nitrate, which are then used to identify the physical processes that are inaccurately reproduced by the model. In this presentation, we provide a proof-of-concept application using simple datasets, and we expand to more complex ones, using several physical and biogeochemical variable pairs, thus providing considerable insight into the mechanisms and phases of the ocean carbon cycle over different temporal and spatial scales.
Ziatdinov, Maxim; Dyck, Ondrej; Maksov, Artem; ...
2017-12-07
Recent advances in scanning transmission electron and scanning probe microscopies have opened unprecedented opportunities in probing the materials structural parameters and various functional properties in real space with an angstrom-level precision. This progress has been accompanied by exponential increase in the size and quality of datasets produced by microscopic and spectroscopic experimental techniques. These developments necessitate adequate methods for extracting relevant physical and chemical information from the large datasets, for which a priori information on the structures of various atomic configurations and lattice defects is limited or absent. Here we demonstrate an application of deep neural networks to extracting informationmore » from atomically resolved images including location of the atomic species and type of defects. We develop a “weakly-supervised” approach that uses information on the coordinates of all atomic species in the image, extracted via a deep neural network, to identify a rich variety of defects that are not part of an initial training set. We further apply our approach to interpret complex atomic and defect transformation, including switching between different coordination of silicon dopants in graphene as a function of time, formation of peculiar silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of molecular “rotor”. In conclusion, this deep learning based approach resembles logic of a human operator, but can be scaled leading to significant shift in the way of extracting and analyzing information from raw experimental data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ziatdinov, Maxim; Dyck, Ondrej; Maksov, Artem
Recent advances in scanning transmission electron and scanning probe microscopies have opened unprecedented opportunities in probing the materials structural parameters and various functional properties in real space with an angstrom-level precision. This progress has been accompanied by exponential increase in the size and quality of datasets produced by microscopic and spectroscopic experimental techniques. These developments necessitate adequate methods for extracting relevant physical and chemical information from the large datasets, for which a priori information on the structures of various atomic configurations and lattice defects is limited or absent. Here we demonstrate an application of deep neural networks to extracting informationmore » from atomically resolved images including location of the atomic species and type of defects. We develop a “weakly-supervised” approach that uses information on the coordinates of all atomic species in the image, extracted via a deep neural network, to identify a rich variety of defects that are not part of an initial training set. We further apply our approach to interpret complex atomic and defect transformation, including switching between different coordination of silicon dopants in graphene as a function of time, formation of peculiar silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of molecular “rotor”. In conclusion, this deep learning based approach resembles logic of a human operator, but can be scaled leading to significant shift in the way of extracting and analyzing information from raw experimental data.« less
Leveraging External Sensor Data for Enhanced Space Situational Awareness
2015-09-17
Space Administration Infrared Processing and Analysis CenterTeacher Archive Research Program NN Nearest Neighbor NOMAD Naval Observatory Merged...used to improve SSA? 1.2.2 Assumptions and Limitations This research assumes that the stars in Naval Observatory Merged Astrometric Dataset ( NOMAD ...developed and maintained by the U. S. Naval Observatory (USNO), but as the NOMAD catalog is much easier to obtain than the UCAC, NOMAD will be used as the
A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping
Shen, Bin; Zheng, Qiuhua; Li, Xingsen; Xu, Libo
2015-01-01
With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1) mapping from the physical space to the cyber space, (2) data preprocessing, (3) pattern mining and (4) knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data. PMID:25751076
NASA Astrophysics Data System (ADS)
Pasquato, Mario; Chung, Chul
2016-05-01
Context. Machine-learning (ML) solves problems by learning patterns from data with limited or no human guidance. In astronomy, ML is mainly applied to large observational datasets, e.g. for morphological galaxy classification. Aims: We apply ML to gravitational N-body simulations of star clusters that are either formed by merging two progenitors or evolved in isolation, planning to later identify globular clusters (GCs) that may have a history of merging from observational data. Methods: We create mock-observations from simulated GCs, from which we measure a set of parameters (also called features in the machine-learning field). After carrying out dimensionality reduction on the feature space, the resulting datapoints are fed in to various classification algorithms. Using repeated random subsampling validation, we check whether the groups identified by the algorithms correspond to the underlying physical distinction between mergers and monolithically evolved simulations. Results: The three algorithms we considered (C5.0 trees, k-nearest neighbour, and support-vector machines) all achieve a test misclassification rate of about 10% without parameter tuning, with support-vector machines slightly outperforming the others. The first principal component of feature space correlates with cluster concentration. If we exclude it from the regression, the performance of the algorithms is only slightly reduced.
NASA Technical Reports Server (NTRS)
Cullather, Richard; Bosilovich, Michael
2017-01-01
The Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) is a global atmospheric reanalysis produced by the NASA Global Modeling and Assimilation Office (GMAO). It spans the satellite observing era from 1980 to the present. The goals of MERRA-2 are to provide a regularly-gridded, homogeneous record of the global atmosphere, and to incorporate additional aspects of the climate system including trace gas constituents (stratospheric ozone), and improved land surface representation, and cryospheric processes. MERRA-2 is also the first satellite-era global reanalysis to assimilate space-based observations of aerosols and represent their interactions with other physical processes in the climate system. The inclusion of these additional components are consistent with the overall objectives of an Integrated Earth System Analysis (IESA). MERRA-2 is intended to replace the original MERRA product, and reflects recent advances in atmospheric modeling and data assimilation. Modern hyperspectral radiance and microwave observations, along with GPS-Radio Occultation and NASA ozone datasets are now assimilated in MERRA-2. Much of the structure of the data files remains the same in MERRA-2. While the original MERRA data format was HDF-EOS, the MERRA-2 supplied binary data format is now NetCDF4 (with lossy compression to save space).
NASA Astrophysics Data System (ADS)
Pulinets, Sergey; Ouzounov, Dimitar; Hernandez-Pajares, Manuel; Hattori, Katsumi; Garcia-Rigo, Alberto
2014-05-01
Our approach of using multiple geo-space observation is based on the LAIC (Lithosphere- Atmosphere- Ionosphere Coupling) model and the gained experience during similar analysis of Three-Mile Island and Chernobyl accidents. We do collect a unique dataset of geophysical data for the period around the time of the most active phase of Fukushima explosions (from 12 March till 31 March, 71-90 DOY). We analyzed following data sets: (i) ground temperature and relative humidity data from the JMA network of Japan, (ii) satellite meteorological data and assimilative models to obtain the integrated water vapor chemical potential; (iii) the infrared emission on the top of atmosphere measured by NOAA and GEOS satellites estimated as Outgoing Longwave Radiation; and (iv) multiple ionospheric measurements , including ground based ionosondes, GPS vTEC from GEONET network, COSMIC/FORMOSAT constellation occultation data, JASON satellite TEC measurements, and tomography reconstruction technique to obtain 3D distribution of electron concentration around the Fukushima power plant. As a result we were able to detect the anomalies in different geophysical parameters representing the dynamics of the Fukushima nuclear accident development and the effects on the atmospheric environment. Their temporal evolution demonstrates the synergy in different atmospheric anomalies development what implies the existence of the common physical mechanism described by the LAIC model.
Prediction of AL and Dst Indices from ACE Measurements Using Hybrid Physics/Black-Box Techniques
NASA Astrophysics Data System (ADS)
Spencer, E.; Rao, A.; Horton, W.; Mays, L.
2008-12-01
ACE measurements of the solar wind velocity, IMF and proton density is used to drive a hybrid Physics/Black- Box model of the nightside magnetosphere. The core physics is contained in a low order nonlinear dynamical model of the nightside magnetosphere called WINDMI. The model is augmented by wavelet based nonlinear mappings between the solar wind quantities and the input into the physics model, followed by further wavelet based mappings of the model output field aligned currents onto the ground based magnetometer measurements of the AL index and Dst index. The black box mappings are introduced at the input stage to account for uncertainties in the way the solar wind quantities are transported from the ACE spacecraft at L1 to the magnetopause. Similar mappings are introduced at the output stage to account for a spatially and temporally varying westward auroral electrojet geometry. The parameters of the model are tuned using a genetic algorithm, and trained using the large geomagnetic storm dataset of October 3-7 2000. It's predictive performance is then evaluated on subsequent storm datasets, in particular the April 15-24 2002 storm. This work is supported by grant NSF 7020201
2D/3D fetal cardiac dataset segmentation using a deformable model.
Dindoyal, Irving; Lambrou, Tryphon; Deng, Jing; Todd-Pokropek, Andrew
2011-07-01
To segment the fetal heart in order to facilitate the 3D assessment of the cardiac function and structure. Ultrasound acquisition typically results in drop-out artifacts of the chamber walls. The authors outline a level set deformable model to automatically delineate the small fetal cardiac chambers. The level set is penalized from growing into an adjacent cardiac compartment using a novel collision detection term. The region based model allows simultaneous segmentation of all four cardiac chambers from a user defined seed point placed in each chamber. The segmented boundaries are automatically penalized from intersecting at walls with signal dropout. Root mean square errors of the perpendicular distances between the algorithm's delineation and manual tracings are within 2 mm which is less than 10% of the length of a typical fetal heart. The ejection fractions were determined from the 3D datasets. We validate the algorithm using a physical phantom and obtain volumes that are comparable to those from physically determined means. The algorithm segments volumes with an error of within 13% as determined using a physical phantom. Our original work in fetal cardiac segmentation compares automatic and manual tracings to a physical phantom and also measures inter observer variation.
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Space physics education via examples in the undergraduate physics curriculum
NASA Astrophysics Data System (ADS)
Martin, R.; Holland, D. L.
2011-12-01
The field of space physics is rich with examples of basic physics and analysis techniques, yet it is rarely seen in physics courses or textbooks. As space physicists in an undergraduate physics department we like to use research to inform teaching, and we find that students respond well to examples from magnetospheric science. While we integrate examples into general education courses as well, this talk will focus on physics major courses. Space physics examples are typically selected to illustrate a particular concept or method taught in the course. Four examples will be discussed, from an introductory electricity and magnetism course, a mechanics/nonlinear dynamics course, a computational physics course, and a plasma physics course. Space physics provides examples of many concepts from introductory E&M, including the application of Faraday's law to terrestrial magnetic storm effects and the use of the basic motion of charged particles as a springboard to discussion of the inner magnetosphere and the aurora. In the mechanics and nonlinear dynamics courses, the motion of charged particles in a magnetotail current sheet magnetic field is treated as a Newtonian dynamical system, illustrating the Poincaré surface-of-section technique, the partitioning of phase space, and the KAM theorem. Neural network time series analysis of AE data is used as an example in the computational physics course. Finally, among several examples, current sheet particle dynamics is utilized in the plasma physics course to illustrate the notion of adiabatic/guiding center motion and the breakdown of the adiabatic approximation. We will present short descriptions of our pedagogy and student assignments in this "backdoor" method of space physics education.
NASA Astrophysics Data System (ADS)
Moldwin, M.; Morrow, C. A.; Moldwin, L. A.; Torrence, J.
2012-12-01
To assess the state-of-health of the field of Solar and Space Physics an analysis of the number of Ph.D.s produced and number of Job Postings each year was done for the decade 2001-2010. To determine the number of Ph.D's produced in the field, the University of Michigan Ph.D. Dissertation Archive (Proquest) was queried for Solar and Space Physics dissertations produced in North America. The field generated about 30 Ph.D. per year from 2001 to 2006, but then saw the number increase to 50 to 70 per year for the rest of the decade. Only 14 institutions account for the majority of Solar and Space Physics PhDs. To estimate the number of jobs available each year in the field, a compilation of the job advertisements listed in the American Astronomical Society's Solar Physics Division (SPD) and the American Geophysical Union's Space Physics and Aeronomy (SPA) electronic newsletters was done. The positions were sorted into four types (Faculty, Post-doctoral Researcher, and Scientist/Researcher or Staff), institution type (academic, government lab, or industry) and if the position was located inside or outside the United States. Overall worldwide, 943 Solar and Space Physics positions were advertised over the decade. Of this total, 52% were for positions outside the US. Within Solar Physics, 44% of the positions were in the US, while in Space Physics 57% of the positions were for US institutions. The annual average for positions in the US were 26.9 for Solar Physics and 31.5 for Space Physics though there is much variability year-to-year particularly in Solar Physics positions outside the US. A disconcerting trend is a decline in job advertisements in the last two years for Solar Physics positions and between 2009 and 2010 for Space Physics positions. For both communities within the US in 2010, the total job ads reached their lowest levels in the decade (14), approximately half the decadal average number of job advertisements.
Large-Scale Pattern Discovery in Music
NASA Astrophysics Data System (ADS)
Bertin-Mahieux, Thierry
This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent.
NASA Technical Reports Server (NTRS)
Doherty, Michael P.
2002-01-01
The Physics of Colloids in Space (PCS) experiment is a Microgravity Fluids Physics investigation that is presently located in an Expedite the Process of Experiments to Space Station (EXPRESS) Rack on the International Space Station. PCS was launched to the International Space Station on April 19, 2001, activated on May 31, 2001, and will continue to operate about 90 hr per week through May 2002.
NASA Astrophysics Data System (ADS)
van Eck, C. M.; Morfopoulos, C.; Betts, R. A.; Chang, J.; Ciais, P.; Friedlingstein, P.; Regnier, P. A. G.
2016-12-01
The frequency and severity of extreme climate events such as droughts, extreme precipitation and heatwaves are expected to increase in our changing climate. These extreme climate events will have an effect on vegetation either by enhanced or reduced productivity. Subsequently, this can have a substantial impact on the terrestrial carbon sink and thus the global carbon cycle, especially as extreme climate events are expected to increase in frequency and severity. Connecting observational datasets with modelling studies provides new insights into these climate-vegetation interactions. This study aims to compare extremes in vegetation productivity as derived from observations with that of Dynamic Global Vegetation Models (DGVMs). In this case GIMMS-NDVI 3g is selected as the observational dataset and both JULES (Joint UK Land Environment Simulator) and ORCHIDEE (Organising Carbon and Hydrology In Dynamic Ecosystems) as the DGVMs. Both models are forced with PGFv2 Global Meteorological Forcing Dataset according to the ISI-MIP2 protocol for historical runs. Extremes in vegetation productivity are the focal point, which are identified as NDVI anomalies below the 10th percentile or above the 90th percentile during the growing season, referred to as browning or greening events respectively. The monthly NDVI dataset GIMMS-NDVI 3g is used to obtain the location in time and space of the vegetation extremes. The global GIMMS-NDVI 3g dataset has been subdivided into IPCC's SREX-regions for which the NDVI anomalies are calculated and the extreme thresholds are determined. With this information we can identify the location in time and space of the browning and greening events in remotely-sensed vegetation productivity. The same procedure is applied to the modelled Gross Primary Productivity (GPP) allowing a comparison between the spatial and temporal occurrence of the browning and greening events in the observational dataset and the models' output. The capacity of the models to catch observed extremes in vegetation productivity is assessed and compared. Factors contributing to observed and modelled vegetation browning/greening extremes are analysed. The results of this study provide a stepping stone to modelling future extremes in vegetation productivity.
Landschoff, Jannes; Du Plessis, Anton; Griffiths, Charles L
2018-04-01
Along with the conventional deposition of physical types at natural history museums, the deposition of 3-dimensional (3D) image data has been proposed for rare and valuable museum specimens, such as irreplaceable type material. Micro computed tomography (μCT) scan data of 5 hermit crab species from South Africa, including rare specimens and type material, depicted main identification characteristics of calcified body parts. However, low-image contrasts, especially in larger (>50 mm total length) specimens, did not allow sufficient 3D reconstructions of weakly calcified and fine characteristics, such as soft tissue of the pleon, mouthparts, gills, and setation. Reconstructions of soft tissue were sometimes possible, depending on individual sample and scanning characteristics. The raw data of seven scans are publicly available for download from the GigaDB repository. Calcified body parts visualized from μCT data can aid taxonomic validation and provide additional, virtual deposition of rare specimens. The use of a nondestructive, nonstaining μCT approach for taxonomy, reconstructions of soft tissue structures, microscopic spines, and setae depend on species characteristics. Constrained to these limitations, the presented dataset can be used for future morphological studies. However, our virtual specimens will be most valuable to taxonomists who can download a digital avatar for 3D examination. Simultaneously, in the event of physical damage to or loss of the original physical specimen, this dataset serves as a vital insurance policy.
NASA Astrophysics Data System (ADS)
Dukas, Georg
Though research in emerging technologies is vital to fulfilling their incredible potential for educational applications, it is often fraught with analytic challenges related to large datasets. This thesis explores these challenges in researching multiuser virtual environments (MUVEs). In a MUVE, users assume a persona and traverse a virtual space often depicted as a physical world, interacting with other users and digital artifacts. As students participate in MUVE-based curricula, detailed records of their paths through the virtual world are typically collected in event logs. Although many studies have demonstrated the instructional power of MUVEs (e.g., Barab, Hay, Barnett, & Squire, 2001; Ketelhut, Dede, Clarke, Nelson, & Bowman, 2008), none have successfully quantified these student paths for analysis in the aggregate. This thesis constructs several frameworks for conducting research involving student navigational choices in MUVEs based on a case study of data generated from the River City project. After providing a context for the research and an introduction to the River City dataset, the first part of this thesis explores the issues associated with data compression and presents a grounded theory approach (Glaser & Strauss, 1967) to the cleaning, compacting, and coding or MUVE datasets. In summary of this section, I discuss the implication of preparation choices for further analysis. Second, two conceptually different approaches to analyzing behavioral sequences are investigated. For each approach, a theoretical context, description of possible exploratory and confirmatory methods, and illustrative examples from River City are provided. The thesis then situates these specific analytic approaches within the constellation of possible research utilizing MUVE event log data. Finally, based on the lessons of River City and the investigation of a spectrum of possible event logs, a set of design heuristics for data collection in MUVEs is constructed and a possible future for research in these environments is envisioned.
Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets
NASA Astrophysics Data System (ADS)
Juric, Mario
2011-01-01
The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.
NASA Astrophysics Data System (ADS)
Fredriksen, H. B.; Løvsletten, O.; Rypdal, M.; Rypdal, K.
2014-12-01
Several research groups around the world collect instrumental temperature data and combine them in different ways to obtain global gridded temperature fields. The three most well known datasets are HadCRUT4 produced by the Climatic Research Unit and the Met Office Hadley Centre in UK, one produced by NASA GISS, and one produced by NOAA. Recently Berkeley Earth has also developed a gridded dataset. All these four will be compared in our analysis. The statistical properties we will focus on are the standard deviation and the Hurst exponent. These two parameters are sufficient to describe the temperatures as long-range memory stochastic processes; the standard deviation describes the general fluctuation level, while the Hurst exponent relates the strength of the long-term variability to the strength of the short-term variability. A higher Hurst exponent means that the slow variations are stronger compared to the fast, and that the autocovariance function will have a stronger tail. Hence the Hurst exponent gives us information about the persistence or memory of the process. We make use of these data to show that data averaged over a larger area exhibit higher Hurst exponents and lower variance than data averaged over a smaller area, which provides information about the relationship between temporal and spatial correlations of the temperature fluctuations. Interpolation in space has some similarities with averaging over space, although interpolation is more weighted towards the measurement locations. We demonstrate that the degree of spatial interpolation used can explain some differences observed between the variances and memory exponents computed from the various datasets.
Cannistraci, Carlo Vittorio; Ravasi, Timothy; Montevecchi, Franco Maria; Ideker, Trey; Alessio, Massimo
2010-09-15
Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures-specifically dimension reduction (DR), coupled with clustering-provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. https://sites.google.com/site/carlovittoriocannistraci/home.
Quest for Value in Big Earth Data
NASA Astrophysics Data System (ADS)
Kuo, Kwo-Sen; Oloso, Amidu O.; Rilee, Mike L.; Doan, Khoa; Clune, Thomas L.; Yu, Hongfeng
2017-04-01
Among all the V's of Big Data challenges, such as Volume, Variety, Velocity, Veracity, etc., we believe Value is the ultimate determinant, because a system delivering better value has a competitive edge over others. Although it is not straightforward to assess the value of scientific endeavors, we believe the ratio of scientific productivity increase to investment is a reasonable measure. Our research in Big Data approaches to data-intensive analysis for Earth Science has yielded some insights, as well as evidences, as to how optimal value might be attained. The first insight is that we should avoid, as much as possible, moving data through connections with relatively low bandwidth. That is, we recognize that moving data is expensive, albeit inevitable. They must at least be moved from the storage device into computer main memory and then to CPU registers for computation. When data must be moved it is better to move them via relatively high-bandwidth connections and avoid low-bandwidth ones. For this reason, a technology that can best exploit data locality will have an advantage over others. Data locality is easy to achieve and exploit with only one dataset. With multiple datasets, data colocation becomes important in addition to data locality. However, the organization of datasets can only be co-located for certain types of analyses. It is impossible for them to be co-located for all analyses. Therefore, our second insight is that we need to co-locate the datasets for the most commonly used analyses. In Earth Science, we believe the most common analysis requirement is "spatiotemporal coincidence". For example, when we analyze precipitation systems, we often would like to know the environment conditions "where and when" (i.e. at the same location and time) there is precipitation. This "where and when" indicates the "spatiotemporal coincidence" requirement. Thus, an associated insight is that datasets need to be partitioned per the physical dimensions, i.e. space and time, rather than their array index dimensions to achieve co-location for spatiotemporal coincidence. This leads further to the insight that, in terms of optimizing Value, achieving good scalability in Variety is more crucial than good scalability in Volume. Therefore, we will discuss our innovative approach to improving productivity by homogenizing the daunting varieties in Earth Science data to enable data co-location systematically. In addition, a Big Data system incorporating the capabilities described above has the potential to drastically shorten the data preparation period of machine learning, better facilitate automated machine learning operations, and further boost scientific productivity.
Fine-Scale Environmental Indicators of Public Health and Well ...
Urban ecosystem services contribute to public health and well-being by buffering natural and man-made hazards, and by promoting healthful lifestyles that include physical activity, social interaction, and engagement with nature. As part of the EnviroAtlas online mapping tool, EPA and its research partners have identified urban environmental features that have been linked in the scientific literature to specific aspects of public health and well-being. Examples of these features include tree cover along walkable roads, overall neighborhood green space, green window views, and proximity to parks. Associated aspects of health and well-being include physical fitness, social capital, school performance, and longevity. In many previous studies, stronger associations were observed in disproportionately vulnerable populations such as children, the elderly, and those of lower socioeconomic status.EnviroAtlas researchers have estimated and mapped a suite of urban environmental features by synthesizing newly-generated one-meter resolution landcover data, downscaled census population data, and existing datasets such as roads and waterways. Resulting geospatial metrics represent health-related indicators of urban ecosystem services supply and demand at the census block-group and finer. They have been developed using consistent methods to facilitate comparisons between neighborhoods and across multiple U.S. communities. Demographic overlays, also available in EnviroAtl
EnviroAtlas Connects Urban Ecosystem Services and Human ...
Ecosystem services in urban areas can improve public health and well-being by mitigating natural and anthropogenic pollution, and by promoting healthy lifestyles that include engagement with nature and enhanced opportunities for physical activity and social interaction. EPA’s EnviroAtlas online mapping tool identifies urban environmental features linked in the scientific and medical literature to specific aspects of public health and well-being. EnviroAtlas researchers have synthesized newly-generated one-meter resolution landcover data, downscaled census population data, and other existing datasets such as roads and parks. Resulting geospatial metrics represent health-related indicators of urban ecosystem services supply and demand by census block-group and finer scales. EnviroAtlas maps include percent of the population with limited window views of trees, tree cover along walkable roads, overall neighborhood green space, and proximity to parks. Demographic data can be overlaid to perform analyses of disproportionate distribution of urban ecosystem services across population groups. Together with the Eco-Health Relationship Browser, EnviroAtlas data can be linked to numerous aspects of public health and well-being including school performance, physical fitness, social capital, and longevity. EnviroAtlas maps have been developed using consistent methods to allow for comparisons between neighborhoods and across multiple U.S. communities. To feature eco-heal
Solar physics in the space age
NASA Technical Reports Server (NTRS)
1989-01-01
A concise and brief review is given of the solar physics' domain, and how its study has been affected by NASA Space programs which have enabled space based observations. The observations have greatly increased the knowledge of solar physics by proving some theories and challenging others. Many questions remain unanswered. To exploit coming opportunities like the Space Station, solar physics must continue its advances in instrument development, observational techniques, and basic theory. Even with the Advance Solar Observatory, other space based observation will still be required for the sure to be ensuing questions.
Evaluation of Ten Methods for Initializing a Land Surface Model
NASA Technical Reports Server (NTRS)
Rodell, M.; Houser, P. R.; Berg, A. A.; Famiglietti, J. S.
2005-01-01
Land surface models (LSMs) are computer programs, similar to weather and climate prediction models, which simulate the stocks and fluxes of water (including soil moisture, snow, evaporation, and runoff) and energy (including the temperature of and sensible heat released from the soil) after they arrive on the land surface as precipitation and sunlight. It is not currently possible to measure all of the variables of interest everywhere on Earth with sufficient accuracy and space-time resolution. Hence LSMs have been developed to integrate the available observations with our understanding of the physical processes involved, using powerful computers, in order to map these stocks and fluxes as they change in time. The maps are used to improve weather forecasts, support water resources and agricultural applications, and study the Earth"s water cycle and climate variability. NASA"s Global Land Data Assimilation System (GLDAS) project facilitates testing of several different LSMs with a variety of input datasets (e.g., precipitation, plant type).
A Functional Data Model Realized: LaTiS Deployments
NASA Astrophysics Data System (ADS)
Baltzer, T.; Lindholm, D. M.; Wilson, A.; Putnam, B.; Christofferson, R.; Flores, N.; Roughton, S.
2016-12-01
At prior AGU annual meetings, members of the University of Colorado Laboratory for Atmospheric and Space Physics (LASP) Web Team have described work being done on a functional data model and the software framework called LaTis, that implements it. This presentation describes the evolution of LaTiS and presents several instances of LaTiS in operation today that demonstrate its various capabilities. With LaTiS, serving a new dataset can be a simple as adding a small descriptor file. From providing access to spacecraft telemetry data in a variety of forms for the LASP missions operation group, to providing access to scientific data for the MMS and MAVEN science teams, to server-side functionality such as fusing satellite visible and infrared data along with forecast model data into a Geotiff image for situational awareness purposes, LaTiS has demonstrated itself as a highly flexible, standards-based framework that provides easy data access, dynamic reformatting, and customizable server side functionality.
NASA Technical Reports Server (NTRS)
1988-01-01
This report presents the on-going research activities at the NASA Marshall Space Flight Center for the year 1988. The subjects presented are space transportation systems, shuttle cargo vehicle, materials processing in space, environmental data base management, microgravity science, astronomy, astrophysics, solar physics, magnetospheric physics, aeronomy, atomic physics, rocket propulsion, materials and processes, telerobotics, and space systems.
Space physics strategy: Implementation study. Volume 2: Program plan
NASA Technical Reports Server (NTRS)
1991-01-01
In June, 1989, the Space Science and Applications Advisory Committee (SSAAC) authorized its Space Physics Subcommittee (SPS) to prepare a plan specifying the future missions, launch sequence, and encompassing themes of the Space Physics Division. The plan, now complete, is the product of a year-long study comprising two week-long workshops - in January and June 1990 - assisted by pre-workshop, inter-workshop, and post-workshop preparation and assessment activities. The workshops engaged about seventy participants, drawn equally from the Division's four science disciplines: cosmic and heliospheric physics, solar physics, magnetosphere physics, and ionosphere-thermosphere-mesospheric physics. An earlier report records the outcome of the first workshop; this is the report of the final workshop.
Building a better search engine for earth science data
NASA Astrophysics Data System (ADS)
Armstrong, E. M.; Yang, C. P.; Moroni, D. F.; McGibbney, L. J.; Jiang, Y.; Huang, T.; Greguska, F. R., III; Li, Y.; Finch, C. J.
2017-12-01
Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search interfaces.
Monitoring and long-term assessment of the Mediterranean Sea physical state
NASA Astrophysics Data System (ADS)
Simoncelli, Simona; Fratianni, Claudia; Clementi, Emanuela; Drudi, Massimiliano; Pistoia, Jenny; Grandi, Alessandro; Del Rosso, Damiano
2017-04-01
The near real time monitoring and long-term assessment of the physical state of the ocean are crucial for the wide CMEMS user community providing a continuous and up to date overview of key indicators computed from operational analysis and reanalysis datasets. This constitutes an operational warning system on particular events, stimulating the research towards a deeper understanding of them and consequently increasing CMEMS products uptake. Ocean Monitoring Indicators (OMIs) of some Essential Ocean Variables have been identified and developed by the Mediterranean Monitoring and Forecasting Centre (MED-MFC) under the umbrella of the CMEMS MYP WG (Multi Year Products Working Group). These OMIs have been operationally implemented starting from the physical reanalysis products and then they have been applied to the operational analyses product. Sea surface temperature, salinity, height as well as heat, water and momentum fluxes at the air-sea interface have been operationally implemented since the reanalysis system development as a real time monitoring of the data production. Their consistency analysis against available observational products or budget values recognized in literature guarantees the high quality of the numerical dataset. The results of the reanalysis validation procedures are yearly published in the QUality Information Document since 2014 available through the CMEMS catalogue (http://marine.copernicus.eu), together with the yearly dataset extension. New OMIs of the winter mixed layer depth, the eddy kinetic energy and the heat content will be presented, in particular we will analyze their time evolution and trends starting from 1987, then we will focus on the recent time period 2013-2016 when reanalysis and analyses datasets overlap to show their consistency beside their different system implementation (i.e. atmospheric forcing, wave coupling, nesting). At the end the focus will be on 2016 sea state and circulation of the Mediterranean Sea and its anomaly with respect to the climatological fields to early detect the 2016 peculiarities.
CMS Analysis and Data Reduction with Apache Spark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gutsche, Oliver; Canali, Luca; Cremer, Illia
Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and tools, promising a fresh look at analysis ofmore » very large datasets that could potentially reduce the time-to-physics with increased interactivity. Moreover these new tools are typically actively developed by large communities, often profiting of industry resources, and under open source licensing. These factors result in a boost for adoption and maturity of the tools and for the communities supporting them, at the same time helping in reducing the cost of ownership for the end-users. In this talk, we are presenting studies of using Apache Spark for end user data analysis. We are studying the HEP analysis workflow separated into two thrusts: the reduction of centrally produced experiment datasets and the end-analysis up to the publication plot. Studying the first thrust, CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. Studying the second thrust, we are presenting studies on using Apache Spark for a CMS Dark Matter physics search, comparing Spark's feasibility, usability and performance to the ROOT-based analysis.« less
Science Enabling Roles and Services of SPDF
NASA Technical Reports Server (NTRS)
McGuire, Robert E.; Bilitza, Dieter; Candey, Robert M.; Chimiak, Reine A.; Cooper, John F.; Garcia, Leonard N.; Harris, Bernard T.; Johnson, Rita C.; King, Joseph H.; Kovalick, Tamara J.;
2011-01-01
The current Heliophysics Science Data Management Policy defines the roles of the Space Physics Data Facility (SPDF) project as a heliophysics active Final Archive, a focus for critical data infrastructure services and a center of excellence for data and ancillary information services. This presentation will highlight some of our current activities and our understanding of why and how our services are useful to researchers, as well as SPDF's programmatic emphasis in the coming year. We will discuss how. in cooperation with the Heliophysics Virtual discipline Observatories (VxOs), we are working closely with the RBSP and MMS mission teams to support their decisions to use CDF as a primary format for their public data products, to leverage the ongoing data flows and capabilities of CDAWeb (directly and through external clients such as Autoplot) to serve their data in a multi-mission context and to use SSCWeb to assist community science planning and analysis. Among other current activities, we will also discuss and demonstrate our continuing effort to make the Virtual Space Physics Observatory (VSPO) service comprehensive in all significant and NASA relevant heliophysics data. The OMNI and OMNI High Resolution datasets remain current and heavily cited in publications. We are expanding our FTP file services to include online archived non-CDF data from all active missions, which is a re-hosting of this function from NSSDC's FTP site. We have extended the definitions of time in CDF to unambiguously and consistently handle leap seconds. We are improving SSCWeb for much faster per1ormance, more capabilities and a web services inter1ace to Query functionality. We will also review how CDAWeb data can be easily accessed within IDL and new features in CDAWeb.
AVHRR composite period selection for land cover classification
Maxwell, S.K.; Hoffer, R.M.; Chapman, P.L.
2002-01-01
Multitemporal satellite image datasets provide valuable information on the phenological characteristics of vegetation, thereby significantly increasing the accuracy of cover type classifications compared to single date classifications. However, the processing of these datasets can become very complex when dealing with multitemporal data combined with multispectral data. Advanced Very High Resolution Radiometer (AVHRR) biweekly composite data are commonly used to classify land cover over large regions. Selecting a subset of these biweekly composite periods may be required to reduce the complexity and cost of land cover mapping. The objective of our research was to evaluate the effect of reducing the number of composite periods and altering the spacing of those composite periods on classification accuracy. Because inter-annual variability can have a major impact on classification results, 5 years of AVHRR data were evaluated. AVHRR biweekly composite images for spectral channels 1-4 (visible, near-infrared and two thermal bands) covering the entire growing season were used to classify 14 cover types over the entire state of Colorado for each of five different years. A supervised classification method was applied to maintain consistent procedures for each case tested. Results indicate that the number of composite periods can be halved-reduced from 14 composite dates to seven composite dates-without significantly reducing overall classification accuracy (80.4% Kappa accuracy for the 14-composite data-set as compared to 80.0% for a seven-composite dataset). At least seven composite periods were required to ensure the classification accuracy was not affected by inter-annual variability due to climate fluctuations. Concentrating more composites near the beginning and end of the growing season, as compared to using evenly spaced time periods, consistently produced slightly higher classification values over the 5 years tested (average Kappa) of 80.3% for the heavy early/late case as compared to 79.0% for the alternate dataset case).
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.
A Spatially Distinct History of the Development of California Groundfish Fisheries
Miller, Rebecca R.; Field, John C.; Santora, Jarrod A.; Schroeder, Isaac D.; Huff, David D.; Key, Meisha; Pearson, Don E.; MacCall, Alec D.
2014-01-01
During the past century, commercial fisheries have expanded from small vessels fishing in shallow, coastal habitats to a broad suite of vessels and gears that fish virtually every marine habitat on the globe. Understanding how fisheries have developed in space and time is critical for interpreting and managing the response of ecosystems to the effects of fishing, however time series of spatially explicit data are typically rare. Recently, the 1933–1968 portion of the commercial catch dataset from the California Department of Fish and Wildlife was recovered and digitized, completing the full historical series for both commercial and recreational datasets from 1933–2010. These unique datasets include landing estimates at a coarse 10 by 10 minute “grid-block” spatial resolution and extends the entire length of coastal California up to 180 kilometers from shore. In this study, we focus on the catch history of groundfish which were mapped for each grid-block using the year at 50% cumulative catch and total historical catch per habitat area. We then constructed generalized linear models to quantify the relationship between spatiotemporal trends in groundfish catches, distance from ports, depth, percentage of days with wind speed over 15 knots, SST and ocean productivity. Our results indicate that over the history of these fisheries, catches have taken place in increasingly deeper habitat, at a greater distance from ports, and in increasingly inclement weather conditions. Understanding spatial development of groundfish fisheries and catches in California are critical for improving population models and for evaluating whether implicit stock assessment model assumptions of relative homogeneity of fisheries removals over time and space are reasonable. This newly reconstructed catch dataset and analysis provides a comprehensive appreciation for the development of groundfish fisheries with respect to commonly assumed trends of global fisheries patterns that are typically constrained by a lack of long-term spatial datasets. PMID:24967973
Gieder, Katherina D.; Karpanty, Sarah M.; Fraser, James D.; Catlin, Daniel H.; Gutierrez, Benjamin T.; Plant, Nathaniel G.; Turecek, Aaron M.; Thieler, E. Robert
2014-01-01
Sea-level rise and human development pose significant threats to shorebirds, particularly for species that utilize barrier island habitat. The piping plover (Charadrius melodus) is a federally-listed shorebird that nests on barrier islands and rapidly responds to changes in its physical environment, making it an excellent species with which to model how shorebird species may respond to habitat change related to sea-level rise and human development. The uncertainty and complexity in predicting sea-level rise, the responses of barrier island habitats to sea-level rise, and the responses of species to sea-level rise and human development necessitate a modelling approach that can link species to the physical habitat features that will be altered by changes in sea level and human development. We used a Bayesian network framework to develop a model that links piping plover nest presence to the physical features of their nesting habitat on a barrier island that is impacted by sea-level rise and human development, using three years of data (1999, 2002, and 2008) from Assateague Island National Seashore in Maryland. Our model performance results showed that we were able to successfully predict nest presence given a wide range of physical conditions within the model’s dataset. We found that model predictions were more successful when the range of physical conditions included in model development was varied rather than when those physical conditions were narrow. We also found that all model predictions had fewer false negatives (nests predicted to be absent when they were actually present in the dataset) than false positives (nests predicted to be present when they were actually absent in the dataset), indicating that our model correctly predicted nest presence better than nest absence. These results indicated that our approach of using a Bayesian network to link specific physical features to nest presence will be useful for modelling impacts of sea-level rise- or human-related habitat change on barrier islands. We recommend that potential users of this method utilize multiple years of data that represent a wide range of physical conditions in model development, because the model performed less well when constructed using a narrow range of physical conditions. Further, given that there will always be some uncertainty in predictions of future physical habitat conditions related to sea-level rise and/or human development, predictive models will perform best when developed using multiple, varied years of data input.
Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval
Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene
2018-01-01
Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379
A Modified Active Appearance Model Based on an Adaptive Artificial Bee Colony
Othman, Zulaiha Ali
2014-01-01
Active appearance model (AAM) is one of the most popular model-based approaches that have been extensively used to extract features by highly accurate modeling of human faces under various physical and environmental circumstances. However, in such active appearance model, fitting the model with original image is a challenging task. State of the art shows that optimization method is applicable to resolve this problem. However, another common problem is applying optimization. Hence, in this paper we propose an AAM based face recognition technique, which is capable of resolving the fitting problem of AAM by introducing a new adaptive ABC algorithm. The adaptation increases the efficiency of fitting as against the conventional ABC algorithm. We have used three datasets: CASIA dataset, property 2.5D face dataset, and UBIRIS v1 images dataset in our experiments. The results have revealed that the proposed face recognition technique has performed effectively, in terms of accuracy of face recognition. PMID:25165748
Page, William R.; Berry, Margaret E.; VanSistine, D. Paco; Snyders, Scott R.
2009-01-01
The purpose of this map is to provide an integrated, bi-national geologic map dataset for display and analyses on an Arc Internet Map Service (IMS) dedicated to environmental health studies in the United States-Mexico border region. The IMS web site was designed by the US-Mexico Border Environmental Health Initiative project and collaborators, and the IMS and project web site address is http://borderhealth.cr.usgs.gov/. The objective of the project is to acquire, evaluate, analyze, and provide earth, biologic, and human health resources data within a GIS framework (IMS) to further our understanding of possible linkages between the physical environment and public health issues. The geologic map dataset is just one of many datasets included in the web site; other datasets include biologic, hydrologic, geographic, and human health themes.
A dataset on human navigation strategies in foreign networked systems.
Kőrösi, Attila; Csoma, Attila; Rétvári, Gábor; Heszberger, Zalán; Bíró, József; Tapolcai, János; Pelle, István; Klajbár, Dávid; Novák, Márton; Halasi, Valentina; Gulyás, András
2018-03-13
Humans are involved in various real-life networked systems. The most obvious examples are social and collaboration networks but the language and the related mental lexicon they use, or the physical map of their territory can also be interpreted as networks. How do they find paths between endpoints in these networks? How do they obtain information about a foreign networked world they find themselves in, how they build mental model for it and how well they succeed in using it? Large, open datasets allowing the exploration of such questions are hard to find. Here we report a dataset collected by a smartphone application, in which players navigate between fixed length source and destination English words step-by-step by changing only one letter at a time. The paths reflect how the players master their navigation skills in such a foreign networked world. The dataset can be used in the study of human mental models for the world around us, or in a broader scope to investigate the navigation strategies in complex networked systems.
Klein, Max; Sharma, Rati; Bohrer, Chris H; Avelis, Cameron M; Roberts, Elijah
2017-01-15
Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Polverari, F.; Talone, M.; Crapolicchio, R. Levy, G.; Marzano, F.
2013-12-01
The European Remote-sensing Satellite (ERS)-2 scatterometer provides wind retrievals over Ocean. To satisfy the needs of high quality and homogeneous set of scatterometer measurements, the European Space Agency (ESA) has developed the project Advanced Scatterometer Processing System (ASPS) with which a long-term dataset of new ERS-2 wind products, with an enhanced resolution of 25km square, has been generated by the reprocessing of the entire ERS mission. This paper presents the main results of the validation work of such new dataset using in situ measurements provided by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA). The comparison indicates that, on average, the scatterometer data agree well with buoys measurements, however the scatterometer tends to overestimates lower winds and underestimates higher winds.
A Science Strategy for Space Physics
NASA Technical Reports Server (NTRS)
1995-01-01
This report by the Committee on Solar and Space Physics and the Committee on Solar-Terrestrial Research recommends the major directions for scientific research in space physics for the coming decade. As a field of science, space physics has passed through the stage of simply looking to see what is out beyond Earth's atmosphere. It has become a 'hard' science, focusing on understanding the fundamental interactions between charged particles, electromagnetic fields, and gases in the natural laboratory consisting of the galaxy, the Sun, the heliosphere, and planetary magnetospheres, ionospheres, and upper atmospheres. The motivation for space physics research goes far beyond basic physics and intellectual curiosity, however, because long-term variations in the brightness of the Sun virtually affect the habitability of the Earth, while sudden rearrangements of magnetic fields above the solar surface can have profound effects on the delicate balance of the forces that shape our environment in space and on the human technology that is sensitive to that balance. The several subfields of space physics share the following objectives: to understand the fundamental laws or processes of nature as they apply to space plasmas and rarefied gases both on the microscale and in the larger complex systems that constitute the domain of space physics; to understand the links between changes in the Sun and the resulting effects at the Earth, with the eventual goal of predicting the significant effects on the terrestrial environment; and to continue the exploration and description of the plasmas and rarefied gases in the solar system.
Book Review: Physics of the Space Environment
NASA Technical Reports Server (NTRS)
Holman, Gordon D.
1998-01-01
Space physics, narrowly defined as the study of Earth's plasma environment, has had an identity crisis throughout its relatively brief existence as a discipline. - The limited and often serendipitous nature of the data requires the research style of an astrophysicist. However, the in situ observations and instrumentation that are central to the field are quite different from the remote observations and instrumentation of astronomy. Compared to neutral gases, the wealth of additional phenomena and the complexity associated with magnetized plasmas and their interaction leaves little in common with the atmospheric scientist. Although the phenomena studied in space physics are ultimately important to astrophysics, the intimate measurements of plasma properties provide a greater commonality with the plasma physicist. Space physics has experienced something of a renaissance in the past few years. The interdisciplinary umbrella "Solar-Terrestrial Physics" or "Sun-Earth Connection" has stimulated an increasing interaction of space physicists, solar physicists and atmospheric scientists. Spectacular images of the Sun from Yohkoh and SOHO and solar-activity-related damage to communications satellites have increased the public's awareness of and interest in "space weather". The dangers of energetic particles and currents in space to technological systems and to future space exploration have elevated space physics observations from interesting scientific measurements that can be included on a space probe to critically important measurements that must be made.
Xu, Haotong; Li, Xiaoxiao; Zhang, Zhengzhi; Qiu, Mingguo; Mu, Qiwen; Wu, Yi; Tan, Liwen; Zhang, Shaoxiang; Zhang, Xiaoming
2011-01-01
Background The major hindrance to multidetector CT imaging of the left extraperitoneal space (LES), and the detailed spatial relationships to its related spaces, is that there is no obvious density difference between them. Traditional gross anatomy and thick-slice sectional anatomy imagery are also insufficient to show the anatomic features of this narrow space in three-dimensions (3D). To overcome these obstacles, we used a new method to visualize the anatomic features of the LES and its spatial associations with related spaces, in random sections and in 3D. Methods In conjunction with Mimics® and Amira® software, we used thin-slice cross-sectional images of the upper abdomen, retrieved from the Chinese and American Visible Human dataset and the Chinese Virtual Human dataset, to display anatomic features of the LES and spatial relationships of the LES to its related spaces, especially the gastric bare area. The anatomic location of the LES was presented on 3D sections reconstructed from CVH2 images and CT images. Principal Findings What calls for special attention of our results is the LES consists of the left sub-diaphragmatic fat space and gastric bare area. The appearance of the fat pad at the cardiac notch contributes to converting the shape of the anteroexternal surface of the LES from triangular to trapezoidal. Moreover, the LES is adjacent to the lesser omentum and the hepatic bare area in the anterointernal and right rear direction, respectively. Conclusion The LES and its related spaces were imaged in 3D using visualization technique for the first time. This technique is a promising new method for exploring detailed communication relationships among other abdominal spaces, and will promote research on the dynamic extension of abdominal diseases, such as acute pancreatitis and intra-abdominal carcinomatosis. PMID:22087259
REU Solar and Space Physics Summer School
NASA Astrophysics Data System (ADS)
Snow, M. A.; Wood, E. L.
2011-12-01
The Research Experience for Undergrads (REU) program in Solar and Space Physics at the University of Colorado begins with a week of lectures and labs on Solar and Space Physics. The students in our program come from a variety of majors (physics, engineering, meteorology, etc.) and from a wide range of schools (small liberal arts colleges up through large research universities). The majority of the students have never been exposed to solar and space physics before arriving in Boulder to begin their research projects. We have developed a week-long crash course in the field using the expertise of scientists in Boulder and the labs designed by the Center for Integrated Space Weather Modeling (CISM).
Data Bookkeeping Service 3 - Providing Event Metadata in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giffels, Manuel; Guo, Y.; Riley, Daniel
The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about 200, 000 datasets and more than 40 million files, which adds up in around 700 GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems [1], all kind of data-processing like Monte Carlo production,more » processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.« less
Overview of NASA's Carbon Monitoring System Flux-Pilot Project
NASA Technical Reports Server (NTRS)
Pawson, Steven; Gunson, Michael R.; Jucks, Kenneth
2011-01-01
NASA's space-based observations of physical, chemical and biological parameters in the Earth System along with state-of-the-art modeling capabilities provide unique capabilities for analyses of the carbon cycle. The Carbon Monitoring System is developing an exploratory framework for detecting carbon in the environment and its changes, with a view towards contributing to national and international monitoring activities. The Flux-Pilot Project aims to provide a unified view of land-atmosphere and ocean-atmosphere carbon exchange, using observation-constrained models. Central to the project is the application of NASA's satellite observations (especially MODIS), the ACOS retrievals of the JAXA-GOSAT observations, and the "MERRA" meteorological reanalysis produced with GEOS-S. With a primary objective of estimating uncertainty in computed fluxes, two land- and two ocean-systems are run for 2009-2010 and compared with existing flux estimates. An transport model is used to evaluate simulated CO2 concentrations with in-situ and space-based observations, in order to assess the realism of the fluxes and how uncertainties in fluxes propagate into atmospheric concentrations that can be more readily evaluated. Finally, the atmospheric partial CO2 columns observed from space are inverted to give new estimates of surface fluxes, which are evaluated using the bottom-up estimates and independent datasets. The focus of this presentation will be on the science goals and current achievements of the pilot project, with emphasis on how policy-relevant questions help focus the scientific direction. Examples include the issue of what spatio-temporal resolution of fluxes can be detected from polar-orbiting satellites and whether it is possible to use space-based observations to separate contributions to atmospheric concentrations of (say) fossil-fuel and biological activity
Govindhan, R; Karthikeyan, B
2017-10-01
The data presented in this article are related to the research entitled of UV-A stable nanotubes. The nanotubes have been prepared from 3,5-bis(trifluoromethyl)benzylamine derivative of tyrosine (BTTP). XRD data reveals the size of the nanotubes. As-synthesized nanotubes (BTTPNTs) are characterized by UV-vis optical absorption studies [1] and photo physical degradation kinetics. The resulted dataset is made available to enable critical or extended analyzes of the BTTPNTs as an excellent light resistive materials.
Green Space Visits among Adolescents: Frequency and Predictors in the PIAMA Birth Cohort Study.
Bloemsma, Lizan D; Gehring, Ulrike; Klompmaker, Jochem O; Hoek, Gerard; Janssen, Nicole A H; Smit, Henriëtte A; Vonk, Judith M; Brunekreef, Bert; Lebret, Erik; Wijga, Alet H
2018-04-30
Green space may influence health through several pathways, for example, increased physical activity, enhanced social cohesion, reduced stress, and improved air quality. For green space to increase physical activity and social cohesion, spending time in green spaces is likely to be important. We examined whether adolescents visit green spaces and for what purposes. Furthermore, we assessed the predictors of green space visits. In this cross-sectional study, data for 1911 participants of the Dutch PIAMA (Prevention and Incidence of Asthma and Mite Allergy) birth cohort were analyzed. At age 17, adolescents reported how often they visited green spaces for physical activities, social activities, relaxation, and to experience nature and quietness. We assessed the predictors of green space visits altogether and for different purposes by log-binomial regression. Fifty-three percent of the adolescents visited green spaces at least once a week in summer, mostly for physical and social activities. Adolescents reporting that a green environment was (very) important to them visited green spaces most frequently {adjusted prevalence ratio (PR) [95% confidence interval (CI)] very vs. not important: 6.84 (5.10, 9.17) for physical activities and 4.76 (3.72, 6.09) for social activities}. Boys and adolescents with highly educated fathers visited green spaces more often for physical and social activities. Adolescents who own a dog visited green spaces more often to experience nature and quietness. Green space visits were not associated with the objectively measured quantity of residential green space, i.e., the average normalized difference vegetation index (NDVI) and percentages of urban, agricultural, and natural green space in circular buffers around the adolescents' homes. Subjective variables are stronger predictors of green space visits in adolescents than the objectively measured quantity of residential green space. https://doi.org/10.1289/EHP2429.
Crowdsourcing Physical Network Topology Mapping With Net.Tagger
2016-03-01
backend server infrastructure . This in- cludes a full security audit, better web services handling, and integration with the OSM stack and dataset to...a novel approach to network infrastructure mapping that combines smartphone apps with crowdsourced collection to gather data for offline aggregation...and analysis. The project aims to build a map of physical network infrastructure such as fiber-optic cables, facilities, and access points. The
Twenty years of space radiation physics at the BNL AGS and NASA Space Radiation Laboratory.
Miller, J; Zeitlin, C
2016-06-01
Highly ionizing atomic nuclei HZE in the GCR will be a significant source of radiation exposure for humans on extended missions outside low Earth orbit. Accelerators such as the LBNL Bevalac and the BNL AGS, designed decades ago for fundamental nuclear and particle physics research, subsequently found use as sources of GCR-like particles for ground-based physics and biology research relevant to space flight. The NASA Space Radiation Laboratory at BNL was constructed specifically for space radiation research. Here we review some of the space-related physics results obtained over the first 20 years of NASA-sponsored research at Brookhaven. Copyright © 2016 The Committee on Space Research (COSPAR). Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
1986-01-01
Activities of the Goddard Space Flight Center are described in the areas of planets and interplanetary media, comets, astronomy and high-energy physics, solar physics, atmospheres, terrestrial physics, ocean science, sensors and space technology, techniques, user space data systems, space communications and navigation, and system and software engineering. Flight projects and mission definition studies are presented, and institutional technology is described.
HIGH-RESOLUTION DATASET OF URBAN CANOPY PARAMETERS FOR HOUSTON, TEXAS
Urban dispersion and air quality simulation models applied at various horizontal scales require different levels of fidelity for specifying the characteristics of the underlying surfaces. As the modeling scales approach the neighborhood level (~1 km horizontal grid spacing), the...
Designing Solar Data Archives: Practical Considerations
NASA Astrophysics Data System (ADS)
Messerotti, M.
The variety of new solar observatories in space and on the ground poses the stringent problem of an efficient storage and archiving of huge datasets. We briefly address some typical architectures and consider the key point of data access and distribution through networking.
2014-01-01
Background The built environment in which older people live plays an important role in promoting or inhibiting physical activity. Most work on this complex relationship between physical activity and the environment has excluded people with reduced physical function or ignored the difference between groups with different levels of physical function. This study aims to explore the role of neighbourhood green space in determining levels of participation in physical activity among elderly men with different levels of lower extremity physical function. Method Using data collected from the Caerphilly Prospective Study (CaPS) and green space data collected from high resolution Landmap true colour aerial photography, we first investigated the effect of the quantity of neighbourhood green space and the variation in neighbourhood vegetation on participation in physical activity for 1,010 men aged 66 and over in Caerphilly county borough, Wales, UK. Second, we explored whether neighbourhood green space affects groups with different levels of lower extremity physical function in different ways. Results Increasing percentage of green space within a 400 meters radius buffer around the home was significantly associated with more participation in physical activity after adjusting for lower extremity physical function, psychological distress, general health, car ownership, age group, marital status, social class, education level and other environmental factors (OR = 1.21, 95% CI 1.05, 1.41). A statistically significant interaction between the variation in neighbourhood vegetation and lower extremity physical function was observed (OR = 1.92, 95% CI 1.12, 3.28). Conclusion Elderly men living in neighbourhoods with more green space have higher levels of participation in regular physical activity. The association between variation in neighbourhood vegetation and regular physical activity varied according to lower extremity physical function. Subjects reporting poor lower extremity physical function living in neighbourhoods with more homogeneous vegetation (i.e. low variation) were more likely to participate in regular physical activity than those living in neighbourhoods with less homogeneous vegetation (i.e. high variation). Good lower extremity physical function reduced the adverse effect of high variation vegetation on participation in regular physical activity. This provides a basis for the future development of novel interventions that aim to increase levels of physical activity in later life, and has implications for planning policy to design, preserve, facilitate and encourage the use of green space near home. PMID:24646136
Gong, Yi; Gallacher, John; Palmer, Stephen; Fone, David
2014-03-19
The built environment in which older people live plays an important role in promoting or inhibiting physical activity. Most work on this complex relationship between physical activity and the environment has excluded people with reduced physical function or ignored the difference between groups with different levels of physical function. This study aims to explore the role of neighbourhood green space in determining levels of participation in physical activity among elderly men with different levels of lower extremity physical function. Using data collected from the Caerphilly Prospective Study (CaPS) and green space data collected from high resolution Landmap true colour aerial photography, we first investigated the effect of the quantity of neighbourhood green space and the variation in neighbourhood vegetation on participation in physical activity for 1,010 men aged 66 and over in Caerphilly county borough, Wales, UK. Second, we explored whether neighbourhood green space affects groups with different levels of lower extremity physical function in different ways. Increasing percentage of green space within a 400 meters radius buffer around the home was significantly associated with more participation in physical activity after adjusting for lower extremity physical function, psychological distress, general health, car ownership, age group, marital status, social class, education level and other environmental factors (OR = 1.21, 95% CI 1.05, 1.41). A statistically significant interaction between the variation in neighbourhood vegetation and lower extremity physical function was observed (OR = 1.92, 95% CI 1.12, 3.28). Elderly men living in neighbourhoods with more green space have higher levels of participation in regular physical activity. The association between variation in neighbourhood vegetation and regular physical activity varied according to lower extremity physical function. Subjects reporting poor lower extremity physical function living in neighbourhoods with more homogeneous vegetation (i.e. low variation) were more likely to participate in regular physical activity than those living in neighbourhoods with less homogeneous vegetation (i.e. high variation). Good lower extremity physical function reduced the adverse effect of high variation vegetation on participation in regular physical activity. This provides a basis for the future development of novel interventions that aim to increase levels of physical activity in later life, and has implications for planning policy to design, preserve, facilitate and encourage the use of green space near home.
Carmen Legaz-García, María Del; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás
2016-06-03
Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats. We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes. The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned. We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.
Pubface: Celebrity face identification based on deep learning
NASA Astrophysics Data System (ADS)
Ouanan, H.; Ouanan, M.; Aksasse, B.
2018-05-01
In this paper, we describe a new real time application called PubFace, which allows to recognize celebrities in public spaces by employs a new pose invariant face recognition deep neural network algorithm with an extremely low error rate. To build this application, we make the following contributions: firstly, we build a novel dataset with over five million faces labelled. Secondly, we fine tuning the deep convolutional neural network (CNN) VGG-16 architecture on our new dataset that we have built. Finally, we deploy this model on the Raspberry Pi 3 model B using the OpenCv dnn module (OpenCV 3.3).
Volcanoes Distribution in Linear Segmentation of Mariana Arc
NASA Astrophysics Data System (ADS)
Andikagumi, H.; Macpherson, C.; McCaffrey, K. J. W.
2016-12-01
A new method has been developed to describe better volcanoes distribution pattern within Mariana Arc. A previous study assumed the distribution of volcanoes in the Mariana Arc is described by a small circle distribution which reflects the melting processes in a curved subduction zone. The small circle fit to this dataset used in the study, comprised 12 -mainly subaerial- volcanoes from Smithsonian Institute Global Volcanism Program, was reassessed by us to have a root-mean-square misfit of 2.5 km. The same method applied to a more complete dataset from Baker et al. (2008), consisting 37 subaerial and submarine volcanoes, resulted in an 8.4 km misfit. However, using the Hough Transform method on the larger dataset, lower misfits of great circle segments were achieved (3.1 and 3.0 km) for two possible segments combination. The results indicate that the distribution of volcanoes in the Mariana Arc is better described by a great circle pattern, instead of small circle. Variogram and cross-variogram analysis on volcano spacing and volume shows that there is spatial correlation between volcanoes between 420 and 500 km which corresponds to the maximum segmentation lengths from Hough Transform (320 km). Further analysis of volcano spacing by the coefficient of variation (Cv), shows a tendency toward not-random distribution as the Cv values are closer to zero than one. These distributions are inferred to be associated with the development of normal faults at the back arc as their Cv values also tend towards zero. To analyse whether volcano spacing is random or not, Cv values were simulated using a Monte Carlo method with random input. Only the southernmost segment has allowed us to reject the null hypothesis that volcanoes are randomly spaced at 95% confidence level by 0.007 estimated probability. This result shows infrequent regularity in volcano spacing by chance so that controlling factor in lithospheric scale should be analysed with different approach (not from random number generator). Sunda Arc which has been studied to have en enchelon segmentation and larger number of volcanoes will be further studied to understand particular upper plate influence in volcanoes distribution.
Removal of nuisance signals from limited and sparse 1H MRSI data using a union-of-subspaces model.
Ma, Chao; Lam, Fan; Johnson, Curtis L; Liang, Zhi-Pei
2016-02-01
To remove nuisance signals (e.g., water and lipid signals) for (1) H MRSI data collected from the brain with limited and/or sparse (k, t)-space coverage. A union-of-subspace model is proposed for removing nuisance signals. The model exploits the partial separability of both the nuisance signals and the metabolite signal, and decomposes an MRSI dataset into several sets of generalized voxels that share the same spectral distributions. This model enables the estimation of the nuisance signals from an MRSI dataset that has limited and/or sparse (k, t)-space coverage. The proposed method has been evaluated using in vivo MRSI data. For conventional chemical shift imaging data with limited k-space coverage, the proposed method produced "lipid-free" spectra without lipid suppression during data acquisition at 130 ms echo time. For sparse (k, t)-space data acquired with conventional pulses for water and lipid suppression, the proposed method was also able to remove the remaining water and lipid signals with negligible residuals. Nuisance signals in (1) H MRSI data reside in low-dimensional subspaces. This property can be utilized for estimation and removal of nuisance signals from (1) H MRSI data even when they have limited and/or sparse coverage of (k, t)-space. The proposed method should prove useful especially for accelerated high-resolution (1) H MRSI of the brain. © 2015 Wiley Periodicals, Inc.
Persistent identifiers for CMIP6 data in the Earth System Grid Federation
NASA Astrophysics Data System (ADS)
Buurman, Merret; Weigel, Tobias; Juckes, Martin; Lautenschlager, Michael; Kindermann, Stephan
2016-04-01
The Earth System Grid Federation (ESGF) is a distributed data infrastructure that will provide access to the CMIP6 experiment data. The data consist of thousands of datasets composed of millions of files. Over the course of the CMIP6 operational phase, datasets may be retracted and replaced by newer versions that consist of completely or partly new files. Each dataset is hosted at a single data centre, but can have one or several backups (replicas) at other data centres. To keep track of the different data entities and relationships between them, to ensure their consistency and improve exchange of information about them, Persistent Identifiers (PIDs) are used. These are unique identifiers that are registered at a globally accessible server, along with some metadata (the PID record). While usually providing access to the data object they refer to, as long as it exists, the metadata record will remain available even beyond the object's lifetime. Besides providing access to data and metadata, PIDs will allow scientists to communicate effectively and on a fine granularity about CMIP6 data. The initiative to introduce PIDs in the ESGF infrastructure has been described and agreed upon through a series of white papers governed by the WGCM Infrastructure Panel (WIP). In CMIP6, each dataset and each file is assigned a PID that keeps track of the data object's physical copies throughout the object lifetime. In addition to this, its relationship with other data objects is stored in the PID recordA human-readable version of this information is available on an information page also linked in the PID record. A possible application that exploits the information available from the PID records is a smart information tool, which a scientific user can call to find out if his/her version was replaced by a new one, to view and browse the related datasets and files, and to get access to the various copies or to additional metadata on a dedicated website. The PID registration process is embedded in the ESGF data publication process. During their first publication, the PID records are populated with metadata including the parent dataset(s), other existing versions and physical location. Every subsequent publication, un-publication or replica publication of a dataset or file then updates the PID records to keep track of changing physical locations of the data (or lack thereof) and of reported errors in the data. Assembling the metadata records and registering the PIDs on a central server is a potential performance bottleneck as millions of data objects may be published in a short timeframe when the CMIP6 experiment phase begins. For this reason, the PID registration and metadata update tasks are pushed to a message queueing system facilitating high availability and scalability and then processed asynchronously. This will lead to a slight delay in PID registration but will avoid blocking resources at the data centres and slowing down the publication of the data so eagerly awaited by the scientists.
Recently amplified arctic warming has contributed to a continual global warming trend
NASA Astrophysics Data System (ADS)
Huang, Jianbin; Zhang, Xiangdong; Zhang, Qiyi; Lin, Yanluan; Hao, Mingju; Luo, Yong; Zhao, Zongci; Yao, Yao; Chen, Xin; Wang, Lei; Nie, Suping; Yin, Yizhou; Xu, Ying; Zhang, Jiansong
2017-12-01
The existence and magnitude of the recently suggested global warming hiatus, or slowdown, have been strongly debated1-3. Although various physical processes4-8 have been examined to elucidate this phenomenon, the accuracy and completeness of observational data that comprise global average surface air temperature (SAT) datasets is a concern9,10. In particular, these datasets lack either complete geographic coverage or in situ observations over the Arctic, owing to the sparse observational network in this area9. As a consequence, the contribution of Arctic warming to global SAT changes may have been underestimated, leading to an uncertainty in the hiatus debate. Here, we constructed a new Arctic SAT dataset using the most recently updated global SATs2 and a drifting buoys based Arctic SAT dataset11 through employing the `data interpolating empirical orthogonal functions' method12. Our estimate of global SAT rate of increase is around 0.112 °C per decade, instead of 0.05 °C per decade from IPCC AR51, for 1998-2012. Analysis of this dataset shows that the amplified Arctic warming over the past decade has significantly contributed to a continual global warming trend, rather than a hiatus or slowdown.
Moving through Life-Space Areas and Objectively Measured Physical Activity of Older People.
Portegijs, Erja; Tsai, Li-Tang; Rantanen, Taina; Rantakokko, Merja
2015-01-01
Physical activity-an important determinant of health and function in old age-may vary according to the life-space area reached. Our aim was to study how moving through greater life-space areas is associated with greater physical activity of community-dwelling older people. The association between objectively measured physical activity and life-space area reached on different days by the same individual was studied using one-week longitudinal data, to provide insight in causal relationships. One-week surveillance of objectively assessed physical activity of community-dwelling 70-90-year-old people in central Finland from the "Life-space mobility in old age" cohort substudy (N = 174). In spring 2012, participants wore an accelerometer for 7 days and completed a daily diary including the largest life-space area reached (inside home, outside home, neighborhood, town, and beyond town). The daily step count, and the time in moderate (incl. walking) and low activity and sedentary behavior were assessed. Differences in physical activity between days on which different life-space areas were reached were tested using Generalized Estimation Equation models (within-group comparison). Participants' mean age was 80.4±4.2 years and 63.5% were female. Participants had higher average step counts (p < .001) and greater moderate and low activity time (p < .001) on days when greater life-space areas were reached, from the home to the town area. Only low activity time continued to increase when moving beyond the town. Community-dwelling older people were more physically active on days when they moved through greater life-space areas. While it is unknown whether physical activity was a motivator to leave the home, intervention studies are needed to determine whether facilitation of daily outdoor mobility, regardless of the purpose, may be beneficial in terms of promoting physical activity.
Audigier, Chloé; Mansi, Tommaso; Delingette, Hervé; Rapaka, Saikiran; Passerini, Tiziano; Mihalef, Viorel; Jolly, Marie-Pierre; Pop, Raoul; Diana, Michele; Soler, Luc; Kamen, Ali; Comaniciu, Dorin; Ayache, Nicholas
2017-09-01
We aim at developing a framework for the validation of a subject-specific multi-physics model of liver tumor radiofrequency ablation (RFA). The RFA computation becomes subject specific after several levels of personalization: geometrical and biophysical (hemodynamics, heat transfer and an extended cellular necrosis model). We present a comprehensive experimental setup combining multimodal, pre- and postoperative anatomical and functional images, as well as the interventional monitoring of intra-operative signals: the temperature and delivered power. To exploit this dataset, an efficient processing pipeline is introduced, which copes with image noise, variable resolution and anisotropy. The validation study includes twelve ablations from five healthy pig livers: a mean point-to-mesh error between predicted and actual ablation extent of 5.3 ± 3.6 mm is achieved. This enables an end-to-end preclinical validation framework that considers the available dataset.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dhou, S; Cai, W; Hurwitz, M
2015-06-15
Purpose: Respiratory-correlated cone-beam CT (4DCBCT) images acquired immediately prior to treatment have the potential to represent patient motion patterns and anatomy during treatment, including both intra- and inter-fractional changes. We develop a method to generate patient-specific motion models based on 4DCBCT images acquired with existing clinical equipment and used to generate time varying volumetric images (3D fluoroscopic images) representing motion during treatment delivery. Methods: Motion models are derived by deformably registering each 4DCBCT phase to a reference phase, and performing principal component analysis (PCA) on the resulting displacement vector fields. 3D fluoroscopic images are estimated by optimizing the resulting PCAmore » coefficients iteratively through comparison of the cone-beam projections simulating kV treatment imaging and digitally reconstructed radiographs generated from the motion model. Patient and physical phantom datasets are used to evaluate the method in terms of tumor localization error compared to manually defined ground truth positions. Results: 4DCBCT-based motion models were derived and used to generate 3D fluoroscopic images at treatment time. For the patient datasets, the average tumor localization error and the 95th percentile were 1.57 and 3.13 respectively in subsets of four patient datasets. For the physical phantom datasets, the average tumor localization error and the 95th percentile were 1.14 and 2.78 respectively in two datasets. 4DCBCT motion models are shown to perform well in the context of generating 3D fluoroscopic images due to their ability to reproduce anatomical changes at treatment time. Conclusion: This study showed the feasibility of deriving 4DCBCT-based motion models and using them to generate 3D fluoroscopic images at treatment time in real clinical settings. 4DCBCT-based motion models were found to account for the 3D non-rigid motion of the patient anatomy during treatment and have the potential to localize tumor and other patient anatomical structures at treatment time even when inter-fractional changes occur. This project was supported, in part, through a Master Research Agreement with Varian Medical Systems, Inc., Palo Alto, CA. The project was also supported, in part, by Award Number R21CA156068 from the National Cancer Institute.« less
NASA Astrophysics Data System (ADS)
Hiebl, Johann; Frei, Christoph
2018-04-01
Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.
Khan, Arshad M.; Perez, Jose G.; Wells, Claire E.; Fuentes, Olac
2018-01-01
The rat has arguably the most widely studied brain among all animals, with numerous reference atlases for rat brain having been published since 1946. For example, many neuroscientists have used the atlases of Paxinos and Watson (PW, first published in 1982) or Swanson (S, first published in 1992) as guides to probe or map specific rat brain structures and their connections. Despite nearly three decades of contemporaneous publication, no independent attempt has been made to establish a basic framework that allows data mapped in PW to be placed in register with S, or vice versa. Such data migration would allow scientists to accurately contextualize neuroanatomical data mapped exclusively in only one atlas with data mapped in the other. Here, we provide a tool that allows levels from any of the seven published editions of atlases comprising three distinct PW reference spaces to be aligned to atlas levels from any of the four published editions representing S reference space. This alignment is based on registration of the anteroposterior stereotaxic coordinate (z) measured from the skull landmark, Bregma (β). Atlas level alignments performed along the z axis using one-dimensional Cleveland dot plots were in general agreement with alignments obtained independently using a custom-made computer vision application that utilized the scale-invariant feature transform (SIFT) and Random Sample Consensus (RANSAC) operation to compare regions of interest in photomicrographs of Nissl-stained tissue sections from the PW and S reference spaces. We show that z-aligned point source data (unpublished hypothalamic microinjection sites) can be migrated from PW to S space to a first-order approximation in the mediolateral and dorsoventral dimensions using anisotropic scaling of the vector-formatted atlas templates, together with expert-guided relocation of obvious outliers in the migrated datasets. The migrated data can be contextualized with other datasets mapped in S space, including neuronal cell bodies, axons, and chemoarchitecture; to generate data-constrained hypotheses difficult to formulate otherwise. The alignment strategies provided in this study constitute a basic starting point for first-order, user-guided data migration between PW and S reference spaces along three dimensions that is potentially extensible to other spatial reference systems for the rat brain. PMID:29765309
EnviroAtlas - Cleveland, OH - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. In this community, tree cover is defined as Trees & Forest and Woody Wetlands. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets)
EnviroAtlas - Minneapolis/St. Paul, MN - Estimated Percent Tree Cover Along Walkable Roads
This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. In this community, tree cover is defined as Trees and Forest and Woody Wetlands. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas/EnviroAtlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets)
Optimal SVM parameter selection for non-separable and unbalanced datasets.
Jiang, Peng; Missoum, Samy; Chen, Zhao
2014-10-01
This article presents a study of three validation metrics used for the selection of optimal parameters of a support vector machine (SVM) classifier in the case of non-separable and unbalanced datasets. This situation is often encountered when the data is obtained experimentally or clinically. The three metrics selected in this work are the area under the ROC curve (AUC), accuracy, and balanced accuracy. These validation metrics are tested using computational data only, which enables the creation of fully separable sets of data. This way, non-separable datasets, representative of a real-world problem, can be created by projection onto a lower dimensional sub-space. The knowledge of the separable dataset, unknown in real-world problems, provides a reference to compare the three validation metrics using a quantity referred to as the "weighted likelihood". As an application example, the study investigates a classification model for hip fracture prediction. The data is obtained from a parameterized finite element model of a femur. The performance of the various validation metrics is studied for several levels of separability, ratios of unbalance, and training set sizes.
Statistical Exploration of Electronic Structure of Molecules from Quantum Monte-Carlo Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prabhat, Mr; Zubarev, Dmitry; Lester, Jr., William A.
In this report, we present results from analysis of Quantum Monte Carlo (QMC) simulation data with the goal of determining internal structure of a 3N-dimensional phase space of an N-electron molecule. We are interested in mining the simulation data for patterns that might be indicative of the bond rearrangement as molecules change electronic states. We examined simulation output that tracks the positions of two coupled electrons in the singlet and triplet states of an H2 molecule. The electrons trace out a trajectory, which was analyzed with a number of statistical techniques. This project was intended to address the following scientificmore » questions: (1) Do high-dimensional phase spaces characterizing electronic structure of molecules tend to cluster in any natural way? Do we see a change in clustering patterns as we explore different electronic states of the same molecule? (2) Since it is hard to understand the high-dimensional space of trajectories, can we project these trajectories to a lower dimensional subspace to gain a better understanding of patterns? (3) Do trajectories inherently lie in a lower-dimensional manifold? Can we recover that manifold? After extensive statistical analysis, we are now in a better position to respond to these questions. (1) We definitely see clustering patterns, and differences between the H2 and H2tri datasets. These are revealed by the pamk method in a fairly reliable manner and can potentially be used to distinguish bonded and non-bonded systems and get insight into the nature of bonding. (2) Projecting to a lower dimensional subspace ({approx}4-5) using PCA or Kernel PCA reveals interesting patterns in the distribution of scalar values, which can be related to the existing descriptors of electronic structure of molecules. Also, these results can be immediately used to develop robust tools for analysis of noisy data obtained during QMC simulations (3) All dimensionality reduction and estimation techniques that we tried seem to indicate that one needs 4 or 5 components to account for most of the variance in the data, hence this 5D dataset does not necessarily lie on a well-defined, low dimensional manifold. In terms of specific clustering techniques, K-means was generally useful in exploring the dataset. The partition around medoids (pam) technique produced the most definitive results for our data showing distinctive patterns for both a sample of the complete data and time-series. The gap statistic with tibshirani criteria did not provide any distinction across the 2 dataset. The gap statistic w/DandF criteria, Model based clustering and hierarchical modeling simply failed to run on our datasets. Thankfully, the vanilla PCA technique was successful in handling our entire dataset. PCA revealed some interesting patterns for the scalar value distribution. Kernel PCA techniques (vanilladot, RBF, Polynomial) and MDS failed to run on the entire dataset, or even a significant fraction of the dataset, and we resorted to creating an explicit feature map followed by conventional PCA. Clustering using K-means and PAM in the new basis set seems to produce promising results. Understanding the new basis set in the scientific context of the problem is challenging, and we are currently working to further examine and interpret the results.« less
NASA Astrophysics Data System (ADS)
Heather, David; Besse, Sebastien; Vallat, Claire; Barbarisi, Isa; Arviset, Christophe; De Marchi, Guido; Barthelemy, Maud; Coia, Daniela; Costa, Marc; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; MacFarlane, Alan; Martinez, Santa; Rios, Carlos; Vallejo, Fran; Saiz, Jaime
2017-04-01
The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://psa.esa.int. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA is currently implementing a number of significant improvements, mostly driven by the evolution of the PDS standard, and the growing need for better interfaces and advanced applications to support science exploitation. As of the end of 2016, the PSA is hosting data from all of ESA's planetary missions. This includes ESA's first planetary mission Giotto that encountered comet 1P/Halley in 1986 with a flyby at 800km. Science data from Venus Express, Mars Express, Huygens and the SMART-1 mission are also all available at the PSA. The PSA also contains all science data from Rosetta, which explored comet 67P/Churyumov-Gerasimenko and asteroids Steins and Lutetia. The year 2016 has seen the arrival of the ExoMars 2016 data in the archive. In the upcoming years, at least three new projects are foreseen to be fully archived at the PSA. The BepiColombo mission is scheduled for launch in 2018. Following that, the ExoMars Rover Surface Platform (RSP) in 2020, and then the JUpiter ICy moon Explorer (JUICE). All of these will archive their data in the PSA. In addition, a few ground-based support programmes are also available, especially for the Venus Express and Rosetta missions. The newly designed PSA will enhance the user experience and will significantly reduce the complexity for users to find their data promoting one-click access to the scientific datasets with more customized views when needed. This includes a better integration with Planetary GIS analysis tools and Planetary interoperability services (search and retrieve data, supporting e.g. PDAP, EPN-TAP). It will also be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's ExoMars and upcoming BepiColombo missions. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. The new PSA interface was released in January 2017. The home page provides a direct and simple access to the scientific data, aiming to help scientists to discover and explore its content. The archive can be explored through a set of parameters that allow the selection of products through space and time. Quick views provide information needed for the selection of appropriate scientific products. During 2017, the PSA team will focus their efforts on developing a map search interface using GIS technologies to display ESA planetary datasets, an image gallery providing navigation through images to explore the datasets, and interoperability with international partners. This will be done in parallel with additional metadata searchable through the interface (i.e., geometry), and with a dedication to improve the content of 20 years of space exploration.
Physics Model-Based Scatter Correction in Multi-Source Interior Computed Tomography.
Gong, Hao; Li, Bin; Jia, Xun; Cao, Guohua
2018-02-01
Multi-source interior computed tomography (CT) has a great potential to provide ultra-fast and organ-oriented imaging at low radiation dose. However, X-ray cross scattering from multiple simultaneously activated X-ray imaging chains compromises imaging quality. Previously, we published two hardware-based scatter correction methods for multi-source interior CT. Here, we propose a software-based scatter correction method, with the benefit of no need for hardware modifications. The new method is based on a physics model and an iterative framework. The physics model was derived analytically, and was used to calculate X-ray scattering signals in both forward direction and cross directions in multi-source interior CT. The physics model was integrated to an iterative scatter correction framework to reduce scatter artifacts. The method was applied to phantom data from both Monte Carlo simulations and physical experimentation that were designed to emulate the image acquisition in a multi-source interior CT architecture recently proposed by our team. The proposed scatter correction method reduced scatter artifacts significantly, even with only one iteration. Within a few iterations, the reconstructed images fast converged toward the "scatter-free" reference images. After applying the scatter correction method, the maximum CT number error at the region-of-interests (ROIs) was reduced to 46 HU in numerical phantom dataset and 48 HU in physical phantom dataset respectively, and the contrast-noise-ratio at those ROIs increased by up to 44.3% and up to 19.7%, respectively. The proposed physics model-based iterative scatter correction method could be useful for scatter correction in dual-source or multi-source CT.
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.
2007-12-01
The General Earth Science Investigation Suite (GENESIS) project is a NASA-sponsored partnership between the Jet Propulsion Laboratory, academia, and NASA data centers to develop a new suite of Web Services tools to facilitate multi-sensor investigations in Earth System Science. The goal of GENESIS is to enable large-scale, multi-instrument atmospheric science using combined datasets from the AIRS, MODIS, MISR, and GPS sensors. Investigations include cross-comparison of spaceborne climate sensors, cloud spectral analysis, study of upper troposphere-stratosphere water transport, study of the aerosol indirect cloud effect, and global climate model validation. The challenges are to bring together very large datasets, reformat and understand the individual instrument retrievals, co-register or re-grid the retrieved physical parameters, perform computationally-intensive data fusion and data mining operations, and accumulate complex statistics over months to years of data. To meet these challenges, we have developed a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data access, subsetting, registration, mining, fusion, compression, and advanced statistical analysis. SciFlo leverages remote Web Services, called via Simple Object Access Protocol (SOAP) or REST (one-line) URLs, and the Grid Computing standards (WS-* & Globus Alliance toolkits), and enables scientists to do multi- instrument Earth Science by assembling reusable Web Services and native executables into a distributed computing flow (tree of operators). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. In particular, SciFlo exploits the wealth of datasets accessible by OpenGIS Consortium (OGC) Web Mapping Servers & Web Coverage Servers (WMS/WCS), and by Open Data Access Protocol (OpenDAP) servers. SciFlo also publishes its own SOAP services for space/time query and subsetting of Earth Science datasets, and automated access to large datasets via lists of (FTP, HTTP, or DAP) URLs which point to on-line HDF or netCDF files. Typical distributed workflows obtain datasets by calling standard WMS/WCS servers or discovering and fetching data granules from ftp sites; invoke remote analysis operators available as SOAP services (interface described by a WSDL document); and merge results into binary containers (netCDF or HDF files) for further analysis using local executable operators. Naming conventions (HDFEOS and CF-1.0 for netCDF) are exploited to automatically understand and read on-line datasets. More interoperable conventions, and broader adoption of existing converntions, are vital if we are to "scale up" automated choreography of Web Services beyond toy applications. Recently, the ESIP Federation sponsored a collaborative activity in which several ESIP members developed some collaborative science scenarios for atmospheric and aerosol science, and then choreographed services from multiple groups into demonstration workflows using the SciFlo engine and a Business Process Execution Language (BPEL) workflow engine. We will discuss the lessons learned from this activity, the need for standardized interfaces (like WMS/WCS), the difficulty in agreeing on even simple XML formats and interfaces, the benefits of doing collaborative science analysis at the "touch of a button" once services are connected, and further collaborations that are being pursued.
Data-Oriented Astrophysics at NOAO: The Science Archive & The Data Lab
NASA Astrophysics Data System (ADS)
Juneau, Stephanie; NOAO Data Lab, NOAO Science Archive
2018-06-01
As we keep progressing into an era of increasingly large astronomy datasets, NOAO’s data-oriented mission is growing in prominence. The NOAO Science Archive, which captures and processes the pixel data from mountaintops in Chile and Arizona, now contains holdings at Petabyte scales. Working at the intersection of astronomy and data science, the main goal of the NOAO Data Lab is to provide users with a suite of tools to work close to this data, the catalogs derived from them, as well as externally provided datasets, and thus optimize the scientific productivity of the astronomy community. These tools and services include databases, query tools, virtual storage space, workflows through our Jupyter Notebook server, and scripted analysis. We currently host datasets from NOAO facilities such as the Dark Energy Survey (DES), the DESI imaging Legacy Surveys (LS), the Dark Energy Camera Plane Survey (DECaPS), and the nearly all-sky NOAO Source Catalog (NSC). We are further preparing for large spectroscopy datasets such as DESI. After a brief overview of the Science Archive, the Data Lab and datasets, I will briefly showcase scientific applications showing use of our data holdings. Lastly, I will describe our vision for future developments as we tackle the next technical and scientific challenges.
Design and implementation of space physics multi-model application integration based on web
NASA Astrophysics Data System (ADS)
Jiang, Wenping; Zou, Ziming
With the development of research on space environment and space science, how to develop network online computing environment of space weather, space environment and space physics models for Chinese scientific community is becoming more and more important in recent years. Currently, There are two software modes on space physics multi-model application integrated system (SPMAIS) such as C/S and B/S. the C/S mode which is traditional and stand-alone, demands a team or workshop from many disciplines and specialties to build their own multi-model application integrated system, that requires the client must be deployed in different physical regions when user visits the integrated system. Thus, this requirement brings two shortcomings: reducing the efficiency of researchers who use the models to compute; inconvenience of accessing the data. Therefore, it is necessary to create a shared network resource access environment which could help users to visit the computing resources of space physics models through the terminal quickly for conducting space science research and forecasting spatial environment. The SPMAIS develops high-performance, first-principles in B/S mode based on computational models of the space environment and uses these models to predict "Space Weather", to understand space mission data and to further our understanding of the solar system. the main goal of space physics multi-model application integration system (SPMAIS) is to provide an easily and convenient user-driven online models operating environment. up to now, the SPMAIS have contained dozens of space environment models , including international AP8/AE8 IGRF T96 models and solar proton prediction model geomagnetic transmission model etc. which are developed by Chinese scientists. another function of SPMAIS is to integrate space observation data sets which offers input data for models online high-speed computing. In this paper, service-oriented architecture (SOA) concept that divides system into independent modules according to different business needs is applied to solve the problem of the independence of the physical space between multiple models. The classic MVC(Model View Controller) software design pattern is concerned to build the architecture of space physics multi-model application integrated system. The JSP+servlet+javabean technology is used to integrate the web application programs of space physics multi-model. It solves the problem of multi-user requesting the same job of model computing and effectively balances each server computing tasks. In addition, we also complete follow tasks: establishing standard graphical user interface based on Java Applet application program; Designing the interface between model computing and model computing results visualization; Realizing three-dimensional network visualization without plug-ins; Using Java3D technology to achieve a three-dimensional network scene interaction; Improved ability to interact with web pages and dynamic execution capabilities, including rendering three-dimensional graphics, fonts and color control. Through the design and implementation of the SPMAIS based on Web, we provide an online computing and application runtime environment of space physics multi-model. The practical application improves that researchers could be benefit from our system in space physics research and engineering applications.
Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu
2012-12-01
Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
Kumar, Shiu; Mamun, Kabir; Sharma, Alok
2017-12-01
Classification of electroencephalography (EEG) signals for motor imagery based brain computer interface (MI-BCI) is an exigent task and common spatial pattern (CSP) has been extensively explored for this purpose. In this work, we focused on developing a new framework for classification of EEG signals for MI-BCI. We propose a single band CSP framework for MI-BCI that utilizes the concept of tangent space mapping (TSM) in the manifold of covariance matrices. The proposed method is named CSP-TSM. Spatial filtering is performed on the bandpass filtered MI EEG signal. Riemannian tangent space is utilized for extracting features from the spatial filtered signal. The TSM features are then fused with the CSP variance based features and feature selection is performed using Lasso. Linear discriminant analysis (LDA) is then applied to the selected features and finally classification is done using support vector machine (SVM) classifier. The proposed framework gives improved performance for MI EEG signal classification in comparison with several competing methods. Experiments conducted shows that the proposed framework reduces the overall classification error rate for MI-BCI by 3.16%, 5.10% and 1.70% (for BCI Competition III dataset IVa, BCI Competition IV Dataset I and BCI Competition IV Dataset IIb, respectively) compared to the conventional CSP method under the same experimental settings. The proposed CSP-TSM method produces promising results when compared with several competing methods in this paper. In addition, the computational complexity is less compared to that of TSM method. Our proposed CSP-TSM framework can be potentially used for developing improved MI-BCI systems. Copyright © 2017 Elsevier Ltd. All rights reserved.
Space environment data storage and access: lessons learned and recommendations for the future
NASA Astrophysics Data System (ADS)
Evans, Hugh; Heynderickx, Daniel
2012-07-01
With the ever increasing volume of space environment data available at present and planned for the near future, the demands on data storage and access methods are increasing as well. In addition, continued access to historical, archived data remains crucial. On the basis of many years of experience, the authors identify the following issues as important for continued and efficient handling of datasets now and in the future: The huge data volumes currently or very soon avaiable from a number of space missions will limi direct Internet download access to even relatively short epoch ranges of data. Therefore, data providers should establish or extend standardised data (post-) processing services so that only data query results should be downloaded. Although a single standardised data format will in all likelihood remain utopia, data providers should at least include extensive metadata with their data products, according to established standards and practices (e.g. ISTP, SPASE). Standardisation of (sets of) metadata greatly facilitates data mining and querying. The use of SQL database storage should be considered instead of, or in parallel with, classic storage of data files. The use of SQL does away with having to handle file parsing and processing, while at the same time standard access protocols can be used to (remotely) connect to such data repositories. Many data holdings are still lacking in extensive descriptions of data provenance (e.g. instrument description), content and format. Unfortunately, detailed data information is usually rejected by scientific and technical journals. Re-processing of historical archived datasets into modern formats, making them easily available and usable, is urgently required, as knowledge is being lost. A global data directory has still not been achieved; policy makers should enforce stricter rules for "broadcasting" dataset information.
Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection
2015-11-01
specific, if the anomaly behaves as a sudden outlier after which the data stream goes back to normal state, then the anomalous data point should be...introduced three types of anomalies , all of them are sudden outliers . 438 J. Huang and X. Ning Table 2. Synthetic dataset: AUC and parameters method...Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection Jiaji Huang1(B) and Xia Ning2 1 Department of Electrical
NASA Astrophysics Data System (ADS)
Alston, E. J.; Sokolik, I. N.; Kalashnikova, O. V.
2011-12-01
This study examines how aerosols measured from the ground and space over the US Southeast change temporally over a regional scale during the past decade. PM2.5 data consist of two datasets that represent the measurements that are used for regulatory purposes by the US EPA and continuous measurements used for quickly disseminating air quality information. AOD data comes from three NASA sensors: the MODIS sensors onboard Terra and Aqua satellites and the MISR sensor onboard the Terra satellite. We analyze all available data over the state of Georgia from 2000-2009 of both types of aerosol data. The analysis reveals that during the summer the large metropolitan area of Atlanta has average PM2.5 concentrations that are 50% more than the remainder of the state. Strong seasonality is detected in both the AOD and PM2.5 datasets; as evidenced by a threefold increase of AOD from mean winter values to mean summer values, and the increase in PM2.5 concentrations is almost twofold from over the same period. Additionally, there is good agreement between MODIS and MISR onboard the Terra satellite during the spring and summer having correlation coefficients of 0.64 and 0.71, respectively. Monthly anomalies were used to determine the presence of a trend in all considered aerosol datasets. We found negative linear trends in both the monthly AOD anomalies from MODIS onboard Terra and the PM2.5 datasets, which are statistically significant for α = 0.05. Decreasing trends were also found for MISR onboard Terra and MODIS onboard Aqua, but those trends were not statistically significant.
Latency-Information Theory: The Mathematical-Physical Theory of Communication-Observation
2010-01-01
Werner Heisenberg of quantum mechanics; 3) the source-entropy and channel-capacity lossless performance bounds of Claude Shannon that guide...through noisy intel-space channels, and where the physical time-dislocations of intel-space exhibit a passing of time Heisenberg information...life-space sensor, and where the physical time- dislocations of life-space exhibit a passing of time Heisenberg information-uncertainty; and 4
Instructional computing in space physics moves ahead
NASA Astrophysics Data System (ADS)
Russell, C. T.; Omidi, N.
As the number of spacecraft stationed in the Earth's magnetosphere exponentiates and society becomes more technologically sophisticated and dependent on these spacebased resources, both the importance of space physics and the need to train people in this field will increase.Space physics is a very difficult subject for students to master. Both mechanical and electromagnetic forces are important. The treatment of problems can be very mathematical, and the scale sizes of phenomena are usually such that laboratory studies become impossible, and experimentation, when possible at all, must be carried out in deep space. Fortunately, computers have evolved to the point that they are able to greatly facilitate instruction in space physics.
Global analysis of b → sℓℓ anomalies
NASA Astrophysics Data System (ADS)
Descotes-Genon, Sébastien; Hofer, Lars; Matias, Joaquim; Virto, Javier
2016-06-01
We present a detailed discussion of the current theoretical and experimental situation of the anomaly in the angular distribution of B → K * (→ Kπ) μ + μ -, observed at LHCb in the 1 fb-1 dataset and recently confirmed by the 3 fb-1 dataset. The impact of this data and other recent measurements on b → sℓ + ℓ - transitions ( ℓ = e, μ) is considered. We review the observables of interest, focusing on their theoretical uncertainties and their sensitivity to New Physics, based on an analysis employing the QCD factorisation approach including several sources of hadronic uncertainties (form factors, power corrections, charm-loop effects). We perform fits to New Physics contributions including experimental and theoretical correlations. The solution that we proposed in 2013 to solve the B → K * μ + μ - anomaly, with a contribution {mathcal{C}}_9^{NP}˜eq -1 , is confirmed and reinforced. A wider range of New-Physics scenarios with high significances (between 4 and 5 σ) emerges from the fit, some of them being particularly relevant for model building. More data is needed to discriminate among them conclusively. The inclusion of b → se + e - observables increases the significance of the favoured scenarios under the hypothesis of New Physics breaking lepton flavour universality. Several tests illustrate the robustness of our conclusions.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.
McCabe, Erin; Gross, Douglas P; Bulut, Okan
2018-06-07
The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Nie, Zhi; Vairavan, Srinivasan; Narayan, Vaibhav A; Ye, Jieping; Li, Qingqin S
2018-01-01
Identification of risk factors of treatment resistance may be useful to guide treatment selection, avoid inefficient trial-and-error, and improve major depressive disorder (MDD) care. We extended the work in predictive modeling of treatment resistant depression (TRD) via partition of the data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) cohort into a training and a testing dataset. We also included data from a small yet completely independent cohort RIS-INT-93 as an external test dataset. We used features from enrollment and level 1 treatment (up to week 2 response only) of STAR*D to explore the feature space comprehensively and applied machine learning methods to model TRD outcome at level 2. For TRD defined using QIDS-C16 remission criteria, multiple machine learning models were internally cross-validated in the STAR*D training dataset and externally validated in both the STAR*D testing dataset and RIS-INT-93 independent dataset with an area under the receiver operating characteristic curve (AUC) of 0.70-0.78 and 0.72-0.77, respectively. The upper bound for the AUC achievable with the full set of features could be as high as 0.78 in the STAR*D testing dataset. Model developed using top 30 features identified using feature selection technique (k-means clustering followed by χ2 test) achieved an AUC of 0.77 in the STAR*D testing dataset. In addition, the model developed using overlapping features between STAR*D and RIS-INT-93, achieved an AUC of > 0.70 in both the STAR*D testing and RIS-INT-93 datasets. Among all the features explored in STAR*D and RIS-INT-93 datasets, the most important feature was early or initial treatment response or symptom severity at week 2. These results indicate that prediction of TRD prior to undergoing a second round of antidepressant treatment could be feasible even in the absence of biomarker data.
NASA Astrophysics Data System (ADS)
Yu, Francis T. S.
2017-08-01
In this article we have based on the laws of physics to illustrate the enigma time as creating our physical space (i.e., the universe). We have shown that without time there would be no physical substances, no space and no life. In reference to Einstein's energy equation, we see that energy and mass can be traded, and every mass can be treated as an Energy Reservoir. We have further shown that physical space cannot be embedded in absolute empty space and cannot have any absolute empty subspace in it. Since all physical substances existed with time, our cosmos is created by time and every substance including our universe is coexisted with time. Although time initiates the creation, it is the physical substances which presented to us the existence of time. We are not alone with almost absolute certainty. Someday we may find a right planet, once upon a time, had harbored a civilization for a short period of light years.
Sahan, Muhammet Ikbal; Verguts, Tom; Boehler, Carsten Nicolas; Pourtois, Gilles; Fias, Wim
2016-08-01
Selective attention is not limited to information that is physically present in the external world, but can also operate on mental representations in the internal world. However, it is not known whether the mechanisms of attentional selection operate in similar fashions in physical and mental space. We studied the spatial distributions of attention for items in physical and mental space by comparing how successfully distractors were rejected at varying distances from the attended location. The results indicated very similar distribution characteristics of spatial attention in physical and mental space. Specifically, we found that performance monotonically improved with increasing distractor distance relative to the attended location, suggesting that distractor confusability is particularly pronounced for nearby distractors, relative to distractors farther away. The present findings suggest that mental representations preserve their spatial configuration in working memory, and that similar mechanistic principles underlie selective attention in physical and in mental space.
Plasma Physics of the Subauroral Space Weather
2016-03-20
AFRL-RV-PS- AFRL-RV-PS- TR-2016-0068 TR-2016-0068 PLASMA PHYSICS OF THE SUBAURORAL SPACE WEATHER Evgeny V. Mishin, et al. 20 March 2016 Final...Oct 2013 to 30 Sep 2015 4. TITLE AND SUBTITLE Plasma Physics of the Subauroral Space Weather 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM...5 4.3. Physics -based hybrid model with finite Larmor radius effects
Spark and HPC for High Energy Physics Data Analyses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sehrish, Saba; Kowalkowski, Jim; Paterno, Marc
A full High Energy Physics (HEP) data analysis is divided into multiple data reduction phases. Processing within these phases is extremely time consuming, therefore intermediate results are stored in files held in mass storage systems and referenced as part of large datasets. This processing model limits what can be done with interactive data analytics. Growth in size and complexity of experimental datasets, along with emerging big data tools are beginning to cause changes to the traditional ways of doing data analyses. Use of big data tools for HEP analysis looks promising, mainly because extremely large HEP datasets can be representedmore » and held in memory across a system, and accessed interactively by encoding an analysis using highlevel programming abstractions. The mainstream tools, however, are not designed for scientific computing or for exploiting the available HPC platform features. We use an example from the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) in Geneva, Switzerland. The LHC is the highest energy particle collider in the world. Our use case focuses on searching for new types of elementary particles explaining Dark Matter in the universe. We use HDF5 as our input data format, and Spark to implement the use case. We show the benefits and limitations of using Spark with HDF5 on Edison at NERSC.« less
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.
An Approach to Integrate a Space-Time GIS Data Model with High Performance Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Dali; Zhao, Ziliang; Shaw, Shih-Lung
2011-01-01
In this paper, we describe an approach to integrate a Space-Time GIS data model on a high performance computing platform. The Space-Time GIS data model has been developed on a desktop computing environment. We use the Space-Time GIS data model to generate GIS module, which organizes a series of remote sensing data. We are in the process of porting the GIS module into an HPC environment, in which the GIS modules handle large dataset directly via parallel file system. Although it is an ongoing project, authors hope this effort can inspire further discussions on the integration of GIS on highmore » performance computing platforms.« less
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172
NASA Astrophysics Data System (ADS)
Chegwidden, O.; Nijssen, B.; Pytlak, E.
2017-12-01
Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us to develop improved methods for scientists and practitioners alike.
EnviroAtlas - Austin, TX - Greenspace Around Schools by Block Group
This EnviroAtlas data set shows the number of schools in each block group in the EnviroAtlas community boundary as well as the number of schools where less than 25% of the area within 100 meters of the school is classified as greenspace. Green space is defined as Trees & Forest, Grass & Herbaceous, and Agriculture. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
Functional CAR models for large spatially correlated functional datasets.
Zhang, Lin; Baladandayuthapani, Veerabhadran; Zhu, Hongxiao; Baggerly, Keith A; Majewski, Tadeusz; Czerniak, Bogdan A; Morris, Jeffrey S
2016-01-01
We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.
Fimereli, Danai; Detours, Vincent; Konopka, Tomasz
2013-04-01
High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
Minutes of the CD-ROM Workshop
NASA Technical Reports Server (NTRS)
King, Joseph H.; Grayzeck, Edwin J.
1989-01-01
The workshop described in this document had two goals: (1) to establish guidelines for the CD-ROM as a tool to distribute datasets; and (2) to evaluate current scientific CD-ROM projects as an archive. Workshop attendees were urged to coordinate with European groups to develop CD-ROM, which is already available at low cost in the U.S., as a distribution medium for astronomical datasets. It was noted that NASA has made the CD Publisher at the National Space Science Data Center (NSSDC) available to the scientific community when the Publisher is not needed for NASA work. NSSDC's goal is to provide the Publisher's user with the hardware and software tools needed to design a user's dataset for distribution. This includes producing a master CD and copies. The prerequisite premastering process is described, as well as guidelines for CD-ROM construction. The production of discs was evaluated. CD-ROM projects, guidelines, and problems of the technology were discussed.
InSAR Maps of Deformation Covering Raft River, Idaho from 2007 to 2010
Reinisch, Elena C. (ORCID:0000000252211921)
2007-03-11
This dataset contains maps of deformation covering Raft River, Idaho from 2007 to 2010 calculated from interferometric synthetic aperture radar data. This dataset is used in the study entitled "Inferring geothermal reservoir processes at the Raft River Geothermal Field, Idaho, USA through modeling InSAR-measured surface deformation" by F. Liu, et al. This dataset was derived from raw SAR data from the Envisat satellite missions operated by the European Space Agency (ESA) that are copyrighted by ESA and were provided through the WInSAR consortium at the UNAVCO facility. All pair directories use the image acquired on 3/11/2007 as a reference image. To view specific information for each grd file, please use the GMT command "grdinfo" - e.g., for grd file In20070311_20071111/drho_utm.grd, use terminal command: grdinfo In20070311_20071111/drho_utm.grd
The space shuttle payload planning working groups. Volume 2: Atmospheric and space physics
NASA Technical Reports Server (NTRS)
1973-01-01
The findings of the Atmospheric and Space Physics working group of the space shuttle mission planning activity are presented. The principal objectives defined by the group are: (1) to investigate the detailed mechanisms which control the near-space environment of the earth, (2) to perform plasma physics investigations not feasible in ground-based laboratories, and (3) to conduct investigations which are important in understanding planetary and cometary phenomena. The core instrumentation and laboratory configurations for conducting the investigations are defined.
Efficient and effective pruning strategies for health data de-identification.
Prasser, Fabian; Kohlmayer, Florian; Kuhn, Klaus A
2016-04-30
Privacy must be protected when sensitive biomedical data is shared, e.g. for research purposes. Data de-identification is an important safeguard, where datasets are transformed to meet two conflicting objectives: minimizing re-identification risks while maximizing data quality. Typically, de-identification methods search a solution space of possible data transformations to find a good solution to a given de-identification problem. In this process, parts of the search space must be excluded to maintain scalability. The set of transformations which are solution candidates is typically narrowed down by storing the results obtained during the search process and then using them to predict properties of the output of other transformations in terms of privacy (first objective) and data quality (second objective). However, due to the exponential growth of the size of the search space, previous implementations of this method are not well-suited when datasets contain many attributes which need to be protected. As this is often the case with biomedical research data, e.g. as a result of longitudinal collection, we have developed a novel method. Our approach combines the mathematical concept of antichains with a data structure inspired by prefix trees to represent properties of a large number of data transformations while requiring only a minimal amount of information to be stored. To analyze the improvements which can be achieved by adopting our method, we have integrated it into an existing algorithm and we have also implemented a simple best-first branch and bound search (BFS) algorithm as a first step towards methods which fully exploit our approach. We have evaluated these implementations with several real-world datasets and the k-anonymity privacy model. When integrated into existing de-identification algorithms for low-dimensional data, our approach reduced memory requirements by up to one order of magnitude and execution times by up to 25 %. This allowed us to increase the size of solution spaces which could be processed by almost a factor of 10. When using the simple BFS method, we were able to further increase the size of the solution space by a factor of three. When used as a heuristic strategy for high-dimensional data, the BFS approach outperformed a state-of-the-art algorithm by up to 12 % in terms of the quality of output data. This work shows that implementing methods of data de-identification for real-world applications is a challenging task. Our approach solves a problem often faced by data custodians: a lack of scalability of de-identification software when used with datasets having realistic schemas and volumes. The method described in this article has been implemented into ARX, an open source de-identification software for biomedical data.
Green space definition affects associations of green space with overweight and physical activity.
Klompmaker, Jochem O; Hoek, Gerard; Bloemsma, Lizan D; Gehring, Ulrike; Strak, Maciej; Wijga, Alet H; van den Brink, Carolien; Brunekreef, Bert; Lebret, Erik; Janssen, Nicole A H
2018-01-01
In epidemiological studies, exposure to green space is inconsistently associated with being overweight and physical activity, possibly because studies differ widely in their definition of green space exposure, inclusion of important confounders, study population and data analysis. We evaluated whether the association of green space with being overweight and physical activity depended upon definition of greenspace. We conducted a cross-sectional study using data from a Dutch national health survey of 387,195 adults. Distance to the nearest park entrance and surrounding green space, based on the Normalized Difference Vegetation Index (NDVI) or a detailed Dutch land-use database (TOP10NL), was calculated for each residential address. We used logistic regression analyses to study the association of green space exposure with being overweight and being moderately or vigorously physically active outdoors at least 150min/week (self-reported). To study the shape of the association, we specified natural splines and quintiles. The distance to the nearest park entrance was not associated with being overweight or outdoor physical activity. Associations of surrounding green space with being overweight or outdoor physical activity were highly non-linear. For NDVI surrounding greenness, we observed significantly decreased odds of being overweight [300m buffer, odds ratio (OR) = 0.88; 95% CI: 0.86, 0.91] and increased odds for outdoor physical activity [300m buffer, OR = 1.14; 95% CI: 1.10, 1.17] in the highest quintile compared to the lowest quintile. For TOP10NL surrounding green space, associations were mostly non-significant. Associations were generally stronger for subjects living in less urban areas and for the smaller buffers. Associations of green space with being overweight and outdoor physical activity differed considerably between different green space definitions. Associations were strongest for NDVI surrounding greenness. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Fujimori, Shigeo; Hirai, Naoya; Ohashi, Hiroyuki; Masuoka, Kazuyo; Nishikimi, Akihiko; Fukui, Yoshinori; Washio, Takanori; Oshikubo, Tomohiro; Yamashita, Tatsuhiro; Miyamoto-Sato, Etsuko
2012-01-01
Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks. PMID:23056904
Dataset on daytime outdoor thermal comfort for Belo Horizonte, Brazil.
Hirashima, Simone Queiroz da Silveira; Assis, Eleonora Sad de; Nikolopoulou, Marialena
2016-12-01
This dataset describe microclimatic parameters of two urban open public spaces in the city of Belo Horizonte, Brazil; physiological equivalent temperature (PET) index values and the related subjective responses of interviewees regarding thermal sensation perception and preference and thermal comfort evaluation. Individuals and behavioral characteristics of respondents were also presented. Data were collected at daytime, in summer and winter, 2013. Statistical treatment of this data was firstly presented in a PhD Thesis ("Percepção sonora e térmica e avaliação de conforto em espaços urbanos abertos do município de Belo Horizonte - MG, Brasil" (Hirashima, 2014) [1]), providing relevant information on thermal conditions in these locations and on thermal comfort assessment. Up to now, this data was also explored in the article "Daytime Thermal Comfort in Urban Spaces: A Field Study in Brazil" (Hirashima et al., in press) [2]. These references are recommended for further interpretation and discussion.
YummyData: providing high-quality open life science data
Yamaguchi, Atsuko; Splendiani, Andrea
2018-01-01
Abstract Many life science datasets are now available via Linked Data technologies, meaning that they are represented in a common format (the Resource Description Framework), and are accessible via standard APIs (SPARQL endpoints). While this is an important step toward developing an interoperable bioinformatics data landscape, it also creates a new set of obstacles, as it is often difficult for researchers to find the datasets they need. Different providers frequently offer the same datasets, with different levels of support: as well as having more or less up-to-date data, some providers add metadata to describe the content, structures, and ontologies of the stored datasets while others do not. We currently lack a place where researchers can go to easily assess datasets from different providers in terms of metrics such as service stability or metadata richness. We also lack a space for collecting feedback and improving data providers’ awareness of user needs. To address this issue, we have developed YummyData, which consists of two components. One periodically polls a curated list of SPARQL endpoints, monitoring the states of their Linked Data implementations and content. The other presents the information measured for the endpoints and provides a forum for discussion and feedback. YummyData is designed to improve the findability and reusability of life science datasets provided as Linked Data and to foster its adoption. It is freely accessible at http://yummydata.org/. Database URL: http://yummydata.org/ PMID:29688370
NASA Astrophysics Data System (ADS)
Malandraki, Olga; Klein, Karl-Ludwig; Vainio, Rami; Agueda, Neus; Nunez, Marlon; Heber, Bernd; Buetikofer, Rolf; Sarlanis, Christos; Crosby, Norma
2017-04-01
High-energy solar energetic particles (SEPs) emitted from the Sun are a major space weather hazard motivating the development of predictive capabilities. In this work, the current state of knowledge on the origin and forecasting of SEP events will be reviewed. Subsequently, we will present the EU HORIZON2020 HESPERIA (High Energy Solar Particle Events foRecastIng and Analysis) project, its structure, its main scientific objectives and forecasting operational tools, as well as the added value to SEP research both from the observational as well as the SEP modelling perspective. The project addresses through multi-frequency observations and simulations the chain of processes from particle acceleration in the corona, particle transport in the magnetically complex corona and interplanetary space to the detection near 1 AU. Furthermore, publicly available software to invert neutron monitor observations of relativistic SEPs to physical parameters that can be compared with space-borne measurements at lower energies is provided for the first time by HESPERIA. In order to achieve these goals, HESPERIA is exploiting already available large datasets stored in databases such as the neutron monitor database (NMDB) and SEPServer that were developed under EU FP7 projects from 2008 to 2013. Forecasting results of the two novel SEP operational forecasting tools published via the consortium server of 'HESPERIA' will be presented, as well as some scientific key results on the acceleration, transport and impact on Earth of high-energy particles. Acknowledgement: This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 637324.
NASA Astrophysics Data System (ADS)
Cencetti, Michele
2016-07-01
European space exploration missions have produced huge data sets of potentially immense value for research as well as for planning and operating future missions. For instance, Mars Exploration programs comprise a series of missions with launches ranging from the past to beyond present, which are anticipated to produce exceptional volumes of data which provide prospects for research breakthroughs and advancing further activities in space. These collected data include a variety of information, such as imagery, topography, atmospheric, geochemical datasets and more, which has resulted in and still demands, databases, versatile visualisation tools and data reduction methods. Such rate of valuable data acquisition requires the scientists, researchers and computer scientists to coordinate their storage, processing and relevant tools to enable efficient data analysis. However, the current position is that expert teams from various disciplines, the databases and tools are fragmented, leaving little scope for unlocking its value through collaborative activities. The benefits of collaborative virtual environments have been implemented in various industrial fields allowing real-time multi-user collaborative work among people from different disciplines. Exploiting the benefits of advanced immersive virtual environments (IVE) has been recognized as an important interaction paradigm to facilitate future space exploration. The current work is mainly aimed towards the presentation of the preliminary results coming from the CROSS DRIVE project. This research received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 607177 and is mainly aimed towards the implementation of a distributed virtual workspace for collaborative scientific discovery, mission planning and operations. The purpose of the CROSS DRIVE project is to lay foundations of collaborative European workspaces for space science. It will demonstrate the feasibility and begin to standardize the integration of space datasets, simulators, analytical modules, remote scientific centers and experts to work together to conduct space science activities as well as support the planning and operations of space missions. The development of this collaborative workspace infrastructure will be focused through preparation of the ExoMars 2016 TGO and 2018 rover missions. Three use case scenarios with increasing levels of complexities has been considered to exercise the remote and Collaborative Workspace as it would happen during science mission design or real-time operations: rover landing site characterization; Mars atmospheric data analysis and comparison among datasets; rover target selection and motion planning during real-time operations. A brief overview of the traditional approaches used in the operations domains is provided in the first part of the paper, mainly focusing on the main drawbacks that arise during actual missions. Examples of design, execution and management of the operational activities are introduced in this section, highlighting the main issues and tools that are currently used. The current needs and the possible solutions are introduced in the following section, providing details on how CROSS DRIVE environment can be used to improve space operations. The developed prototype and the related approach are assessed to show the improvements that can be achieved with respect to data exchange and users' interactions. The project results are also intended to show how the same operational philosophy can be extended from robotic exploration to human-rated ones missions.
Future Directions in Medical Physics: Models, Technology, and Translation to Medicine
NASA Astrophysics Data System (ADS)
Siewerdsen, Jeffrey
The application of physics in medicine has been integral to major advances in diagnostic and therapeutic medicine. Two primary areas represent the mainstay of medical physics research in the last century: in radiation therapy, physicists have propelled advances in conformal radiation treatment and high-precision image guidance; and in diagnostic imaging, physicists have advanced an arsenal of multi-modality imaging that includes CT, MRI, ultrasound, and PET as indispensible tools for noninvasive screening, diagnosis, and assessment of treatment response. In addition to their role in building such technologically rich fields of medicine, physicists have also become integral to daily clinical practice in these areas. The future suggests new opportunities for multi-disciplinary research bridging physics, biology, engineering, and computer science, and collaboration in medical physics carries a strong capacity for identification of significant clinical needs, access to clinical data, and translation of technologies to clinical studies. In radiation therapy, for example, the extraction of knowledge from large datasets on treatment delivery, image-based phenotypes, genomic profile, and treatment outcome will require innovation in computational modeling and connection with medical physics for the curation of large datasets. Similarly in imaging physics, the demand for new imaging technology capable of measuring physical and biological processes over orders of magnitude in scale (from molecules to whole organ systems) and exploiting new contrast mechanisms for greater sensitivity to molecular agents and subtle functional / morphological change will benefit from multi-disciplinary collaboration in physics, biology, and engineering. Also in surgery and interventional radiology, where needs for increased precision and patient safety meet constraints in cost and workflow, development of new technologies for imaging, image registration, and robotic assistance can leverage collaboration in physics, biomedical engineering, and computer science. In each area, there is major opportunity for multi-disciplinary collaboration with medical physics to accelerate the translation of such technologies to clinical use. Research supported by the National Institutes of Health, Siemens Healthcare, and Carestream Health.
NASA Astrophysics Data System (ADS)
Chow, L.; Fai, S.
2017-08-01
The digitization and abstraction of existing buildings into building information models requires the translation of heterogeneous datasets that may include CAD, technical reports, historic texts, archival drawings, terrestrial laser scanning, and photogrammetry into model elements. In this paper, we discuss a project undertaken by the Carleton Immersive Media Studio (CIMS) that explored the synthesis of heterogeneous datasets for the development of a building information model (BIM) for one of Canada's most significant heritage assets - the Centre Block of the Parliament Hill National Historic Site. The scope of the project included the development of an as-found model of the century-old, six-story building in anticipation of specific model uses for an extensive rehabilitation program. The as-found Centre Block model was developed in Revit using primarily point cloud data from terrestrial laser scanning. The data was captured by CIMS in partnership with Heritage Conservation Services (HCS), Public Services and Procurement Canada (PSPC), using a Leica C10 and P40 (exterior and large interior spaces) and a Faro Focus (small to mid-sized interior spaces). Secondary sources such as archival drawings, photographs, and technical reports were referenced in cases where point cloud data was not available. As a result of working with heterogeneous data sets, a verification system was introduced in order to communicate to model users/viewers the source of information for each building element within the model.
Urban agriculture: a global analysis of the space constraint to meet urban vegetable demand
NASA Astrophysics Data System (ADS)
Martellozzo, F.; Landry, J.-S.; Plouffe, D.; Seufert, V.; Rowhani, P.; Ramankutty, N.
2014-05-01
Urban agriculture (UA) has been drawing a lot of attention recently for several reasons: the majority of the world population has shifted from living in rural to urban areas; the environmental impact of agriculture is a matter of rising concern; and food insecurity, especially the accessibility of food, remains a major challenge. UA has often been proposed as a solution to some of these issues, for example by producing food in places where population density is highest, reducing transportation costs, connecting people directly to food systems and using urban areas efficiently. However, to date no study has examined how much food could actually be produced in urban areas at the global scale. Here we use a simple approach, based on different global-scale datasets, to assess to what extent UA is constrained by the existing amount of urban space. Our results suggest that UA would require roughly one third of the total global urban area to meet the global vegetable consumption of urban dwellers. This estimate does not consider how much urban area may actually be suitable and available for UA, which likely varies substantially around the world and according to the type of UA performed. Further, this global average value masks variations of more than two orders of magnitude among individual countries. The variations in the space required across countries derive mostly from variations in urban population density, and much less from variations in yields or per capita consumption. Overall, the space required is regrettably the highest where UA is most needed, i.e., in more food insecure countries. We also show that smaller urban clusters (i.e., <100 km2 each) together represent about two thirds of the global urban extent; thus UA discourse and policies should not focus on large cities exclusively, but should also target smaller urban areas that offer the greatest potential in terms of physical space.
Altermatt, Anna; Gaetano, Laura; Magon, Stefano; Häring, Dieter A; Tomic, Davorka; Wuerfel, Jens; Radue, Ernst-Wilhelm; Kappos, Ludwig; Sprenger, Till
2018-05-29
There is a limited correlation between white matter (WM) lesion load as determined by magnetic resonance imaging and disability in multiple sclerosis (MS). The reasons for this so-called clinico-radiological paradox are diverse and may, at least partly, relate to the fact that not just the overall lesion burden, but also the exact anatomical location of lesions predict the severity and type of disability. We aimed at studying the relationship between lesion distribution and disability using a voxel-based lesion probability mapping approach in a very large dataset of MS patients. T2-weighted lesion masks of 2348 relapsing-remitting MS patients were spatially normalized to standard stereotaxic space by non-linear registration. Relations between supratentorial WM lesion locations and disability measures were assessed using a non-parametric ANCOVA (Expanded Disability Status Scale [EDSS]; Multiple Sclerosis Functional Composite, and subscores; Modified Fatigue Impact Scale) or multinomial ordinal logistic regression (EDSS functional subscores). Data from 1907 (81%) patients were included in the analysis because of successful registration. The lesion mapping showed similar areas to be associated with the different disability scales: periventricular regions in temporal, frontal, and limbic lobes were predictive, mainly affecting the posterior thalamic radiation, the anterior, posterior, and superior parts of the corona radiata. In summary, significant associations between lesion location and clinical scores were found in periventricular areas. Such lesion clusters appear to be associated with impairment of different physical and cognitive abilities, probably because they affect commissural and long projection fibers, which are relevant WM pathways supporting many different brain functions.
NASA physics and chemistry experiments in-space program
NASA Technical Reports Server (NTRS)
Gabris, E. A.
1981-01-01
The Physics and Chemistry Experiments Program (PACE) is part of the Office of Aeronautics and Space Technology (OAST) research and technology effort in understanding the fundamental characteristics of physics and chemical phenomena. This program seeks to increase the basic knowledge in these areas by well-planned research efforts which include in-space experiments when the limitations of ground-based activities precludes or restricts the achievement of research goals. Overview study areas are concerned with molecular beam experiments for Space Shuttle, experiments on drops and bubbles in a manned earth-orbiting laboratory, the study of combustion experiments in space, combustion experiments in orbiting spacecraft, gravitation experiments in space, and fluid physics, thermodynamics, and heat-transfer experiments. Procedures for the study program have four phases. An overview study was conducted in the area of materials science.
Applied Physics Lab Kennedy Space Center: Recent Contributions
NASA Technical Reports Server (NTRS)
Starr, Stan; Youngquist, Robert
2006-01-01
The mission of the Applied Physics Lab is: (1) Develop and deliver novel sensors and devices to support KSC mission operations. (2) Analyze operational issues and recommend or deliver practical solutions. (3) Apply physics to the resolution of long term space flight issues that affect space port operation on Earth or on other planets.
Online Visualization and Analysis of Merged Global Geostationary Satellite Infrared Dataset
NASA Technical Reports Server (NTRS)
Liu, Zhong; Ostrenga, D.; Leptoukh, G.; Mehta, A.
2008-01-01
The NASA Goddard Earth Sciences Data Information Services Center (GES DISC) is home of Tropical Rainfall Measuring Mission (TRMM) data archive. The global merged IR product also known as the NCEP/CPC 4-km Global (60 degrees N - 60 degrees S) IR Dataset, is one of TRMM ancillary datasets. They are globally merged (60 degrees N - 60 degrees S) pixel-resolution (4 km) IR brightness temperature data (equivalent blackbody temperatures), merged from all available geostationary satellites (GOES-8/10, METEOSAT-7/5 and GMS). The availability of data from METEOSAT-5, which is located at 63E at the present time, yields a unique opportunity for total global (60 degrees N- 60 degrees S) coverage. The GES DISC has collected over 8 years of the data beginning from February of 2000. This high temporal resolution dataset can not only provide additional background information to TRMM and other satellite missions, but also allow observing a wide range of meteorological phenomena from space, such as, mesoscale convection systems, tropical cyclones, hurricanes, etc. The dataset can also be used to verify model simulations. Despite that the data can be downloaded via ftp, however, its large volume poses a challenge for many users. A single file occupies about 70 MB disk space and there is a total of approximately 73,000 files (approximately 4.5 TB) for the past 8 years. In order to facilitate data access, we have developed a web prototype to allow users to conduct online visualization and analysis of this dataset. With a web browser and few mouse clicks, users can have a full access to over 8 year and over 4.5 TB data and generate black and white IR imagery and animation without downloading any software and data. In short, you can make your own images! Basic functions include selection of area of interest, single imagery or animation, a time skip capability for different temporal resolution and image size. Users can save an animation as a file (animated gif) and import it in other presentation software, such as, Microsoft PowerPoint. The prototype will be integrated into GIOVANNI and existing GIOVANNI capabilities, such as, data download, Google Earth KMZ, etc will be available. Users will also be able to access other data products in the GIOVANNI family.
West Africa land use and land cover time series
Cotillon, Suzanne E.
2017-02-16
Started in 1999, the West Africa Land Use Dynamics project represents an effort to map land use and land cover, characterize the trends in time and space, and understand their effects on the environment across West Africa. The outcome of the West Africa Land Use Dynamics project is the production of a three-time period (1975, 2000, and 2013) land use and land cover dataset for the Sub-Saharan region of West Africa, including the Cabo Verde archipelago. The West Africa Land Use Land Cover Time Series dataset offers a unique basis for characterizing and analyzing land changes across the region, systematically and at an unprecedented level of detail.
The space physics analysis network
NASA Astrophysics Data System (ADS)
Green, James L.
1988-04-01
The Space Physics Analysis Network, or SPAN, is emerging as a viable method for solving an immediate communication problem for space and Earth scientists and has been operational for nearly 7 years. SPAN and its extension into Europe, utilizes computer-to-computer communications allowing mail, binary and text file transfer, and remote logon capability to over 1000 space science computer systems. The network has been used to successfully transfer real-time data to remote researchers for rapid data analysis but its primary function is for non-real-time applications. One of the major advantages for using SPAN is its spacecraft mission independence. Space science researchers using SPAN are located in universities, industries and government institutions all across the United States and Europe. These researchers are in such fields as magnetospheric physics, astrophysics, ionosperic physics, atmospheric physics, climatology, meteorology, oceanography, planetary physics and solar physics. SPAN users have access to space and Earth science data bases, mission planning and information systems, and computational facilities for the purposes of facilitating correlative space data exchange, data analysis and space research. For example, the National Space Science Data Center (NSSDC), which manages the network, is providing facilities on SPAN such as the Network Information Center (SPAN NIC). SPAN has interconnections with several national and international networks such as HEPNET and TEXNET forming a transparent DECnet network. The combined total number of computers now reachable over these combined networks is about 2000. In addition, SPAN supports full function capabilities over the international public packet switched networks (e.g. TELENET) and has mail gateways to ARPANET, BITNET and JANET.
A Framework to Learn Physics from Atomically Resolved Images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vlcek, L.; Maksov, A.; Pan, M.
Here, we present a generalized framework for physics extraction, i.e., knowledge, from atomically resolved images, and show its utility by applying it to a model system of segregation of chalcogen atoms in an FeSe 0.45Te 0.55 superconductor system. We emphasize that the framework can be used for any imaging data for which a generative physical model exists. Consider that a generative physical model can produce a very large number of configurations, not all of which are observable. By applying a microscope function to a sub-set of this generated data, we form a simulated dataset on which statistics can be computed.
Space physics educational outreach
NASA Technical Reports Server (NTRS)
Copeland, Richard A.
1995-01-01
The goal of this Space Physics Educational Outreach project was to develop a laboratory experiment and classroom lecture on Earth's aurora for use in lower division college physics courses, with the particular aim of implementing the experiment and lecture at Saint Mary's College of California. The strategy is to teach physics in the context of an interesting natural phenomenon by investigating the physical principles that are important in Earth's aurora, including motion of charged particles in electric and magnetic fields, particle collisions and chemical reactions, and atomic and molecular spectroscopy. As a by-product, the undergraduate students would develop an appreciation for naturally occurring space physics phenomena.
Estimating the number of people in crowded scenes
NASA Astrophysics Data System (ADS)
Kim, Minjin; Kim, Wonjun; Kim, Changick
2011-01-01
This paper presents a method to estimate the number of people in crowded scenes without using explicit object segmentation or tracking. The proposed method consists of three steps as follows: (1) extracting space-time interest points using eigenvalues of the local spatio-temporal gradient matrix, (2) generating crowd regions based on space-time interest points, and (3) estimating the crowd density based on the multiple regression. In experimental results, the efficiency and robustness of our proposed method are demonstrated by using PETS 2009 dataset.
Xray: N-dimensional, labeled arrays for analyzing physical datasets in Python
NASA Astrophysics Data System (ADS)
Hoyer, S.
2015-12-01
Efficient analysis of geophysical datasets requires tools that both preserve and utilize metadata, and that transparently scale to process large datas. Xray is such a tool, in the form of an open source Python library for analyzing the labeled, multi-dimensional array (tensor) datasets that are ubiquitous in the Earth sciences. Xray's approach pairs Python data structures based on the data model of the netCDF file format with the proven design and user interface of pandas, the popular Python data analysis library for labeled tabular data. On top of the NumPy array, xray adds labeled dimensions (e.g., "time") and coordinate values (e.g., "2015-04-10"), which it uses to enable a host of operations powered by these labels: selection, aggregation, alignment, broadcasting, split-apply-combine, interoperability with pandas and serialization to netCDF/HDF5. Many of these operations are enabled by xray's tight integration with pandas. Finally, to allow for easy parallelism and to enable its labeled data operations to scale to datasets that does not fit into memory, xray integrates with the parallel processing library dask.
A dataset on human navigation strategies in foreign networked systems
Kőrösi, Attila; Csoma, Attila; Rétvári, Gábor; Heszberger, Zalán; Bíró, József; Tapolcai, János; Pelle, István; Klajbár, Dávid; Novák, Márton; Halasi, Valentina; Gulyás, András
2018-01-01
Humans are involved in various real-life networked systems. The most obvious examples are social and collaboration networks but the language and the related mental lexicon they use, or the physical map of their territory can also be interpreted as networks. How do they find paths between endpoints in these networks? How do they obtain information about a foreign networked world they find themselves in, how they build mental model for it and how well they succeed in using it? Large, open datasets allowing the exploration of such questions are hard to find. Here we report a dataset collected by a smartphone application, in which players navigate between fixed length source and destination English words step-by-step by changing only one letter at a time. The paths reflect how the players master their navigation skills in such a foreign networked world. The dataset can be used in the study of human mental models for the world around us, or in a broader scope to investigate the navigation strategies in complex networked systems. PMID:29533391
NASA Astrophysics Data System (ADS)
Koskinen, H. E.
2008-12-01
Plasma physics as the backbone of space physics is difficult and thus the space physics students need to have strong foundations in general physics, in particular in classical electrodynamics and thermodynamics, and master the basic mathematical tools for physicists. In many universities the number of students specializing in space physics at Master's and Doctoral levels is rather small and the students may have quite different preferences ranging from experimental approach to hard-core space plasma theory. This poses challenges in building up a study program that has both the variety and depth needed to motivate the best students to choose this field. At the University of Helsinki we require all beginning space physics students, regardless whether they enter the field as Master's or Doctoral degree students, to take a one-semester package consisting of plasma physics and its space applications. However, some compromises are necessary. For example, it is not at all clear, how thoroughly Landau damping should be taught at the first run or how deeply should the intricacies of collisionless reconnection be discussed. In both cases we have left the details to an optional course in advanced space physics, even with the risk that the student's appreciation of, e.g., reconnection may remain at the level of a magic wand. For learning experimental work, data analysis or computer simulations we have actively pursued arrangements for the Master's degree students to get a summer employments in active research groups, which usually lead to the Master's theses. All doctoral students are members of research groups and participate in experimental work, data analysis, simulation studies or theory development, or any combination of these. We emphasize strongly "learning by doing" all the way from the weekly home exercises during the lecture courses to the PhD theses which in Finland consist typically of 4-6 peer-reviewed articles with a comprehensive introductory part.
Activating Public Space: How to Promote Physical Activity in Urban Environment
NASA Astrophysics Data System (ADS)
Kostrzewska, Małgorzata
2017-10-01
Physical activity is an essential component of a healthy lifestyle. The quality and equipment of urban public space plays an important role in promoting physical activity among people (residents, tourists). In order for recreation and sports activities to be undertaken willingly, in a safe and comprehensive manner, certain spatial conditions and requirements must be met. The distinctive feature of contemporary large cities is the disappearance of local, neighbourly relations, and the consequent loneliness, alienation, and atomization of the residents. Thus, the design of public spaces should be an expression of the values of social inclusion and integration. A properly designed urban space would encourage people to leave their homes and integrate, also by undertaking different forms of physical activities. This, in turn, can lead to raising the quality of the space, especially in the context of its “familiarization” and “domestication”. The aim of the research was to identify the architectural and urban features of the public spaces of contemporary cities that can contribute to the promotion of physical activity. The paper presents the research results and the case studies of such spatial solutions and examples of good practices, which invite residents to undertake different forms of physical activities in public spaces. The issue of the integrating, inclusionary, and social function of physical recreation and sport is discussed as well, and so are the possibilities of translating these values into physical characteristics of an urban space. The main conclusions are that taking into account the diverse needs of different social groups, participation in the design and construction process, aesthetic and interesting design, vicinity of the residence, open access for all age groups and the disabled would be the most important spatial determinants of a properly designed, physically activating public space. Strategies of planning the sports and recreation infrastructure should also make sure of their multifunctionality and variability in time to adjust it to the changing needs of the residents.
Candidates for office 2004-2006
NASA Astrophysics Data System (ADS)
Timothy L. Killeen. AGU member since 1981. Director of the National Center for Atmospheric Research (NCAR); Senior Scientist, High Altitude Observatory; Adjunct Professor, University of Michigan. Major areas of interest include space physics and aeronomy remote sensing, and interdisciplinary science education. B.S., Physics and Astronomy (first class honors), 1972, University College London; Ph.D., Atomic and Molecular Physics, 1975, University College London. University of Michigan: Researcher and Professor of Atmospheric, Oceanic, and Space Sciences, 1978-2000 Director of the Space Physics Research Laboratory 1993-1998 Associate Vice-President for Research, 1997-2000. Visiting senior scientist at NASA Goddard Space Flight Center, 1992. Program Committee, American Association for the Advancement of Science; Council Member, American Meteorological Society; Editor-in-Chief, Journal of Atmospheric and Solar-Terrestrial Physics; Chair, Jerome K.Weisner National Policy Symposium on the Integration of Research and Education, 1999. Authored over 140 publications, 57 in AGU journals. Significant publications include: Interaction of low energy positrons with gaseous atoms and molecules, Atomic Physics, 4, 1975; Energetics and dynamics of the thermosphere, Reviews of Geophysics, 1987; The upper mesosphere and lower thermosphere, AGU Geophysical Monograph, 1995, Excellence in Teaching and Research awards, College of Engineering, University of Michigan; recipient of two NASA Achievement Awards; former chair, NASA Space Physics Subcommittee; former chair, National Science Foundation (NSF) Coupling, Energetics and Dynamics of Atmospheric Regions (CEDAR) program; former member, NSF Advisory Committee for Geosciences, and chair of NSF's Atmospheric Sciences Subcommittee, 1999-2002 member, NASA Earth Science Enterprise Advisory Committee; member of various National Academy of Science/National Research Council Committees; cochair, American Association for the Advancement of Science National Meeting, 2003. AGU service includes: term as associate editor of Journal of Geophysical Research-Space Physics; chair, Panel on International Space Station; Global Climate Change Panel; Federal Budget Review Committee; member of AGU Program, Public Information, Awards, and Public Affairs committees; Chapman Conference Convener and Monograph editor; Section Secretary and Program Chair, Space and Planetary Relations Section; President of Space Physics and Aeronomy Section; AGU Council Member.
NASA Astrophysics Data System (ADS)
Stall, S.
2017-12-01
Integrity and transparency within research is solidified by a complete set of research products that are findable, accessible, interoperable, and reusable. In other words, they follow the FAIR Guidelines developed by FORCE11.org. Your datasets, images, video, software, scripts, models, physical samples, and other tools and technology are an integral part of the narrative you tell about your research. These research products increasingly are being captured through workflow tools and preserved and connected through persistent identifiers across multiple repositories that keep them safe. They help secure, with your publications, the supporting evidence and integrity of the scientific record. This is the direction that Earth and space science as well as other disciplines is moving. Within our community, some science domains are further along, and others are taking more measured steps. AGU as a publisher is working to support the full scientific record with peer reviewed publications. Working with our community and all the Earth and space science journals, AGU is developing new policies to encourage researchers to plan for proper data preservation and provide data citations along with their research submission and to encourage adoption of best practices throughout the research workflow and data life cycle. Providing incentives, community standards, and easy-to-use tools are some important factors for helping researchers embrace the FAIR Guidelines and support transparency and integrity.
Detecting space-time cancer clusters using residential histories
NASA Astrophysics Data System (ADS)
Jacquez, Geoffrey M.; Meliker, Jaymie R.
2007-04-01
Methods for analyzing geographic clusters of disease typically ignore the space-time variability inherent in epidemiologic datasets, do not adequately account for known risk factors (e.g., smoking and education) or covariates (e.g., age, gender, and race), and do not permit investigation of the latency window between exposure and disease. Our research group recently developed Q-statistics for evaluating space-time clustering in cancer case-control studies with residential histories. This technique relies on time-dependent nearest neighbor relationships to examine clustering at any moment in the life-course of the residential histories of cases relative to that of controls. In addition, in place of the widely used null hypothesis of spatial randomness, each individual's probability of being a case is instead based on his/her risk factors and covariates. Case-control clusters will be presented using residential histories of 220 bladder cancer cases and 440 controls in Michigan. In preliminary analyses of this dataset, smoking, age, gender, race and education were sufficient to explain the majority of the clustering of residential histories of the cases. Clusters of unexplained risk, however, were identified surrounding the business address histories of 10 industries that emit known or suspected bladder cancer carcinogens. The clustering of 5 of these industries began in the 1970's and persisted through the 1990's. This systematic approach for evaluating space-time clustering has the potential to generate novel hypotheses about environmental risk factors. These methods may be extended to detect differences in space-time patterns of any two groups of people, making them valuable for security intelligence and surveillance operations.
NASA Astrophysics Data System (ADS)
Lamarche, G.; Le Gonidec, Y.; Lucieer, V.; Lurton, X.; Greinert, J.; Dupré, S.; Nau, A.; Heffron, E.; Roche, M.; Ladroit, Y.; Urban, P.
2017-12-01
Detecting liquid, solid or gaseous features in the ocean is generating considerable interest in the geoscience community, because of their potentially high economic values (oil & gas, mining), their significance for environmental management (oil/gas leakage, biodiversity mapping, greenhouse gas monitoring) as well as their potential cultural and traditional values (food, freshwater). Enhancing people's capability to quantify and manage the natural capital present in the ocean water goes hand in hand with the development of marine acoustic technology, as marine echosounders provide the most reliable and technologically advanced means to develop quantitative studies of water column backscatter data. This is not developed to its full capability because (i) of the complexity of the physics involved in relation to the constantly changing marine environment, and (ii) the rapid technological evolution of high resolution multibeam echosounder (MBES) water-column imaging systems. The Water Column Imaging Working Group is working on a series of multibeam echosounder (MBES) water column datasets acquired in a variety of environments, using a range of frequencies, and imaging a number of water-column features such as gas seeps, oil leaks, suspended particulate matter, vegetation and freshwater springs. Access to data from different acoustic frequencies and ocean dynamics enables us to discuss and test multifrequency approaches which is the most promising means to develop a quantitative analysis of the physical properties of acoustic scatterers, providing rigorous cross calibration of the acoustic devices. In addition, high redundancy of multibeam data, such as is available for some datasets, will allow us to develop data processing techniques, leading to quantitative estimates of water column gas seeps. Each of the datasets has supporting ground-truthing data (underwater videos and photos, physical oceanography measurements) which provide information on the origin and chemistry of the seep content. This is of first importance when assessing the physical properties of water column scatterers from backscatter acoustic measurement.
20th National Solar Physics Meeting
NASA Astrophysics Data System (ADS)
Dorotovic, Ivan
2010-12-01
These proceedings (ISBN: 978-80-85221-68-8) provide an overview of current research on solar physics, geophysics and space weather in the astronomical, geophysical and space physics institutions in the Slovak Republic and the Czech Republic. Several researchers from other countries participated in the meeting as well. The different parts address: solar interior, solar photosphere, chromosphere, corona, total solar eclipses, space weather, instrumentation. Most of the papers are published in Slovak and Czech, respectively. The proceedings are intended for researchers, graduate and PhD. students, workers of astronomical observatories interested in solar physics, geophysics and space weather.
A large-scale dataset of solar event reports from automated feature recognition modules
NASA Astrophysics Data System (ADS)
Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.
2016-05-01
The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.