Phase 1 Free Air CO2 Enrichment Model-Data Synthesis (FACE-MDS): Model Output Data (2015)
Walker, A. P.; De Kauwe, M. G.; Medlyn, B. E.; Zaehle, S.; Asao, S.; Dietze, M.; El-Masri, B.; Hanson, P. J.; Hickler, T.; Jain, A.; Luo, Y.; Parton, W. J.; Prentice, I. C.; Ricciuto, D. M.; Thornton, P. E.; Wang, S.; Wang, Y -P; Warlind, D.; Weng, E.; Oren, R.; Norby, R. J.
2015-01-01
These datasets comprise the model output from phase 1 of the FACE-MDS. These include simulations of the Duke and Oak Ridge experiments and also idealised long-term (300 year) simulations at both sites (please see the modelling protocol for details). Included as part of this dataset are modelling and output protocols. The model datasets are formatted according to the output protocols. Phase 1 datasets are reproduced here for posterity and reproducibility although the model output for the experimental period have been somewhat superseded by the Phase 2 datasets.
Obs4MIPS: Satellite Observations for Model Evaluation
NASA Astrophysics Data System (ADS)
Ferraro, R.; Waliser, D. E.; Gleckler, P. J.
2017-12-01
This poster will review the current status of the obs4MIPs project, whose purpose is to provide a limited collection of well-established and documented datasets for comparison with Earth system models (https://www.earthsystemcog.org/projects/obs4mips/). These datasets have been reformatted to correspond with the CMIP5 model output requirements, and include technical documentation specifically targeted for their use in model output evaluation. The project holdings now exceed 120 datasets with observations that directly correspond to CMIP5 model output variables, with new additions in response to the CMIP6 experiments. With the growth in climate model output data volume, it is increasing more difficult to bring the model output and the observations together to do evaluations. The positioning of the obs4MIPs datasets within the Earth System Grid Federation (ESGF) allows for the use of currently available and planned online tools within the ESGF to perform analysis using model output and observational datasets without necessarily downloading everything to a local workstation. This past year, obs4MIPs has updated its submission guidelines to closely align with changes in the CMIP6 experiments, and is implementing additional indicators and ancillary data to allow users to more easily determine the efficacy of an obs4MIPs dataset for specific evaluation purposes. This poster will present the new guidelines and indicators, and update the list of current obs4MIPs holdings and their connection to the ESGF evaluation and analysis tools currently available, and being developed for the CMIP6 experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
PNNL - WRF-LES - Convective - TTU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
ANL - WRF-LES - Convective - TTU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
LLNL - WRF-LES - Neutral - TTU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
Kosovic, Branko
2018-06-20
This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
LANL - WRF-LES - Neutral - TTU
Kosovic, Branko
2018-06-20
This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
LANL - WRF-LES - Convective - TTU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kosovic, Branko
This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.
Structural identifiability analysis of a cardiovascular system model.
Pironet, Antoine; Dauby, Pierre C; Chase, J Geoffrey; Docherty, Paul D; Revie, James A; Desaive, Thomas
2016-05-01
The six-chamber cardiovascular system model of Burkhoff and Tyberg has been used in several theoretical and experimental studies. However, this cardiovascular system model (and others derived from it) are not identifiable from any output set. In this work, two such cases of structural non-identifiability are first presented. These cases occur when the model output set only contains a single type of information (pressure or volume). A specific output set is thus chosen, mixing pressure and volume information and containing only a limited number of clinically available measurements. Then, by manipulating the model equations involving these outputs, it is demonstrated that the six-chamber cardiovascular system model is structurally globally identifiable. A further simplification is made, assuming known cardiac valve resistances. Because of the poor practical identifiability of these four parameters, this assumption is usual. Under this hypothesis, the six-chamber cardiovascular system model is structurally identifiable from an even smaller dataset. As a consequence, parameter values computed from limited but well-chosen datasets are theoretically unique. This means that the parameter identification procedure can safely be performed on the model from such a well-chosen dataset. Thus, the model may be considered suitable for use in diagnosis. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.
Validation project. This report describes the procedure used to generate the noise models output dataset , and then it compares that dataset to the...benchmark, the Engineer Research and Development Centers Long-Range Sound Propagation dataset . It was found that the models consistently underpredict the
Jeon, Soyoung; Paciorek, Christopher J.; Wehner, Michael F.
2016-02-16
Extreme event attribution characterizes how anthropogenic climate change may have influenced the probability and magnitude of selected individual extreme weather and climate events. Attribution statements often involve quantification of the fraction of attributable risk (FAR) or the risk ratio (RR) and associated confidence intervals. Many such analyses use climate model output to characterize extreme event behavior with and without anthropogenic influence. However, such climate models may have biases in their representation of extreme events. To account for discrepancies in the probabilities of extreme events between observational datasets and model datasets, we demonstrate an appropriate rescaling of the model output basedmore » on the quantiles of the datasets to estimate an adjusted risk ratio. Our methodology accounts for various components of uncertainty in estimation of the risk ratio. In particular, we present an approach to construct a one-sided confidence interval on the lower bound of the risk ratio when the estimated risk ratio is infinity. We demonstrate the methodology using the summer 2011 central US heatwave and output from the Community Earth System Model. In this example, we find that the lower bound of the risk ratio is relatively insensitive to the magnitude and probability of the actual event.« less
Real-time quality monitoring in debutanizer column with regression tree and ANFIS
NASA Astrophysics Data System (ADS)
Siddharth, Kumar; Pathak, Amey; Pani, Ajaya Kumar
2018-05-01
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the output is the butane concentration in the debutanizer column bottom product. The input-output dataset is divided equally into a training (calibration) set and a validation (testing) set. The training set data were used to develop fuzzy inference, adaptive neuro fuzzy (ANFIS) and regression tree models for the debutanizer column. The accuracy of the developed models were evaluated by simulation of the models with the validation dataset. It is observed that the ANFIS model has better estimation accuracy than other models developed in this work and many data-driven models proposed so far in the literature for the debutanizer column.
CMAQ and CMAQ-VBS model outputThis dataset is associated with the following publication:Woody , M., K. Baker , P. Hayes, J. Jimenez, B. Koo, and H. Pye. Understanding sources of organic aerosol during CalNex-2010 using the CMAQ-VBS. Atmospheric Chemistry and Physics. Copernicus Publications, Katlenburg-Lindau, GERMANY, 16: 4081-4100, (2016).
Hydrologic extremes - an intercomparison of multiple gridded statistical downscaling methods
NASA Astrophysics Data System (ADS)
Werner, A. T.; Cannon, A. J.
2015-06-01
Gridded statistical downscaling methods are the main means of preparing climate model data to drive distributed hydrological models. Past work on the validation of climate downscaling methods has focused on temperature and precipitation, with less attention paid to the ultimate outputs from hydrological models. Also, as attention shifts towards projections of extreme events, downscaling comparisons now commonly assess methods in terms of climate extremes, but hydrologic extremes are less well explored. Here, we test the ability of gridded downscaling models to replicate historical properties of climate and hydrologic extremes, as measured in terms of temporal sequencing (i.e., correlation tests) and distributional properties (i.e., tests for equality of probability distributions). Outputs from seven downscaling methods - bias correction constructed analogues (BCCA), double BCCA (DBCCA), BCCA with quantile mapping reordering (BCCAQ), bias correction spatial disaggregation (BCSD), BCSD using minimum/maximum temperature (BCSDX), climate imprint delta method (CI), and bias corrected CI (BCCI) - are used to drive the Variable Infiltration Capacity (VIC) model over the snow-dominated Peace River basin, British Columbia. Outputs are tested using split-sample validation on 26 climate extremes indices (ClimDEX) and two hydrologic extremes indices (3 day peak flow and 7 day peak flow). To characterize observational uncertainty, four atmospheric reanalyses are used as climate model surrogates and two gridded observational datasets are used as downscaling target data. The skill of the downscaling methods generally depended on reanalysis and gridded observational dataset. However, CI failed to reproduce the distribution and BCSD and BCSDX the timing of winter 7 day low flow events, regardless of reanalysis or observational dataset. Overall, DBCCA passed the greatest number of tests for the ClimDEX indices, while BCCAQ, which is designed to more accurately resolve event-scale spatial gradients, passed the greatest number of tests for hydrologic extremes. Non-stationarity in the observational/reanalysis datasets complicated the evaluation of downscaling performance. Comparing temporal homogeneity and trends in climate indices and hydrological model outputs calculated from downscaled reanalyses and gridded observations was useful for diagnosing the reliability of the various historical datasets. We recommend that such analyses be conducted before such data are used to construct future hydro-climatic change scenarios.
Rebaudo, François; Faye, Emile; Dangles, Olivier
2016-01-01
A large body of literature has recently recognized the role of microclimates in controlling the physiology and ecology of species, yet the relevance of fine-scale climatic data for modeling species performance and distribution remains a matter of debate. Using a 6-year monitoring of three potato moth species, major crop pests in the tropical Andes, we asked whether the spatiotemporal resolution of temperature data affect the predictions of models of moth performance and distribution. For this, we used three different climatic data sets: (i) the WorldClim dataset (global dataset), (ii) air temperature recorded using data loggers (weather station dataset), and (iii) air crop canopy temperature (microclimate dataset). We developed a statistical procedure to calibrate all datasets to monthly and yearly variation in temperatures, while keeping both spatial and temporal variances (air monthly temperature at 1 km² for the WorldClim dataset, air hourly temperature for the weather station, and air minute temperature over 250 m radius disks for the microclimate dataset). Then, we computed pest performances based on these three datasets. Results for temperature ranging from 9 to 11°C revealed discrepancies in the simulation outputs in both survival and development rates depending on the spatiotemporal resolution of the temperature dataset. Temperature and simulated pest performances were then combined into multiple linear regression models to compare predicted vs. field data. We used an additional set of study sites to test the ability of the results of our model to be extrapolated over larger scales. Results showed that the model implemented with microclimatic data best predicted observed pest abundances for our study sites, but was less accurate than the global dataset model when performed at larger scales. Our simulations therefore stress the importance to consider different temperature datasets depending on the issue to be solved in order to accurately predict species abundances. In conclusion, keeping in mind that the mismatch between the size of organisms and the scale at which climate data are collected and modeled remains a key issue, temperature dataset selection should be balanced by the desired output spatiotemporal scale for better predicting pest dynamics and developing efficient pest management strategies.
Rebaudo, François; Faye, Emile; Dangles, Olivier
2016-01-01
A large body of literature has recently recognized the role of microclimates in controlling the physiology and ecology of species, yet the relevance of fine-scale climatic data for modeling species performance and distribution remains a matter of debate. Using a 6-year monitoring of three potato moth species, major crop pests in the tropical Andes, we asked whether the spatiotemporal resolution of temperature data affect the predictions of models of moth performance and distribution. For this, we used three different climatic data sets: (i) the WorldClim dataset (global dataset), (ii) air temperature recorded using data loggers (weather station dataset), and (iii) air crop canopy temperature (microclimate dataset). We developed a statistical procedure to calibrate all datasets to monthly and yearly variation in temperatures, while keeping both spatial and temporal variances (air monthly temperature at 1 km² for the WorldClim dataset, air hourly temperature for the weather station, and air minute temperature over 250 m radius disks for the microclimate dataset). Then, we computed pest performances based on these three datasets. Results for temperature ranging from 9 to 11°C revealed discrepancies in the simulation outputs in both survival and development rates depending on the spatiotemporal resolution of the temperature dataset. Temperature and simulated pest performances were then combined into multiple linear regression models to compare predicted vs. field data. We used an additional set of study sites to test the ability of the results of our model to be extrapolated over larger scales. Results showed that the model implemented with microclimatic data best predicted observed pest abundances for our study sites, but was less accurate than the global dataset model when performed at larger scales. Our simulations therefore stress the importance to consider different temperature datasets depending on the issue to be solved in order to accurately predict species abundances. In conclusion, keeping in mind that the mismatch between the size of organisms and the scale at which climate data are collected and modeled remains a key issue, temperature dataset selection should be balanced by the desired output spatiotemporal scale for better predicting pest dynamics and developing efficient pest management strategies. PMID:27148077
Evaluation of precipitation extremes over the Asian domain: observation and modelling studies
NASA Astrophysics Data System (ADS)
Kim, In-Won; Oh, Jaiho; Woo, Sumin; Kripalani, R. H.
2018-04-01
In this study, a comparison in the precipitation extremes as exhibited by the seven reference datasets is made to ascertain whether the inferences based on these datasets agree or they differ. These seven datasets, roughly grouped in three categories i.e. rain-gauge based (APHRODITE, CPC-UNI), satellite-based (TRMM, GPCP1DD) and reanalysis based (ERA-Interim, MERRA, and JRA55), having a common data period 1998-2007 are considered. Focus is to examine precipitation extremes in the summer monsoon rainfall over South Asia, East Asia and Southeast Asia. Measures of extreme precipitation include the percentile thresholds, frequency of extreme precipitation events and other quantities. Results reveal that the differences in displaying extremes among the datasets are small over South Asia and East Asia but large differences among the datasets are displayed over the Southeast Asian region including the maritime continent. Furthermore, precipitation data appear to be more consistent over East Asia among the seven datasets. Decadal trends in extreme precipitation are consistent with known results over South and East Asia. No trends in extreme precipitation events are exhibited over Southeast Asia. Outputs of the Coupled Model Intercomparison Project Phase 5 (CMIP5) simulation data are categorized as high, medium and low-resolution models. The regions displaying maximum intensity of extreme precipitation appear to be dependent on model resolution. High-resolution models simulate maximum intensity of extreme precipitation over the Indian sub-continent, medium-resolution models over northeast India and South China and the low-resolution models over Bangladesh, Myanmar and Thailand. In summary, there are differences in displaying extreme precipitation statistics among the seven datasets considered here and among the 29 CMIP5 model data outputs.
The Radiological Physics Center's standard dataset for small field size output factors.
Followill, David S; Kry, Stephen F; Qin, Lihong; Lowenstein, Jessica; Molineu, Andrea; Alvarez, Paola; Aguirre, Jose Francisco; Ibbott, Geoffrey S
2012-08-08
Delivery of accurate intensity-modulated radiation therapy (IMRT) or stereotactic radiotherapy depends on a multitude of steps in the treatment delivery process. These steps range from imaging of the patient to dose calculation to machine delivery of the treatment plan. Within the treatment planning system's (TPS) dose calculation algorithm, various unique small field dosimetry parameters are essential, such as multileaf collimator modeling and field size dependence of the output. One of the largest challenges in this process is determining accurate small field size output factors. The Radiological Physics Center (RPC), as part of its mission to ensure that institutions deliver comparable and consistent radiation doses to their patients, conducts on-site dosimetry review visits to institutions. As a part of the on-site audit, the RPC measures the small field size output factors as might be used in IMRT treatments, and compares the resulting field size dependent output factors to values calculated by the institution's treatment planning system (TPS). The RPC has gathered multiple small field size output factor datasets for X-ray energies ranging from 6 to 18 MV from Varian, Siemens and Elekta linear accelerators. These datasets were measured at 10 cm depth and ranged from 10 × 10 cm(2) to 2 × 2 cm(2). The field sizes were defined by the MLC and for the Varian machines the secondary jaws were maintained at a 10 × 10 cm(2). The RPC measurements were made with a micro-ion chamber whose volume was small enough to gather a full ionization reading even for the 2 × 2 cm(2) field size. The RPC-measured output factors are tabulated and are reproducible with standard deviations (SD) ranging from 0.1% to 1.5%, while the institutions' calculated values had a much larger SD range, ranging up to 7.9% [corrected].The absolute average percent differences were greater for the 2 × 2 cm(2) than for the other field sizes. The RPC's measured small field output factors provide institutions with a standard dataset against which to compare their TPS calculated values. Any discrepancies noted between the standard dataset and calculated values should be investigated with careful measurements and with attention to the specific beam model.
Exploring the potential of machine learning to break deadlock in convection parameterization
NASA Astrophysics Data System (ADS)
Pritchard, M. S.; Gentine, P.
2017-12-01
We explore the potential of modern machine learning tools (via TensorFlow) to replace parameterization of deep convection in climate models. Our strategy begins by generating a large ( 1 Tb) training dataset from time-step level (30-min) output harvested from a one-year integration of a zonally symmetric, uniform-SST aquaplanet integration of the SuperParameterized Community Atmosphere Model (SPCAM). We harvest the inputs and outputs connecting each of SPCAM's 8,192 embedded cloud-resolving model (CRM) arrays to its host climate model's arterial thermodynamic state variables to afford 143M independent training instances. We demonstrate that this dataset is sufficiently large to induce preliminary convergence for neural network prediction of desired outputs of SP, i.e. CRM-mean convective heating and moistening profiles. Sensitivity of the machine learning convergence to the nuances of the TensorFlow implementation are discussed, as well as results from pilot tests from the neural network operating inline within the SPCAM as a replacement to the (super)parameterization of convection.
Caliver: An R package for CALIbration and VERification of forest fire gridded model outputs.
Vitolo, Claudia; Di Giuseppe, Francesca; D'Andrea, Mirko
2018-01-01
The name caliver stands for CALIbration and VERification of forest fire gridded model outputs. This is a package developed for the R programming language and available under an APACHE-2 license from a public repository. In this paper we describe the functionalities of the package and give examples using publicly available datasets. Fire danger model outputs are taken from the modeling components of the European Forest Fire Information System (EFFIS) and observed burned areas from the Global Fire Emission Database (GFED). Complete documentation, including a vignette, is also available within the package.
Caliver: An R package for CALIbration and VERification of forest fire gridded model outputs
Di Giuseppe, Francesca; D’Andrea, Mirko
2018-01-01
The name caliver stands for CALIbration and VERification of forest fire gridded model outputs. This is a package developed for the R programming language and available under an APACHE-2 license from a public repository. In this paper we describe the functionalities of the package and give examples using publicly available datasets. Fire danger model outputs are taken from the modeling components of the European Forest Fire Information System (EFFIS) and observed burned areas from the Global Fire Emission Database (GFED). Complete documentation, including a vignette, is also available within the package. PMID:29293536
Improving Snow Modeling by Assimilating Observational Data Collected by Citizen Scientists
NASA Astrophysics Data System (ADS)
Crumley, R. L.; Hill, D. F.; Arendt, A. A.; Wikstrom Jones, K.; Wolken, G. J.; Setiawan, L.
2017-12-01
Modeling seasonal snow pack in alpine environments includes a multiplicity of challenges caused by a lack of spatially extensive and temporally continuous observational datasets. This is partially due to the difficulty of collecting measurements in harsh, remote environments where extreme gradients in topography exist, accompanied by large model domains and inclement weather. Engaging snow enthusiasts, snow professionals, and community members to participate in the process of data collection may address some of these challenges. In this study, we use SnowModel to estimate seasonal snow water equivalence (SWE) in the Thompson Pass region of Alaska while incorporating snow depth measurements collected by citizen scientists. We develop a modeling approach to assimilate hundreds of snow depth measurements from participants in the Community Snow Observations (CSO) project (www.communitysnowobs.org). The CSO project includes a mobile application where participants record and submit geo-located snow depth measurements while working and recreating in the study area. These snow depth measurements are randomly located within the model grid at irregular time intervals over the span of four months in the 2017 water year. This snow depth observation dataset is converted into a SWE dataset by employing an empirically-based, bulk density and SWE estimation method. We then assimilate this data using SnowAssim, a sub-model within SnowModel, to constrain the SWE output by the observed data. Multiple model runs are designed to represent an array of output scenarios during the assimilation process. An effort to present model output uncertainties is included, as well as quantification of the pre- and post-assimilation divergence in modeled SWE. Early results reveal pre-assimilation SWE estimations are consistently greater than the post-assimilation estimations, and the magnitude of divergence increases throughout the snow pack evolution period. This research has implications beyond the Alaskan context because it increases our ability to constrain snow modeling outputs by making use of snow measurements collected by non-expert, citizen scientists.
A Hierarchical multi-input and output Bi-GRU Model for Sentiment Analysis on Customer Reviews
NASA Astrophysics Data System (ADS)
Zhang, Liujie; Zhou, Yanquan; Duan, Xiuyu; Chen, Ruiqi
2018-03-01
Multi-label sentiment classification on customer reviews is a practical challenging task in Natural Language Processing. In this paper, we propose a hierarchical multi-input and output model based bi-directional recurrent neural network, which both considers the semantic and lexical information of emotional expression. Our model applies two independent Bi-GRU layer to generate part of speech and sentence representation. Then the lexical information is considered via attention over output of softmax activation on part of speech representation. In addition, we combine probability of auxiliary labels as feature with hidden layer to capturing crucial correlation between output labels. The experimental result shows that our model is computationally efficient and achieves breakthrough improvements on customer reviews dataset.
A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets
NASA Astrophysics Data System (ADS)
Porwal, A.; Carranza, J.; Hale, M.
2004-12-01
A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.
Kim, Ki Hwan; Park, Sung-Hong
2017-04-01
The balanced steady-state free precession (bSSFP) MR sequence is frequently used in clinics, but is sensitive to off-resonance effects, which can cause banding artifacts. Often multiple bSSFP datasets are acquired at different phase cycling (PC) angles and then combined in a special way for banding artifact suppression. Many strategies of combining the datasets have been suggested for banding artifact suppression, but there are still limitations in their performance, especially when the number of phase-cycled bSSFP datasets is small. The purpose of this study is to develop a learning-based model to combine the multiple phase-cycled bSSFP datasets for better banding artifact suppression. Multilayer perceptron (MLP) is a feedforward artificial neural network consisting of three layers of input, hidden, and output layers. MLP models were trained by input bSSFP datasets acquired from human brain and knee at 3T, which were separately performed for two and four PC angles. Banding-free bSSFP images were generated by maximum-intensity projection (MIP) of 8 or 12 phase-cycled datasets and were used as targets for training the output layer. The trained MLP models were applied to another brain and knee datasets acquired with different scan parameters and also to multiple phase-cycled bSSFP functional MRI datasets acquired on rat brain at 9.4T, in comparison with the conventional MIP method. Simulations were also performed to validate the MLP approach. Both the simulations and human experiments demonstrated that MLP suppressed banding artifacts significantly, superior to MIP in both banding artifact suppression and SNR efficiency. MLP demonstrated superior performance over MIP for the 9.4T fMRI data as well, which was not used for training the models, while visually preserving the fMRI maps very well. Artificial neural network is a promising technique for combining multiple phase-cycled bSSFP datasets for banding artifact suppression. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Van Pelt, S.; Kohfeld, K. E.; Allen, D. M.
2015-12-01
The decline of the Mayan Civilization is thought to be caused by a series of droughts that affected the Yucatan Peninsula during the Terminal Classic Period (T.C.P.) 800-1000 AD. The goals of this study are two-fold: (a) to compare paleo-model simulations of the past 1000 years with a compilation of multiple proxies of changes in moisture conditions for the Yucatan Peninsula during the T.C.P. and (b) to use this comparison to inform the modeling of groundwater recharge in this region, with a focus on generating the daily climate data series needed as input to a groundwater recharge model. To achieve the first objective, we compiled a dataset of 5 proxies from seven locations across the Yucatan Peninsula, to be compared with temperature and precipitation output from the Community Climate System Model Version 4 (CCSM4), which is part of the Coupled Model Intercomparison Project Phase 5 (CMIP5) past1000 experiment. The proxy dataset includes oxygen isotopes from speleothems and gastropod/ostrocod shells (11 records); and sediment density, mineralogy, and magnetic susceptibility records from lake sediment cores (3 records). The proxy dataset is supplemented by a compilation of reconstructed temperatures using pollen and tree ring records for North America (archived in the PAGES2k global network data). Our preliminary analysis suggests that many of these datasets show evidence of drier and warmer climate on the Yucatan Peninsula around the T.C.P. when compared to modern conditions, although the amplitude and timing of individual warming and drying events varies between sites. This comparison with modeled output will ultimately be used to inform backward shift factors that will be input to a stochastic weather generator. These shift factors will be based on monthly changes in temperature and precipitation and applied to a modern daily climate time series for the Yucatan Peninsula to produce a daily climate time series for the T.C.P.
The link provided access to all the datasets and metadata used in this manuscript for the model development and evaluation per Geoscientific Model Development's publication guidelines with the exception of the model output due to its size. This dataset is associated with the following publication:Bash , J., K. Baker , and M. Beaver. Evaluation of improved land use and canopy representation in BEIS v3.61 with biogenic VOC measurements in California. Geoscientific Model Development. Copernicus Publications, Katlenburg-Lindau, GERMANY, 9: 2191-2207, (2016).
NASA Technical Reports Server (NTRS)
Case, Jonathan L.; Kumar, Sujay V.; Kuliogwski, Robert J.; Langston, Carrie
2013-01-01
This paper and poster presented a description of the current real-time SPoRT-LIS run over the southeastern CONUS to provide high-resolution, land surface initialization grids for local numerical model forecasts at NWS forecast offices. The LIS hourly output also offers a supplemental dataset to aid in situational awareness for convective initiation forecasts, assessing flood potential, and monitoring drought at fine scales. It is a goal of SPoRT and several NWS forecast offices to expand the LIS to an entire CONUS domain, so that LIS output can be utilized by NWS Western Region offices, among others. To make this expansion cleanly so as to provide high-quality land surface output, SPoRT tested new precipitation datasets in LIS as an alternative forcing dataset to the current radar+gauge Stage IV product. Similar to the Stage IV product, the NMQ product showed comparable patterns of precipitation and soil moisture distribution, but suffered from radar gaps in the intermountain West, and incorrectly set values to zero instead of missing in the data-void regions of Mexico and Canada. The other dataset tested was the next-generation GOES-R QPE algorithm, which experienced a high bias in both coverage and intensity of accumulated precipitation relative to the control (NLDAS2), Stage IV, and NMQ simulations. The resulting root zone soil moisture was substantially higher in most areas.
User's Guide for the Agricultural Non-Point Source (AGNPS) Pollution Model Data Generator
Finn, Michael P.; Scheidt, Douglas J.; Jaromack, Gregory M.
2003-01-01
BACKGROUND Throughout this user guide, we refer to datasets that we used in conjunction with developing of this software for supporting cartographic research and producing the datasets to conduct research. However, this software can be used with these datasets or with more 'generic' versions of data of the appropriate type. For example, throughout the guide, we refer to national land cover data (NLCD) and digital elevation model (DEM) data from the U.S. Geological Survey (USGS) at a 30-m resolution, but any digital terrain model or land cover data at any appropriate resolution will produce results. Another key point to keep in mind is to use a consistent data resolution for all the datasets per model run. The U.S. Department of Agriculture (USDA) developed the Agricultural Nonpoint Source (AGNPS) pollution model of watershed hydrology in response to the complex problem of managing nonpoint sources of pollution. AGNPS simulates the behavior of runoff, sediment, and nutrient transport from watersheds that have agriculture as their prime use. The model operates on a cell basis and is a distributed parameter, event-based model. The model requires 22 input parameters. Output parameters are grouped primarily by hydrology, sediment, and chemical output (Young and others, 1995.) Elevation, land cover, and soil are the base data from which to extract the 22 input parameters required by the AGNPS. For automatic parameter extraction, follow the general process described in this guide of extraction from the geospatial data through the AGNPS Data Generator to generate input parameters required by the pollution model (Finn and others, 2002.)
BMDExpress Data Viewer: A Visualization Tool to Analyze ...
Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure in human risk assessments. BMDExpress applies BMD modeling to transcriptomics datasets and groups genes to biological processes and pathways for rapid assessment of doses at which biological perturbations occur. However, graphing and analytical capabilities within BMDExpress are limited, and the analysis of output files is challenging. We developed a web-based application, BMDExpress Data Viewer, for visualization and graphical analyses of BMDExpress output files. The software application consists of two main components: ‘Summary Visualization Tools’ and ‘Dataset Exploratory Tools’. We demonstrate through two case studies that the ‘Summary Visualization Tools’ can be used to examine and assess the distributions of probe and pathway BMD outputs, as well as derive a potential regulatory BMD through the modes or means of the distributions. The ‘Functional Enrichment Analysis’ tool presents biological processes in a two-dimensional bubble chart view. By applying filters of pathway enrichment p-value and minimum number of significant genes, we showed that the Functional Enrichment Analysis tool can be applied to select pathways that are potentially sensitive to chemical perturbations. The ‘Multiple Dataset Comparison’ tool enables comparison of BMDs across multiple experiments (e.g., across time points, tissues, or organisms, etc.). The ‘BMDL-BM
Paleoclimate reconstruction through Bayesian data assimilation
NASA Astrophysics Data System (ADS)
Fer, I.; Raiho, A.; Rollinson, C.; Dietze, M.
2017-12-01
Methods of paleoclimate reconstruction from plant-based proxy data rely on assumptions of static vegetation-climate link which is often established between modern climate and vegetation. This approach might result in biased climate constructions as it does not account for vegetation dynamics. Predictive tools such as process-based dynamic vegetation models (DVM) and their Bayesian inversion could be used to construct the link between plant-based proxy data and palaeoclimate more realistically. In other words, given the proxy data, it is possible to infer the climate that could result in that particular vegetation composition, by comparing the DVM outputs to the proxy data within a Bayesian state data assimilation framework. In this study, using fossil pollen data from five sites across the northern hardwood region of the US, we assimilate fractional composition and aboveground biomass into dynamic vegetation models, LINKAGES, LPJ-GUESS and ED2. To do this, starting from 4 Global Climate Model outputs, we generate an ensemble of downscaled meteorological drivers for the period 850-2015. Then, as a first pass, we weigh these ensembles based on their fidelity with independent paleoclimate proxies. Next, we run the models with this ensemble of drivers, and comparing the ensemble model output to the vegetation data, adjust the model state estimates towards the data. At each iteration, we also reweight the climate values that make the model and data consistent, producing a reconstructed climate time-series dataset. We validated the method using present-day datasets, as well as a synthetic dataset, and then assessed the consistency of results across ecosystem models. Our method allows the combination of multiple data types to reconstruct the paleoclimate, with associated uncertainty estimates, based on ecophysiological and ecological processes rather than phenomenological correlations with proxy data.
Emulation: A fast stochastic Bayesian method to eliminate model space
NASA Astrophysics Data System (ADS)
Roberts, Alan; Hobbs, Richard; Goldstein, Michael
2010-05-01
Joint inversion of large 3D datasets has been the goal of geophysicists ever since the datasets first started to be produced. There are two broad approaches to this kind of problem, traditional deterministic inversion schemes and more recently developed Bayesian search methods, such as MCMC (Markov Chain Monte Carlo). However, using both these kinds of schemes has proved prohibitively expensive, both in computing power and time cost, due to the normally very large model space which needs to be searched using forward model simulators which take considerable time to run. At the heart of strategies aimed at accomplishing this kind of inversion is the question of how to reliably and practicably reduce the size of the model space in which the inversion is to be carried out. Here we present a practical Bayesian method, known as emulation, which can address this issue. Emulation is a Bayesian technique used with considerable success in a number of technical fields, such as in astronomy, where the evolution of the universe has been modelled using this technique, and in the petroleum industry where history matching is carried out of hydrocarbon reservoirs. The method of emulation involves building a fast-to-compute uncertainty-calibrated approximation to a forward model simulator. We do this by modelling the output data from a number of forward simulator runs by a computationally cheap function, and then fitting the coefficients defining this function to the model parameters. By calibrating the error of the emulator output with respect to the full simulator output, we can use this to screen out large areas of model space which contain only implausible models. For example, starting with what may be considered a geologically reasonable prior model space of 10000 models, using the emulator we can quickly show that only models which lie within 10% of that model space actually produce output data which is plausibly similar in character to an observed dataset. We can thus much more tightly constrain the input model space for a deterministic inversion or MCMC method. By using this technique jointly on several datasets (specifically seismic, gravity, and magnetotelluric (MT) describing the same region), we can include in our modelling uncertainties in the data measurements, the relationships between the various physical parameters involved, as well as the model representation uncertainty, and at the same time further reduce the range of plausible models to several percent of the original model space. Being stochastic in nature, the output posterior parameter distributions also allow our understanding of/beliefs about a geological region can be objectively updated, with full assessment of uncertainties, and so the emulator is also an inversion-type tool in it's own right, with the advantage (as with any Bayesian method) that our uncertainties from all sources (both data and model) can be fully evaluated.
Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets
Huser, Vojtech; DeFalco, Frank J.; Schuemie, Martijn; Ryan, Patrick B.; Shang, Ning; Velez, Mark; Park, Rae Woong; Boyce, Richard D.; Duke, Jon; Khare, Ritu; Utidjian, Levon; Bailey, Charles
2016-01-01
Introduction: Data quality and fitness for analysis are crucial if outputs of analyses of electronic health record data or administrative claims data should be trusted by the public and the research community. Methods: We describe a data quality analysis tool (called Achilles Heel) developed by the Observational Health Data Sciences and Informatics Collaborative (OHDSI) and compare outputs from this tool as it was applied to 24 large healthcare datasets across seven different organizations. Results: We highlight 12 data quality rules that identified issues in at least 10 of the 24 datasets and provide a full set of 71 rules identified in at least one dataset. Achilles Heel is a freely available software that provides a useful starter set of data quality rules with the ability to add additional rules. We also present results of a structured email-based interview of all participating sites that collected qualitative comments about the value of Achilles Heel for data quality evaluation. Discussion: Our analysis represents the first comparison of outputs from a data quality tool that implements a fixed (but extensible) set of data quality rules. Thanks to a common data model, we were able to compare quickly multiple datasets originating from several countries in America, Europe and Asia. PMID:28154833
Use of regional climate model output for hydrologic simulations
Hay, L.E.; Clark, M.P.; Wilby, R.L.; Gutowski, W.J.; Leavesley, G.H.; Pan, Z.; Arritt, R.W.; Takle, E.S.
2002-01-01
Daily precipitation and maximum and minimum temperature time series from a regional climate model (RegCM2) configured using the continental United States as a domain and run on a 52-km (approximately) spatial resolution were used as input to a distributed hydrologic model for one rainfall-dominated basin (Alapaha River at Statenville, Georgia) and three snowmelt-dominated basins (Animas River at Durango. Colorado; east fork of the Carson River near Gardnerville, Nevada: and Cle Elum River near Roslyn, Washington). For comparison purposes, spatially averaged daily datasets of precipitation and maximum and minimum temperature were developed from measured data for each basin. These datasets included precipitation and temperature data for all stations (hereafter, All-Sta) located within the area of the RegCM2 output used for each basin, but excluded station data used to calibrate the hydrologic model. Both the RegCM2 output and All-Sta data capture the gross aspects of the seasonal cycles of precipitation and temperature. However, in all four basins, the RegCM2- and All-Sta-based simulations of runoff show little skill on a daily basis [Nash-Sutcliffe (NS) values range from 0.05 to 0.37 for RegCM2 and -0.08 to 0.65 for All-Sta]. When the precipitation and temperature biases are corrected in the RegCM2 output and All-Sta data (Bias-RegCM2 and Bias-All, respectively) the accuracy of the daily runoff simulations improve dramatically for the snowmelt-dominated basins (NS values range from 0.41 to 0.66 for RegCM2 and 0.60 to 0.76 for All-Sta). In the rainfall-dominated basin, runoff simulations based on the Bias-RegCM2 output show no skill (NS value of 0.09) whereas Bias-All simulated runoff improves (NS value improved from - 0.08 to 0.72). These results indicate that measured data at the coarse resolution of the RegCM2 output can be made appropriate for basin-scale modeling through bias correction (essentially a magnitude correction). However, RegCM2 output, even when bias corrected, does not contain the day-to-day variability present in the All-Sta dataset that is necessary for basin-scale modeling. Future work is warranted to identify the causes for systematic biases in RegCM2 simulations, develop methods to remove the biases, and improve RegCM2 simulations of daily variability in local climate.
NASA Technical Reports Server (NTRS)
Alacron, Vladimir J.; Nigro, Joseph D.; McAnally, William H.; OHara, Charles G.; Engman, Edwin Ted; Toll, David
2011-01-01
This paper documents the use of simulated Moderate Resolution Imaging Spectroradiometer land use/land cover (MODIS-LULC), NASA-LIS generated precipitation and evapo-transpiration (ET), and Shuttle Radar Topography Mission (SRTM) datasets (in conjunction with standard land use, topographical and meteorological datasets) as input to hydrological models routinely used by the watershed hydrology modeling community. The study is focused in coastal watersheds in the Mississippi Gulf Coast although one of the test cases focuses in an inland watershed located in northeastern State of Mississippi, USA. The decision support tools (DSTs) into which the NASA datasets were assimilated were the Soil Water & Assessment Tool (SWAT) and the Hydrological Simulation Program FORTRAN (HSPF). These DSTs are endorsed by several US government agencies (EPA, FEMA, USGS) for water resources management strategies. These models use physiographic and meteorological data extensively. Precipitation gages and USGS gage stations in the region were used to calibrate several HSPF and SWAT model applications. Land use and topographical datasets were swapped to assess model output sensitivities. NASA-LIS meteorological data were introduced in the calibrated model applications for simulation of watershed hydrology for a time period in which no weather data were available (1997-2006). The performance of the NASA datasets in the context of hydrological modeling was assessed through comparison of measured and model-simulated hydrographs. Overall, NASA datasets were as useful as standard land use, topographical , and meteorological datasets. Moreover, NASA datasets were used for performing analyses that the standard datasets could not made possible, e.g., introduction of land use dynamics into hydrological simulations
Evaluation of Statistical Downscaling Skill at Reproducing Extreme Events
NASA Astrophysics Data System (ADS)
McGinnis, S. A.; Tye, M. R.; Nychka, D. W.; Mearns, L. O.
2015-12-01
Climate model outputs usually have much coarser spatial resolution than is needed by impacts models. Although higher resolution can be achieved using regional climate models for dynamical downscaling, further downscaling is often required. The final resolution gap is often closed with a combination of spatial interpolation and bias correction, which constitutes a form of statistical downscaling. We use this technique to downscale regional climate model data and evaluate its skill in reproducing extreme events. We downscale output from the North American Regional Climate Change Assessment Program (NARCCAP) dataset from its native 50-km spatial resolution to the 4-km resolution of University of Idaho's METDATA gridded surface meterological dataset, which derives from the PRISM and NLDAS-2 observational datasets. We operate on the major variables used in impacts analysis at a daily timescale: daily minimum and maximum temperature, precipitation, humidity, pressure, solar radiation, and winds. To interpolate the data, we use the patch recovery method from the Earth System Modeling Framework (ESMF) regridding package. We then bias correct the data using Kernel Density Distribution Mapping (KDDM), which has been shown to exhibit superior overall performance across multiple metrics. Finally, we evaluate the skill of this technique in reproducing extreme events by comparing raw and downscaled output with meterological station data in different bioclimatic regions according to the the skill scores defined by Perkins et al in 2013 for evaluation of AR4 climate models. We also investigate techniques for improving bias correction of values in the tails of the distributions. These techniques include binned kernel density estimation, logspline kernel density estimation, and transfer functions constructed by fitting the tails with a generalized pareto distribution.
Black Carbon Concentration from Worldwide Aerosol Robotic Network (AERONET) Measurements
NASA Technical Reports Server (NTRS)
Schuster, Gregory L.; Dubovik, Oleg; Holben, Brent N.; Clothiaux, Eugene E.
2006-01-01
The carbon emissions inventories used to initialize transport models and general circulation models are highly parameterized, and created on the basis of multiple sparse datasets (such as fuel use inventories and emission factors). The resulting inventories are uncertain by at least a factor of 2, and this uncertainty is carried forward to the model output. [Bond et al., 1998, Bond et al., 2004, Cooke et al., 1999, Streets et al., 2001] Worldwide black carbon concentration measurements are needed to assess the efficacy of the carbon emissions inventory and transport model output on a continuous basis.
A Generalized Mixture Framework for Multi-label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We develop a novel probabilistic ensemble framework for multi-label classification that is based on the mixtures-of-experts architecture. In this framework, we combine multi-label classification models in the classifier chains family that decompose the class posterior distribution P(Y1, …, Yd|X) using a product of posterior distributions over components of the output space. Our approach captures different input–output and output–output relations that tend to change across data. As a result, we can recover a rich set of dependency relations among inputs and outputs that a single multi-label classification model cannot capture due to its modeling simplifications. We develop and present algorithms for learning the mixtures-of-experts models from data and for performing multi-label predictions on unseen data instances. Experiments on multiple benchmark datasets demonstrate that our approach achieves highly competitive results and outperforms the existing state-of-the-art multi-label classification methods. PMID:26613069
Application of support vector machines for copper potential mapping in Kerman region, Iran
NASA Astrophysics Data System (ADS)
Shabankareh, Mahdi; Hezarkhani, Ardeshir
2017-04-01
The first step in systematic exploration studies is mineral potential mapping, which involves classification of the study area to favorable and unfavorable parts. Support vector machines (SVM) are designed for supervised classification based on statistical learning theory. This method named support vector classification (SVC). This paper describes SVC model, which combine exploration data in the regional-scale for copper potential mapping in Kerman copper bearing belt in south of Iran. Data layers or evidential maps were in six datasets namely lithology, tectonic, airborne geophysics, ferric alteration, hydroxide alteration and geochemistry. The SVC modeling result selected 2220 pixels as favorable zones, approximately 25 percent of the study area. Besides, 66 out of 86 copper indices, approximately 78.6% of all, were located in favorable zones. Other main goal of this study was to determine how each input affects favorable output. For this purpose, the histogram of each normalized input data to its favorable output was drawn. The histograms of each input dataset for favorable output showed that each information layer had a certain pattern. These patterns of SVC results could be considered as regional copper exploration characteristics.
Impact of length of calibration period on the apex model output simulation performance
USDA-ARS?s Scientific Manuscript database
Datasets from long-term monitoring sites that can be used for calibration and validation of hydrologic and water quality models are rare due to resource constraints. As a result, hydrologic and water quality models are calibrated and, when possible, validated using short-term measured data. A previo...
NASA Astrophysics Data System (ADS)
Ma, X.; Yoshikane, T.; Hara, M.; Adachi, S. A.; Wakazuki, Y.; Kawase, H.; Kimura, F.
2014-12-01
To check the influence of boundary input data on a modeling result, we had a numerical investigation of river discharge by using runoff data derived by a regional climate model with a 4.5-km resolution as input data to a hydrological model. A hindcast experiment, which to reproduce the current climate was carried out for the two decades, 1980s and 1990s. We used the Advanced Research WRF (ARW) (ver. 3.2.1) with a two-way nesting technique and the WRF single-moment 6-class microphysics scheme. Noah-LSM is adopted to simulate the land surface process. The NCEP/NCAR and ERA-Interim 6-hourly reanalysis datasets were used as the lateral boundary condition for the runs, respectively. The output variables used for river discharge simulation from the WRF model were underground runoff and surface runoff. Four rivers (Mogami, Agano, Jinzu and Tone) were selected in this study. The results showed that the characteristic of river discharge in seasonal variation could be represented and there were overestimated compared with measured one.
Use of Advanced Meteorological Model Output for Coastal Ocean Modeling in Puget Sound
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Zhaoqing; Khangaonkar, Tarang; Wang, Taiping
2011-06-01
It is a great challenge to specify meteorological forcing in estuarine and coastal circulation modeling using observed data because of the lack of complete datasets. As a result of this limitation, water temperature is often not simulated in estuarine and coastal modeling, with the assumption that density-induced currents are generally dominated by salinity gradients. However, in many situations, temperature gradients could be sufficiently large to influence the baroclinic motion. In this paper, we present an approach to simulate water temperature using outputs from advanced meteorological models. This modeling approach was applied to simulate annual variations of water temperatures of Pugetmore » Sound, a fjordal estuary in the Pacific Northwest of USA. Meteorological parameters from North American Region Re-analysis (NARR) model outputs were evaluated with comparisons to observed data at real-time meteorological stations. Model results demonstrated that NARR outputs can be used to drive coastal ocean models for realistic simulations of long-term water-temperature distributions in Puget Sound. Model results indicated that the net flux from NARR can be further improved with the additional information from real-time observations.« less
Scaling of global input-output networks
NASA Astrophysics Data System (ADS)
Liang, Sai; Qi, Zhengling; Qu, Shen; Zhu, Ji; Chiu, Anthony S. F.; Jia, Xiaoping; Xu, Ming
2016-06-01
Examining scaling patterns of networks can help understand how structural features relate to the behavior of the networks. Input-output networks consist of industries as nodes and inter-industrial exchanges of products as links. Previous studies consider limited measures for node strengths and link weights, and also ignore the impact of dataset choice. We consider a comprehensive set of indicators in this study that are important in economic analysis, and also examine the impact of dataset choice, by studying input-output networks in individual countries and the entire world. Results show that Burr, Log-Logistic, Log-normal, and Weibull distributions can better describe scaling patterns of global input-output networks. We also find that dataset choice has limited impacts on the observed scaling patterns. Our findings can help examine the quality of economic statistics, estimate missing data in economic statistics, and identify key nodes and links in input-output networks to support economic policymaking.
Knijnenburg, Theo A; Klau, Gunnar W; Iorio, Francesco; Garnett, Mathew J; McDermott, Ultan; Shmulevich, Ilya; Wessels, Lodewyk F A
2016-11-23
Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present 'Logic Optimization for Binary Input to Continuous Output' (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models.
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...
2017-11-06
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
This dataset contains the output for modeling runs that were performed to investigate the effectiveness of various technologies and lay the groundwork for the formulation of policies for reducing methane emissions. See the full report at http://www.epa.gov/methane/projections.html.
Figures 1-10 and Table 1This dataset is associated with the following publication:Chang, S.Y., S. Arunachalam, A. Valencia, B. Naess, V. Isakov , M. Breen , T. Palma , and W. Vizuete. A modeling framework for characterizing near-road air pollutant concentration at community scales. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 538: 905-921, (2015).
NASA Astrophysics Data System (ADS)
Sun, L. Qing; Feng, Feng X.
2014-11-01
In this study, we first built and compared two different climate datasets for Wuling mountainous area in 2010, one of which considered topographical effects during the ANUSPLIN interpolation was referred as terrain-based climate dataset, while the other one did not was called ordinary climate dataset. Then, we quantified the topographical effects of climatic inputs on NPP estimation by inputting two different climate datasets to the same ecosystem model, the Boreal Ecosystem Productivity Simulator (BEPS), to evaluate the importance of considering relief when estimating NPP. Finally, we found the primary contributing variables to the topographical effects through a series of experiments given an overall accuracy of the model output for NPP. The results showed that: (1) The terrain-based climate dataset presented more reliable topographic information and had closer agreements with the station dataset than the ordinary climate dataset at successive time series of 365 days in terms of the daily mean values. (2) On average, ordinary climate dataset underestimated NPP by 12.5% compared with terrain-based climate dataset over the whole study area. (3) The primary climate variables contributing to the topographical effects of climatic inputs for Wuling mountainous area were temperatures, which suggest that it is necessary to correct temperature differences for estimating NPP accurately in such a complex terrain.
Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.
She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng
2015-01-01
Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.
Towards systematic evaluation of crop model outputs for global land-use models
NASA Astrophysics Data System (ADS)
Leclere, David; Azevedo, Ligia B.; Skalský, Rastislav; Balkovič, Juraj; Havlík, Petr
2016-04-01
Land provides vital socioeconomic resources to the society, however at the cost of large environmental degradations. Global integrated models combining high resolution global gridded crop models (GGCMs) and global economic models (GEMs) are increasingly being used to inform sustainable solution for agricultural land-use. However, little effort has yet been done to evaluate and compare the accuracy of GGCM outputs. In addition, GGCM datasets require a large amount of parameters whose values and their variability across space are weakly constrained: increasing the accuracy of such dataset has a very high computing cost. Innovative evaluation methods are required both to ground credibility to the global integrated models, and to allow efficient parameter specification of GGCMs. We propose an evaluation strategy for GGCM datasets in the perspective of use in GEMs, illustrated with preliminary results from a novel dataset (the Hypercube) generated by the EPIC GGCM and used in the GLOBIOM land use GEM to inform on present-day crop yield, water and nutrient input needs for 16 crops x 15 management intensities, at a spatial resolution of 5 arc-minutes. We adopt the following principle: evaluation should provide a transparent diagnosis of model adequacy for its intended use. We briefly describe how the Hypercube data is generated and how it articulates with GLOBIOM in order to transparently identify the performances to be evaluated, as well as the main assumptions and data processing involved. Expected performances include adequately representing the sub-national heterogeneity in crop yield and input needs: i) in space, ii) across crop species, and iii) across management intensities. We will present and discuss measures of these expected performances and weight the relative contribution of crop model, input data and data processing steps in performances. We will also compare obtained yield gaps and main yield-limiting factors against the M3 dataset. Next steps include iterative improvement of parameter assumptions and evaluation of implications of GGCM performances for intended use in the IIASA EPIC-GLOBIOM model cluster. Our approach helps targeting future efforts at improving GGCM accuracy and would achieve highest efficiency if combined with traditional field-scale evaluation and sensitivity analysis.
Pandey, Daya Shankar; Das, Saptarshi; Pan, Indranil; Leahy, James J; Kwapinski, Witold
2016-12-01
In this paper, multi-layer feed forward neural networks are used to predict the lower heating value of gas (LHV), lower heating value of gasification products including tars and entrained char (LHV p ) and syngas yield during gasification of municipal solid waste (MSW) during gasification in a fluidized bed reactor. These artificial neural networks (ANNs) with different architectures are trained using the Levenberg-Marquardt (LM) back-propagation algorithm and a cross validation is also performed to ensure that the results generalise to other unseen datasets. A rigorous study is carried out on optimally choosing the number of hidden layers, number of neurons in the hidden layer and activation function in a network using multiple Monte Carlo runs. Nine input and three output parameters are used to train and test various neural network architectures in both multiple output and single output prediction paradigms using the available experimental datasets. The model selection procedure is carried out to ascertain the best network architecture in terms of predictive accuracy. The simulation results show that the ANN based methodology is a viable alternative which can be used to predict the performance of a fluidized bed gasifier. Copyright © 2016 Elsevier Ltd. All rights reserved.
Using Weather Data and Climate Model Output in Economic Analyses of Climate Change
DOE Office of Scientific and Technical Information (OSTI.GOV)
Auffhammer, M.; Hsiang, S. M.; Schlenker, W.
2013-06-28
Economists are increasingly using weather data and climate model output in analyses of the economic impacts of climate change. This article introduces a set of weather data sets and climate models that are frequently used, discusses the most common mistakes economists make in using these products, and identifies ways to avoid these pitfalls. We first provide an introduction to weather data, including a summary of the types of datasets available, and then discuss five common pitfalls that empirical researchers should be aware of when using historical weather data as explanatory variables in econometric applications. We then provide a brief overviewmore » of climate models and discuss two common and significant errors often made by economists when climate model output is used to simulate the future impacts of climate change on an economic outcome of interest.« less
Prediction of AL and Dst Indices from ACE Measurements Using Hybrid Physics/Black-Box Techniques
NASA Astrophysics Data System (ADS)
Spencer, E.; Rao, A.; Horton, W.; Mays, L.
2008-12-01
ACE measurements of the solar wind velocity, IMF and proton density is used to drive a hybrid Physics/Black- Box model of the nightside magnetosphere. The core physics is contained in a low order nonlinear dynamical model of the nightside magnetosphere called WINDMI. The model is augmented by wavelet based nonlinear mappings between the solar wind quantities and the input into the physics model, followed by further wavelet based mappings of the model output field aligned currents onto the ground based magnetometer measurements of the AL index and Dst index. The black box mappings are introduced at the input stage to account for uncertainties in the way the solar wind quantities are transported from the ACE spacecraft at L1 to the magnetopause. Similar mappings are introduced at the output stage to account for a spatially and temporally varying westward auroral electrojet geometry. The parameters of the model are tuned using a genetic algorithm, and trained using the large geomagnetic storm dataset of October 3-7 2000. It's predictive performance is then evaluated on subsequent storm datasets, in particular the April 15-24 2002 storm. This work is supported by grant NSF 7020201
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saaban, Azizan; Zainudin, Lutfi; Bakar, Mohd Nazari Abu
This paper intends to reveal the ability of the linear interpolation method to predict missing values in solar radiation time series. Reliable dataset is equally tends to complete time series observed dataset. The absence or presence of radiation data alters long-term variation of solar radiation measurement values. Based on that change, the opportunities to provide bias output result for modelling and the validation process is higher. The completeness of the observed variable dataset has significantly important for data analysis. Occurrence the lack of continual and unreliable time series solar radiation data widely spread and become the main problematic issue. However,more » the limited number of research quantity that has carried out to emphasize and gives full attention to estimate missing values in the solar radiation dataset.« less
Paiton, Dylan M.; Kenyon, Garrett T.; Brumby, Steven P.; Schultz, Peter F.; George, John S.
2015-07-28
An approach to detecting objects in an image dataset may combine texture/color detection, shape/contour detection, and/or motion detection using sparse, generative, hierarchical models with lateral and top-down connections. A first independent representation of objects in an image dataset may be produced using a color/texture detection algorithm. A second independent representation of objects in the image dataset may be produced using a shape/contour detection algorithm. A third independent representation of objects in the image dataset may be produced using a motion detection algorithm. The first, second, and third independent representations may then be combined into a single coherent output using a combinatorial algorithm.
Subsampling for dataset optimisation
NASA Astrophysics Data System (ADS)
Ließ, Mareike
2017-04-01
Soil-landscapes have formed by the interaction of soil-forming factors and pedogenic processes. In modelling these landscapes in their pedodiversity and the underlying processes, a representative unbiased dataset is required. This concerns model input as well as output data. However, very often big datasets are available which are highly heterogeneous and were gathered for various purposes, but not to model a particular process or data space. As a first step, the overall data space and/or landscape section to be modelled needs to be identified including considerations regarding scale and resolution. Then the available dataset needs to be optimised via subsampling to well represent this n-dimensional data space. A couple of well-known sampling designs may be adapted to suit this purpose. The overall approach follows three main strategies: (1) the data space may be condensed and de-correlated by a factor analysis to facilitate the subsampling process. (2) Different methods of pattern recognition serve to structure the n-dimensional data space to be modelled into units which then form the basis for the optimisation of an existing dataset through a sensible selection of samples. Along the way, data units for which there is currently insufficient soil data available may be identified. And (3) random samples from the n-dimensional data space may be replaced by similar samples from the available dataset. While being a presupposition to develop data-driven statistical models, this approach may also help to develop universal process models and identify limitations in existing models.
Spear, Timothy T; Nishimura, Michael I; Simms, Patricia E
2017-08-01
Advancement in flow cytometry reagents and instrumentation has allowed for simultaneous analysis of large numbers of lineage/functional immune cell markers. Highly complex datasets generated by polychromatic flow cytometry require proper analytical software to answer investigators' questions. A problem among many investigators and flow cytometry Shared Resource Laboratories (SRLs), including our own, is a lack of access to a flow cytometry-knowledgeable bioinformatics team, making it difficult to learn and choose appropriate analysis tool(s). Here, we comparatively assess various multidimensional flow cytometry software packages for their ability to answer a specific biologic question and provide graphical representation output suitable for publication, as well as their ease of use and cost. We assessed polyfunctional potential of TCR-transduced T cells, serving as a model evaluation, using multidimensional flow cytometry to analyze 6 intracellular cytokines and degranulation on a per-cell basis. Analysis of 7 parameters resulted in 128 possible combinations of positivity/negativity, far too complex for basic flow cytometry software to analyze fully. Various software packages were used, analysis methods used in each described, and representative output displayed. Of the tools investigated, automated classification of cellular expression by nonlinear stochastic embedding (ACCENSE) and coupled analysis in Pestle/simplified presentation of incredibly complex evaluations (SPICE) provided the most user-friendly manipulations and readable output, evaluating effects of altered antigen-specific stimulation on T cell polyfunctionality. This detailed approach may serve as a model for other investigators/SRLs in selecting the most appropriate software to analyze complex flow cytometry datasets. Further development and awareness of available tools will help guide proper data analysis to answer difficult biologic questions arising from incredibly complex datasets. © Society for Leukocyte Biology.
Sorting protein decoys by machine-learning-to-rank
Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen
2016-01-01
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset. PMID:27530967
Sorting protein decoys by machine-learning-to-rank.
Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen
2016-08-17
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.
Music Signal Processing Using Vector Product Neural Networks
NASA Astrophysics Data System (ADS)
Fan, Z. C.; Chan, T. S.; Yang, Y. H.; Jang, J. S. R.
2017-05-01
We propose a novel neural network model for music signal processing using vector product neurons and dimensionality transformations. Here, the inputs are first mapped from real values into three-dimensional vectors then fed into a three-dimensional vector product neural network where the inputs, outputs, and weights are all three-dimensional values. Next, the final outputs are mapped back to the reals. Two methods for dimensionality transformation are proposed, one via context windows and the other via spectral coloring. Experimental results on the iKala dataset for blind singing voice separation confirm the efficacy of our model.
NASA Astrophysics Data System (ADS)
Zhu, X.; Wen, X.; Zheng, Z.
2017-12-01
For better prediction and understanding of land-atmospheric interaction, in-situ observed meteorological data acquired from the China Meteorological Administration (CMA) were assimilated in the Weather Research and Forecasting (WRF) model and the monthly Green Vegetation Coverage (GVF) data, which was calculated using the Normalized Difference Vegetation Index (NDVI) of the Earth Observing System Moderate-Resolution Imaging Spectroradiometer (EOS-MODIS) and Digital Elevation Model (DEM) data of the Shuttle Radar Topography Mission (SRTM) system. Furthermore, the WRF model produced a High-Resolution Assimilation Dataset of the water-energy cycle in China (HRADC). This dataset has a horizontal resolution of 25 km for near surface meteorological data, such as air temperature, humidity, wind vectors and pressure (19 levels); soil temperature and moisture (four levels); surface temperature; downward/upward short/long radiation; 3-h latent heat flux; sensible heat flux; and ground heat flux. In this study, we 1) briefly introduce the cycling 3D-Var assimilation method and 2) compare results of meteorological elements, such as 2 m temperature and precipitation generated by the HRADC with the gridded observation data from CMA, and surface temperature and specific humidity with Global LandData Assimilation System (GLDAS) output data from the National Aeronautics and Space Administration (NASA). We found that the satellite-derived GVF from MODIS increased over southeast China compared with the default model over the whole year. The simulated results of soil temperature, net radiation and surface energy flux from the HRADC are improved compared with the control simulation and are close to GLDAS outputs. The values of net radiation from HRADC are higher than the GLDAS outputs, and the differences in the simulations are large in the east region but are smaller in northwest China and on the Qinghai-Tibet Plateau. The spatial distribution of the sensible heat flux and the ground heat flux from HRADC is consistent with the GLDAS outputs in summer. In general, the simulated results from HRADC are an improvement on the control simulation and can present the characteristics of the spatial and temporal variation of the water-energy cycle in China.
Enhancing e-waste estimates: Improving data quality by multivariate Input–Output Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Feng, E-mail: fwang@unu.edu; Design for Sustainability Lab, Faculty of Industrial Design Engineering, Delft University of Technology, Landbergstraat 15, 2628CE Delft; Huisman, Jaco
2013-11-15
Highlights: • A multivariate Input–Output Analysis method for e-waste estimates is proposed. • Applying multivariate analysis to consolidate data can enhance e-waste estimates. • We examine the influence of model selection and data quality on e-waste estimates. • Datasets of all e-waste related variables in a Dutch case study have been provided. • Accurate modeling of time-variant lifespan distributions is critical for estimate. - Abstract: Waste electrical and electronic equipment (or e-waste) is one of the fastest growing waste streams, which encompasses a wide and increasing spectrum of products. Accurate estimation of e-waste generation is difficult, mainly due to lackmore » of high quality data referred to market and socio-economic dynamics. This paper addresses how to enhance e-waste estimates by providing techniques to increase data quality. An advanced, flexible and multivariate Input–Output Analysis (IOA) method is proposed. It links all three pillars in IOA (product sales, stock and lifespan profiles) to construct mathematical relationships between various data points. By applying this method, the data consolidation steps can generate more accurate time-series datasets from available data pool. This can consequently increase the reliability of e-waste estimates compared to the approach without data processing. A case study in the Netherlands is used to apply the advanced IOA model. As a result, for the first time ever, complete datasets of all three variables for estimating all types of e-waste have been obtained. The result of this study also demonstrates significant disparity between various estimation models, arising from the use of data under different conditions. It shows the importance of applying multivariate approach and multiple sources to improve data quality for modelling, specifically using appropriate time-varying lifespan parameters. Following the case study, a roadmap with a procedural guideline is provided to enhance e-waste estimation studies.« less
CTAG model inputs and outputsThis dataset is associated with the following publication:Tong, Z., R. Baldauf , V. Isakov , P.J. Deshmukh, and M. Zhang. Roadside vegetation barrier designs to mitigate near-road air pollution impacts. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 541: 920-927, (2016).
Hamdan, Sadeque; Cheaitou, Ali
2017-08-01
This data article provides detailed optimization input and output datasets and optimization code for the published research work titled "Dynamic green supplier selection and order allocation with quantity discounts and varying supplier availability" (Hamdan and Cheaitou, 2017, In press) [1]. Researchers may use these datasets as a baseline for future comparison and extensive analysis of the green supplier selection and order allocation problem with all-unit quantity discount and varying number of suppliers. More particularly, the datasets presented in this article allow researchers to generate the exact optimization outputs obtained by the authors of Hamdan and Cheaitou (2017, In press) [1] using the provided optimization code and then to use them for comparison with the outputs of other techniques or methodologies such as heuristic approaches. Moreover, this article includes the randomly generated optimization input data and the related outputs that are used as input data for the statistical analysis presented in Hamdan and Cheaitou (2017 In press) [1] in which two different approaches for ranking potential suppliers are compared. This article also provides the time analysis data used in (Hamdan and Cheaitou (2017, In press) [1] to study the effect of the problem size on the computation time as well as an additional time analysis dataset. The input data for the time study are generated randomly, in which the problem size is changed, and then are used by the optimization problem to obtain the corresponding optimal outputs as well as the corresponding computation time.
Collaboration tools and techniques for large model datasets
Signell, R.P.; Carniel, S.; Chiggiato, J.; Janekovic, I.; Pullen, J.; Sherwood, C.R.
2008-01-01
In MREA and many other marine applications, it is common to have multiple models running with different grids, run by different institutions. Techniques and tools are described for low-bandwidth delivery of data from large multidimensional datasets, such as those from meteorological and oceanographic models, directly into generic analysis and visualization tools. Output is stored using the NetCDF CF Metadata Conventions, and then delivered to collaborators over the web via OPeNDAP. OPeNDAP datasets served by different institutions are then organized via THREDDS catalogs. Tools and procedures are then used which enable scientists to explore data on the original model grids using tools they are familiar with. It is also low-bandwidth, enabling users to extract just the data they require, an important feature for access from ship or remote areas. The entire implementation is simple enough to be handled by modelers working with their webmasters - no advanced programming support is necessary. ?? 2007 Elsevier B.V. All rights reserved.
Impact of Land Cover Characterization and Properties on Snow Albedo in Climate Models
NASA Astrophysics Data System (ADS)
Wang, L.; Bartlett, P. A.; Chan, E.; Montesano, P.
2017-12-01
The simulation of winter albedo in boreal and northern environments has been a particular challenge for land surface modellers. Assessments of output from CMIP3 and CMIP5 climate models have revealed that many simulations are characterized by overestimation of albedo in the boreal forest. Recent studies suggest that inaccurate representation of vegetation distribution, improper simulation of leaf area index, and poor treatment of canopy-snow processes are the primary causes of albedo errors. While several land cover datasets are commonly used to derive plant functional types (PFT) for use in climate models, new land cover and vegetation datasets with higher spatial resolution have become available in recent years. In this study, we compare the spatial distribution of the dominant PFTs and canopy cover fractions based on different land cover datasets, and present results from offline simulations of the latest version Canadian Land Surface Scheme (CLASS) over the northern Hemisphere land. We discuss the impact of land cover representation and surface properties on winter albedo simulations in climate models.
Preliminary results and assessment of the MAR outputs over High Mountain Asia
NASA Astrophysics Data System (ADS)
Linares, M.; Tedesco, M.; Margulis, S. A.; Cortés, G.; Fettweis, X.
2017-12-01
Lack of ground measurements has made the use of regional climate models (RCMs) over the High Mountain Asia (HMA) pivotal for understanding the impact of climate change on the hydrological cycle and on the cryosphere. Here, we show an analysis of the assessment of the outputs of Modèle Atmosphérique Régionale (MAR) model RCM over the HMA region as part of the NASA-funded project `Understanding and forecasting changes in High Mountain Asia snow hydrology via a novel Bayesian reanalysis and modeling approach'. The first step was to evaluate the impact of the different forcings on MAR outputs. To this aim, we performed simulations for the 2007 - 2008 and 2014 - 2015 years forcing MAR at its boundaries either with reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF) or from the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). The comparison between the outputs obtained with the two forcings indicates that the impact on MAR simulations depends on specific parameters. For example, in case of surface pressure the maximum percentage error is 0.09 % while the 2-m air temperature has a maximum percentage error of 103.7%. Next, we compared the MAR outputs with reanalysis data fields over the region of interest. In particular, we evaluated the following parameters: surface pressure, snow depth, total cloud cover, two meter temperature, horizontal wind speed, vertical wind speed, wind speed, surface new solar radiation, skin temperature, surface sensible heat flux, and surface latent heat flux. Lastly, we report results concerning the assessment of MAR surface albedo and surface temperature over the region through MODIS remote sensing products. Next steps are to determine whether RCMs and reanalysis datasets are effective at capturing snow and snowmelt runoff processes in the HMA region through a comparison with in situ datasets. This will help determine what refinements are necessary to improve RCM outputs.
Advances in a distributed approach for ocean model data interoperability
Signell, Richard P.; Snowden, Derrick P.
2014-01-01
An infrastructure for earth science data is emerging across the globe based on common data models and web services. As we evolve from custom file formats and web sites to standards-based web services and tools, data is becoming easier to distribute, find and retrieve, leaving more time for science. We describe recent advances that make it easier for ocean model providers to share their data, and for users to search, access, analyze and visualize ocean data using MATLAB® and Python®. These include a technique for modelers to create aggregated, Climate and Forecast (CF) metadata convention datasets from collections of non-standard Network Common Data Form (NetCDF) output files, the capability to remotely access data from CF-1.6-compliant NetCDF files using the Open Geospatial Consortium (OGC) Sensor Observation Service (SOS), a metadata standard for unstructured grid model output (UGRID), and tools that utilize both CF and UGRID standards to allow interoperable data search, browse and access. We use examples from the U.S. Integrated Ocean Observing System (IOOS®) Coastal and Ocean Modeling Testbed, a project in which modelers using both structured and unstructured grid model output needed to share their results, to compare their results with other models, and to compare models with observed data. The same techniques used here for ocean modeling output can be applied to atmospheric and climate model output, remote sensing data, digital terrain and bathymetric data.
This EnviroAtlas dataset includes annual nitrogen and sulfur deposition within each 12-digit HUC subwatershed for the year 2002. Values are provided for total oxidized nitrogen (HNO3, NO, NO2, N2O5, NH3, HONO, PAN, organic nitrogen, and particulate NO3), oxidized nitrogen wet deposition, oxidized nitrogen dry deposition, total reduced nitrogen (NH3 and particulate NH4), reduced nitrogen dry deposition, reduced nitrogen wet deposition, total dry nitrogen deposition, total wet nitrogen deposition, total nitrogen deposition (wet+dry), total sulfur (SO2 + particulate SO4) dry deposition, total sulfur wet deposition, and total sulfur deposition. The dataset is based on output from the Community Multiscale Air Quality modeling system (CMAQ) v5.0.2 run using the bidirectional flux option for the 12-km grid size for the US, Canada, and Mexico. The CMAQ output has been post-processed to adjust the wet deposition for errors in the location and amount of precipitation and for regional biases in the TNO3 (HNO3 + NO3), NHx (NH4 + NH3), and sulfate wet deposition. Model predicted values of dry deposition were not adjusted. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadab
This EnviroAtlas dataset includes annual nitrogen and sulfur deposition within each 12-digit HUC subwatershed for the year 2011. Values are provided for total oxidized nitrogen (HNO3, NO, NO2, N2O5, NH3, HONO, PAN, organic nitrogen, and particulate NO3), oxidized nitrogen wet deposition, oxidized nitrogen dry deposition, total reduced nitrogen (NH3 and particulate NH4), reduced nitrogen dry deposition, reduced nitrogen wet deposition, total dry nitrogen deposition, total wet nitrogen deposition, total nitrogen deposition (wet+dry), total sulfur (SO2 + particulate SO4) dry deposition, total sulfur wet deposition, and total sulfur deposition. The dataset is based on output from the Community Multiscale Air Quality modeling system (CMAQ) run using the bidirectional flux option for the 12-km grid size for the US, Canada, and Mexico. The CMAQ output has been post-processed to adjust the wet deposition for errors in the location and amount of precipitation and for regional biases in the TNO3 (HNO3 + NO3), NHx (NH4 + NH3), and sulfate wet deposition. Model predicted values of dry deposition were not adjusted. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data
This EnviroAtlas dataset includes annual nitrogen and sulfur deposition within each 12-digit HUC subwatershed for the year 2006. Values are provided for total oxidized nitrogen (HNO3, NO, NO2, N2O5, NH3, HONO, PAN, organic nitrogen, and particulate NO3), oxidized nitrogen wet deposition, oxidized nitrogen dry deposition, total reduced nitrogen (NH3 and particulate NH4), reduced nitrogen dry deposition, reduced nitrogen wet deposition, total dry nitrogen deposition, total wet nitrogen deposition, total nitrogen deposition (wet+dry), total sulfur (SO2 + particulate SO4) dry deposition, total sulfur wet deposition, and total sulfur deposition. The dataset is based on output from the Community Multiscale Air Quality modeling system (CMAQ) run using the bidirectional flux option for the 12-km grid size for the US, Canada, and Mexico. The CMAQ output has been post-processed to adjust the wet deposition for errors in the location and amount of precipitation and for regional biases in the TNO3 (HNO3 + NO3), NHx (NH4 + NH3), and sulfate wet deposition. Model predicted values of dry deposition were not adjusted. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable dat
Climate Model Diagnostic Analyzer Web Service System
NASA Astrophysics Data System (ADS)
Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Jiang, J. H.
2014-12-01
We have developed a cloud-enabled web-service system that empowers physics-based, multi-variable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. We have developed a methodology to transform an existing science application code into a web service using a Python wrapper interface and Python web service frameworks. The web-service system, called Climate Model Diagnostic Analyzer (CMDA), currently supports (1) all the observational datasets from Obs4MIPs and a few ocean datasets from NOAA and Argo, which can serve as observation-based reference data for model evaluation, (2) many of CMIP5 model outputs covering a broad range of atmosphere, ocean, and land variables from the CMIP5 specific historical runs and AMIP runs, and (3) ECMWF reanalysis outputs for several environmental variables in order to supplement observational datasets. Analysis capabilities currently supported by CMDA are (1) the calculation of annual and seasonal means of physical variables, (2) the calculation of time evolution of the means in any specified geographical region, (3) the calculation of correlation between two variables, (4) the calculation of difference between two variables, and (5) the conditional sampling of one physical variable with respect to another variable. A web user interface is chosen for CMDA because it not only lowers the learning curve and removes the adoption barrier of the tool but also enables instantaneous use, avoiding the hassle of local software installation and environment incompatibility. CMDA will be used as an educational tool for the summer school organized by JPL's Center for Climate Science in 2014. In order to support 30+ simultaneous users during the school, we have deployed CMDA to the Amazon cloud environment. The cloud-enabled CMDA will provide each student with a virtual machine while the user interaction with the system will remain the same through web-browser interfaces. The summer school will serve as a valuable testbed for the tool development, preparing CMDA to serve its target community: Earth-science modeling and model-analysis community.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paiton, Dylan M.; Kenyon, Garrett T.; Brumby, Steven P.
An approach to detecting objects in an image dataset may combine texture/color detection, shape/contour detection, and/or motion detection using sparse, generative, hierarchical models with lateral and top-down connections. A first independent representation of objects in an image dataset may be produced using a color/texture detection algorithm. A second independent representation of objects in the image dataset may be produced using a shape/contour detection algorithm. A third independent representation of objects in the image dataset may be produced using a motion detection algorithm. The first, second, and third independent representations may then be combined into a single coherent output using amore » combinatorial algorithm.« less
Quantification of downscaled precipitation uncertainties via Bayesian inference
NASA Astrophysics Data System (ADS)
Nury, A. H.; Sharma, A.; Marshall, L. A.
2017-12-01
Prediction of precipitation from global climate model (GCM) outputs remains critical to decision-making in water-stressed regions. In this regard, downscaling of GCM output has been a useful tool for analysing future hydro-climatological states. Several downscaling approaches have been developed for precipitation downscaling, including those using dynamical or statistical downscaling methods. Frequently, outputs from dynamical downscaling are not readily transferable across regions for significant methodical and computational difficulties. Statistical downscaling approaches provide a flexible and efficient alternative, providing hydro-climatological outputs across multiple temporal and spatial scales in many locations. However these approaches are subject to significant uncertainty, arising due to uncertainty in the downscaled model parameters and in the use of different reanalysis products for inferring appropriate model parameters. Consequently, these will affect the performance of simulation in catchment scale. This study develops a Bayesian framework for modelling downscaled daily precipitation from GCM outputs. This study aims to introduce uncertainties in downscaling evaluating reanalysis datasets against observational rainfall data over Australia. In this research a consistent technique for quantifying downscaling uncertainties by means of Bayesian downscaling frame work has been proposed. The results suggest that there are differences in downscaled precipitation occurrences and extremes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Draxl, C.; Hodge, B. M.; Orwig, K.
2013-10-01
Regional wind integration studies in the United States require detailed wind power output data at many locations to perform simulations of how the power system will operate under high-penetration scenarios. The wind data sets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as be time synchronized with available load profiles. The Wind Integration National Dataset (WIND) Toolkit described in this paper fulfills these requirements. A wind resource dataset, wind power production time series, and simulated forecasts from a numerical weather predictionmore » model run on a nationwide 2-km grid at 5-min resolution will be made publicly available for more than 110,000 onshore and offshore wind power production sites.« less
Establishment and analysis of High-Resolution Assimilation Dataset of water-energy cycle over China
NASA Astrophysics Data System (ADS)
Wen, Xiaohang; Liao, Xiaohan; Dong, Wenjie; Yuan, Wenping
2015-04-01
For better prediction and understanding of water-energy exchange process and land-atmospheric interaction, the in-situ observed meteorological data which were acquired from China Meteorological Administration (CMA) were assimilated in the Weather Research and Forecasting (WRF) model and the monthly Green Vegetation Coverage (GVF) data, which was calculated by the Normalized Difference Vegetation Index (NDVI) of Earth Observing System Moderate-Resolution Imaging Spectroradiometer (EOS-MODIS), Digital Elevation Model (DEM) data of the Shuttle Radar Topography Mission (SRTM) system were also integrated in the WRF model over China. Further, the High-Resolution Assimilation Dataset of water-energy cycle over China (HRADC) was produced by WRF model. This dataset include 25 km horizontal resolution near surface meteorological data such as air temperature, humidity, ground temperature, and pressure at 19 levels, soil temperature and soil moisture at 4 levels, green vegetation coverage, latent heat flux, sensible heat flux, and ground heat flux for 3 hours. In this study, we 1) briefly introduce the cycling 3D-Var assimilation method; 2) Compare results of meteorological elements such as 2 m temperature, precipitation and ground temperature generated by the HRADC with the gridded observation data from CMA, and Global Land Data Assimilation System (GLDAS) output data from National Aeronautics and Space Administration (NASA). It is found that the results of 2 m temperature were improved compared with the control simulation and has effectively reproduced the observed patterns, and the simulated results of ground temperature, 0-10 cm soil temperature and specific humidity were as much closer to GLDAS outputs. Root mean square errors are reduced in assimilation run than control run, and the assimilation run of ground temperature, 0-10 cm soil temperature, radiation and surface fluxes were agreed well with the GLDAS outputs over China. The HRADC could be used in further research on the long period climatic effects and characteristics of water-energy cycle over China.
Expected results and outputs include: extensive dataset of in-field and laboratory emissions data for traditional and improved cookstoves; parameterization to predict cookstove emissions from drive cycle data; indoor and personal exposure data for traditional and improved cook...
Unleashing spatially distributed ecohydrology modeling using Big Data tools
NASA Astrophysics Data System (ADS)
Miles, B.; Idaszak, R.
2015-12-01
Physically based spatially distributed ecohydrology models are useful for answering science and management questions related to the hydrology and biogeochemistry of prairie, savanna, forested, as well as urbanized ecosystems. However, these models can produce hundreds of gigabytes of spatial output for a single model run over decadal time scales when run at regional spatial scales and moderate spatial resolutions (~100-km2+ at 30-m spatial resolution) or when run for small watersheds at high spatial resolutions (~1-km2 at 3-m spatial resolution). Numerical data formats such as HDF5 can store arbitrarily large datasets. However even in HPC environments, there are practical limits on the size of single files that can be stored and reliably backed up. Even when such large datasets can be stored, querying and analyzing these data can suffer from poor performance due to memory limitations and I/O bottlenecks, for example on single workstations where memory and bandwidth are limited, or in HPC environments where data are stored separately from computational nodes. The difficulty of storing and analyzing spatial data from ecohydrology models limits our ability to harness these powerful tools. Big Data tools such as distributed databases have the potential to surmount the data storage and analysis challenges inherent to large spatial datasets. Distributed databases solve these problems by storing data close to computational nodes while enabling horizontal scalability and fault tolerance. Here we present the architecture of and preliminary results from PatchDB, a distributed datastore for managing spatial output from the Regional Hydro-Ecological Simulation System (RHESSys). The initial version of PatchDB uses message queueing to asynchronously write RHESSys model output to an Apache Cassandra cluster. Once stored in the cluster, these data can be efficiently queried to quickly produce both spatial visualizations for a particular variable (e.g. maps and animations), as well as point time series of arbitrary variables at arbitrary points in space within a watershed or river basin. By treating ecohydrology modeling as a Big Data problem, we hope to provide a platform for answering transformative science and management questions related to water quantity and quality in a world of non-stationary climate.
Climate Model Diagnostic Analyzer
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Pan, Lei; Zhai, Chengxing; Tang, Benyang; Kubar, Terry; Zhang, Zia; Wang, Wei
2015-01-01
The comprehensive and innovative evaluation of climate models with newly available global observations is critically needed for the improvement of climate model current-state representation and future-state predictability. A climate model diagnostic evaluation process requires physics-based multi-variable analyses that typically involve large-volume and heterogeneous datasets, making them both computation- and data-intensive. With an exploratory nature of climate data analyses and an explosive growth of datasets and service tools, scientists are struggling to keep track of their datasets, tools, and execution/study history, let alone sharing them with others. In response, we have developed a cloud-enabled, provenance-supported, web-service system called Climate Model Diagnostic Analyzer (CMDA). CMDA enables the physics-based, multivariable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. At the same time, CMDA provides a crowd-sourcing space where scientists can organize their work efficiently and share their work with others. CMDA is empowered by many current state-of-the-art software packages in web service, provenance, and semantic search.
Using weighted power mean for equivalent square estimation.
Zhou, Sumin; Wu, Qiuwen; Li, Xiaobo; Ma, Rongtao; Zheng, Dandan; Wang, Shuo; Zhang, Mutian; Li, Sicong; Lei, Yu; Fan, Qiyong; Hyun, Megan; Diener, Tyler; Enke, Charles
2017-11-01
Equivalent Square (ES) enables the calculation of many radiation quantities for rectangular treatment fields, based only on measurements from square fields. While it is widely applied in radiotherapy, its accuracy, especially for extremely elongated fields, still leaves room for improvement. In this study, we introduce a novel explicit ES formula based on Weighted Power Mean (WPM) function and compare its performance with the Sterling formula and Vadash/Bjärngard's formula. The proposed WPM formula is ESWPMa,b=waα+1-wbα1/α for a rectangular photon field with sides a and b. The formula performance was evaluated by three methods: standard deviation of model fitting residual error, maximum relative model prediction error, and model's Akaike Information Criterion (AIC). Testing datasets included the ES table from British Journal of Radiology (BJR), photon output factors (S cp ) from the Varian TrueBeam Representative Beam Data (Med Phys. 2012;39:6981-7018), and published S cp data for Varian TrueBeam Edge (J Appl Clin Med Phys. 2015;16:125-148). For the BJR dataset, the best-fit parameter value α = -1.25 achieved a 20% reduction in standard deviation in ES estimation residual error compared with the two established formulae. For the two Varian datasets, employing WPM reduced the maximum relative error from 3.5% (Sterling) or 2% (Vadash/Bjärngard) to 0.7% for open field sizes ranging from 3 cm to 40 cm, and the reduction was even more prominent for 1 cm field sizes on Edge (J Appl Clin Med Phys. 2015;16:125-148). The AIC value of the WPM formula was consistently lower than its counterparts from the traditional formulae on photon output factors, most prominent on very elongated small fields. The WPM formula outperformed the traditional formulae on three testing datasets. With increasing utilization of very elongated, small rectangular fields in modern radiotherapy, improved photon output factor estimation is expected by adopting the WPM formula in treatment planning and secondary MU check. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
NASA Astrophysics Data System (ADS)
Simon, E.; Nowicki, S.; Neumann, T.; Tyahla, L.; Saba, J. L.; Guerber, J. R.; Bonin, J. A.; DiMarzio, J. P.
2017-12-01
The Cryosphere model Comparison tool (CmCt) is a web based ice sheet model validation tool that is being developed by NASA to facilitate direct comparison between observational data and various ice sheet models. The CmCt allows the user to take advantage of several decades worth of observations from Greenland and Antarctica. Currently, the CmCt can be used to compare ice sheet models provided by the user with remotely sensed satellite data from ICESat (Ice, Cloud, and land Elevation Satellite) laser altimetry, GRACE (Gravity Recovery and Climate Experiment) satellite, and radar altimetry (ERS-1, ERS-2, and Envisat). One or more models can be uploaded through the CmCt website and compared with observational data, or compared to each other or other models. The CmCt calculates statistics on the differences between the model and observations, and other quantitative and qualitative metrics, which can be used to evaluate the different model simulations against the observations. The qualitative metrics consist of a range of visual outputs and the quantitative metrics consist of several whole-ice-sheet scalar values that can be used to assign an overall score to a particular simulation. The comparison results from CmCt are useful in quantifying improvements within a specific model (or within a class of models) as a result of differences in model dynamics (e.g., shallow vs. higher-order dynamics approximations), model physics (e.g., representations of ice sheet rheological or basal processes), or model resolution (mesh resolution and/or changes in the spatial resolution of input datasets). The framework and metrics could also be used for use as a model-to-model intercomparison tool, simply by swapping outputs from another model as the observational datasets. Future versions of the tool will include comparisons with other datasets that are of interest to the modeling community, such as ice velocity, ice thickness, and surface mass balance.
Tang, G.; Andre, B.; Hoffman, F. M.; Painter, S. L.; Thornton, P. E.; Yuan, F.; Bisht, G.; Hammond, G. E.; Lichtner, P. C.; Kumar, J.; Mills, R. T.; Xu, X.
2016-04-19
This Modeling Archive is in support of an NGEE Arctic discussion paper under review and available at doi:10.5194/gmd-9-927-2016. The purpose is to document the simulations to allow verification, reproducibility, and follow-up studies. This dataset contains shell scripts to create the CLM-PFLOTRAN cases, specific input files for PFLOTRAN and CLM, outputs, and python scripts to make the figures using the outputs in the publication. Through these results, we demonstrate that CLM-PFLOTRAN can approximately reproduce CLM results in selected cases for the Arctic, temperate and tropic sites. In addition, the new framework facilitates mechanistic representations of soil biogeochemistry processes in the land surface model.
Variability of Upper-Tropospheric Precipitable from Satellite and Model Reanalysis Datasets
NASA Technical Reports Server (NTRS)
Jedlovec, Gary J.; Iwai, Hisaki
1999-01-01
Numerous datasets have been used to quantify water vapor and its variability in the upper-troposphere from satellite and model reanalysis data. These investigations have shown some usefulness in monitoring seasonal and inter-annual variations in moisture either globally, with polar orbiting satellite data or global model output analysis, or regionally, with the higher spatial and temporal resolution geostationary measurements. The datasets are not without limitations, however, due to coverage or limited temporal sampling, and may also contain bias in their representation of moisture processes. The research presented in this conference paper inter-compares the NVAP, NCEP/NCAR and DAO reanalysis models, and GOES satellite measurements of upper-tropospheric,precipitable water for the period from 1988-1994. This period captures several dramatic swings in climate events associated with ENSO events. The data are evaluated for temporal and spatial continuity, inter-compared to assess reliability and potential bias, and analyzed in light of expected trends due to changes in precipitation and synoptic-scale weather features. This work is the follow-on to previous research which evaluated total precipitable water over the same period. The relationship between total and upper-level precipitable water in the datasets will be discussed as well.
A software tool for determination of breast cancer treatment methods using data mining approach.
Cakır, Abdülkadir; Demirel, Burçin
2011-12-01
In this work, breast cancer treatment methods are determined using data mining. For this purpose, software is developed to help to oncology doctor for the suggestion of application of the treatment methods about breast cancer patients. 462 breast cancer patient data, obtained from Ankara Oncology Hospital, are used to determine treatment methods for new patients. This dataset is processed with Weka data mining tool. Classification algorithms are applied one by one for this dataset and results are compared to find proper treatment method. Developed software program called as "Treatment Assistant" uses different algorithms (IB1, Multilayer Perception and Decision Table) to find out which one is giving better result for each attribute to predict and by using Java Net beans interface. Treatment methods are determined for the post surgical operation of breast cancer patients using this developed software tool. At modeling step of data mining process, different Weka algorithms are used for output attributes. For hormonotherapy output IB1, for tamoxifen and radiotherapy outputs Multilayer Perceptron and for the chemotherapy output decision table algorithm shows best accuracy performance compare to each other. In conclusion, this work shows that data mining approach can be a useful tool for medical applications particularly at the treatment decision step. Data mining helps to the doctor to decide in a short time.
Synchrony between reanalysis-driven RCM simulations and observations: variation with time scale
NASA Astrophysics Data System (ADS)
de Elía, Ramón; Laprise, René; Biner, Sébastien; Merleau, James
2017-04-01
Unlike coupled global climate models (CGCMs) that run in a stand-alone mode, nested regional climate models (RCMs) are driven by either a CGCM or a reanalysis dataset. This feature makes high correlations between the RCM simulation and its driver possible. When the driving dataset is a reanalysis, time correlations between RCM output and observations are also common and to be expected. In certain situations time correlation between driver and driven RCM is of particular interest and techniques have been developed to increase it (e.g. large-scale spectral nudging). For such cases, a question that remains open is whether aggregating in time increases the correlation between RCM output and observations. That is, although the RCM may be unable to reproduce a given daily event, whether it will still be able to satisfactorily simulate an anomaly on a monthly or annual basis. This is a preconception that the authors of this work and others in the community have held, perhaps as a natural extension of the properties of upscaling or aggregating other statistics such as the mean squared error. Here we explore analytically four particular cases that help us partially answer this question. In addition, we use observations datasets and RCM-simulated data to illustrate our findings. Results indicate that time upscaling does not necessarily increase time correlations, and that those interested in achieving high monthly or annual time correlations between RCM output and observations may have to do so by increasing correlation as much as possible at the shortest time scale. This may indicate that even when only concerned with time correlations at large temporal scale, large-scale spectral nudging acting at the time-step level may have to be used.
The virtual enhancements - solar proton event radiation (VESPER) model
NASA Astrophysics Data System (ADS)
Aminalragia-Giamini, Sigiava; Sandberg, Ingmar; Papadimitriou, Constantinos; Daglis, Ioannis A.; Jiggens, Piers
2018-02-01
A new probabilistic model introducing a novel paradigm for the modelling of the solar proton environment at 1 AU is presented. The virtual enhancements - solar proton event radiation model (VESPER) uses the European space agency's solar energetic particle environment modelling (SEPEM) Reference Dataset and produces virtual time-series of proton differential fluxes. In this regard it fundamentally diverges from the approach of existing SPE models that are based on probabilistic descriptions of SPE macroscopic characteristics such as peak flux and cumulative fluence. It is shown that VESPER reproduces well the dataset characteristics it uses, and further comparisons with existing models are made with respect to their results. The production of time-series as the main output of the model opens a straightforward way for the calculation of solar proton radiation effects in terms of time-series and the pairing with effects caused by trapped radiation and galactic cosmic rays.
The Mpi-M Aerosol Climatology (MAC)
NASA Astrophysics Data System (ADS)
Kinne, S.
2014-12-01
Monthly gridded global data-sets for aerosol optical properties (AOD, SSA and g) and for aerosol microphysical properties (CCN and IN) offer a (less complex) alternate path to include aerosol radiative effects and aerosol impacts on cloud-microphysics in global simulations. Based on merging AERONET sun-/sky-photometer data onto background maps provided by AeroCom phase 1 modeling output and AERONET sun-/the MPI-M Aerosol Climatology (MAC) version 1 was developed and applied in IPCC simulations with ECHAM and as ancillary data-set in satellite-based global data-sets. An updated version 2 of this climatology will be presented now applying central values from the more recent AeroCom phase 2 modeling and utilizing the better global coverage of trusted sun-photometer data - including statistics from the Marine Aerosol network (MAN). Applications include spatial distributions of estimates for aerosol direct and aerosol indirect radiative effects.
NASA Astrophysics Data System (ADS)
Li, L.; Yang, C.
2017-12-01
Climate extremes often manifest as rare events in terms of surface air temperature and precipitation with an annual reoccurrence period. In order to represent the manifold characteristics of climate extremes for monitoring and analysis, the Expert Team on Climate Change Detection and Indices (ETCCDI) had worked out a set of 27 core indices based on daily temperature and precipitation data, describing extreme weather and climate events on an annual basis. The CLIMDEX project (http://www.climdex.org) had produced public domain datasets of such indices for data from a variety of sources, including output from global climate models (GCM) participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Among the 27 ETCCDI indices, there are six percentile-based temperature extremes indices that may fall into two groups: exceedance rates (ER) (TN10p, TN90p, TX10p and TX90p) and durations (CSDI and WSDI). Percentiles must be estimated prior to the calculation of the indices, and could more or less be biased by the adopted algorithm. Such biases will in turn be propagated to the final results of indices. The CLIMDEX used an empirical quantile estimator combined with a bootstrap resampling procedure to reduce the inhomogeneity in the annual series of the ER indices. However, there are still some problems remained in the CLIMDEX datasets, namely the overestimated climate variability due to unaccounted autocorrelation in the daily temperature data, seasonally varying biases and inconsistency between algorithms applied to the ER indices and to the duration indices. We now present new results of the six indices through a semiparametric quantile regression approach for the CMIP5 model output. By using the base-period data as a whole and taking seasonality and autocorrelation into account, this approach successfully addressed the aforementioned issues and came out with consistent results. The new datasets cover the historical and three projected (RCP2.6, RCP4.5 and RCP8.5) emission scenarios run a multimodel ensemble of 19 members. We analyze changes in the six indices on global and regional scales over the 21st century relative to either the base period 1961-1990 or the reference period 1981-2000, and compare the results with those based on the CLIMDEX datasets.
NASA Astrophysics Data System (ADS)
Sommer, Philipp S.; Kaplan, Jed O.
2017-10-01
While a wide range of Earth system processes occur at daily and even subdaily timescales, many global vegetation and other terrestrial dynamics models historically used monthly meteorological forcing both to reduce computational demand and because global datasets were lacking. Recently, dynamic land surface modeling has moved towards resolving daily and subdaily processes, and global datasets containing daily and subdaily meteorology have become available. These meteorological datasets, however, cover only the instrumental era of the last approximately 120 years at best, are subject to considerable uncertainty, and represent extremely large data files with associated computational costs of data input/output and file transfer. For periods before the recent past or in the future, global meteorological forcing can be provided by climate model output, but the quality of these data at high temporal resolution is low, particularly for daily precipitation frequency and amount. Here, we present GWGEN, a globally applicable statistical weather generator for the temporal downscaling of monthly climatology to daily meteorology. Our weather generator is parameterized using a global meteorological database and simulates daily values of five common variables: minimum and maximum temperature, precipitation, cloud cover, and wind speed. GWGEN is lightweight, modular, and requires a minimal set of monthly mean variables as input. The weather generator may be used in a range of applications, for example, in global vegetation, crop, soil erosion, or hydrological models. While GWGEN does not currently perform spatially autocorrelated multi-point downscaling of daily weather, this additional functionality could be implemented in future versions.
Extended output phasor representation of multi-spectral fluorescence lifetime imaging microscopy
Campos-Delgado, Daniel U.; Navarro, O. Gutiérrez; Arce-Santana, E. R.; Jo, Javier A.
2015-01-01
In this paper, we investigate novel low-dimensional and model-free representations for multi-spectral fluorescence lifetime imaging microscopy (m-FLIM) data. We depart from the classical definition of the phasor in the complex plane to propose the extended output phasor (EOP) and extended phasor (EP) for multi-spectral information. The frequency domain properties of the EOP and EP are analytically studied based on a multiexponential model for the impulse response of the imaged tissue. For practical implementations, the EOP is more appealing since there is no need to perform deconvolution of the instrument response from the measured m-FLIM data, as in the case of EP. Our synthetic and experimental evaluations with m-FLIM datasets of human coronary atherosclerotic plaques show that low frequency indexes have to be employed for a distinctive representation of the EOP and EP, and to reduce noise distortion. The tissue classification of the m-FLIM datasets by EOP and EP also improves with low frequency indexes, and does not present significant differences by using either phasor. PMID:26114031
Wootten, Adrienne; Smith, Kara; Boyles, Ryan; Terando, Adam; Stefanova, Lydia; Misra, Vasru; Smith, Tom; Blodgett, David L.; Semazzi, Fredrick
2014-01-01
Climate change is likely to have many effects on natural ecosystems in the Southeast U.S. The National Climate Assessment Southeast Technical Report (SETR) indicates that natural ecosystems in the Southeast are likely to be affected by warming temperatures, ocean acidification, sea-level rise, and changes in rainfall and evapotranspiration. To better assess these how climate changes could affect multiple sectors, including ecosystems, climatologists have created several downscaled climate projections (or downscaled datasets) that contain information from the global climate models (GCMs) translated to regional or local scales. The process of creating these downscaled datasets, known as downscaling, can be carried out using a broad range of statistical or numerical modeling techniques. The rapid proliferation of techniques that can be used for downscaling and the number of downscaled datasets produced in recent years present many challenges for scientists and decisionmakers in assessing the impact or vulnerability of a given species or ecosystem to climate change. Given the number of available downscaled datasets, how do these model outputs compare to each other? Which variables are available, and are certain downscaled datasets more appropriate for assessing vulnerability of a particular species? Given the desire to use these datasets for impact and vulnerability assessments and the lack of comparison between these datasets, the goal of this report is to synthesize the information available in these downscaled datasets and provide guidance to scientists and natural resource managers with specific interests in ecological modeling and conservation planning related to climate change in the Southeast U.S. This report enables the Southeast Climate Science Center (SECSC) to address an important strategic goal of providing scientific information and guidance that will enable resource managers and other participants in Landscape Conservation Cooperatives to make science-based climate change adaptation decisions.
Evaluation of Global Observations-Based Evapotranspiration Datasets and IPCC AR4 Simulations
NASA Technical Reports Server (NTRS)
Mueller, B.; Seneviratne, S. I.; Jimenez, C.; Corti, T.; Hirschi, M.; Balsamo, G.; Ciais, P.; Dirmeyer, P.; Fisher, J. B.; Guo, Z.;
2011-01-01
Quantification of global land evapotranspiration (ET) has long been associated with large uncertainties due to the lack of reference observations. Several recently developed products now provide the capacity to estimate ET at global scales. These products, partly based on observational data, include satellite ]based products, land surface model (LSM) simulations, atmospheric reanalysis output, estimates based on empirical upscaling of eddycovariance flux measurements, and atmospheric water balance datasets. The LandFlux-EVAL project aims to evaluate and compare these newly developed datasets. Additionally, an evaluation of IPCC AR4 global climate model (GCM) simulations is presented, providing an assessment of their capacity to reproduce flux behavior relative to the observations ]based products. Though differently constrained with observations, the analyzed reference datasets display similar large-scale ET patterns. ET from the IPCC AR4 simulations was significantly smaller than that from the other products for India (up to 1 mm/d) and parts of eastern South America, and larger in the western USA, Australia and China. The inter-product variance is lower across the IPCC AR4 simulations than across the reference datasets in several regions, which indicates that uncertainties may be underestimated in the IPCC AR4 models due to shared biases of these simulations.
NASA Astrophysics Data System (ADS)
Appel, Marius; Lahn, Florian; Buytaert, Wouter; Pebesma, Edzer
2018-04-01
Earth observation (EO) datasets are commonly provided as collection of scenes, where individual scenes represent a temporal snapshot and cover a particular region on the Earth's surface. Using these data in complex spatiotemporal modeling becomes difficult as soon as data volumes exceed a certain capacity or analyses include many scenes, which may spatially overlap and may have been recorded at different dates. In order to facilitate analytics on large EO datasets, we combine and extend the geospatial data abstraction library (GDAL) and the array-based data management and analytics system SciDB. We present an approach to automatically convert collections of scenes to multidimensional arrays and use SciDB to scale computationally intensive analytics. We evaluate the approach in three study cases on national scale land use change monitoring with Landsat imagery, global empirical orthogonal function analysis of daily precipitation, and combining historical climate model projections with satellite-based observations. Results indicate that the approach can be used to represent various EO datasets and that analyses in SciDB scale well with available computational resources. To simplify analyses of higher-dimensional datasets as from climate model output, however, a generalization of the GDAL data model might be needed. All parts of this work have been implemented as open-source software and we discuss how this may facilitate open and reproducible EO analyses.
NASA Astrophysics Data System (ADS)
Yang, P.; Fekete, B. M.; Rosenzweig, B.; Lengyel, F.; Vorosmarty, C. J.
2012-12-01
Atmospheric dynamics are essential inputs to Regional-scale Earth System Models (RESMs). Variables including surface air temperature, total precipitation, solar radiation, wind speed and humidity must be downscaled from coarse-resolution, global General Circulation Models (GCMs) to the high temporal and spatial resolution required for regional modeling. However, this downscaling procedure can be challenging due to the need to correct for bias from the GCM and to capture the spatiotemporal heterogeneity of the regional dynamics. In this study, the results obtained using several downscaling techniques and observational datasets were compared for a RESM of the Northeast Corridor of the United States. Previous efforts have enhanced GCM model outputs through bias correction using novel techniques. For example, the Climate Impact Research at Potsdam Institute developed a series of bias-corrected GCMs towards the next generation climate change scenarios (Schiermeier, 2012; Moss et al., 2010). Techniques to better represent the heterogeneity of climate variables have also been improved using statistical approaches (Maurer, 2008; Abatzoglou, 2011). For this study, four downscaling approaches to transform bias-corrected HADGEM2-ES Model output (daily at .5 x .5 degree) to the 3'*3'(longitude*latitude) daily and monthly resolution required for the Northeast RESM were compared: 1) Bilinear Interpolation, 2) Daily bias-corrected spatial downscaling (D-BCSD) with Gridded Meteorological Datasets (developed by Abazoglou 2011), 3) Monthly bias-corrected spatial disaggregation (M-BCSD) with CRU(Climate Research Unit) and 4) Dynamic Downscaling based on Weather Research and Forecast (WRF) model. Spatio-temporal analysis of the variability in precipitation was conducted over the study domain. Validation of the variables of different downscaling methods against observational datasets was carried out for assessment of the downscaled climate model outputs. The effects of using the different approaches to downscale atmospheric variables (specifically air temperature and precipitation) for use as inputs to the Water Balance Model (WBMPlus, Vorosmarty et al., 1998;Wisser et al., 2008) for simulation of daily discharge and monthly stream flow in the Northeast US for a 100-year period in the 21st century were also assessed. Statistical techniques especially monthly bias-corrected spatial disaggregation (M-BCSD) showed potential advantage among other methods for the daily discharge and monthly stream flow simulation. However, Dynamic Downscaling will provide important complements to the statistical approaches tested.
Recent Upgrades to NASA SPoRT Initialization Datasets for the Environmental Modeling System
NASA Technical Reports Server (NTRS)
Case, Jonathan L.; Lafontaine, Frank J.; Molthan, Andrew L.; Zavodsky, Bradley T.; Rozumalski, Robert A.
2012-01-01
The NASA Short-term Prediction Research and Transition (SPoRT) Center has developed several products for its NOAA/National Weather Service (NWS) partners that can initialize specific fields for local model runs within the NOAA/NWS Science and Training Resource Center Environmental Modeling System (EMS). The suite of SPoRT products for use in the EMS consists of a Sea Surface Temperature (SST) composite that includes a Lake Surface Temperature (LST) analysis over the Great Lakes, a Great Lakes sea-ice extent within the SST composite, a real-time Green Vegetation Fraction (GVF) composite, and NASA Land Information System (LIS) gridded output. This paper and companion poster describe each dataset and provide recent upgrades made to the SST, Great Lakes LST, GVF composites, and the real-time LIS runs.
Ray Drapek; John B. Kim; Ronald P. Neilson
2015-01-01
Land managers need to include climate change in their decisionmaking, but the climate models that project future climates operate at spatial scales that are too coarse to be of direct use. To create a dataset more useful to managers, soil and historical climate were assembled for the United States and Canada at a 5-arcminute grid resolution. Nine CMIP3 future climate...
A reanalysis dataset of the South China Sea.
Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu
2014-01-01
Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992-2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability.
A reanalysis dataset of the South China Sea
Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu
2014-01-01
Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803
NASA Astrophysics Data System (ADS)
Brissebrat, Guillaume; Mastrorillo, Laurence; Ramage, Karim; Boichard, Jean-Luc; Cloché, Sophie; Fleury, Laurence; Klenov, Ludmila; Labatut, Laurent; Mière, Arnaud
2013-04-01
The international HyMeX (HYdrological cycle in the Mediterranean EXperiment) project aims at a better understanding and quantification of the hydrological cycle and related processes in the Mediterranean, with emphasis on high-impact weather events, inter-annual to decadal variability of the Mediterranean coupled system, and associated trends in the context of global change. The project includes long term monitoring of environmental parameters, intensive field campaigns, use of satellite data, modelling studies, as well as post event field surveys and value-added products processing. Therefore HyMeX database incorporates various dataset types from different disciplines, either operational or research. The database relies on a strong collaboration between OMP and IPSL data centres. Field data, which are 1D time series, maps or pictures, are managed by OMP team while gridded data (satellite products, model outputs, radar data...) are managed by IPSL team. At present, the HyMeX database contains about 150 datasets, including 80 hydrological, meteorological, ocean and soil in situ datasets, 30 radar datasets, 15 satellite products, 15 atmosphere, ocean and land surface model outputs from operational (re-)analysis or forecasts and from research simulations, and 5 post event survey datasets. The data catalogue complies with international standards (ISO 19115; INSPIRE; Directory Interchange Format; Global Change Master Directory Thesaurus). It includes all the datasets stored in the HyMeX database, as well as external datasets relevant for the project. All the data, whatever the type is, are accessible through a single gateway. The database website http://mistrals.sedoo.fr/HyMeX offers different tools: - A registration procedure which enables any scientist to accept the data policy and apply for a user database account. - A search tool to browse the catalogue using thematic, geographic and/or temporal criteria. - Sorted lists of the datasets by thematic keywords, by measured parameters, by instruments or by platform type. - Forms to document observations or products that will be provided to the database. - A shopping-cart web interface to order in situ data files. - Ftp facilities to access gridded data. The website will soon propose new facilities. Many in situ datasets have been homogenized and inserted in a relational database yet, in order to enable more accurate data selection and download of different datasets in a shared format. Interoperability between the two data centres will be enhanced by the OpenDAP communication protocol associated with the Thredds catalogue software, which may also be implemented in other data centres that manage data of interest for the HyMeX project. In order to meet the operational needs for the HyMeX 2012 campaigns, a day-to-day quick look and report display website has been developed too: http://sop.hymex.org. It offers a convenient way to browse meteorological conditions and data during the campaign periods.
Efficient Lane Boundary Detection with Spatial-Temporal Knowledge Filtering
Nan, Zhixiong; Wei, Ping; Xu, Linhai; Zheng, Nanning
2016-01-01
Lane boundary detection technology has progressed rapidly over the past few decades. However, many challenges that often lead to lane detection unavailability remain to be solved. In this paper, we propose a spatial-temporal knowledge filtering model to detect lane boundaries in videos. To address the challenges of structure variation, large noise and complex illumination, this model incorporates prior spatial-temporal knowledge with lane appearance features to jointly identify lane boundaries. The model first extracts line segments in video frames. Two novel filters—the Crossing Point Filter (CPF) and the Structure Triangle Filter (STF)—are proposed to filter out the noisy line segments. The two filters introduce spatial structure constraints and temporal location constraints into lane detection, which represent the spatial-temporal knowledge about lanes. A straight line or curve model determined by a state machine is used to fit the line segments to finally output the lane boundaries. We collected a challenging realistic traffic scene dataset. The experimental results on this dataset and other standard dataset demonstrate the strength of our method. The proposed method has been successfully applied to our autonomous experimental vehicle. PMID:27529248
Validation of individual and aggregate global flood hazard models for two major floods in Africa.
NASA Astrophysics Data System (ADS)
Trigg, M.; Bernhofen, M.; Whyman, C.
2017-12-01
A recent intercomparison of global flood hazard models undertaken by the Global Flood Partnership shows that there is an urgent requirement to undertake more validation of the models against flood observations. As part of the intercomparison, the aggregated model dataset resulting from the project was provided as open access data. We compare the individual and aggregated flood extent output from the six global models and test these against two major floods in the African Continent within the last decade, namely severe flooding on the Niger River in Nigeria in 2012, and on the Zambezi River in Mozambique in 2007. We test if aggregating different number and combination of models increases model fit to the observations compared with the individual model outputs. We present results that illustrate some of the challenges of comparing imperfect models with imperfect observations and also that of defining the probability of a real event in order to test standard model output probabilities. Finally, we propose a collective set of open access validation flood events, with associated observational data and descriptions that provide a standard set of tests across different climates and hydraulic conditions.
Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold
2015-03-01
A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.
Guinotte, J.M.; Bartley, J.D.; Iqbal, A.; Fautin, D.G.; Buddemeier, R.W.
2006-01-01
We demonstrate the KGSMapper (Kansas Geological Survey Mapper), a straightforward, web-based biogeographic tool that uses environmental conditions of places where members of a taxon are known to occur to find other places containing suitable habitat for them. Using occurrence data for anemonefishes or their host sea anemones, and data for environmental parameters, we generated maps of suitable habitat for the organisms. The fact that the fishes are obligate symbionts of the anemones allowed us to validate the KGSMapper output: we were able to compare the inferred occurrence of the organism to that of the actual occurrence of its symbiont. Characterizing suitable habitat for these organisms in the Indo-West Pacific, the region where they naturally occur, can be used to guide conservation efforts, field work, etc.; defining suitable habitat for them in the Atlantic and eastern Pacific is relevant to identifying areas vulnerable to biological invasions. We advocate distinguishing between these 2 sorts of model output, terming the former maps of realized habitat and the latter maps of potential habitat. Creation of a niche model requires adding biotic data to the environmental data used for habitat maps: we included data on fish occurrences to infer anemone distribution and vice versa. Altering the selection of environmental variables allowed us to investigate which variables may exert the most influence on organism distribution. Adding variables does not necessarily improve precision of the model output. KGSMapper output distinguishes areas that fall within 1 standard deviation (SD) of the mean environmental variable values for places where members of the taxon occur, within 2 SD, and within the entire range of values; eliminating outliers or data known to be imprecise or inaccurate improved output precision mainly in the 2 SD range and beyond. Thus, KGSMapper is robust in the face of questionable data, offering the user a way to recognize and clean such data. It also functions well with sparse datasets. These features make it useful for biogeographic meta-analyses with the diverse, distributed datasets that are typical for marine organisms lacking direct commercial value. ?? Inter-Research 2006.
Fisher, Joshua B.; Sikka, Munish; Huntzinger, Deborah N.; ...
2016-07-29
Here, the land surface provides a boundary condition to atmospheric forward and flux inversion models. These models require prior estimates of CO 2 fluxes at relatively high temporal resolutions (e.g., 3-hourly) because of the high frequency of atmospheric mixing and wind heterogeneity. However, land surface model CO 2 fluxes are often provided at monthly time steps, typically because the land surface modeling community focuses more on time steps associated with plant phenology (e.g., seasonal) than on sub-daily phenomena. Here, we describe a new dataset created from 15 global land surface models and 4 ensemble products in the Multi-scale Synthesis andmore » Terrestrial Model Intercomparison Project (MsTMIP), temporally downscaled from monthly to 3-hourly output. We provide 3-hourly output for each individual model over 7 years (2004–2010), as well as an ensemble mean, a weighted ensemble mean, and the multi-model standard deviation. Output is provided in three different spatial resolutions for user preferences: 0.5° × 0.5°, 2.0° × 2.5°, and 4.0° × 5.0° (latitude × longitude).« less
Colors of attraction: Modeling insect flight to light behavior.
Donners, Maurice; van Grunsven, Roy H A; Groenendijk, Dick; van Langevelde, Frank; Bikker, Jan Willem; Longcore, Travis; Veenendaal, Elmar
2018-06-26
Light sources attract nocturnal flying insects, but some lamps attract more insects than others. The relation between the properties of a light source and the number of attracted insects is, however, poorly understood. We developed a model to quantify the attractiveness of light sources based on the spectral output. This model is fitted using data from field experiments that compare a large number of different light sources. We validated this model using two additional datasets, one for all insects and one excluding the numerous Diptera. Our model facilitates the development and application of light sources that attract fewer insects without the need for extensive field tests and it can be used to correct for spectral composition when formulating hypotheses on the ecological impact of artificial light. In addition, we present a tool allowing the conversion of the spectral output of light sources to their relative insect attraction based on this model. © 2018 Wiley Periodicals, Inc.
Knijnenburg, Theo A.; Klau, Gunnar W.; Iorio, Francesco; Garnett, Mathew J.; McDermott, Ultan; Shmulevich, Ilya; Wessels, Lodewyk F. A.
2016-01-01
Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present ‘Logic Optimization for Binary Input to Continuous Output’ (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models. PMID:27876821
NASA Astrophysics Data System (ADS)
Cohn, A.; Bragança, A.; Jeffries, G. R.
2017-12-01
An increasing share of global agricultural production can be found in the humid tropics. Therefore, an improved understanding of the mechanisms governing variability in the output of tropical agricultural systems is of increasing importance for food security including through climate change adaptation. Yet, the long window over which many tropical crops can be sown, the diversity of crop varieties and management practices combine to challenge inference into climate risk to cropping output in analyses of tropical crop-climate sensitivity employing administrative data. In this paper, we leverage a newly developed spatially explicit dataset of soybean yields in Brazil to combat this problem. The dataset was built by training a model of remotely-sensed vegetation index data and land cover classification data using a rich in situ dataset of soybean yield and management variables collected over the period 2006 to 2016. The dataset contains soybean yields by plant date, cropping frequency, and maturity group for each 5km grid cell in Brazil. We model variation in these yields using an approach enabling the estimation of the influence of management factors on the sensitivity of soybean yields to variability in: cumulative solar radiation, extreme degree days, growing degree days, flooding rain in the harvest period, and dry spells in the rainy season. We find strong variation in climate sensitivity by management class. Planting date and maturity group each explained a great deal more variation in yield sensitivity than did cropping frequency. Brazil collects comparatively fine spatial resolution yield data. But, our attempt to replicate our results using administrative soy yield data revealed substantially lesser crop-climate sensitivity; suggesting that previous analyses employing administrative data may have underestimated climate risk to tropical soy production.
Statistical Compression for Climate Model Output
NASA Astrophysics Data System (ADS)
Hammerling, D.; Guinness, J.; Soh, Y. J.
2017-12-01
Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.
Brokering technologies to realize the hydrology scenario in NSF BCube
NASA Astrophysics Data System (ADS)
Boldrini, Enrico; Easton, Zachary; Fuka, Daniel; Pearlman, Jay; Nativi, Stefano
2015-04-01
In the National Science Foundation (NSF) BCube project an international team composed of cyber infrastructure experts, geoscientists, social scientists and educators are working together to explore the use of brokering technologies, initially focusing on four domains: hydrology, oceans, polar, and weather. In the hydrology domain, environmental models are fundamental to understand the behaviour of hydrological systems. A specific model usually requires datasets coming from different disciplines for its initialization (e.g. elevation models from Earth observation, weather data from Atmospheric sciences, etc.). Scientific datasets are usually available on heterogeneous publishing services, such as inventory and access services (e.g. OGC Web Coverage Service, THREDDS Data Server, etc.). Indeed, datasets are published according to different protocols, moreover they usually come in different formats, resolutions, Coordinate Reference Systems (CRSs): in short different grid environments depending on the original data and the publishing service processing capabilities. Scientists can thus be impeded by the burden of discovery, access and normalize the desired datasets to the grid environment required by the model. These technological tasks of course divert scientists from their main, scientific goals. The use of GI-axe brokering framework has been experimented in a hydrology scenario where scientists needed to compare a particular hydrological model with two different input datasets (digital elevation models): - the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) dataset, v.2. - the Shuttle Radar Topography Mission (SRTM) dataset, v.3. These datasets were published by means of Hyrax Server technology, which can provide NetCDF files at their original resolution and CRS. Scientists had their model running on ArcGIS, so the main goal was to import the datasets using the available ArcPy library and have EPSG:4326 with the same resolution grid as the reference system, so that model outputs could be compared. ArcPy however is able to access only GeoTIff datasets that are published by a OGC Web Coverage Service (WCS). The GI-axe broker has then been deployed between the client application and the data providers. It has been configured to broker the two different Hyrax service endpoints and republish the data content through a WCS interface for the use of the ArcPy library. Finally, scientists were able to easily run the model, and to concentrate on the comparison of the different results obtained according to the selected input dataset. The use of a third party broker to perform such technological tasks has also shown to have the potential advantage of increasing the repeatability of a study among different researchers.
National Water Model: Providing the Nation with Actionable Water Intelligence
NASA Astrophysics Data System (ADS)
Aggett, G. R.; Bates, B.
2017-12-01
The National Water Model (NWM) provides national, street-level detail of water movement through time and space. Operating hourly, this flood of information offers enormous benefits in the form of water resource management, natural disaster preparedness, and the protection of life and property. The Geo-Intelligence Division at the NOAA National Water Center supplies forecasters and decision-makers with timely, actionable water intelligence through the processing of billions of NWM data points every hour. These datasets include current streamflow estimates, short and medium range streamflow forecasts, and many other ancillary datasets. The sheer amount of NWM data produced yields a dataset too large to allow for direct human comprehension. As such, it is necessary to undergo model data post-processing, filtering, and data ingestion by visualization web apps that make use of cartographic techniques to bring attention to the areas of highest urgency. This poster illustrates NWM output post-processing and cartographic visualization techniques being developed and employed by the Geo-Intelligence Division at the NOAA National Water Center to provide national actionable water intelligence.
Cournane, S; Sheehy, N; Cooke, J
2014-06-01
Benford's law is an empirical observation which predicts the expected frequency of digits in naturally occurring datasets spanning multiple orders of magnitude, with the law having been most successfully applied as an audit tool in accountancy. This study investigated the sensitivity of the technique in identifying system output changes using simulated changes in interventional radiology Dose-Area-Product (DAP) data, with any deviations from Benford's distribution identified using z-statistics. The radiation output for interventional radiology X-ray equipment is monitored annually during quality control testing; however, for a considerable portion of the year an increased output of the system, potentially caused by engineering adjustments or spontaneous system faults may go unnoticed, leading to a potential increase in the radiation dose to patients. In normal operation recorded examination radiation outputs vary over multiple orders of magnitude rendering the application of normal statistics ineffective for detecting systematic changes in the output. In this work, the annual DAP datasets complied with Benford's first order law for first, second and combinations of the first and second digits. Further, a continuous 'rolling' second order technique was devised for trending simulated changes over shorter timescales. This distribution analysis, the first employment of the method for radiation output trending, detected significant changes simulated on the original data, proving the technique useful in this case. The potential is demonstrated for implementation of this novel analysis for monitoring and identifying change in suitable datasets for the purpose of system process control. Copyright © 2013 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Hawaii Regional Sediment Management: Regional Sediment Budget for the Kekaha Region of Kauai, HI
2013-06-01
Waimea River . Some sediment passes from the Waimea cell to the west and is deposited in the Kikiaola Harbor entrance channel and basin . Upland... study regions, have been developed by the University of Hawaii Coastal Geology Group (UH CGG) (Fletcher et al. 2012) for the US Geological Survey... Study (WIS) (Hubertz 1992) hindcast dataset were used as input to the model STeady WAVE (STWAVE) (Smith et al. 2001). The model output provides
NASA AVOSS Fast-Time Models for Aircraft Wake Prediction: User's Guide (APA3.8 and TDP2.1)
NASA Technical Reports Server (NTRS)
Ahmad, Nash'at N.; VanValkenburg, Randal L.; Pruis, Matthew J.; Limon Duparcmeur, Fanny M.
2016-01-01
NASA's current distribution of fast-time wake vortex decay and transport models includes APA (Version 3.8) and TDP (Version 2.1). This User's Guide provides detailed information on the model inputs, file formats, and model outputs. A brief description of the Memphis 1995, Dallas/Fort Worth 1997, and the Denver 2003 wake vortex datasets is given along with the evaluation of models. A detailed bibliography is provided which includes publications on model development, wake field experiment descriptions, and applications of the fast-time wake vortex models.
The uploaded data consists of the BRACE Na aerosol observations paired with CMAQ model output, the updated model's parameterization of sea salt aerosol emission size distribution, and the model's parameterization of the sea salt emission factor as a function of sea surface temperature. This dataset is associated with the following publication:Gantt , B., J. Kelly , and J. Bash. Updating sea spray aerosol emissions in the Community Multiscale Air Quality (CMAQ) model version 5.0.2. Geoscientific Model Development. Copernicus Publications, Katlenburg-Lindau, GERMANY, 8: 3733-3746, (2015).
Fienen, Michael N.; Nolan, Bernard T.; Feinstein, Daniel T.
2016-01-01
For decision support, the insights and predictive power of numerical process models can be hampered by insufficient expertise and computational resources required to evaluate system response to new stresses. An alternative is to emulate the process model with a statistical “metamodel.” Built on a dataset of collocated numerical model input and output, a groundwater flow model was emulated using a Bayesian Network, an Artificial neural network, and a Gradient Boosted Regression Tree. The response of interest was surface water depletion expressed as the source of water-to-wells. The results have application for managing allocation of groundwater. Each technique was tuned using cross validation and further evaluated using a held-out dataset. A numerical MODFLOW-USG model of the Lake Michigan Basin, USA, was used for the evaluation. The performance and interpretability of each technique was compared pointing to advantages of each technique. The metamodel can extend to unmodeled areas.
Multiresolution comparison of precipitation datasets for large-scale models
NASA Astrophysics Data System (ADS)
Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.
2014-12-01
Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.
Semantic Image Segmentation with Contextual Hierarchical Models.
Seyedhosseini, Mojtaba; Tasdizen, Tolga
2016-05-01
Semantic segmentation is the problem of assigning an object label to each pixel. It unifies the image segmentation and object recognition problems. The importance of using contextual information in semantic segmentation frameworks has been widely realized in the field. We propose a contextual framework, called contextual hierarchical model (CHM), which learns contextual information in a hierarchical framework for semantic segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. This training strategy allows for optimization of a joint posterior probability at multiple resolutions through the hierarchy. Contextual hierarchical model is purely based on the input image patches and does not make use of any fragments or shape examples. Hence, it is applicable to a variety of problems such as object segmentation and edge detection. We demonstrate that CHM performs at par with state-of-the-art on Stanford background and Weizmann horse datasets. It also outperforms state-of-the-art edge detection methods on NYU depth dataset and achieves state-of-the-art on Berkeley segmentation dataset (BSDS 500).
NASA Astrophysics Data System (ADS)
Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.
2016-12-01
Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.
NASA Astrophysics Data System (ADS)
Van Den Broeke, Matthew S.; Kalin, Andrew; Alavez, Jose Abraham Torres; Oglesby, Robert; Hu, Qi
2017-11-01
In climate modeling studies, there is a need to choose a suitable land surface model (LSM) while adhering to available resources. In this study, the viability of three LSM options (Community Land Model version 4.0 [CLM4.0], Noah-MP, and the five-layer thermal diffusion [Bucket] scheme) in the Weather Research and Forecasting model version 3.6 (WRF3.6) was examined for the warm season in a domain centered on the central USA. Model output was compared to Parameter-elevation Relationships on Independent Slopes Model (PRISM) data, a gridded observational dataset including mean monthly temperature and total monthly precipitation. Model output temperature, precipitation, latent heat (LH) flux, sensible heat (SH) flux, and soil water content (SWC) were compared to observations from sites in the Central and Southern Great Plains region. An overall warm bias was found in CLM4.0 and Noah-MP, with a cool bias of larger magnitude in the Bucket model. These three LSMs produced similar patterns of wet and dry biases. Model output of SWC and LH/SH fluxes were compared to observations, and did not show a consistent bias. Both sophisticated LSMs appear to be viable options for simulating the effects of land use change in the central USA.
Independent validation of Swarm Level 2 magnetic field products and `Quick Look' for Level 1b data
NASA Astrophysics Data System (ADS)
Beggan, Ciarán D.; Macmillan, Susan; Hamilton, Brian; Thomson, Alan W. P.
2013-11-01
Magnetic field models are produced on behalf of the European Space Agency (ESA) by an independent scientific consortium known as the Swarm Satellite Constellation Application and Research Facility (SCARF), through the Level 2 Processor (L2PS). The consortium primarily produces magnetic field models for the core, lithosphere, ionosphere and magnetosphere. Typically, for each magnetic product, two magnetic field models are produced in separate chains using complementary data selection and processing techniques. Hence, the magnetic field models from the complementary processing chains will be similar but not identical. The final step in the overall L2PS therefore involves inspection and validation of the magnetic field models against each other and against data from (semi-) independent sources (e.g. ground observatories). We describe the validation steps for each magnetic field product and the comparison against independent datasets, and we show examples of the output of the validation. In addition, the L2PS also produces a daily set of `Quick Look' output graphics and statistics to monitor the overall quality of Level 1b data issued by ESA. We describe the outputs of the `Quick Look' chain.
Providing Geographic Datasets as Linked Data in Sdi
NASA Astrophysics Data System (ADS)
Hietanen, E.; Lehto, L.; Latvala, P.
2016-06-01
In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
NASA Astrophysics Data System (ADS)
Shukla, Shraddhanand; Funk, Chris; Peterson, Pete; McNally, Amy; Dinku, Tufa; Barbosa, Humberto; Paredes-Trejo, Franklin; Pedreros, Diego; Husak, Greg
2017-04-01
A high quality, long-term, high-resolution precipitation dataset is key for supporting drought-related risk management and food security early warning. Here, we present the Climate Hazards group InfraRed Precipitation with Stations (CHIRPS) v2.0, developed by scientists at the University of California, Santa Barbara and the U.S. Geological Survey Earth Resources Observation and Science Center under the direction of Famine Early Warning Systems Network (FEWS NET). CHIRPS is a quasi-global precipitation product and is made available at daily to seasonal time scales with a spatial resolution of 0.05° and a 1981 to near real-time period of record. We begin by describing the three main components of CHIRPS - a high-resolution climatology, time-varying cold cloud duration precipitation estimates, and in situ precipitation estimates, and how they are combined. We then present a validation of this dataset and describe how CHIRPS is being disseminated and used in different applications, such as large-scale hydrologic models and crop water balance models. Validation of CHIRPS has focused on comparisons with precipitation products with global coverage, long periods of record and near real-time availability such as CPC-Unified, CFS Reanalysis and ECMWF datasets and datasets such GPCC and GPCP that incorporate high quality in situ datasets from places such as Uganda, Colombia, and the Sahel. The CHIRPS is shown to have low systematic errors (bias) and low mean absolute errors. We find that CHIRPS performance appears quite similar to research quality products like the GPCC and GPCP, but with higher resolution and lower latency. We also present results from independent validation studies focused on South America and East Africa. CHIRPS is currently being used to drive FEWS NET Land Data Assimilation System (FLDAS), that incorporates multiple hydrologic models, and Water Requirement Satisfaction Index (WRSI), which is a widely used crop water balance model. The outputs (such as soil moisture and runoff) from these models are being used for real-time drought monitoring in Africa. Under support from the USAID FEWS NET, CHG/USGS has developed a two way strategy for dissemination of CHIRPS and related products (e.g. FLDAS, WRSI) and incorporate contributed station data. For example, we are currently working with partners in Mexico (Conagua), Southern Africa (SASSCAL), Colombia (IDEAM), Nigeria (Kukua), Somalia (SWALIM) and Ethiopia (NMA). These institutions provide in situ observations which enhance the CHIRPS and CHIRPS provides feedback on data quality. The CHIRPS is then placed in a web accessible geospatial database. Partners in these countries can then access CHIRPS and other outputs, and display this information using web-based mapping tools. This provides a win-win collaboration, leading to improved globally accessible precipitation estimates and improved climate services in developing nations.
A three-dimensional multivariate representation of atmospheric variability
NASA Astrophysics Data System (ADS)
Žagar, Nedjeljka; Jelić, Damjan; Blaauw, Marten; Jesenko, Blaž
2016-04-01
A recently developed MODES software has been applied to the ECMWF analyses and forecasts and to several reanalysis datasets to describe the global variability of the balanced and inertio-gravity (IG) circulation across many scales by considering both mass and wind field and the whole model depth. In particular, the IG spectrum, which has only recently become observable in global datasets, can be studied simultaneously in the mass field and wind field and considering the whole model depth. MODES is open-access software that performs the normal-mode function decomposition of the 3D global datasets. Its application to the ERA Interim dataset reveals several aspects of the large-scale circulation after it has been partitioned into the linearly balanced and IG components. The global energy distribution is dominated by the balanced energy while the IG modes contribute around 8% of the total wave energy. However, on subsynoptic scales IG energy dominates and it is associated with the main features of tropical variability on all scales. The presented energy distribution and features of the zonally-averaged and equatorial circulation provide a reference for the intercomparison of several reanalysis datasets and for the validation of climate models. Features of the global IG circulation are compared in ERA Interim, MERRA and JRA reanalysis datasets and in several CMIP5 models. Since October 2014 the operational medium-range forecasts of the European Centre for Medium-Range Weather Forecasts (ECMWF) have been analyzed by MODES daily and an online archive of all the outputs is available at http://meteo.fmf.uni-lj.si/MODES. New outputs are made available daily based on the 00 UTC run and subsequent 12-hour forecasts up to 240-hour forecast. In addition to the energy spectra and horizontal circulation on selected levels for the balanced and IG components, the equatorial Kelvin waves are presented in time and space as the most energetic tropical IG modes propagating vertically and along the equator from its main generation regions in the upper troposphere over the Indian and Pacific region. The validation of the 10-day ECMWF forecasts with analyses in the modal space suggests a lack of variability in the tropics in the medium range. Reference: Žagar, N. et al., 2015: Normal-mode function representation of global 3-D data sets: open-access software for the atmospheric research community. Geosci. Model Dev., 8, 1169-1195, doi:10.5194/gmd-8-1169-2015 Žagar, N., R. Buizza, and J. Tribbia, 2015: A three-dimensional multivariate modal analysis of atmospheric predictability with application to the ECMWF ensemble. J. Atmos. Sci., 72, 4423-4444 The MODES software is available from http://meteo.fmf.uni-lj.si/MODES.
Affective State Level Recognition in Naturalistic Facial and Vocal Expressions.
Meng, Hongying; Bianchi-Berthouze, Nadia
2014-03-01
Naturalistic affective expressions change at a rate much slower than the typical rate at which video or audio is recorded. This increases the probability that consecutive recorded instants of expressions represent the same affective content. In this paper, we exploit such a relationship to improve the recognition performance of continuous naturalistic affective expressions. Using datasets of naturalistic affective expressions (AVEC 2011 audio and video dataset, PAINFUL video dataset) continuously labeled over time and over different dimensions, we analyze the transitions between levels of those dimensions (e.g., transitions in pain intensity level). We use an information theory approach to show that the transitions occur very slowly and hence suggest modeling them as first-order Markov models. The dimension levels are considered to be the hidden states in the Hidden Markov Model (HMM) framework. Their discrete transition and emission matrices are trained by using the labels provided with the training set. The recognition problem is converted into a best path-finding problem to obtain the best hidden states sequence in HMMs. This is a key difference from previous use of HMMs as classifiers. Modeling of the transitions between dimension levels is integrated in a multistage approach, where the first level performs a mapping between the affective expression features and a soft decision value (e.g., an affective dimension level), and further classification stages are modeled as HMMs that refine that mapping by taking into account the temporal relationships between the output decision labels. The experimental results for each of the unimodal datasets show overall performance to be significantly above that of a standard classification system that does not take into account temporal relationships. In particular, the results on the AVEC 2011 audio dataset outperform all other systems presented at the international competition.
A downscaling method for the assessment of local climate change
NASA Astrophysics Data System (ADS)
Bruno, E.; Portoghese, I.; Vurro, M.
2009-04-01
The use of complimentary models is necessary to study the impact of climate change scenarios on the hydrological response at different space-time scales. However, the structure of GCMs is such that their space resolution (hundreds of kilometres) is too coarse and not adequate to describe the variability of extreme events at basin scale (Burlando and Rosso, 2002). To bridge the space-time gap between the climate scenarios and the usual scale of the inputs for hydrological prediction models is a fundamental requisite for the evaluation of climate change impacts on water resources. Since models operate a simplification of a complex reality, their results cannot be expected to fit with climate observations. Identifying local climate scenarios for impact analysis implies the definition of more detailed local scenario by downscaling GCMs or RCMs results. Among the output correction methods we consider the statistical approach by Déqué (2007) reported as a ‘Variable correction method' in which the correction of model outputs is obtained by a function build with the observation dataset and operating a quantile-quantile transformation (Q-Q transform). However, in the case of daily precipitation fields the Q-Q transform is not able to correct the temporal property of the model output concerning the dry-wet lacunarity process. An alternative correction method is proposed based on a stochastic description of the arrival-duration-intensity processes in coherence with the Poissonian Rectangular Pulse scheme (PRP) (Eagleson, 1972). In this proposed approach, the Q-Q transform is applied to the PRP variables derived from the daily rainfall datasets. Consequently the corrected PRP parameters are used for the synthetic generation of statistically homogeneous rainfall time series that mimic the persistency of daily observations for the reference period. Then the PRP parameters are forced through the GCM scenarios to generate local scale rainfall records for the 21st century. The statistical parameters characterizing daily storm occurrence, storm intensity and duration needed to apply the PRP scheme are considered among STARDEX collection of extreme indices.
FastQuery: A Parallel Indexing System for Scientific Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chou, Jerry; Wu, Kesheng; Prabhat,
2011-07-29
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also developmore » a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.« less
Keshavarz, M; Mojra, A
2015-05-01
Geometrical features of a cancerous tumor embedded in biological soft tissue, including tumor size and depth, are a necessity in the follow-up procedure and making suitable therapeutic decisions. In this paper, a new socio-politically motivated global search strategy which is called imperialist competitive algorithm (ICA) is implemented to train a feed forward neural network (FFNN) to estimate the tumor's geometrical characteristics (FFNNICA). First, a viscoelastic model of liver tissue is constructed by using a series of in vitro uniaxial and relaxation test data. Then, 163 samples of the tissue including a tumor with different depths and diameters are generated by making use of PYTHON programming to link the ABAQUS and MATLAB together. Next, the samples are divided into 123 samples as training dataset and 40 samples as testing dataset. Training inputs of the network are mechanical parameters extracted from palpation of the tissue through a developing noninvasive technology called artificial tactile sensing (ATS). Last, to evaluate the FFNNICA performance, outputs of the network including tumor's depth and diameter are compared with desired values for both training and testing datasets. Deviations of the outputs from desired values are calculated by a regression analysis. Statistical analysis is also performed by measuring Root Mean Square Error (RMSE) and Efficiency (E). RMSE in diameter and depth estimations are 0.50 mm and 1.49, respectively, for the testing dataset. Results affirm that the proposed optimization algorithm for training neural network can be useful to characterize soft tissue tumors accurately by employing an artificial palpation approach. Copyright © 2015 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Whitehall, K. D.; Jenkins, G. S.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Goodale, C. E.; Hart, A. F.; Ramirez, P.; Whittell, J.; Zimdars, P. A.
2012-12-01
Mesoscale convective complexes (MCCs) are large (2 - 3 x 105 km2) nocturnal convectively-driven weather systems that are generally associated with high precipitation events in short durations (less than 12hrs) in various locations through out the tropics and midlatitudes (Maddox 1980). These systems are particularly important for climate in the West Sahel region, where the precipitation associated with them is a principal component of the rainfall season (Laing and Fritsch 1993). These systems occur on weather timescales and are historically identified from weather data analysis via manual and more recently automated processes (Miller and Fritsch 1991, Nesbett 2006, Balmey and Reason 2012). The Regional Climate Model Evaluation System (RCMES) is an open source tool designed for easy evaluation of climate and Earth system data through access to standardized datasets, and intrinsic tools that perform common analysis and visualization tasks (Hart et al. 2011). The RCMES toolkit also provides the flexibility of user-defined subroutines for further metrics, visualization and even dataset manipulation. The purpose of this study is to present a methodology for identifying MCCs in observation datasets using the RCMES framework. TRMM 3 hourly datasets will be used to demonstrate the methodology for 2005 boreal summer. This method promotes the use of open source software for scientific data systems to address a concern to multiple stakeholders in the earth sciences. A historical MCC dataset provides a platform with regards to further studies of the variability of frequency on various timescales of MCCs that is important for many including climate scientists, meteorologists, water resource managers, and agriculturalists. The methodology of using RCMES for searching and clipping datasets will engender a new realm of studies as users of the system will no longer be restricted to solely using the datasets as they reside in their own local systems; instead will be afforded rapid, effective, and transparent access, processing and visualization of the wealth of remote sensing datasets and climate model outputs available.
Simulating Freshwater Availability under Future Climate Conditions
NASA Astrophysics Data System (ADS)
Zhao, F.; Zeng, N.; Motesharrei, S.; Gustafson, K. C.; Rivas, J.; Miralles-Wilhelm, F.; Kalnay, E.
2013-12-01
Freshwater availability is a key factor for regional development. Precipitation, evaporation, river inflow and outflow are the major terms in the estimate of regional water supply. In this study, we aim to obtain a realistic estimate for these variables from 1901 to 2100. First we calculated the ensemble mean precipitation using the 2011-2100 RCP4.5 output (re-sampled to half-degree spatial resolution) from 16 General Circulation Models (GCMs) participating the Coupled Model Intercomparison Project Phase 5 (CMIP5). The projections are then combined with the half-degree 1901-2010 Climate Research Unit (CRU) TS3.2 dataset after bias correction. We then used the combined data to drive our UMD Earth System Model (ESM), in order to generate evaporation and runoff. We also developed a River-Routing Scheme based on the idea of Taikan Oki, as part of the ESM. It is capable of calculating river inflow and outflow for any region, driven by the gridded runoff output. River direction and slope information from Global Dominant River Tracing (DRT) dataset are included in our scheme. The effects of reservoirs/dams are parameterized based on a few simple factors such as soil moisture, population density and geographic regions. Simulated river flow is validated with river gauge measurements for the world's major rivers. We have applied our river flow calculation to two data-rich watersheds in the United States: Phoenix AMA watershed and the Potomac River Basin. The results are used in our SImple WAter model (SIWA) to explore water management options.
Interoperability challenges in river discharge modelling: A cross domain application scenario
NASA Astrophysics Data System (ADS)
Santoro, Mattia; Andres, Volker; Jirka, Simon; Koike, Toshio; Looser, Ulrich; Nativi, Stefano; Pappenberger, Florian; Schlummer, Manuela; Strauch, Adrian; Utech, Michael; Zsoter, Ervin
2018-06-01
River discharge is a critical water cycle variable, as it integrates all the processes (e.g. runoff and evapotranspiration) occurring within a river basin and provides a hydrological output variable that can be readily measured. Its prediction is of invaluable help for many water-related tasks including water resources assessment and management, flood protection, and disaster mitigation. Observations of river discharge are important to calibrate and validate hydrological or coupled land, atmosphere and ocean models. This requires using datasets from different scientific domains (Water, Weather, etc.). Typically, such datasets are provided using different technological solutions. This complicates the integration of new hydrological data sources into application systems. Therefore, a considerable effort is often spent on data access issues instead of the actual scientific question. This paper describes the work performed to address multidisciplinary interoperability challenges related to river discharge modeling and validation. This includes definition and standardization of domain specific interoperability standards for hydrological data sharing and their support in global frameworks such as the Global Earth Observation System of Systems (GEOSS). The research was developed in the context of the EU FP7-funded project GEOWOW (GEOSS Interoperability for Weather, Ocean and Water), which implemented a "River Discharge" application scenario. This scenario demonstrates the combination of river discharge observations data from the Global Runoff Data Centre (GRDC) database and model outputs produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) predicting river discharge based on weather forecast information in the context of the GEOSS.
Cerveau, Nicolas; Jackson, Daniel J
2016-12-09
Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be closer to biological reality than any of the three individual assembly packages we investigated. This approach (freely available as a simple perl script) will be of use to researchers working with species for which there is little or no reference data against which the assembly of a transcriptome can be performed.
Design of FastQuery: How to Generalize Indexing and Querying System for Scientific Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Jerry; Wu, Kesheng
2011-04-18
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit are critical for facilitating interactive exploration of large datasets. These technologies rely on adding auxiliary information to existing datasets to accelerate query processing. To use these indices, we need to match the relational data model used by the indexing systems with the array data model used by most scientific data, and to provide an efficient input and output layer for reading and writing the indices. In this work, we present a flexible design that can be easily applied to most scientific datamore » formats. We demonstrate this flexibility by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using simulation data from the particle accelerator and climate simulation communities. To demonstrate the effectiveness of the new design, we also present a detailed performance study using both synthetic and real scientific workloads.« less
NASA Astrophysics Data System (ADS)
van Eck, C. M.; Morfopoulos, C.; Betts, R. A.; Chang, J.; Ciais, P.; Friedlingstein, P.; Regnier, P. A. G.
2016-12-01
The frequency and severity of extreme climate events such as droughts, extreme precipitation and heatwaves are expected to increase in our changing climate. These extreme climate events will have an effect on vegetation either by enhanced or reduced productivity. Subsequently, this can have a substantial impact on the terrestrial carbon sink and thus the global carbon cycle, especially as extreme climate events are expected to increase in frequency and severity. Connecting observational datasets with modelling studies provides new insights into these climate-vegetation interactions. This study aims to compare extremes in vegetation productivity as derived from observations with that of Dynamic Global Vegetation Models (DGVMs). In this case GIMMS-NDVI 3g is selected as the observational dataset and both JULES (Joint UK Land Environment Simulator) and ORCHIDEE (Organising Carbon and Hydrology In Dynamic Ecosystems) as the DGVMs. Both models are forced with PGFv2 Global Meteorological Forcing Dataset according to the ISI-MIP2 protocol for historical runs. Extremes in vegetation productivity are the focal point, which are identified as NDVI anomalies below the 10th percentile or above the 90th percentile during the growing season, referred to as browning or greening events respectively. The monthly NDVI dataset GIMMS-NDVI 3g is used to obtain the location in time and space of the vegetation extremes. The global GIMMS-NDVI 3g dataset has been subdivided into IPCC's SREX-regions for which the NDVI anomalies are calculated and the extreme thresholds are determined. With this information we can identify the location in time and space of the browning and greening events in remotely-sensed vegetation productivity. The same procedure is applied to the modelled Gross Primary Productivity (GPP) allowing a comparison between the spatial and temporal occurrence of the browning and greening events in the observational dataset and the models' output. The capacity of the models to catch observed extremes in vegetation productivity is assessed and compared. Factors contributing to observed and modelled vegetation browning/greening extremes are analysed. The results of this study provide a stepping stone to modelling future extremes in vegetation productivity.
NASA Astrophysics Data System (ADS)
Bellos, V.; Mahmoodian, M.; Leopold, U.; Torres-Matallana, J. A.; Schutz, G.; Clemens, F.
2017-12-01
Surrogate models help to decrease the run-time of computationally expensive, detailed models. Recent studies show that Gaussian Process Emulators (GPE) are promising techniques in the field of urban drainage modelling. However, this study focusses on developing a GPE-based surrogate model for later application in Real Time Control (RTC) using input and output time series of a complex simulator. The case study is an urban drainage catchment in Luxembourg. A detailed simulator, implemented in InfoWorks ICM, is used to generate 120 input-output ensembles, from which, 100 are used for training the emulator and 20 for validation of the results. An ensemble of historical rainfall events with 2 hours duration and 10 minutes time steps are considered as the input data. Two example outputs, are selected as wastewater volume and total COD concentration in a storage tank in the network. The results of the emulator are tested with unseen random rainfall events from the ensemble dataset. The emulator is approximately 1000 times faster than the original simulator for this small case study. Whereas the overall patterns of the simulator are matched by the emulator, in some cases the emulator deviates from the simulator. To quantify the accuracy of the emulator in comparison with the original simulator, Nash-Sutcliffe efficiency (NSE) between the emulator and simulator is calculated for unseen rainfall scenarios. The range of NSE for the case of tank volume is from 0.88 to 0.99 with a mean value of 0.95, whereas for COD is from 0.71 to 0.99 with a mean value of 0.92. The emulator is able to predict the tank volume with higher accuracy as the relationship between rainfall intensity and tank volume is linear. For COD, which has a non-linear behaviour, the predictions are less accurate and more uncertain, in particular when rainfall intensity increases. This predictions were improved by including a larger amount of training data for the higher rainfall intensities. It was observed that, the accuracy of the emulator predictions depends on the ensemble training dataset design and the amount of data fed. Finally, more investigation is required to test the possibility of applying this type of fast emulators for model-based RTC applications in which limited number of inputs and outputs are considered in a short prediction horizon.
Developing Snow Model Forcing Data From WRF Model Output to Aid in Water Resource Forecasting
NASA Astrophysics Data System (ADS)
Havens, S.; Marks, D. G.; Watson, K. A.; Masarik, M.; Flores, A. N.; Kormos, P.; Hedrick, A. R.
2015-12-01
Traditional operational modeling tools used by water managers in the west are challenged by more frequently occurring uncharacteristic stream flow patterns caused by climate change. Water managers are now turning to new models based on the physical processes within a watershed to combat the increasing number of events that do not follow the historical patterns. The USDA-ARS has provided near real time snow water equivalent (SWE) maps using iSnobal since WY2012 for the Boise River Basin in southwest Idaho and since WY2013 for the Tuolumne Basin in California that feeds the Hetch Hetchy reservoir. The goal of these projects is to not only provide current snowpack estimates but to use the Weather Research and Forecasting (WRF) model to drive iSnobal in order to produce a forecasted stream flow when coupled to a hydrology model. The first step is to develop methods on how to create snow model forcing data from WRF outputs. Using a reanalysis 1km WRF dataset from WY2009 over the Boise River Basin, WRF model results like surface air temperature, relative humidity, wind, precipitation, cloud cover, and incoming long wave radiation must be downscaled for use in iSnobal. iSnobal results forced with WRF output are validated at point locations throughout the basin, as well as compared with iSnobal results forced with traditional weather station data. The presentation will explore the differences in forcing data derived from WRF outputs and weather stations and how this affects the snowpack distribution.
Potential for using regional and global datasets for national scale ecosystem service modelling
NASA Astrophysics Data System (ADS)
Maxwell, Deborah; Jackson, Bethanna
2016-04-01
Ecosystem service models are increasingly being used by planners and policy makers to inform policy development and decisions about national-level resource management. Such models allow ecosystem services to be mapped and quantified, and subsequent changes to these services to be identified and monitored. In some cases, the impact of small scale changes can be modelled at a national scale, providing more detailed information to decision makers about where to best focus investment and management interventions that could address these issues, while moving toward national goals and/or targets. National scale modelling often uses national (or local) data (for example, soils, landcover and topographical information) as input. However, there are some places where fine resolution and/or high quality national datasets cannot be easily obtained, or do not even exist. In the absence of such detailed information, regional or global datasets could be used as input to such models. There are questions, however, about the usefulness of these coarser resolution datasets and the extent to which inaccuracies in this data may degrade predictions of existing and potential ecosystem service provision and subsequent decision making. Using LUCI (the Land Utilisation and Capability Indicator) as an example predictive model, we examine how the reliability of predictions change when national datasets of soil, landcover and topography are substituted with coarser scale regional and global datasets. We specifically look at how LUCI's predictions of where water services, such as flood risk, flood mitigation, erosion and water quality, change when national data inputs are replaced by regional and global datasets. Using the Conwy catchment, Wales, as a case study, the land cover products compared are the UK's Land Cover Map (2007), the European CORINE land cover map and the ESA global land cover map. Soils products include the National Soil Map of England and Wales (NatMap) and the European Soils Database. NEXTmap elevation data, which covers the UK and parts of continental Europe, are compared to global AsterDEM and SRTM30 topographical products. While the regional and global datasets can be used to fill gaps in data requirements, the coarser resolution of these datasets means that there is greater aggregation of information over larger areas. This loss of detail impacts on the reliability of model output, particularly where significant discrepancies between datasets exist. The implications of this loss of detail in terms of spatial planning and decision making is discussed. Finally, in the context of broader development the need for better nationally and globally available data to allow LUCI and other ecosystem models to become more globally applicable is highlighted.
Fourcade, Yoan; Engler, Jan O; Rödder, Dennis; Secondi, Jean
2014-01-01
MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We compared here the performance of five methods of bias correction on three datasets of species occurrence: one "virtual" derived from a land cover map, and two actual datasets for a turtle (Chrysemys picta) and a salamander (Plethodon cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases.
Fourcade, Yoan; Engler, Jan O.; Rödder, Dennis; Secondi, Jean
2014-01-01
MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We compared here the performance of five methods of bias correction on three datasets of species occurrence: one “virtual” derived from a land cover map, and two actual datasets for a turtle (Chrysemys picta) and a salamander (Plethodon cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases. PMID:24818607
Use of Regional Climate Model Output for Hydrologic Simulations
NASA Astrophysics Data System (ADS)
Hay, L. E.; Clark, M. P.; Wilby, R. L.; Gutowski, W. J.; Leavesley, G. H.; Pan, Z.; Arritt, R. W.; Takle, E. S.
2001-12-01
Daily precipitation and maximum and minimum temperature time series from a Regional Climate Model (RegCM2) were used as input to a distributed hydrologic model for a rainfall-dominated basin (Alapaha River at Statenville, Georgia) and three snowmelt-dominated basins (Animas River at Durango, Colorado; East Fork of the Carson River near Gardnerville, Nevada; and Cle Elum River near Roslyn, Washington). For comparison purposes, spatially averaged daily data sets of precipitation and maximum and minimum temperature were developed from measured data. These datasets included precipitation and temperature data for all stations that are located within the area of the RegCM2 model output used for each basin, but excluded station data used to calibrate the hydrologic model. Both the RegCM2 output and station data capture the gross aspects of the seasonal cycles of precipitation and temperature. However, in all four basins, the RegCM2- and station-based simulations of runoff show little skill on a daily basis (Nash-Sutcliffe (NS) values ranging from 0.05-0.37 for RegCM2 and -0.08-0.65 for station). When the precipitation and temperature biases are corrected in the RegCM2 output and station data sets (Bias-RegCM2 and Bias-station, respectively) the accuracy of the daily runoff simulations improve dramatically for the snowmelt-dominated basins. In the rainfall-dominated basin, runoff simulations based on the Bias-RegCM2 output show no skill (NS value of 0.09) whereas Bias-All simulated runoff improves (NS value improved from -0.08 to 0.72). These results indicate that the resolution of the RegCM2 output is appropriate for basin-scale modeling, but RegCM2 model output does not contain the day-to-day variability needed for basin-scale modeling in rainfall-dominated basins. Future work is warranted to identify the causes for systematic biases in RegCM2 simulations, develop methods to remove the biases, and improve RegCM2 simulations of daily variability in local climate.
ClimateNet: A Machine Learning dataset for Climate Science Research
NASA Astrophysics Data System (ADS)
Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.
2017-12-01
Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.
NASA Astrophysics Data System (ADS)
Xu, Z.; Rhoades, A.; Johansen, H.; Ullrich, P. A.; Collins, W. D.
2017-12-01
Dynamical downscaling is widely used to properly characterize regional surface heterogeneities that shape the local hydroclimatology. However, the factors in dynamical downscaling, including the refinement of model horizontal resolution, large-scale forcing datasets and dynamical cores, have not been fully evaluated. Two cutting-edge global-to-regional downscaling methods are used to assess these, specifically the variable-resolution Community Earth System Model (VR-CESM) and the Weather Research & Forecasting (WRF) regional climate model, under different horizontal resolutions (28, 14, and 7 km). Two groups of WRF simulations are driven by either the NCEP reanalysis dataset (WRF_NCEP) or VR-CESM outputs (WRF_VRCESM) to evaluate the effects of the large-scale forcing datasets. The impacts of dynamical core are assessed by comparing the VR-CESM simulations to the coupled WRF_VRCESM simulations with the same physical parameterizations and similar grid domains. The simulated hydroclimatology (i.e., total precipitation, snow cover, snow water equivalent and surface temperature) are compared with the reference datasets. The large-scale forcing datasets are critical to the WRF simulations in more accurately simulating total precipitation, SWE and snow cover, but not surface temperature. Both the WRF and VR-CESM results highlight that no significant benefit is found in the simulated hydroclimatology by just increasing horizontal resolution refinement from 28 to 7 km. Simulated surface temperature is sensitive to the choice of dynamical core. WRF generally simulates higher temperatures than VR-CESM, alleviates the systematic cold bias of DJF temperatures over the California mountain region, but overestimates the JJA temperature in California's Central Valley.
Radar Reflectivity in Wingtip-Generated Wake Vortices
NASA Technical Reports Server (NTRS)
Marshall, Robert E.; Mudukutore, Ashok; Wissel, Vicki
1997-01-01
This report documents new predictive models of radar reflectivity, with meter-scale resolution, for aircraft wakes in clear air and fog. The models result from a radar design program to locate and quantify wake vortices from commercial aircraft in support of the NASA Aircraft Vortex Spacing System (AVOSS). The radar reflectivity model for clear air assumes: 1) turbulent eddies in the wake produce small discontinuities in radar refractive index; and 2) these turbulent eddies are in the 'inertial subrange' of turbulence. From these assumptions, the maximum radar frequency for detecting a particular aircraft wake, as well as the refractive index structure constant and radar volume reflectivity in the wake can be obtained from the NASA Terminal Area Simulation System (TASS) output. For fog conditions, an empirical relationship is used to calculate radar reflectivity factor from TASS output of bulk liquid water. Currently, two models exist: 1) Atlas-based on observations of liquid water and radar reflectivity factor in clouds; and 2) de Wolf- specifically tailored to a specific measured dataset (1992 Vandenberg Air Force Base).
PROC IRT: A SAS Procedure for Item Response Theory
Matlock Cole, Ki; Paek, Insu
2017-01-01
This article reviews the procedure for item response theory (PROC IRT) procedure in SAS/STAT 14.1 to conduct item response theory (IRT) analyses of dichotomous and polytomous datasets that are unidimensional or multidimensional. The review provides an overview of available features, including models, estimation procedures, interfacing, input, and output files. A small-scale simulation study evaluates the IRT model parameter recovery of the PROC IRT procedure. The use of the IRT procedure in Statistical Analysis Software (SAS) may be useful for researchers who frequently utilize SAS for analyses, research, and teaching.
Earth-Science Data Co-Locating Tool
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Pan, Lei; Block, Gary L.
2012-01-01
This software is used to locate Earth-science satellite data and climate-model analysis outputs in space and time. This enables the direct comparison of any set of data with different spatial and temporal resolutions. It is written in three separate modules that are clearly separated for their functionality and interface with other modules. This enables a fast development of supporting any new data set. In this updated version of the tool, several new front ends are developed for new products. This software finds co-locatable data pairs for given sets of data products and creates new data products that share the same spatial and temporal coordinates. This facilitates the direct comparison between the two heterogeneous datasets and the comprehensive and synergistic use of the datasets.
Climatological Impact of Atmospheric River Based on NARCCAP and DRI-RCM Datasets
NASA Astrophysics Data System (ADS)
Mejia, J. F.; Perryman, N. M.
2012-12-01
This study evaluates spatial responses of extreme precipitation environments, typically associated with Atmospheric River events, using Regional Climate Model (RCM) output from NARCCAP dataset (50km grid size) and the Desert Research Institute-RCM simulations (36 and 12 km grid size). For this study, a pattern-detection algorithm was developed to characterize Atmospheric Rivers (ARs)-like features from climate models. Topological analysis of the enhanced elongated moisture flux (500-300hPa; daily means) cores is used to objectively characterize such AR features in two distinct groups: (i) zonal, north Pacific ARs, and (ii) subtropical ARs, also known as "Pineapple Express" events. We computed the climatological responses of the different RCMs upon these two AR groups, from which intricate differences among RCMs stand out. This study presents these climatological responses from historical and scenario driven simulations, as well as implications for precipitation extreme-value analyses.
Innovative use of self-organising maps (SOMs) in model validation.
NASA Astrophysics Data System (ADS)
Jolly, Ben; McDonald, Adrian; Coggins, Jack
2016-04-01
We present an innovative combination of techniques for validation of numerical weather prediction (NWP) output against both observations and reanalyses using two classification schemes, demonstrated by a validation of the operational NWP 'AMPS' (the Antarctic Mesoscale Prediction System). Historically, model validation techniques have centred on case studies or statistics at various time scales (yearly/seasonal/monthly). Within the past decade the latter technique has been expanded by the addition of classification schemes in place of time scales, allowing more precise analysis. Classifications are typically generated for either the model or the observations, then used to create composites for both which are compared. Our method creates and trains a single self-organising map (SOM) on both the model output and observations, which is then used to classify both datasets using the same class definitions. In addition to the standard statistics on class composites, we compare the classifications themselves between the model and the observations. To add further context to the area studied, we use the same techniques to compare the SOM classifications with regimes developed for another study to great effect. The AMPS validation study compares model output against surface observations from SNOWWEB and existing University of Wisconsin-Madison Antarctic Automatic Weather Stations (AWS) during two months over the austral summer of 2014-15. Twelve SOM classes were defined in a '4 x 3' pattern, trained on both model output and observations of 2 m wind components, then used to classify both training datasets. Simple statistics (correlation, bias and normalised root-mean-square-difference) computed for SOM class composites showed that AMPS performed well during extreme weather events, but less well during lighter winds and poorly during the more changeable conditions between either extreme. Comparison of the classification time-series showed that, while correlations were lower during lighter wind periods, AMPS actually forecast the existence of those periods well suggesting that the correlations may be unfairly low. Further investigation showed poor temporal alignment during more changeable conditions, highlighting problems AMPS has around the exact timing of events. There was also a tendency for AMPS to over-predict certain wind flow patterns at the expense of others. In order to gain a larger scale perspective, we compared our mesoscale SOM classification time-series with synoptic scale regimes developed by another study using ERA-Interim reanalysis output and k-means clustering. There was good alignment between the regimes and the observations classifications (observations/regimes), highlighting the effect of synoptic scale forcing on the area. However, comparing the alignment between observations/regimes and AMPS/regimes showed that AMPS may have problems accurately resolving the strength and location of cyclones in the Ross Sea to the north of the target area.
An application of hybrid downscaling model to forecast summer precipitation at stations in China
NASA Astrophysics Data System (ADS)
Liu, Ying; Fan, Ke
2014-06-01
A pattern prediction hybrid downscaling method was applied to predict summer (June-July-August) precipitation at China 160 stations. The predicted precipitation from the downscaling scheme is available one month before. Four predictors were chosen to establish the hybrid downscaling scheme. The 500-hPa geopotential height (GH5) and 850-hPa specific humidity (q85) were from the skillful predicted output of three DEMETER (Development of a European Multi-model Ensemble System for Seasonal to Interannual Prediction) general circulation models (GCMs). The 700-hPa geopotential height (GH7) and sea level pressure (SLP) were from reanalysis datasets. The hybrid downscaling scheme (HD-4P) has better prediction skill than a conventional statistical downscaling model (SD-2P) which contains two predictors derived from the output of GCMs, although two downscaling schemes were performed to improve the seasonal prediction of summer rainfall in comparison with the original output of the DEMETER GCMs. In particular, HD-4P downscaling predictions showed lower root mean square errors than those based on the SD-2P model. Furthermore, the HD-4P downscaling model reproduced the China summer precipitation anomaly centers more accurately than the scenario of the SD-2P model in 1998. A hybrid downscaling prediction should be effective to improve the prediction skill of summer rainfall at stations in China.
On representation of temporal variability in electricity capacity planning models
Merrick, James H.
2016-08-23
This study systematically investigates how to represent intra-annual temporal variability in models of optimum electricity capacity investment. Inappropriate aggregation of temporal resolution can introduce substantial error into model outputs and associated economic insight. The mechanisms underlying the introduction of this error are shown. How many representative periods are needed to fully capture the variability is then investigated. For a sample dataset, a scenario-robust aggregation of hourly (8760) resolution is possible in the order of 10 representative hours when electricity demand is the only source of variability. The inclusion of wind and solar supply variability increases the resolution of the robustmore » aggregation to the order of 1000. A similar scale of expansion is shown for representative days and weeks. These concepts can be applied to any such temporal dataset, providing, at the least, a benchmark that any other aggregation method can aim to emulate. Finally, how prior information about peak pricing hours can potentially reduce resolution further is also discussed.« less
On representation of temporal variability in electricity capacity planning models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merrick, James H.
This study systematically investigates how to represent intra-annual temporal variability in models of optimum electricity capacity investment. Inappropriate aggregation of temporal resolution can introduce substantial error into model outputs and associated economic insight. The mechanisms underlying the introduction of this error are shown. How many representative periods are needed to fully capture the variability is then investigated. For a sample dataset, a scenario-robust aggregation of hourly (8760) resolution is possible in the order of 10 representative hours when electricity demand is the only source of variability. The inclusion of wind and solar supply variability increases the resolution of the robustmore » aggregation to the order of 1000. A similar scale of expansion is shown for representative days and weeks. These concepts can be applied to any such temporal dataset, providing, at the least, a benchmark that any other aggregation method can aim to emulate. Finally, how prior information about peak pricing hours can potentially reduce resolution further is also discussed.« less
Enhancing e-waste estimates: improving data quality by multivariate Input-Output Analysis.
Wang, Feng; Huisman, Jaco; Stevels, Ab; Baldé, Cornelis Peter
2013-11-01
Waste electrical and electronic equipment (or e-waste) is one of the fastest growing waste streams, which encompasses a wide and increasing spectrum of products. Accurate estimation of e-waste generation is difficult, mainly due to lack of high quality data referred to market and socio-economic dynamics. This paper addresses how to enhance e-waste estimates by providing techniques to increase data quality. An advanced, flexible and multivariate Input-Output Analysis (IOA) method is proposed. It links all three pillars in IOA (product sales, stock and lifespan profiles) to construct mathematical relationships between various data points. By applying this method, the data consolidation steps can generate more accurate time-series datasets from available data pool. This can consequently increase the reliability of e-waste estimates compared to the approach without data processing. A case study in the Netherlands is used to apply the advanced IOA model. As a result, for the first time ever, complete datasets of all three variables for estimating all types of e-waste have been obtained. The result of this study also demonstrates significant disparity between various estimation models, arising from the use of data under different conditions. It shows the importance of applying multivariate approach and multiple sources to improve data quality for modelling, specifically using appropriate time-varying lifespan parameters. Following the case study, a roadmap with a procedural guideline is provided to enhance e-waste estimation studies. Copyright © 2013 Elsevier Ltd. All rights reserved.
A machine learning pipeline for automated registration and classification of 3D lidar data
NASA Astrophysics Data System (ADS)
Rajagopal, Abhejit; Chellappan, Karthik; Chandrasekaran, Shivkumar; Brown, Andrew P.
2017-05-01
Despite the large availability of geospatial data, registration and exploitation of these datasets remains a persis- tent challenge in geoinformatics. Popular signal processing and machine learning algorithms, such as non-linear SVMs and neural networks, rely on well-formatted input models as well as reliable output labels, which are not always immediately available. In this paper we outline a pipeline for gathering, registering, and classifying initially unlabeled wide-area geospatial data. As an illustrative example, we demonstrate the training and test- ing of a convolutional neural network to recognize 3D models in the OGRIP 2007 LiDAR dataset using fuzzy labels derived from OpenStreetMap as well as other datasets available on OpenTopography.org. When auxiliary label information is required, various text and natural language processing filters are used to extract and cluster keywords useful for identifying potential target classes. A subset of these keywords are subsequently used to form multi-class labels, with no assumption of independence. Finally, we employ class-dependent geometry extraction routines to identify candidates from both training and testing datasets. Our regression networks are able to identify the presence of 6 structural classes, including roads, walls, and buildings, in volumes as big as 8000 m3 in as little as 1.2 seconds on a commodity 4-core Intel CPU. The presented framework is neither dataset nor sensor-modality limited due to the registration process, and is capable of multi-sensor data-fusion.
NASA Technical Reports Server (NTRS)
Elshorbany, Yasin F.; Duncan, Bryan N.; Strode, Sarah A.; Wang, James S.; Kouatchou, Jules
2015-01-01
We present the Efficient CH4-CO-OH Module (ECCOH) that allows for the simulation of the methane, carbon monoxide and hydroxyl radical (CH4-CO-OH cycle, within a chemistry climate model, carbon cycle model, or earth system model. The computational efficiency of the module allows many multi-decadal, sensitivity simulations of the CH4-CO-OH cycle, which primarily determines the global tropospheric oxidizing capacity. This capability is important for capturing the nonlinear feedbacks of the CH4-CO-OH system and understanding the perturbations to relatively long-lived methane and the concomitant impacts on climate. We implemented the ECCOH module into the NASA GEOS-5 Atmospheric Global Circulation Model (AGCM), performed multiple sensitivity simulations of the CH4-CO-OH system over two decades, and evaluated the model output with surface and satellite datasets of methane and CO. The favorable comparison of output from the ECCOH module (as configured in the GEOS-5 AGCM) with observations demonstrates the fidelity of the module for use in scientific research.
Scaling up: What coupled land-atmosphere models can tell us about critical zone processes
NASA Astrophysics Data System (ADS)
FitzGerald, K. A.; Masarik, M. T.; Rudisill, W. J.; Gelb, L.; Flores, A. N.
2017-12-01
A significant limitation to extending our knowledge of critical zone (CZ) evolution and function is a lack of hydrometeorological information at sufficiently fine spatial and temporal resolutions to resolve topo-climatic gradients and adequate spatial and temporal extent to capture a range of climatic conditions across ecoregions. Research at critical zone observatories (CZOs) suggests hydrometeorological stores and fluxes exert key controls on processes such as hydrologic partitioning and runoff generation, landscape evolution, soil formation, biogeochemical cycling, and vegetation dynamics. However, advancing fundamental understanding of CZ processes necessitates understanding how hydrometeorological drivers vary across space and time. As a result of recent advances in computational capabilities it has become possible, although still computationally expensive, to simulate hydrometeorological conditions via high resolution coupled land-atmosphere models. Using the Weather Research and Forecasting (WRF) model, we developed a high spatiotemporal resolution dataset extending from water year 1987 to present for the Snake River Basin in the northwestern USA including the Reynolds Creek and Dry Creek Experimental Watersheds, both part of the Reynolds Creek CZO, as well as a range of other ecosystems including shrubland desert, montane forests, and alpine tundra. Drawing from hypotheses generated by work at these sites and across the CZO network, we use the resulting dataset in combination with CZO observations and publically available datasets to provide insights regarding hydrologic partitioning, vegetation distribution, and erosional processes. This dataset provides key context in interpreting and reconciling what observations obtained at particular sites reveal about underlying CZ structure and function. While this dataset does not extend to future climates, the same modeling framework can be used to dynamically downscale coarse global climate model output to scales relevant to CZ processes. This presents an opportunity to better characterize the impact of climate change on the CZ. We also argue that opportunities exist beyond the one way flow of information and that what we learn at CZOs has the potential to contribute significantly to improved Earth system models.
"One-Stop Shopping" for Ocean Remote-Sensing and Model Data
NASA Technical Reports Server (NTRS)
Li, P. Peggy; Vu, Quoc; Chao, Yi; Li, Zhi-Jin; Choi, Jei-Kook
2006-01-01
OurOcean Portal 2.0 (http:// ourocean.jpl.nasa.gov) is a software system designed to enable users to easily gain access to ocean observation data, both remote-sensing and in-situ, configure and run an Ocean Model with observation data assimilated on a remote computer, and visualize both the observation data and the model outputs. At present, the observation data and models focus on the California coastal regions and Prince William Sound in Alaska. This system can be used to perform both real-time and retrospective analyses of remote-sensing data and model outputs. OurOcean Portal 2.0 incorporates state-of-the-art information technologies (IT) such as MySQL database, Java Web Server (Apache/Tomcat), Live Access Server (LAS), interactive graphics with Java Applet at the Client site and MatLab/GMT at the server site, and distributed computing. OurOcean currently serves over 20 real-time or historical ocean data products. The data are served in pre-generated plots or their native data format. For some of the datasets, users can choose different plotting parameters and produce customized graphics. OurOcean also serves 3D Ocean Model outputs generated by ROMS (Regional Ocean Model System) using LAS. The Live Access Server (LAS) software, developed by the Pacific Marine Environmental Laboratory (PMEL) of the National Oceanic and Atmospheric Administration (NOAA), is a configurable Web-server program designed to provide flexible access to geo-referenced scientific data. The model output can be views as plots in horizontal slices, depth profiles or time sequences, or can be downloaded as raw data in different data formats, such as NetCDF, ASCII, Binary, etc. The interactive visualization is provided by graphic software, Ferret, also developed by PMEL. In addition, OurOcean allows users with minimal computing resources to configure and run an Ocean Model with data assimilation on a remote computer. Users may select the forcing input, the data to be assimilated, the simulation period, and the output variables and submit the model to run on a backend parallel computer. When the run is complete, the output will be added to the LAS server for
NASA Astrophysics Data System (ADS)
Wen, Xiaohang; Dong, Wenjie; Yuan, Wenping; Zheng, Zhiyuan
For better prediction and understanding of land-atmospheric interaction, in-situ observed meteorological data acquired from the China Meteorological Administration (CMA) were assimilated in the Weather Research and Forecasting (WRF) model and the monthly Green Vegetation Coverage (GVF) data, which was calculated using the Normalized Difference Vegetation Index (NDVI) of the Earth Observing System Moderate-Resolution Imaging Spectroradiometer (EOS-MODIS) and Digital Elevation Model (DEM) data of the Shuttle Radar Topography Mission (SRTM) system. Furthermore, the WRF model produced a High-Resolution Assimilation Dataset of the water-energy cycle in China (HRADC). This dataset has a horizontal resolution of 25 km for near surface meteorological data, such as air temperature, humidity, wind vectors and pressure (19 levels); soil temperature and moisture (four levels); surface temperature; downward/upward short/long radiation; 3-h latent heat flux; sensible heat flux; and ground heat flux. In this study, we 1) briefly introduce the cycling 3D-Var assimilation method and 2) compare results of meteorological elements, such as 2 m temperature and precipitation generated by the HRADC with the gridded observation data from CMA, and surface temperature and specific humidity with Global Land Data Assimilation System (GLDAS) output data from the National Aeronautics and Space Administration (NASA). We find that the simulated results of monthly 2 m temperature from HRADC is improved compared with the control simulation and has effectively reproduced the observed patterns. The simulated special distribution of ground surface temperature and specific humidity from HRADC are much closer to GLDAS outputs. The spatial distribution of root mean square errors (RMSE) and bias of 2 m temperature between observations and HRADC is reduced compared with the bias between observations and the control run. The monthly spatial distribution of surface temperature and specific humidity from HRADC is consistent with the GLDAS outputs over China. This study could improve the land surface parameters by utilizing remote sensing data and could further improve atmospheric elements with a data assimilation system. This work provides an effective attempt at combining multi-source data with different spatial and temporal scales into numerical simulations, and the simulated results could be used in further research on the long-term climatic effects and characteristics of the water-energy cycle over China.
NASA Astrophysics Data System (ADS)
Ichii, K.; Suzuki, T.; Kato, T.; Ito, A.; Hajima, T.; Ueyama, M.; Sasai, T.; Hirata, R.; Saigusa, N.; Ohtani, Y.; Takagi, K.
2010-07-01
Terrestrial biosphere models show large differences when simulating carbon and water cycles, and reducing these differences is a priority for developing more accurate estimates of the condition of terrestrial ecosystems and future climate change. To reduce uncertainties and improve the understanding of their carbon budgets, we investigated the utility of the eddy flux datasets to improve model simulations and reduce variabilities among multi-model outputs of terrestrial biosphere models in Japan. Using 9 terrestrial biosphere models (Support Vector Machine - based regressions, TOPS, CASA, VISIT, Biome-BGC, DAYCENT, SEIB, LPJ, and TRIFFID), we conducted two simulations: (1) point simulations at four eddy flux sites in Japan and (2) spatial simulations for Japan with a default model (based on original settings) and a modified model (based on model parameter tuning using eddy flux data). Generally, models using default model settings showed large deviations in model outputs from observation with large model-by-model variability. However, after we calibrated the model parameters using eddy flux data (GPP, RE and NEP), most models successfully simulated seasonal variations in the carbon cycle, with less variability among models. We also found that interannual variations in the carbon cycle are mostly consistent among models and observations. Spatial analysis also showed a large reduction in the variability among model outputs. This study demonstrated that careful validation and calibration of models with available eddy flux data reduced model-by-model differences. Yet, site history, analysis of model structure changes, and more objective procedure of model calibration should be included in the further analysis.
EFEHR - the European Facilities for Earthquake Hazard and Risk: beyond the web-platform
NASA Astrophysics Data System (ADS)
Danciu, Laurentiu; Wiemer, Stefan; Haslinger, Florian; Kastli, Philipp; Giardini, Domenico
2017-04-01
European Facilities for Earthquake Hazard and Risk (EEFEHR) represents the sustainable community resource for seismic hazard and risk in Europe. The EFEHR web platform is the main gateway to access data, models and tools as well as provide expertise relevant for assessment of seismic hazard and risk. The main services (databases and web-platform) are hosted at ETH Zurich and operated by the Swiss Seismological Service (Schweizerischer Erdbebendienst SED). EFEHR web-portal (www.efehr.org) collects and displays (i) harmonized datasets necessary for hazard and risk modeling, e.g. seismic catalogues, fault compilations, site amplifications, vulnerabilities, inventories; (ii) extensive seismic hazard products, namely hazard curves, uniform hazard spectra and maps for national and regional assessments. (ii) standardized configuration files for re-computing the regional seismic hazard models; (iv) relevant documentation of harmonized datasets, models and web-services. Today, EFEHR distributes full output of the 2013 European Seismic Hazard Model, ESHM13, as developed within the SHARE project (http://www.share-eu.org/); the latest results of the 2014 Earthquake Model of the Middle East (EMME14), derived within the EMME Project (www.emme-gem.org); the 2001 Global Seismic Hazard Assessment Project (GSHAP) results and the 2015 updates of the Swiss Seismic Hazard. New datasets related to either seismic hazard or risk will be incorporated as they become available. We present the currents status of the EFEHR platform, with focus on the challenges, summaries of the up-to-date datasets, user experience and feedback, as well as the roadmap to future technological innovation beyond the web-platform development. We also show the new services foreseen to fully integrate with the seismological core services of European Plate Observing System (EPOS).
Educational and Scientific Applications of Climate Model Diagnostic Analyzer
NASA Astrophysics Data System (ADS)
Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Kubar, T. L.; Zhang, J.; Bao, Q.
2016-12-01
Climate Model Diagnostic Analyzer (CMDA) is a web-based information system designed for the climate modeling and model analysis community to analyze climate data from models and observations. CMDA provides tools to diagnostically analyze climate data for model validation and improvement, and to systematically manage analysis provenance for sharing results with other investigators. CMDA utilizes cloud computing resources, multi-threading computing, machine-learning algorithms, web service technologies, and provenance-supporting technologies to address technical challenges that the Earth science modeling and model analysis community faces in evaluating and diagnosing climate models. As CMDA infrastructure and technology have matured, we have developed the educational and scientific applications of CMDA. Educationally, CMDA supported the summer school of the JPL Center for Climate Sciences for three years since 2014. In the summer school, the students work on group research projects where CMDA provide datasets and analysis tools. Each student is assigned to a virtual machine with CMDA installed in Amazon Web Services. A provenance management system for CMDA is developed to keep track of students' usages of CMDA, and to recommend datasets and analysis tools for their research topic. The provenance system also allows students to revisit their analysis results and share them with their group. Scientifically, we have developed several science use cases of CMDA covering various topics, datasets, and analysis types. Each use case developed is described and listed in terms of a scientific goal, datasets used, the analysis tools used, scientific results discovered from the use case, an analysis result such as output plots and data files, and a link to the exact analysis service call with all the input arguments filled. For example, one science use case is the evaluation of NCAR CAM5 model with MODIS total cloud fraction. The analysis service used is Difference Plot Service of Two Variables, and the datasets used are NCAR CAM total cloud fraction and MODIS total cloud fraction. The scientific highlight of the use case is that the CAM5 model overall does a fairly decent job at simulating total cloud cover, though simulates too few clouds especially near and offshore of the eastern ocean basins where low clouds are dominant.
Artificial intelligence (AI) systems for interpreting complex medical datasets.
Altman, R B
2017-05-01
Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.
Water Balance in the Amazon Basin from a Land Surface Model Ensemble
NASA Technical Reports Server (NTRS)
Getirana, Augusto C. V.; Dutra, Emanuel; Guimberteau, Matthieu; Kam, Jonghun; Li, Hong-Yi; Decharme, Bertrand; Zhang, Zhengqiu; Ducharne, Agnes; Boone, Aaron; Balsamo, Gianpaolo;
2014-01-01
Despite recent advances in land surfacemodeling and remote sensing, estimates of the global water budget are still fairly uncertain. This study aims to evaluate the water budget of the Amazon basin based on several state-ofthe- art land surface model (LSM) outputs. Water budget variables (terrestrial water storage TWS, evapotranspiration ET, surface runoff R, and base flow B) are evaluated at the basin scale using both remote sensing and in situ data. Meteorological forcings at a 3-hourly time step and 18 spatial resolution were used to run 14 LSMs. Precipitation datasets that have been rescaled to matchmonthly Global Precipitation Climatology Project (GPCP) andGlobal Precipitation Climatology Centre (GPCC) datasets and the daily Hydrologie du Bassin de l'Amazone (HYBAM) dataset were used to perform three experiments. The Hydrological Modeling and Analysis Platform (HyMAP) river routing scheme was forced with R and B and simulated discharges are compared against observations at 165 gauges. Simulated ET and TWS are compared against FLUXNET and MOD16A2 evapotranspiration datasets andGravity Recovery and ClimateExperiment (GRACE)TWSestimates in two subcatchments of main tributaries (Madeira and Negro Rivers).At the basin scale, simulated ET ranges from 2.39 to 3.26 mm day(exp -1) and a low spatial correlation between ET and precipitation indicates that evapotranspiration does not depend on water availability over most of the basin. Results also show that other simulated water budget components vary significantly as a function of both the LSM and precipitation dataset, but simulated TWS generally agrees with GRACE estimates at the basin scale. The best water budget simulations resulted from experiments using HYBAM, mostly explained by a denser rainfall gauge network and the rescaling at a finer temporal scale.
How sensitive are estimates of carbon fixation in agricultural models to input data?
2012-01-01
Background Process based vegetation models are central to understand the hydrological and carbon cycle. To achieve useful results at regional to global scales, such models require various input data from a wide range of earth observations. Since the geographical extent of these datasets varies from local to global scale, data quality and validity is of major interest when they are chosen for use. It is important to assess the effect of different input datasets in terms of quality to model outputs. In this article, we reflect on both: the uncertainty in input data and the reliability of model results. For our case study analysis we selected the Marchfeld region in Austria. We used independent meteorological datasets from the Central Institute for Meteorology and Geodynamics and the European Centre for Medium-Range Weather Forecasts (ECMWF). Land cover / land use information was taken from the GLC2000 and the CORINE 2000 products. Results For our case study analysis we selected two different process based models: the Environmental Policy Integrated Climate (EPIC) and the Biosphere Energy Transfer Hydrology (BETHY/DLR) model. Both process models show a congruent pattern to changes in input data. The annual variability of NPP reaches 36% for BETHY/DLR and 39% for EPIC when changing major input datasets. However, EPIC is less sensitive to meteorological input data than BETHY/DLR. The ECMWF maximum temperatures show a systematic pattern. Temperatures above 20°C are overestimated, whereas temperatures below 20°C are underestimated, resulting in an overall underestimation of NPP in both models. Besides, BETHY/DLR is sensitive to the choice and accuracy of the land cover product. Discussion This study shows that the impact of input data uncertainty on modelling results need to be assessed: whenever the models are applied under new conditions, local data should be used for both input and result comparison. PMID:22296931
NASA Technical Reports Server (NTRS)
Quattrochi, Dale A.; Estes, Maurice G., Jr.; Crosson, William L.; Khan, Maudood N.
2006-01-01
The Atlanta Urban Heat Island and Air Quality Project had its genesis in Project ATLANTA (ATlanta Land use Analysis: Temperature and Air quality) that began in 1996. Project ATLANTA examined how high-spatial resolution thermal remote sensing data could be used to derive better measurements of the Urban Heat Island effect over Atlanta. We have explored how these thermal remote sensing, as well as other imaged datasets, can be used to better characterize the urban landscape for improved air quality modeling over the Atlanta area. For the air quality modeling project, the National Land Cover Dataset and the local scale Landpro99 dataset at 30m spatial resolutions have been used to derive land use/land cover characteristics for input into the MM5 mesoscale meteorological model that is one of the foundations for the Community Multiscale Air Quality (CMAQ) model to assess how these data can improve output from CMAQ. Additionally, land use changes to 2030 have been predicted using a Spatial Growth Model (SGM). SGM simulates growth around a region using population, employment and travel demand forecasts. Air quality modeling simulations were conducted using both current and future land cover. Meteorological modeling simulations indicate a 0.5 C increase in daily maximum air temperatures by 2030. Air quality modeling simulations show substantial differences in relative contributions of individual atmospheric pollutant constituents as a result of land cover change. Enhanced boundary layer mixing over the city tends to offset the increase in ozone concentration expected due to higher surface temperatures as a result of urbanization.
Development and application of GIS-based PRISM integration through a plugin approach
NASA Astrophysics Data System (ADS)
Lee, Woo-Seop; Chun, Jong Ahn; Kang, Kwangmin
2014-05-01
A PRISM (Parameter-elevation Regressions on Independent Slopes Model) QGIS-plugin was developed on Quantum GIS platform in this study. This Quantum GIS plugin system provides user-friendly graphic user interfaces (GUIs) so that users can obtain gridded meteorological data of high resolutions (1 km × 1 km). Also, this software is designed to run on a personal computer so that it does not require an internet access or a sophisticated computer system. This module is a user-friendly system that a user can generate PRISM data with ease. The proposed PRISM QGIS-plugin is a hybrid statistical-geographic model system that uses coarse resolution datasets (APHRODITE datasets in this study) with digital elevation data to generate the fine-resolution gridded precipitation. To validate the performance of the software, Prek Thnot River Basin in Kandal, Cambodia is selected for application. Overall statistical analysis shows promising outputs generated by the proposed plugin. Error measures such as RMSE (Root Mean Square Error) and MAPE (Mean Absolute Percentage Error) were used to evaluate the performance of the developed PRISM QGIS-plugin. Evaluation results using RMSE and MAPE were 2.76 mm and 4.2%, respectively. This study suggested that the plugin can be used to generate high resolution precipitation datasets for hydrological and climatological studies at a watershed where observed weather datasets are limited.
Application Perspective of 2D+SCALE Dimension
NASA Astrophysics Data System (ADS)
Karim, H.; Rahman, A. Abdul
2016-09-01
Different applications or users need different abstraction of spatial models, dimensionalities and specification of their datasets due to variations of required analysis and output. Various approaches, data models and data structures are now available to support most current application models in Geographic Information System (GIS). One of the focuses trend in GIS multi-dimensional research community is the implementation of scale dimension with spatial datasets to suit various scale application needs. In this paper, 2D spatial datasets that been scaled up as the third dimension are addressed as 2D+scale (or 3D-scale) dimension. Nowadays, various data structures, data models, approaches, schemas, and formats have been proposed as the best approaches to support variety of applications and dimensionality in 3D topology. However, only a few of them considers the element of scale as their targeted dimension. As the scale dimension is concerned, the implementation approach can be either multi-scale or vario-scale (with any available data structures and formats) depending on application requirements (topology, semantic and function). This paper attempts to discuss on the current and new potential applications which positively could be integrated upon 3D-scale dimension approach. The previous and current works on scale dimension as well as the requirements to be preserved for any given applications, implementation issues and future potential applications forms the major discussion of this paper.
Distributive On-line Processing, Visualization and Analysis System for Gridded Remote Sensing Data
NASA Technical Reports Server (NTRS)
Leptoukh, G.; Berrick, S.; Liu, Z.; Pham, L.; Rui, H.; Shen, S.; Teng, W.; Zhu, T.
2004-01-01
The ability to use data stored in the current Earth Observing System (EOS) archives for studying regional or global phenomena is highly dependent on having a detailed understanding of the data's internal structure and physical implementation. Gaining this understanding and applying it to data reduction is a time- consuming task that must be undertaken before the core investigation can begin. This is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets that are usually of different formats, structures, and resolutions, for example, when preparing data for input into modeling systems. The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) has taken a major step towards meeting this challenge by developing an infrastructure with a Web interface that allows users to perform interactive analysis online without downloading any data, the GES-DISC Interactive Online Visualization and Analysis Infrastructure or "Giovanni." Giovanni provides interactive, online, analysis tools for data users to facilitate their research. There have been several instances of this interface created to serve TRMM users, Aerosol scientists, Ocean Color and Agriculture applications users. The first generation of these tools support gridded data only. The user selects geophysical parameters, area of interest, time period; and the system generates an output on screen in a matter of seconds. The currently available output options are: Area plot averaged or accumulated over any available data period for any rectangular area; Time plot time series averaged over any rectangular area; Time plots image view of any longitude-time and latitude-time cross sections; ASCII output for all plot types; Image animation for area plot. In the future, we will add correlation plots, GIS-compatible outputs, etc. This allow user to focus on data content (i.e. science parameters) and eliminate the need for expensive learning, development and processing tasks that are redundantly incurred by an archive's user community. The current implementation utilizes the GrADS-DODS Server (GDS), a stable, secure data server that provides subsetting and analysis services across the Internet for any GrADS-readable dataset. The subsetting capability allows users to retrieve a specified temporal and/or spatial subdomain from a large dataset, eliminating the need to download everything simply to access a small relevant portion of a dataset. The analysis capability allows users to retrieve the results of an operation applied to one or more datasets on the server. In our case, we use this approach to read pre-processed binary files and/or to read and extract the needed parts from HDF or HDF-EOS files. These subsets then serve as inputs into GrADS processing and analysis scripts. It can be used in a wide variety of Earth science applications: climate and weather events study and monitoring; modeling. It can be easily configured for new applications.
Skimming Digits: Neuromorphic Classification of Spike-Encoded Images
Cohen, Gregory K.; Orchard, Garrick; Leng, Sio-Hoi; Tapson, Jonathan; Benosman, Ryad B.; van Schaik, André
2016-01-01
The growing demands placed upon the field of computer vision have renewed the focus on alternative visual scene representations and processing paradigms. Silicon retinea provide an alternative means of imaging the visual environment, and produce frame-free spatio-temporal data. This paper presents an investigation into event-based digit classification using N-MNIST, a neuromorphic dataset created with a silicon retina, and the Synaptic Kernel Inverse Method (SKIM), a learning method based on principles of dendritic computation. As this work represents the first large-scale and multi-class classification task performed using the SKIM network, it explores different training patterns and output determination methods necessary to extend the original SKIM method to support multi-class problems. Making use of SKIM networks applied to real-world datasets, implementing the largest hidden layer sizes and simultaneously training the largest number of output neurons, the classification system achieved a best-case accuracy of 92.87% for a network containing 10,000 hidden layer neurons. These results represent the highest accuracies achieved against the dataset to date and serve to validate the application of the SKIM method to event-based visual classification tasks. Additionally, the study found that using a square pulse as the supervisory training signal produced the highest accuracy for most output determination methods, but the results also demonstrate that an exponential pattern is better suited to hardware implementations as it makes use of the simplest output determination method based on the maximum value. PMID:27199646
Forecast first: An argument for groundwater modeling in reverse
White, Jeremy
2017-01-01
Numerical groundwater models are important compo-nents of groundwater analyses that are used for makingcritical decisions related to the management of ground-water resources. In this support role, models are oftenconstructed to serve a specific purpose that is to provideinsights, through simulation, related to a specific func-tion of a complex aquifer system that cannot be observeddirectly (Anderson et al. 2015).For any given modeling analysis, several modelinput datasets must be prepared. Herein, the datasetsrequired to simulate the historical conditions are referredto as the calibration model, and the datasets requiredto simulate the model’s purpose are referred to as theforecast model. Future groundwater conditions or otherunobserved aspects of the groundwater system may besimulated by the forecast model—the outputs of interestfrom the forecast model represent the purpose of themodeling analysis. Unfortunately, the forecast model,needed to simulate the purpose of the modeling analysis,is seemingly an afterthought—calibration is where themajority of time and effort are expended and calibrationis usually completed before the forecast model is evenconstructed. Herein, I am proposing a new groundwatermodeling workflow, referred to as the “forecast first”workflow, where the forecast model is constructed at anearlier stage in the modeling analysis and the outputsof interest from the forecast model are evaluated duringsubsequent tasks in the workflow.
OceanNOMADS: A New Distribution Node for Operational Ocean Model Output
NASA Astrophysics Data System (ADS)
Cross, S.; Vance, T.; Breckenridge, T.
2009-12-01
The NOAA National Operational Model Archive and Distribution System (NOMADS) is a distributed, web-services based project providing real-time and retrospective access to climate and weather model data and related datasets. OceanNOMADS is a new NOMADS node dedicated to ocean model and related data, with an initial focus on operational ocean models from NOAA and the U.S. Navy. The node offers data access through a Thematic Real-time Environmental Distributed Data Services (THREDDS) server via the commonly used OPeNDAP protocol. The primary server is operated by the National Coastal Data Development Center and hosted by the Northern Gulf Institute at Stennis Space Center, MS. In cooperation with the National Marine Fisheries Service and Mississippi State University (MSU), a duplicate server is being installed at MSU with a 1-gigabit connection to the National Lambda Rail. This setup will allow us to begin to quantify the benefit of high-speed data connections to scientists needing remote access to these large datasets. Work is also underway on the next generation of services from OceanNOMADS, including user-requested server-side data reformatting, regridding, and aggregation, as well as tools for model-data comparison.
Differentially private distributed logistic regression using private and public data.
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila
2014-01-01
Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kang, Shujiang; Kline, Keith L; Nair, S. Surendran
A global energy crop productivity model that provides geospatially explicit quantitative details on biomass potential and factors affecting sustainability would be useful, but does not exist now. This study describes a modeling platform capable of meeting many challenges associated with global-scale agro-ecosystem modeling. We designed an analytical framework for bioenergy crops consisting of six major components: (i) standardized natural resources datasets, (ii) global field-trial data and crop management practices, (iii) simulation units and management scenarios, (iv) model calibration and validation, (v) high-performance computing (HPC) simulation, and (vi) simulation output processing and analysis. The HPC-Environmental Policy Integrated Climate (HPC-EPIC) model simulatedmore » a perennial bioenergy crop, switchgrass (Panicum virgatum L.), estimating feedstock production potentials and effects across the globe. This modeling platform can assess soil C sequestration, net greenhouse gas (GHG) emissions, nonpoint source pollution (e.g., nutrient and pesticide loss), and energy exchange with the atmosphere. It can be expanded to include additional bioenergy crops (e.g., miscanthus, energy cane, and agave) and food crops under different management scenarios. The platform and switchgrass field-trial dataset are available to support global analysis of biomass feedstock production potential and corresponding metrics of sustainability.« less
A multifactor approach to forecasting Romanian gross domestic product (GDP) in the short run.
Armeanu, Daniel; Andrei, Jean Vasile; Lache, Leonard; Panait, Mirela
2017-01-01
The purpose of this paper is to investigate the application of a generalized dynamic factor model (GDFM) based on dynamic principal components analysis to forecasting short-term economic growth in Romania. We have used a generalized principal components approach to estimate a dynamic model based on a dataset comprising 86 economic and non-economic variables that are linked to economic output. The model exploits the dynamic correlations between these variables and uses three common components that account for roughly 72% of the information contained in the original space. We show that it is possible to generate reliable forecasts of quarterly real gross domestic product (GDP) using just the common components while also assessing the contribution of the individual variables to the dynamics of real GDP. In order to assess the relative performance of the GDFM to standard models based on principal components analysis, we have also estimated two Stock-Watson (SW) models that were used to perform the same out-of-sample forecasts as the GDFM. The results indicate significantly better performance of the GDFM compared with the competing SW models, which empirically confirms our expectations that the GDFM produces more accurate forecasts when dealing with large datasets.
A multifactor approach to forecasting Romanian gross domestic product (GDP) in the short run
Armeanu, Daniel; Lache, Leonard; Panait, Mirela
2017-01-01
The purpose of this paper is to investigate the application of a generalized dynamic factor model (GDFM) based on dynamic principal components analysis to forecasting short-term economic growth in Romania. We have used a generalized principal components approach to estimate a dynamic model based on a dataset comprising 86 economic and non-economic variables that are linked to economic output. The model exploits the dynamic correlations between these variables and uses three common components that account for roughly 72% of the information contained in the original space. We show that it is possible to generate reliable forecasts of quarterly real gross domestic product (GDP) using just the common components while also assessing the contribution of the individual variables to the dynamics of real GDP. In order to assess the relative performance of the GDFM to standard models based on principal components analysis, we have also estimated two Stock-Watson (SW) models that were used to perform the same out-of-sample forecasts as the GDFM. The results indicate significantly better performance of the GDFM compared with the competing SW models, which empirically confirms our expectations that the GDFM produces more accurate forecasts when dealing with large datasets. PMID:28742100
Smith, Molly B.; Mahowald, Natalie M.; Albani, Samuel; ...
2017-03-07
Interannual variability in desert dust is widely observed and simulated, yet the sensitivity of these desert dust simulations to a particular meteorological dataset, as well as a particular model construction, is not well known. Here we use version 4 of the Community Atmospheric Model (CAM4) with the Community Earth System Model (CESM) to simulate dust forced by three different reanalysis meteorological datasets for the period 1990–2005. We then contrast the results of these simulations with dust simulated using online winds dynamically generated from sea surface temperatures, as well as with simulations conducted using other modeling frameworks but the same meteorological forcings, in order tomore » determine the sensitivity of climate model output to the specific reanalysis dataset used. For the seven cases considered in our study, the different model configurations are able to simulate the annual mean of the global dust cycle, seasonality and interannual variability approximately equally well (or poorly) at the limited observational sites available. Altogether, aerosol dust-source strength has remained fairly constant during the time period from 1990 to 2005, although there is strong seasonal and some interannual variability simulated in the models and seen in the observations over this time period. Model interannual variability comparisons to observations, as well as comparisons between models, suggest that interannual variability in dust is still difficult to simulate accurately, with averaged correlation coefficients of 0.1 to 0.6. Because of the large variability, at least 1 year of observations at most sites are needed to correctly observe the mean, but in some regions, particularly the remote oceans of the Southern Hemisphere, where interannual variability may be larger than in the Northern Hemisphere, 2–3 years of data are likely to be needed.« less
On the uncertainties associated with using gridded rainfall data as a proxy for observed
NASA Astrophysics Data System (ADS)
Tozer, C. R.; Kiem, A. S.; Verdon-Kidd, D. C.
2011-09-01
Gridded rainfall datasets are used in many hydrological and climatological studies, in Australia and elsewhere, including for hydroclimatic forecasting, climate attribution studies and climate model performance assessments. The attraction of the spatial coverage provided by gridded data is clear, particularly in Australia where the spatial and temporal resolution of the rainfall gauge network is sparse. However, the question that must be asked is whether it is suitable to use gridded data as a proxy for observed point data, given that gridded data is inherently "smoothed" and may not necessarily capture the temporal and spatial variability of Australian rainfall which leads to hydroclimatic extremes (i.e. droughts, floods)? This study investigates this question through a statistical analysis of three monthly gridded Australian rainfall datasets - the Bureau of Meteorology (BOM) dataset, the Australian Water Availability Project (AWAP) and the SILO dataset. To demonstrate the hydrological implications of using gridded data as a proxy for gauged data, a rainfall-runoff model is applied to one catchment in South Australia (SA) initially using gridded data as the source of rainfall input and then gauged rainfall data. The results indicate a markedly different runoff response associated with each of the different sources of rainfall data. It should be noted that this study does not seek to identify which gridded dataset is the "best" for Australia, as each gridded data source has its pros and cons, as does gauged or point data. Rather the intention is to quantify differences between various gridded data sources and how they compare with gauged data so that these differences can be considered and accounted for in studies that utilise these gridded datasets. Ultimately, if key decisions are going to be based on the outputs of models that use gridded data, an estimate (or at least an understanding) of the uncertainties relating to the assumptions made in the development of gridded data and how that gridded data compares with reality should be made.
Abdullah-Al-Shafi, Md; Bahar, Ali Newaz; Bhuiyan, Mohammad Maksudur Rahman; Shamim, S M; Ahmed, Kawser
2018-08-01
Quantum-dot cellular automata (QCA) as nanotechnology is a pledging contestant that has incredible prospective to substitute complementary metal-oxide-semiconductor (CMOS) because of its superior structures such as intensely high device thickness, minimal power depletion with rapid operation momentum. In this study, the dataset of average output polarization (AOP) for fundamental reversible logic circuits is organized as presented in (Abdullah-Al-Shafi and Bahar, 2017; Bahar et al., 2016; Abdullah-Al-Shafi et al., 2015; Abdullah-Al-Shafi, 2016) [1-4]. QCADesigner version 2.0.3 has been utilized to survey the AOP of reversible circuits at separate temperature point in Kelvin (K) unit.
Antarctic ice shelf thickness from CryoSat-2 radar altimetry
NASA Astrophysics Data System (ADS)
Chuter, Stephen; Bamber, Jonathan
2016-04-01
The Antarctic ice shelves provide buttressing to the inland grounded ice sheet, and therefore play a controlling role in regulating ice dynamics and mass imbalance. Accurate knowledge of ice shelf thickness is essential for input-output method mass balance calculations, sub-ice shelf ocean models and buttressing parameterisations in ice sheet models. Ice shelf thickness has previously been inferred from satellite altimetry elevation measurements using the assumption of hydrostatic equilibrium, as direct measurements of ice thickness do not provide the spatial coverage necessary for these applications. The sensor limitations of previous radar altimeters have led to poor data coverage and a lack of accuracy, particularly the grounding zone where a break in slope exists. We present a new ice shelf thickness dataset using four years (2011-2014) of CryoSat-2 elevation measurements, with its SARIn dual antennae mode of operation alleviating the issues affecting previous sensors. These improvements and the dense across track spacing of the satellite has resulted in ˜92% coverage of the ice shelves, with substantial improvements, for example, of over 50% across the Venable and Totten Ice Shelves in comparison to the previous dataset. Significant improvements in coverage and accuracy are also seen south of 81.5° for the Ross and Filchner-Ronne Ice Shelves. Validation of the surface elevation measurements, used to derive ice thickness, against NASA ICESat laser altimetry data shows a mean bias of less than 1 m (equivalent to less than 9 m in ice thickness) and a fourfold decrease in standard deviation in comparison to the previous continental dataset. Importantly, the most substantial improvements are found in the grounding zone. Validation of the derived thickness data has been carried out using multiple Radio Echo Sounding (RES) campaigns across the continent. Over the Amery ice shelf, where extensive RES measurements exist, the mean difference between the datasets is 3.3% and 4.7% across the whole shelf and within 10 km of the grounding line, respectively. These represent a two to three fold improvement in accuracy when compared to the previous data product. The impact of these improvements on Input-Output estimates of mass balance is illustrated for the Abbot Ice Shelf. Our new product shows a mean reduction of 29% in thickness at the grounding line when compared to the previous dataset as well as the elimination of non-physical 'data spikes' that were prevalent in the previous product in areas of complex terrain. The reduction in grounding line thickness equates to a change in mass balance for the areas from -14±9 GTyr-1to -4±9 GTyr-1. We show examples from other sectors including the Getz and George VI ice shelves. The updated estimate is more consistent with the positive surface elevation rate in this region obtained from satellite altimetry. The new thickness dataset will greatly reduce the uncertainty in Input-Output estimates of mass balance for the ˜30% of the grounding line of Antarctica where direct ice thickness measurements do not exist.
Multilevel Modeling in Psychosomatic Medicine Research
Myers, Nicholas D.; Brincks, Ahnalee M.; Ames, Allison J.; Prado, Guillermo J.; Penedo, Frank J.; Benedict, Catherine
2012-01-01
The primary purpose of this manuscript is to provide an overview of multilevel modeling for Psychosomatic Medicine readers and contributors. The manuscript begins with a general introduction to multilevel modeling. Multilevel regression modeling at two-levels is emphasized because of its prevalence in psychosomatic medicine research. Simulated datasets based on some core ideas from the Familias Unidas effectiveness study are used to illustrate key concepts including: communication of model specification, parameter interpretation, sample size and power, and missing data. Input and key output files from Mplus and SAS are provided. A cluster randomized trial with repeated measures (i.e., three-level regression model) is then briefly presented with simulated data based on some core ideas from a cognitive behavioral stress management intervention in prostate cancer. PMID:23107843
NASA Astrophysics Data System (ADS)
Amesbury, Matthew J.; Swindles, Graeme T.; Bobrov, Anatoly; Charman, Dan J.; Holden, Joseph; Lamentowicz, Mariusz; Mallon, Gunnar; Mazei, Yuri; Mitchell, Edward A. D.; Payne, Richard J.; Roland, Thomas P.; Turner, T. Edward; Warner, Barry G.
2016-11-01
In the decade since the first pan-European testate amoeba-based transfer function for peatland palaeohydrological reconstruction was published, a vast amount of additional data collection has been undertaken by the research community. Here, we expand the pan-European dataset from 128 to 1799 samples, spanning 35° of latitude and 55° of longitude. After the development of a new taxonomic scheme to permit compilation of data from a wide range of contributors and the removal of samples with high pH values, we developed ecological transfer functions using a range of model types and a dataset of ∼1300 samples. We rigorously tested the efficacy of these models using both statistical validation and independent test sets with associated instrumental data. Model performance measured by statistical indicators was comparable to other published models. Comparison to test sets showed that taxonomic resolution did not impair model performance and that the new pan-European model can therefore be used as an effective tool for palaeohydrological reconstruction. Our results question the efficacy of relying on statistical validation of transfer functions alone and support a multi-faceted approach to the assessment of new models. We substantiated recent advice that model outputs should be standardised and presented as residual values in order to focus interpretation on secure directional shifts, avoiding potentially inaccurate conclusions relating to specific water-table depths. The extent and diversity of the dataset highlighted that, at the taxonomic resolution applied, a majority of taxa had broad geographic distributions, though some morphotypes appeared to have restricted ranges.
The Elephant in the Room: Spatial Heterogeneity and the Uncertainty of Measurements and Models
NASA Astrophysics Data System (ADS)
Alfieri, J. G.; Kustas, W. P.; Prueger, J. H.; Agam, N.; Neale, C. M. U.; Evett, S. R.
2014-12-01
Variations in surface conditions can significantly influence the exchange of heat and moisture between the land and atmosphere. As a result, measurements of surface fluxes using disparate methods not only may differ, they may fail to represent the surrounding landscape due to localized differences in surface conditions. To illustrate this, data collected over adjacent cotton fields during the Bushland Evapotranspiration and Agricultural Remote Sensing Experiment (BEAREX08) will be used. The evapotranspiration (ET) within each field was determined via lysimetry (LY), mass balance using neutron probe (NP) data, and a pair of eddy covariance (EC) systems. A comparison of the cumulative ET from each field showed that ET from LY was 20% to 25% greater than that derived from NP and 10% to 15% greater than those from EC. Additionally, the cumulative flux for the two fields collected using the same approach differed by 5% to 10%. These discrepancies can be explained, in large part, by the variations in vegetation density within the two fields. Not only were there substantial variations in the leaf area index (LAI) within the source areas of the different measurement systems - for example, the LAI within LY was, on average, 0.4 m2 m-2 greater than the LAI within the source area of NP - there were also significant differences in the LAI between the fields as a whole. The cumulative ET output by the remote sensing-based Two-Source Energy Balance (TSEB) model was also compared to the cumulative ET from each of the three measurement approaches. Depending on which measurement technique is used, the model either underestimated the moisture flux by approximately 5%, in the case of LY, or overestimated the flux by nearly 20%, in the case of NP. Comparison of the model output with EC data also indicated that the model overestimated ET, in this case, by approximately 10%. Clearly, the choice of which dataset is used to validate the model significantly effects the conclusions drawn regarding the model's accuracy and utility in estimating ET. The results of this study also underscores the limitations of each of these measurement techniques and the need to understand those limitations when using observational datasets to make general conclusions about field scale ET and validating model output.
Climate Model Diagnostic Analyzer Web Service System
NASA Astrophysics Data System (ADS)
Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Jiang, J. H.
2013-12-01
The latest Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report stressed the need for the comprehensive and innovative evaluation of climate models with newly available global observations. The traditional approach to climate model evaluation, which compares a single parameter at a time, identifies symptomatic model biases and errors but fails to diagnose the model problems. The model diagnosis process requires physics-based multi-variable comparisons that typically involve large-volume and heterogeneous datasets, making them both computationally- and data-intensive. To address these challenges, we are developing a parallel, distributed web-service system that enables the physics-based multi-variable model performance evaluations and diagnoses through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. We have developed a methodology to transform an existing science application code into a web service using a Python wrapper interface and Python web service frameworks (i.e., Flask, Gunicorn, and Tornado). The web-service system, called Climate Model Diagnostic Analyzer (CMDA), currently supports (1) all the datasets from Obs4MIPs and a few ocean datasets from NOAA and Argo, which can serve as observation-based reference data for model evaluation and (2) many of CMIP5 model outputs covering a broad range of atmosphere, ocean, and land variables from the CMIP5 specific historical runs and AMIP runs. Analysis capabilities currently supported by CMDA are (1) the calculation of annual and seasonal means of physical variables, (2) the calculation of time evolution of the means in any specified geographical region, (3) the calculation of correlation between two variables, and (4) the calculation of difference between two variables. A web user interface is chosen for CMDA because it not only lowers the learning curve and removes the adoption barrier of the tool but also enables instantaneous use, avoiding the hassle of local software installation and environment incompatibility. CMDA is planned to be used as an educational tool for the summer school organized by JPL's Center for Climate Science in 2014. The requirements of the educational tool are defined with the interaction with the school organizers, and CMDA is customized to meet the requirements accordingly. The tool needs to be production quality for 30+ simultaneous users. The summer school will thus serve as a valuable testbed for the tool development, preparing CMDA to serve the Earth-science modeling and model-analysis community at the end of the project. This work was funded by the NASA Earth Science Program called Computational Modeling Algorithms and Cyberinfrastructure (CMAC).
Applying deep bidirectional LSTM and mixture density network for basketball trajectory prediction
NASA Astrophysics Data System (ADS)
Zhao, Yu; Yang, Rennong; Chevalier, Guillaume; Shah, Rajiv C.; Romijnders, Rob
2018-04-01
Data analytics helps basketball teams to create tactics. However, manual data collection and analytics are costly and ineffective. Therefore, we applied a deep bidirectional long short-term memory (BLSTM) and mixture density network (MDN) approach. This model is not only capable of predicting a basketball trajectory based on real data, but it also can generate new trajectory samples. It is an excellent application to help coaches and players decide when and where to shoot. Its structure is particularly suitable for dealing with time series problems. BLSTM receives forward and backward information at the same time, while stacking multiple BLSTMs further increases the learning ability of the model. Combined with BLSTMs, MDN is used to generate a multi-modal distribution of outputs. Thus, the proposed model can, in principle, represent arbitrary conditional probability distributions of output variables. We tested our model with two experiments on three-pointer datasets from NBA SportVu data. In the hit-or-miss classification experiment, the proposed model outperformed other models in terms of the convergence speed and accuracy. In the trajectory generation experiment, eight model-generated trajectories at a given time closely matched real trajectories.
NASA Astrophysics Data System (ADS)
Curci, Gabriele; Falasca, Serena
2017-04-01
Deterministic air quality forecast is routinely carried out at many local Environmental Agencies in Europe and throughout the world by means of eulerian chemistry-transport models. The skill of these models in predicting the ground-level concentrations of relevant pollutants (ozone, nitrogen dioxide, particulate matter) a few days ahead has greatly improved in recent years, but it is not yet always compliant with the required quality level for decision making (e.g. the European Commission has set a maximum uncertainty of 50% on daily values of relevant pollutants). Post-processing of deterministic model output is thus still regarded as a useful tool to make the forecast more reliable. In this work, we test several bias correction techniques applied to a long-term dataset of air quality forecasts over Europe and Italy. We used the WRF-CHIMERE modelling system, which provides operational experimental chemical weather forecast at CETEMPS (http://pumpkin.aquila.infn.it/forechem/), to simulate the years 2008-2012 at low resolution over Europe (0.5° x 0.5°) and moderate resolution over Italy (0.15° x 0.15°). We compared the simulated dataset with available observation from the European Environmental Agency database (AirBase) and characterized model skill and compliance with EU legislation using the Delta tool from FAIRMODE project (http://fairmode.jrc.ec.europa.eu/). The bias correction techniques adopted are, in order of complexity: (1) application of multiplicative factors calculated as the ratio of model-to-observed concentrations averaged over the previous days; (2) correction of the statistical distribution of model forecasts, in order to make it similar to that of the observations; (3) development and application of Model Output Statistics (MOS) regression equations. We illustrate differences and advantages/disadvantages of the three approaches. All the methods are relatively easy to implement for other modelling systems.
NASA Astrophysics Data System (ADS)
Ball, William; Rozanov, Eugene; Shapiro, Anna
2015-04-01
Ozone plays a key role in the temperature structure of the Earth's atmosphere and absorbs damaging ultraviolet (UV) solar radiation. Evidence suggests that variations in stratospheric ozone resulting from changes in solar UV output may have an important role to play in weather over the North Atlantic and Europe on decadal timescales through a "top-down" coupling with the troposphere. However, the magnitude of the stratospheric response to the Sun over the 11-year solar cycle (SC) depends primarily on how much the UV changes. SC UV changes differ significantly between different observational instruments and the observations and models. The substantial disagreements between existing SSI datasets lead to different atmospheric responses when they are used in climate models and, therefore, we still cannot fully understand and simulate the ozone variability. We use the SOCOL chemistry-climate model, in specified dynamics mode, to calculate the atmospheric response from using different spectral irradiance from the SATIRE-S and NRLSSI models and with SORCE observations and a constant Sun. We compare the ozone and hydroxl results from these runs with observations to try to determine which SSI dataset is most likely to be correct. This is important to get a better understanding of which SSI dataset should be used in climate modelling and what magnitude of UV variability the Sun has. This will lead to a better understanding of the Sun's influence upon our climate and weather.
Earth System Grid II (ESG): Turning Climate Model Datasets Into Community Resources
NASA Astrophysics Data System (ADS)
Williams, D.; Middleton, D.; Foster, I.; Nevedova, V.; Kesselman, C.; Chervenak, A.; Bharathi, S.; Drach, B.; Cinquni, L.; Brown, D.; Strand, G.; Fox, P.; Garcia, J.; Bernholdte, D.; Chanchio, K.; Pouchard, L.; Chen, M.; Shoshani, A.; Sim, A.
2003-12-01
High-resolution, long-duration simulations performed with advanced DOE SciDAC/NCAR climate models will produce tens of petabytes of output. To be useful, this output must be made available to global change impacts researchers nationwide, both at national laboratories and at universities, other research laboratories, and other institutions. To this end, we propose to create a new Earth System Grid, ESG-II - a virtual collaborative environment that links distributed centers, users, models, and data. ESG-II will provide scientists with virtual proximity to the distributed data and resources that they require to perform their research. The creation of this environment will significantly increase the scientific productivity of U.S. climate researchers by turning climate datasets into community resources. In creating ESG-II, we will integrate and extend a range of Grid and collaboratory technologies, including the DODS remote access protocols for environmental data, Globus Toolkit technologies for authentication, resource discovery, and resource access, and Data Grid technologies developed in other projects. We will develop new technologies for (1) creating and operating "filtering servers" capable of performing sophisticated analyses, and (2) delivering results to users. In so doing, we will simultaneously contribute to climate science and advance the state of the art in collaboratory technology. We expect our results to be useful to numerous other DOE projects. The three-year R&D program will be undertaken by a talented and experienced team of computer scientists at five laboratories (ANL, LBNL, LLNL, NCAR, ORNL) and one university (ISI), working in close collaboration with climate scientists at several sites.
Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong
2011-09-01
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
NASA Astrophysics Data System (ADS)
Ybanez, R. L.; Lagmay, A. M. A.; David, C. P.
2016-12-01
With climatological hazards increasing globally, the Philippines is listed as one of the most vulnerable countries in the world due to its location in the Western Pacific. Flood hazards mapping and modelling is one of the responses by local government and research institutions to help prepare for and mitigate the effects of flood hazards that constantly threaten towns and cities in floodplains during the 6-month rainy season. Available digital elevation maps, which serve as the most important dataset used in 2D flood modelling, are limited in the Philippines and testing is needed to determine which of the few would work best for flood hazards mapping and modelling. Two-dimensional GIS-based flood modelling with the flood-routing software FLO-2D was conducted using three different available DEMs from the ASTER GDEM, the SRTM GDEM, and the locally available IfSAR DTM. All other parameters kept uniform, such as resolution, soil parameters, rainfall amount, and surface roughness, the three models were run over a 129-sq. kilometer watershed with only the basemap varying. The output flood hazard maps were compared on the basis of their flood distribution, extent, and depth. The ASTER and SRTM GDEMs contained too much error and noise which manifested as dissipated and dissolved hazard areas in the lower watershed where clearly delineated flood hazards should be present. Noise on the two datasets are clearly visible as erratic mounds in the floodplain. The dataset which produced the only feasible flood hazard map is the IfSAR DTM which delineates flood hazard areas clearly and properly. Despite the use of ASTER and SRTM with their published resolution and accuracy, their use in GIS-based flood modelling would be unreliable. Although not as accessible, only IfSAR or better datasets should be used for creating secondary products from these base DEM datasets. For developing countries which are most prone to hazards, but with limited choices for basemaps used in hazards studies, the caution must be taken in the use of globally available GDEMs and higher-resolution DEMs must always be sought.
Feature Representations for Neuromorphic Audio Spike Streams.
Anumula, Jithendar; Neil, Daniel; Delbruck, Tobi; Liu, Shih-Chii
2018-01-01
Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset.
Feature Representations for Neuromorphic Audio Spike Streams
Anumula, Jithendar; Neil, Daniel; Delbruck, Tobi; Liu, Shih-Chii
2018-01-01
Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset. PMID:29479300
NASA Technical Reports Server (NTRS)
Gottschalck, Jon; Meng, Jesse; Rodel, Matt; Houser, paul
2005-01-01
Land surface models (LSMs) are computer programs, similar to weather and climate prediction models, which simulate the stocks and fluxes of water (including soil moisture, snow, evaporation, and runoff) and energy (including the temperature of and sensible heat released from the soil) after they arrive on the land surface as precipitation and sunlight. It is not currently possible to measure all of the variables of interest everywhere on Earth with sufficient accuracy and space-time resolution. Hence LSMs have been developed to integrate the available observations with our understanding of the physical processes involved, using powerful computers, in order to map these stocks and fluxes as they change in time. The maps are used to improve weather forecasts, support water resources and agricultural applications, and study the Earth's water cycle and climate variability. NASA's Global Land Data Assimilation System (GLDAS) project facilitates testing of several different LSMs with a variety of input datasets (e.g., precipitation, plant type). Precipitation is arguably the most important input to LSMs. Many precipitation datasets have been produced using satellite and rain gauge observations and weather forecast models. In this study, seven different global precipitation datasets were evaluated over the United States, where dense rain gauge networks contribute to reliable precipitation maps. We then used the seven datasets as inputs to GLDAS simulations, so that we could diagnose their impacts on output stocks and fluxes of water. In terms of totals, the Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP) had the closest agreement with the US rain gauge dataset for all seasons except winter. The CMAP precipitation was also the most closely correlated in time with the rain gauge data during spring, fall, and winter, while the satellitebased estimates performed best in summer. The GLDAS simulations revealed that modeled soil moisture is highly sensitive to precipitation, with differences in spring and summer as large as 45% depending on the choice of precipitation input.
Shi, X. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Thornton, P. E. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Ricciuto, D. M. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Hanson, P. J. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Mao, J. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Sebestyen, S. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Griffiths, N. A. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.; Bisht, G. [Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.
2016-09-01
Here we provide model code, inputs, outputs and evaluation datasets for a new configuration of the Community Land Model (CLM) for SPRUCE, which includes a fully prognostic water table calculation for SPRUCE. Our structural and process changes to CLM focus on modifications needed to represent the hydrologic cycle of bogs environment with perched water tables, as well as distinct hydrologic dynamics and vegetation communities of the raised hummock and sunken hollow microtopography characteristic of SPRUCE and other peatland bogs. The modified model was parameterized and independently evaluated against observations from an ombrotrophic raised-dome bog in northern Minnesota (S1-Bog), the site for the Spruce and Peatland Responses Under Climatic and Environmental Change experiment (SPRUCE).
NASA Astrophysics Data System (ADS)
Hermann, A. J.; Moore, C.; Soreide, N. N.
2002-12-01
Ocean circulation is irrefutably three dimensional, and powerful new measurement technologies and numerical models promise to expand our three-dimensional knowledge of the dynamics further each year. Yet, most ocean data and model output is still viewed using two-dimensional maps. Immersive visualization techniques allow the investigator to view their data as a three dimensional world of surfaces and vectors which evolves through time. The experience is not unlike holding a part of the ocean basin in one's hand, turning and examining it from different angles. While immersive, three dimensional visualization has been possible for at least a decade, the technology was until recently inaccessible (both physically and financially) for most researchers. It is not yet fully appreciated by practicing oceanographers how new, inexpensive computing hardware and software (e.g. graphics cards and controllers designed for the huge PC gaming market) can be employed for immersive, three dimensional, color visualization of their increasingly huge datasets and model output. In fact, the latest developments allow immersive visualization through web servers, giving scientists the ability to "fly through" three-dimensional data stored half a world away. Here we explore what additional insight is gained through immersive visualization, describe how scientists of very modest means can easily avail themselves of the latest technology, and demonstrate its implementation on a web server for Pacific Ocean model output.
NASA Astrophysics Data System (ADS)
Tang, U. W.; Wang, Z. S.
2008-10-01
Each city has its unique urban form. The importance of urban form on sustainable development has been recognized in recent years. Traditionally, air quality modelling in a city is in a mesoscale with grid resolution of kilometers, regardless of its urban form. This paper introduces a GIS-based air quality and noise model system developed to study the built environment of highly compact urban forms. Compared with traditional mesoscale air quality model system, the present model system has a higher spatial resolution down to individual buildings along both sides of the street. Applying the developed model system in the Macao Peninsula with highly compact urban forms, the average spatial resolution of input and output data is as high as 174 receptor points per km2. Based on this input/output dataset with a high spatial resolution, this study shows that even the highly compact urban forms can be fragmented into a very small geographic scale of less than 3 km2. This is due to the significant temporal variation of urban development. The variation of urban form in each fragment in turn affects air dispersion, traffic condition, and thus air quality and noise in a measurable scale.
ERIC Educational Resources Information Center
Limniou, Maria; Downes, John J.; Maskell, Simon
2015-01-01
Nowadays, the use of datasets is of crucial importance for the advancement of educational research. Specifically in the field of Higher Education, many researchers might share through online data repositories their research outputs in order for data to be reusable, accessible and accountable to educational community. The aim of this paper is to…
Cesari, Daniela; Amato, F; Pandolfi, M; Alastuey, A; Querol, X; Contini, D
2016-08-01
Source apportionment of aerosol is an important approach to investigate aerosol formation and transformation processes as well as to assess appropriate mitigation strategies and to investigate causes of non-compliance with air quality standards (Directive 2008/50/CE). Receptor models (RMs) based on chemical composition of aerosol measured at specific sites are a useful, and widely used, tool to perform source apportionment. However, an analysis of available studies in the scientific literature reveals heterogeneities in the approaches used, in terms of "working variables" such as the number of samples in the dataset and the number of chemical species used as well as in the modeling tools used. In this work, an inter-comparison of PM10 source apportionment results obtained at three European measurement sites is presented, using two receptor models: principal component analysis coupled with multi-linear regression analysis (PCA-MLRA) and positive matrix factorization (PMF). The inter-comparison focuses on source identification, quantification of source contribution to PM10, robustness of the results, and how these are influenced by the number of chemical species available in the datasets. Results show very similar component/factor profiles identified by PCA and PMF, with some discrepancies in the number of factors. The PMF model appears to be more suitable to separate secondary sulfate and secondary nitrate with respect to PCA at least in the datasets analyzed. Further, some difficulties have been observed with PCA in separating industrial and heavy oil combustion contributions. Commonly at all sites, the crustal contributions found with PCA were larger than those found with PMF, and the secondary inorganic aerosol contributions found by PCA were lower than those found by PMF. Site-dependent differences were also observed for traffic and marine contributions. The inter-comparison of source apportionment performed on complete datasets (using the full range of available chemical species) and incomplete datasets (with reduced number of chemical species) allowed to investigate the sensitivity of source apportionment (SA) results to the working variables used in the RMs. Results show that, at both sites, the profiles and the contributions of the different sources calculated with PMF are comparable within the estimated uncertainties indicating a good stability and robustness of PMF results. In contrast, PCA outputs are more sensitive to the chemical species present in the datasets. In PCA, the crustal contributions are higher in the incomplete datasets and the traffic contributions are significantly lower for incomplete datasets.
Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation
Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B.
2016-01-01
Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field. PMID:27853419
Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation.
Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B
2016-01-01
Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field.
Extra-Tropical Cyclones at Climate Scales: Comparing Models to Observations
NASA Astrophysics Data System (ADS)
Tselioudis, G.; Bauer, M.; Rossow, W.
2009-04-01
Climate is often defined as the accumulation of weather, and weather is not the concern of climate models. Justification for this latter sentiment has long been hidden behind coarse model resolutions and blunt validation tools based on climatological maps. The spatial-temporal resolutions of today's climate models and observations are converging onto meteorological scales, however, which means that with the correct tools we can test the largely unproven assumption that climate model weather is correct enough that its accumulation results in a robust climate simulation. Towards this effort we introduce a new tool for extracting detailed cyclone statistics from observations and climate model output. These include the usual cyclone characteristics (centers, tracks), but also adaptive cyclone-centric composites. We have created a novel dataset, the MAP Climatology of Mid-latitude Storminess (MCMS), which provides a detailed 6 hourly assessment of the areas under the influence of mid-latitude cyclones, using a search algorithm that delimits the boundaries of each system from the outer-most closed SLP contour. Using this we then extract composites of cloud, radiation, and precipitation properties from sources such as ISCCP and GPCP to create a large comparative dataset for climate model validation. A demonstration of the potential usefulness of these tools in process-based climate model evaluation studies will be shown.
Zajac, Zuzanna; Stith, Bradley M.; Bowling, Andrea C.; Langtimm, Catherine A.; Swain, Eric D.
2015-01-01
Habitat suitability index (HSI) models are commonly used to predict habitat quality and species distributions and are used to develop biological surveys, assess reserve and management priorities, and anticipate possible change under different management or climate change scenarios. Important management decisions may be based on model results, often without a clear understanding of the level of uncertainty associated with model outputs. We present an integrated methodology to assess the propagation of uncertainty from both inputs and structure of the HSI models on model outputs (uncertainty analysis: UA) and relative importance of uncertain model inputs and their interactions on the model output uncertainty (global sensitivity analysis: GSA). We illustrate the GSA/UA framework using simulated hydrology input data from a hydrodynamic model representing sea level changes and HSI models for two species of submerged aquatic vegetation (SAV) in southwest Everglades National Park: Vallisneria americana (tape grass) and Halodule wrightii (shoal grass). We found considerable spatial variation in uncertainty for both species, but distributions of HSI scores still allowed discrimination of sites with good versus poor conditions. Ranking of input parameter sensitivities also varied spatially for both species, with high habitat quality sites showing higher sensitivity to different parameters than low-quality sites. HSI models may be especially useful when species distribution data are unavailable, providing means of exploiting widely available environmental datasets to model past, current, and future habitat conditions. The GSA/UA approach provides a general method for better understanding HSI model dynamics, the spatial and temporal variation in uncertainties, and the parameters that contribute most to model uncertainty. Including an uncertainty and sensitivity analysis in modeling efforts as part of the decision-making framework will result in better-informed, more robust decisions.
van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter
2015-08-07
Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.
Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter
2015-01-01
Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160
NASA Technical Reports Server (NTRS)
Peters-Lidard, Christa D.; Mocko, David; Kumar, Sujay; Ek, Michael; Xia, Youlong; Dong, Jiarui
2012-01-01
Both NLDAS Phase 1 (1996-2007) and Phase 2 (1979-present) datasets have been evaluated against in situ observational datasets, and NLDAS forcings and outputs are used by a wide variety of users. Drought indices and drought monitoring from NLDAS were recently examined by Mo et al. (2010) and Sheffield et al. (2010). In this poster, we will present results analyzing NLDAS Phase 2 forcings and outputs for 3 North American Case studies being analyzed as part of the NOAA MAPP Drought Task Force: (1) Western US drought (1998- 2004); (2) plains/southeast US drought (2006-2007); and (3) Current Texas-Mexico drought (2011-). We will examine percentiles of soil moisture consistent with the NLDAS drought monitor.
The Wind Integration National Dataset (WIND) toolkit (Presentation)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caroline Draxl: NREL
2014-01-01
Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.
3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study.
Dolz, Jose; Desrosiers, Christian; Ben Ayed, Ismail
2018-04-15
This study investigates a 3D and fully convolutional neural network (CNN) for subcortical brain structure segmentation in MRI. 3D CNN architectures have been generally avoided due to their computational and memory requirements during inference. We address the problem via small kernels, allowing deeper architectures. We further model both local and global context by embedding intermediate-layer outputs in the final prediction, which encourages consistency between features extracted at different scales and embeds fine-grained information directly in the segmentation process. Our model is efficiently trained end-to-end on a graphics processing unit (GPU), in a single stage, exploiting the dense inference capabilities of fully CNNs. We performed comprehensive experiments over two publicly available datasets. First, we demonstrate a state-of-the-art performance on the ISBR dataset. Then, we report a large-scale multi-site evaluation over 1112 unregistered subject datasets acquired from 17 different sites (ABIDE dataset), with ages ranging from 7 to 64 years, showing that our method is robust to various acquisition protocols, demographics and clinical factors. Our method yielded segmentations that are highly consistent with a standard atlas-based approach, while running in a fraction of the time needed by atlas-based methods and avoiding registration/normalization steps. This makes it convenient for massive multi-site neuroanatomical imaging studies. To the best of our knowledge, our work is the first to study subcortical structure segmentation on such large-scale and heterogeneous data. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Shin, Seulki; Moon, Yong-Jae; Chu, Hyoungseok
2017-08-01
As the application of deep-learning methods has been succeeded in various fields, they have a high potential to be applied to space weather forecasting. Convolutional neural network, one of deep learning methods, is specialized in image recognition. In this study, we apply the AlexNet architecture, which is a winner of Imagenet Large Scale Virtual Recognition Challenge (ILSVRC) 2012, to the forecast of daily solar flare occurrence using the MatConvNet software of MATLAB. Our input images are SOHO/MDI, EIT 195Å, and 304Å from January 1996 to December 2010, and output ones are yes or no of flare occurrence. We select training dataset from Jan 1996 to Dec 2000 and from Jan 2003 to Dec 2008. Testing dataset is chosen from Jan 2001 to Dec 2002 and from Jan 2009 to Dec 2010 in order to consider the solar cycle effect. In training dataset, we randomly select one fifth of training data for validation dataset to avoid the overfitting problem. Our model successfully forecasts the flare occurrence with about 0.90 probability of detection (POD) for common flares (C-, M-, and X-class). While POD of major flares (M- and X-class) forecasting is 0.96, false alarm rate (FAR) also scores relatively high(0.60). We also present several statistical parameters such as critical success index (CSI) and true skill statistics (TSS). Our model can immediately be applied to automatic forecasting service when image data are available.
NASA Astrophysics Data System (ADS)
Yiannikopoulou, I.; Philippopoulos, K.; Deligiorgi, D.
2012-04-01
The vertical thermal structure of the atmosphere is defined by a combination of dynamic and radiation transfer processes and plays an important role in describing the meteorological conditions at local scales. The scope of this work is to develop and quantify the predictive ability of a hybrid dynamic-statistical downscaling procedure to estimate the vertical profile of ambient temperature at finer spatial scales. The study focuses on the warm period of the year (June - August) and the method is applied to an urban coastal site (Hellinikon), located in eastern Mediterranean. The two-step methodology initially involves the dynamic downscaling of coarse resolution climate data via the RegCM4.0 regional climate model and subsequently the statistical downscaling of the modeled outputs by developing and training site-specific artificial neural networks (ANN). The 2.5ox2.5o gridded NCEP-DOE Reanalysis 2 dataset is used as initial and boundary conditions for the dynamic downscaling element of the methodology, which enhances the regional representivity of the dataset to 20km and provides modeled fields in 18 vertical levels. The regional climate modeling results are compared versus the upper-air Hellinikon radiosonde observations and the mean absolute error (MAE) is calculated between the four grid point values nearest to the station and the ambient temperature at the standard and significant pressure levels. The statistical downscaling element of the methodology consists of an ensemble of ANN models, one for each pressure level, which are trained separately and employ the regional scale RegCM4.0 output. The ANN models are theoretically capable of estimating any measurable input-output function to any desired degree of accuracy. In this study they are used as non-linear function approximators for identifying the relationship between a number of predictor variables and the ambient temperature at the various vertical levels. An insight of the statistically derived input-output transfer functions is obtained by utilizing the ANN weights method, which quantifies the relative importance of the predictor variables in the estimation procedure. The overall downscaling performance evaluation incorporates a set of correlation and statistical measures along with appropriate statistical tests. The hybrid downscaling method presented in this work can be extended to various locations by training different site-specific ANN models and the results, depending on the application, can be used for assisting the understanding of the past, present and future climatology. ____________________________ This research has been co-financed by the European Union and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II: Investing in knowledge society through the European Social Fund.
Development of Regional Wind Resource and Wind Plant Output Datasets for the Hawaiian Islands
DOE Office of Scientific and Technical Information (OSTI.GOV)
Manobianco, J.; Alonge, C.; Frank, J.
In March 2009, AWS Truepower was engaged by the National Renewable Energy Laboratory (NREL) to develop a set of wind resource and plant output data for the Hawaiian Islands. The objective of this project was to expand the methods and techniques employed in the Eastern Wind Integration and Transmission Study (EWITS) to include the state of Hawaii.
POWERLIB: SAS/IML Software for Computing Power in Multivariate Linear Models
Johnson, Jacqueline L.; Muller, Keith E.; Slaughter, James C.; Gurka, Matthew J.; Gribbin, Matthew J.; Simpson, Sean L.
2014-01-01
The POWERLIB SAS/IML software provides convenient power calculations for a wide range of multivariate linear models with Gaussian errors. The software includes the Box, Geisser-Greenhouse, Huynh-Feldt, and uncorrected tests in the “univariate” approach to repeated measures (UNIREP), the Hotelling Lawley Trace, Pillai-Bartlett Trace, and Wilks Lambda tests in “multivariate” approach (MULTIREP), as well as a limited but useful range of mixed models. The familiar univariate linear model with Gaussian errors is an important special case. For estimated covariance, the software provides confidence limits for the resulting estimated power. All power and confidence limits values can be output to a SAS dataset, which can be used to easily produce plots and tables for manuscripts. PMID:25400516
A Field Guide to Extra-Tropical Cyclones: Comparing Models to Observations
NASA Astrophysics Data System (ADS)
Bauer, M.
2008-12-01
Climate it is said is the accumulation of weather. And weather is not the concern of climate models. Justification for this latter sentiment has long hidden behind coarse model resolutions and blunt validation tools based on climatological maps and the like. The spatial-temporal resolutions of today's models and observations are converging onto meteorological scales however, which means that with the correct tools we can test the largely unproven assumption that climate model weather is correct enough, or at least lacks perverting biases, such that its accumulation does in fact result in a robust climate prediction. Towards this effort we introduce a new tool for extracting detailed cyclone statistics from climate model output. These include the usual cyclone distribution statistics (maps, histograms), but also adaptive cyclone- centric composites. We have also created a complementary dataset, The MAP Climatology of Mid-latitude Storminess (MCMS), which provides a detailed 6 hourly assessment of the areas under the influence of mid- latitude cyclones based on Reanalysis products. Using this we then extract complimentary composites from sources such as ISCCP and GPCP to create a large comparative dataset for climate model validation. A demonstration of the potential usefulness of these tools will be shown. dime.giss.nasa.gov/mcms/mcms.html
NASA GES DISC On-line Visualization and Analysis System for Gridded Remote Sensing Data
NASA Technical Reports Server (NTRS)
Leptoukh, Gregory G.; Berrick, S.; Rui, H.; Liu, Z.; Zhu, T.; Teng, W.; Shen, S.; Qin, J.
2005-01-01
The ability to use data stored in the current NASA Earth Observing System (EOS) archives for studying regional or global phenomena is highly dependent on having a detailed understanding of the data's internal structure and physical implementation. Gaining this understanding and applying it to data reduction is a time-consuming task that must be undertaken before the core investigation can begin. This is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets that are usually of different formats, structures, and resolutions. The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) has taken a major step towards meeting this challenge by developing an infrastructure with a Web interface that allows users to perform interactive analysis online without downloading any data, the GES-DISC Interactive Online Visualization and Analysis Infrastructure or "Giovanni." Giovanni provides interactive, online, analysis tools for data users to facilitate their research. There have been several instances of this interface created to serve TRMM users, Aerosol scientists, Ocean Color and Agriculture applications users. The first generation of these tools support gridded data only. The user selects geophysical parameters, area of interest, time period; and the system generates an output on screen in a matter of seconds. The currently available output options are: Area plot averaged or accumulated over any available data period for any rectangular area; Time plot time series averaged over any rectangular area; Hovmoller plots image view of any longitude-time and latitude-time cross sections; ASCII output for all plot types; Image animation for area plot. Another analysis suite deals with parameter intercomparison: scatter plots, temporal correlation maps, GIs-compatible outputs, etc. This allow user to focus on data content (i.e. science parameters) and eliminate the need for expensive learning, development and processing tasks that are redundantly incurred by an archive's user community. The current implementation utilizes the GrADS-DODS Server (GDS), and provides subsetting and analysis services across the Internet for any GrADS-readable dataset. The subsetting capability allows users to retrieve a specified temporal and/or spatial subdomain from a large dataset, eliminating the need to download everything simply to access a small relevant portion of a dataset. The analysis capability allows users to retrieve the results of an operation applied to one or more datasets on the server. We use this approach to read pre-processed binary files and/or to read and extract the needed parts directly from HDF or HDF-EOS files. These subsets then serve as inputs into GrADS analysis scripts. It can be used in a wide variety of Earth science applications: climate and weather events study and monitoring; modeling. It can be easily configured for new applications.
Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen
2010-07-01
We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
NASA Astrophysics Data System (ADS)
Abiriand Bhekisipho Twala, Olufunminiyi
2017-08-01
In this paper, a multilayer feedforward neural network with Bayesian regularization constitutive model is developed for alloy 316L during high strain rate and high temperature plastic deformation. The input variables are strain rate, temperature and strain while the output value is the flow stress of the material. The results show that the use of Bayesian regularized technique reduces the potential of overfitting and overtraining. The prediction quality of the model is thereby improved. The model predictions are in good agreement with experimental measurements. The measurement data used for the network training and model comparison were taken from relevant literature. The developed model is robust as it can be generalized to deformation conditions slightly below or above the training dataset.
Tropical Cyclone Information System
NASA Technical Reports Server (NTRS)
Li, P. Peggy; Knosp, Brian W.; Vu, Quoc A.; Yi, Chao; Hristova-Veleva, Svetla M.
2009-01-01
The JPL Tropical Cyclone Infor ma tion System (TCIS) is a Web portal (http://tropicalcyclone.jpl.nasa.gov) that provides researchers with an extensive set of observed hurricane parameters together with large-scale and convection resolving model outputs. It provides a comprehensive set of high-resolution satellite (see figure), airborne, and in-situ observations in both image and data formats. Large-scale datasets depict the surrounding environmental parameters such as SST (Sea Surface Temperature) and aerosol loading. Model outputs and analysis tools are provided to evaluate model performance and compare observations from different platforms. The system pertains to the thermodynamic and microphysical structure of the storm, the air-sea interaction processes, and the larger-scale environment as depicted by ocean heat content and the aerosol loading of the environment. Currently, the TCIS is populated with satellite observations of all tropical cyclones observed globally during 2005. There is a plan to extend the database both forward in time till present as well as backward to 1998. The portal is powered by a MySQL database and an Apache/Tomcat Web server on a Linux system. The interactive graphic user interface is provided by Google Map.
NASA Astrophysics Data System (ADS)
Clark, E. P.; Cosgrove, B.; Salas, F.
2016-12-01
As a significant step forward to transform NOAA's water prediction services, NOAA plans to implement a new National Water Model (NWM) Version 1.0 in August 2016. A continental scale water resources model, the NWM is an evolution of the WRF-Hydro architecture developed by the National Center for Atmospheric Research (NCAR). The NWM will provide analyses and forecasts of flow for the 2.7 million stream reaches nationwide in the National Hydrography Dataset Plus v2 (NHDPlusV2) jointly developed by the USGS and EPA. The NWM also produces high-resolution water budget variables of snow, soil moisture, and evapotranspiration on a 1-km grid. NOAA's stakeholders require additional decision support application to be built on these data. The Geo-intelligence division of the Office of Water Prediction is building new products and services that integrate output from the NWM with geospatial datasets such as infrastructure and demographics to better estimate the impacts dynamic water resource states on community resiliency. This presentation will detail the methods and underlying information to produce prototypes water resources intelligence that is timely, actionable and credible. Moreover, it will to explore the NWM capability to support sector-specific decision support services.
Assessing Ecosystem Model Performance in Semiarid Systems
NASA Astrophysics Data System (ADS)
Thomas, A.; Dietze, M.; Scott, R. L.; Biederman, J. A.
2017-12-01
In ecosystem process modelling, comparing outputs to benchmark datasets observed in the field is an important way to validate models, allowing the modelling community to track model performance over time and compare models at specific sites. Multi-model comparison projects as well as models themselves have largely been focused on temperate forests and similar biomes. Semiarid regions, on the other hand, are underrepresented in land surface and ecosystem modelling efforts, and yet will be disproportionately impacted by disturbances such as climate change due to their sensitivity to changes in the water balance. Benchmarking models at semiarid sites is an important step in assessing and improving models' suitability for predicting the impact of disturbance on semiarid ecosystems. In this study, several ecosystem models were compared at a semiarid grassland in southwestern Arizona using PEcAn, or the Predictive Ecosystem Analyzer, an open-source eco-informatics toolbox ideal for creating the repeatable model workflows necessary for benchmarking. Models included SIPNET, DALEC, JULES, ED2, GDAY, LPJ-GUESS, MAESPA, CLM, CABLE, and FATES. Comparison between model output and benchmarks such as net ecosystem exchange (NEE) tended to produce high root mean square error and low correlation coefficients, reflecting poor simulation of seasonality and the tendency for models to create much higher carbon sources than observed. These results indicate that ecosystem models do not currently adequately represent semiarid ecosystem processes.
NASA Astrophysics Data System (ADS)
Jandt, Simon; Laagemaa, Priidik; Janssen, Frank
2014-05-01
The systematic and objective comparison between output from a numerical ocean model and a set of observations, called validation in the context of this presentation, is a beneficial activity at several stages, starting from early steps in model development and ending at the quality control of model based products delivered to customers. Even though the importance of this kind of validation work is widely acknowledged it is often not among the most popular tasks in ocean modelling. In order to ease the validation work a comprehensive toolbox has been developed in the framework of the MyOcean-2 project. The objective of this toolbox is to carry out validation integrating different data sources, e.g. time-series at stations, vertical profiles, surface fields or along track satellite data, with one single program call. The validation toolbox, implemented in MATLAB, features all parts of the validation process - ranging from read-in procedures of datasets to the graphical and numerical output of statistical metrics of the comparison. The basic idea is to have only one well-defined validation schedule for all applications, in which all parts of the validation process are executed. Each part, e.g. read-in procedures, forms a module in which all available functions of this particular part are collected. The interface between the functions, the module and the validation schedule is highly standardized. Functions of a module are set up for certain validation tasks, new functions can be implemented into the appropriate module without affecting the functionality of the toolbox. The functions are assigned for each validation task in user specific settings, which are externally stored in so-called namelists and gather all information of the used datasets as well as paths and metadata. In the framework of the MyOcean-2 project the toolbox is frequently used to validate the forecast products of the Baltic Sea Marine Forecasting Centre. Hereby the performance of any new product version is compared with the previous version. Although, the toolbox is mainly tested for the Baltic Sea yet, it can easily be adapted to different datasets and parameters, regardless of the geographic region. In this presentation the usability of the toolbox is demonstrated along with several results of the validation process.
Comparative Analysis of Vertebrate Diurnal/Circadian Transcriptomes
Boyle, Greg; Richter, Kerstin; Priest, Henry D.; Traver, David; Mockler, Todd C.; Chang, Jeffrey T.; Kay, Steve A.
2017-01-01
From photosynthetic bacteria to mammals, the circadian clock evolved to track diurnal rhythms and enable organisms to anticipate daily recurring changes such as temperature and light. It orchestrates a broad spectrum of physiology such as the sleep/wake and eating/fasting cycles. While we have made tremendous advances in our understanding of the molecular details of the circadian clock mechanism and how it is synchronized with the environment, we still have rudimentary knowledge regarding its connection to help regulate diurnal physiology. One potential reason is the sheer size of the output network. Diurnal/circadian transcriptomic studies are reporting that around 10% of the expressed genome is rhythmically controlled. Zebrafish is an important model system for the study of the core circadian mechanism in vertebrate. As Zebrafish share more than 70% of its genes with human, it could also be an additional model in addition to rodent for exploring the diurnal/circadian output with potential for translational relevance. Here we performed comparative diurnal/circadian transcriptome analysis with established mouse liver and other tissue datasets. First, by combining liver tissue sampling in a 48h time series, transcription profiling using oligonucleotide arrays and bioinformatics analysis, we profiled rhythmic transcripts and identified 2609 rhythmic genes. The comparative analysis revealed interesting features of the output network regarding number of rhythmic genes, proportion of tissue specific genes and the extent of transcription factor family expression. Undoubtedly, the Zebrafish model system will help identify new vertebrate outputs and their regulators and provides leads for further characterization of the diurnal cis-regulatory network. PMID:28076377
NASA SPoRT Initialization Datasets for Local Model Runs in the Environmental Modeling System
NASA Technical Reports Server (NTRS)
Case, Jonathan L.; LaFontaine, Frank J.; Molthan, Andrew L.; Carcione, Brian; Wood, Lance; Maloney, Joseph; Estupinan, Jeral; Medlin, Jeffrey M.; Blottman, Peter; Rozumalski, Robert A.
2011-01-01
The NASA Short-term Prediction Research and Transition (SPoRT) Center has developed several products for its National Weather Service (NWS) partners that can be used to initialize local model runs within the Weather Research and Forecasting (WRF) Environmental Modeling System (EMS). These real-time datasets consist of surface-based information updated at least once per day, and produced in a composite or gridded product that is easily incorporated into the WRF EMS. The primary goal for making these NASA datasets available to the WRF EMS community is to provide timely and high-quality information at a spatial resolution comparable to that used in the local model configurations (i.e., convection-allowing scales). The current suite of SPoRT products supported in the WRF EMS include a Sea Surface Temperature (SST) composite, a Great Lakes sea-ice extent, a Greenness Vegetation Fraction (GVF) composite, and Land Information System (LIS) gridded output. The SPoRT SST composite is a blend of primarily the Moderate Resolution Imaging Spectroradiometer (MODIS) infrared and Advanced Microwave Scanning Radiometer for Earth Observing System data for non-precipitation coverage over the oceans at 2-km resolution. The composite includes a special lake surface temperature analysis over the Great Lakes using contributions from the Remote Sensing Systems temperature data. The Great Lakes Environmental Research Laboratory Ice Percentage product is used to create a sea-ice mask in the SPoRT SST composite. The sea-ice mask is produced daily (in-season) at 1.8-km resolution and identifies ice percentage from 0 100% in 10% increments, with values above 90% flagged as ice.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Integrating UAV Flight outputs in Esri's CityEngine for semi-urban areas
NASA Astrophysics Data System (ADS)
Anca, Paula; Vasile, Alexandru; Sandric, Ionut
2016-04-01
One of the most pervasive technologies of recent years, which has crossed over into consumer products due to its lowering prince, is the UAV, commonly known as drones. Besides its ever-more accessible prices and growing functionality, what is truly impressive is the drastic reduction in processing time, from days to ours: from the initial flight preparation to the final output. This paper presents such a workflow and goes further by integrating the outputs into another growing technology: 3D. The software used for this purpose is Esri's CityEngine, which was developed for modeling 3D urban environments using existing 2D GIS data and computer generated architecture (CGA) rules, instead of modeling each feature individually. A semi-urban areas was selected for this study and captured using the E-Bee from Parrot. The output point cloud elevation from the E-Bee flight was transformed into a raster in order to be used as an elevation surface in CityEngine, and the mosaic raster dataset was draped over this surface. In order to model the buildings in this area CGA rules were written using the building footprints, as inputs, in the form of Feature Classes. The extrusion heights for the buildings were also extracted from the point cloud, and realistic textures were draped over the 3D building models. Finally the scene was shared as a 3D web-scene which can be accessed by anyone through a link, without any software besides an internet browser. This can serve as input for Smart City development through further analysis for urban ecology Keywords: 3D, drone, CityEngine, E-Bee, Esri, scene, web-scene
Chanel, Guillaume; Pichon, Swann; Conty, Laurence; Berthoz, Sylvie; Chevallier, Coralie; Grèzes, Julie
2015-01-01
Multivariate pattern analysis (MVPA) has been applied successfully to task-based and resting-based fMRI recordings to investigate which neural markers distinguish individuals with autistic spectrum disorders (ASD) from controls. While most studies have focused on brain connectivity during resting state episodes and regions of interest approaches (ROI), a wealth of task-based fMRI datasets have been acquired in these populations in the last decade. This calls for techniques that can leverage information not only from a single dataset, but from several existing datasets that might share some common features and biomarkers. We propose a fully data-driven (voxel-based) approach that we apply to two different fMRI experiments with social stimuli (faces and bodies). The method, based on Support Vector Machines (SVMs) and Recursive Feature Elimination (RFE), is first trained for each experiment independently and each output is then combined to obtain a final classification output. Second, this RFE output is used to determine which voxels are most often selected for classification to generate maps of significant discriminative activity. Finally, to further explore the clinical validity of the approach, we correlate phenotypic information with obtained classifier scores. The results reveal good classification accuracy (range between 69% and 92.3%). Moreover, we were able to identify discriminative activity patterns pertaining to the social brain without relying on a priori ROI definitions. Finally, social motivation was the only dimension which correlated with classifier scores, suggesting that it is the main dimension captured by the classifiers. Altogether, we believe that the present RFE method proves to be efficient and may help identifying relevant biomarkers by taking advantage of acquired task-based fMRI datasets in psychiatric populations. PMID:26793434
2016-01-01
Although heavy-tailed fluctuations are ubiquitous in complex systems, a good understanding of the mechanisms that generate them is still lacking. Optical complex systems are ideal candidates for investigating heavy-tailed fluctuations, as they allow recording large datasets under controllable experimental conditions. A dynamical regime that has attracted a lot of attention over the years is the so-called low-frequency fluctuations (LFFs) of semiconductor lasers with optical feedback. In this regime, the laser output intensity is characterized by abrupt and apparently random dropouts. The statistical analysis of the inter-dropout-intervals (IDIs) has provided many useful insights into the underlying dynamics. However, the presence of large temporal fluctuations in the IDI sequence has not yet been investigated. Here, by applying fluctuation analysis we show that the experimental distribution of IDI fluctuations is heavy-tailed, and specifically, is well-modeled by a non-Gaussian stable distribution. We find a good qualitative agreement with simulations of the Lang-Kobayashi model. Moreover, we uncover a transition from a less-heavy-tailed state at low pump current to a more-heavy-tailed state at higher pump current. Our results indicate that fluctuation analysis can be a useful tool for investigating the output signals of complex optical systems; it can be used for detecting underlying regime shifts, for model validation and parameter estimation. PMID:26901346
Statistical link between external climate forcings and modes of ocean variability
NASA Astrophysics Data System (ADS)
Malik, Abdul; Brönnimann, Stefan; Perona, Paolo
2017-07-01
In this study we investigate statistical link between external climate forcings and modes of ocean variability on inter-annual (3-year) to centennial (100-year) timescales using de-trended semi-partial-cross-correlation analysis technique. To investigate this link we employ observations (AD 1854-1999), climate proxies (AD 1600-1999), and coupled Atmosphere-Ocean-Chemistry Climate Model simulations with SOCOL-MPIOM (AD 1600-1999). We find robust statistical evidence that Atlantic multi-decadal oscillation (AMO) has intrinsic positive correlation with solar activity in all datasets employed. The strength of the relationship between AMO and solar activity is modulated by volcanic eruptions and complex interaction among modes of ocean variability. The observational dataset reveals that El Niño southern oscillation (ENSO) has statistically significant negative intrinsic correlation with solar activity on decadal to multi-decadal timescales (16-27-year) whereas there is no evidence of a link on a typical ENSO timescale (2-7-year). In the observational dataset, the volcanic eruptions do not have a link with AMO on a typical AMO timescale (55-80-year) however the long-term datasets (proxies and SOCOL-MPIOM output) show that volcanic eruptions have intrinsic negative correlation with AMO on inter-annual to multi-decadal timescales. The Pacific decadal oscillation has no link with solar activity, however, it has positive intrinsic correlation with volcanic eruptions on multi-decadal timescales (47-54-year) in reconstruction and decadal to multi-decadal timescales (16-32-year) in climate model simulations. We also find evidence of a link between volcanic eruptions and ENSO, however, the sign of relationship is not consistent between observations/proxies and climate model simulations.
LIME: 3D visualisation and interpretation of virtual geoscience models
NASA Astrophysics Data System (ADS)
Buckley, Simon; Ringdal, Kari; Dolva, Benjamin; Naumann, Nicole; Kurz, Tobias
2017-04-01
Three-dimensional and photorealistic acquisition of surface topography, using methods such as laser scanning and photogrammetry, has become widespread across the geosciences over the last decade. With recent innovations in photogrammetric processing software, robust and automated data capture hardware, and novel sensor platforms, including unmanned aerial vehicles, obtaining 3D representations of exposed topography has never been easier. In addition to 3D datasets, fusion of surface geometry with imaging sensors, such as multi/hyperspectral, thermal and ground-based InSAR, and geophysical methods, create novel and highly visual datasets that provide a fundamental spatial framework to address open geoscience research questions. Although data capture and processing routines are becoming well-established and widely reported in the scientific literature, challenges remain related to the analysis, co-visualisation and presentation of 3D photorealistic models, especially for new users (e.g. students and scientists new to geomatics methods). Interpretation and measurement is essential for quantitative analysis of 3D datasets, and qualitative methods are valuable for presentation purposes, for planning and in education. Motivated by this background, the current contribution presents LIME, a lightweight and high performance 3D software for interpreting and co-visualising 3D models and related image data in geoscience applications. The software focuses on novel data integration and visualisation of 3D topography with image sources such as hyperspectral imagery, logs and interpretation panels, geophysical datasets and georeferenced maps and images. High quality visual output can be generated for dissemination purposes, to aid researchers with communication of their research results. The background of the software is described and case studies from outcrop geology, in hyperspectral mineral mapping and geophysical-geospatial data integration are used to showcase the novel methods developed.
The credibility challenge for global fluvial flood risk analysis
NASA Astrophysics Data System (ADS)
Trigg, M. A.; Birch, C. E.; Neal, J. C.; Bates, P. D.; Smith, A.; Sampson, C. C.; Yamazaki, D.; Hirabayashi, Y.; Pappenberger, F.; Dutra, E.; Ward, P. J.; Winsemius, H. C.; Salamon, P.; Dottori, F.; Rudari, R.; Kappes, M. S.; Simpson, A. L.; Hadzilacos, G.; Fewtrell, T. J.
2016-09-01
Quantifying flood hazard is an essential component of resilience planning, emergency response, and mitigation, including insurance. Traditionally undertaken at catchment and national scales, recently, efforts have intensified to estimate flood risk globally to better allow consistent and equitable decision making. Global flood hazard models are now a practical reality, thanks to improvements in numerical algorithms, global datasets, computing power, and coupled modelling frameworks. Outputs of these models are vital for consistent quantification of global flood risk and in projecting the impacts of climate change. However, the urgency of these tasks means that outputs are being used as soon as they are made available and before such methods have been adequately tested. To address this, we compare multi-probability flood hazard maps for Africa from six global models and show wide variation in their flood hazard, economic loss and exposed population estimates, which has serious implications for model credibility. While there is around 30%-40% agreement in flood extent, our results show that even at continental scales, there are significant differences in hazard magnitude and spatial pattern between models, notably in deltas, arid/semi-arid zones and wetlands. This study is an important step towards a better understanding of modelling global flood hazard, which is urgently required for both current risk and climate change projections.
NASA Technical Reports Server (NTRS)
Gardner, Adrian
2010-01-01
National Aeronautical and Space Administration (NASA) weather and atmospheric environmental organizations are insatiable consumers of geophysical, hydrometeorological and solar weather statistics. The expanding array of internet-worked sensors producing targeted physical measurements has generated an almost factorial explosion of near real-time inputs to topical statistical datasets. Normalizing and value-based parsing of such statistical datasets in support of time-constrained weather and environmental alerts and warnings is essential, even with dedicated high-performance computational capabilities. What are the optimal indicators for advanced decision making? How do we recognize the line between sufficient statistical sampling and excessive, mission destructive sampling ? How do we assure that the normalization and parsing process, when interpolated through numerical models, yields accurate and actionable alerts and warnings? This presentation will address the integrated means and methods to achieve desired outputs for NASA and consumers of its data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendoza, Paul Michael
2016-08-31
The project goals seek to develop applications in order to automate MCNP criticality benchmark execution; create a dataset containing static benchmark information; combine MCNP output with benchmark information; and fit and visually represent data.
NASA Astrophysics Data System (ADS)
Zhou, S.; Tao, W. K.; Li, X.; Matsui, T.; Sun, X. H.; Yang, X.
2015-12-01
A cloud-resolving model (CRM) is an atmospheric numerical model that can numerically resolve clouds and cloud systems at 0.25~5km horizontal grid spacings. The main advantage of the CRM is that it can allow explicit interactive processes between microphysics, radiation, turbulence, surface, and aerosols without subgrid cloud fraction, overlapping and convective parameterization. Because of their fine resolution and complex physical processes, it is challenging for the CRM community to i) visualize/inter-compare CRM simulations, ii) diagnose key processes for cloud-precipitation formation and intensity, and iii) evaluate against NASA's field campaign data and L1/L2 satellite data products due to large data volume (~10TB) and complexity of CRM's physical processes. We have been building the Super Cloud Library (SCL) upon a Hadoop framework, capable of CRM database management, distribution, visualization, subsetting, and evaluation in a scalable way. The current SCL capability includes (1) A SCL data model enables various CRM simulation outputs in NetCDF, including the NASA-Unified Weather Research and Forecasting (NU-WRF) and Goddard Cumulus Ensemble (GCE) model, to be accessed and processed by Hadoop, (2) A parallel NetCDF-to-CSV converter supports NU-WRF and GCE model outputs, (3) A technique visualizes Hadoop-resident data with IDL, (4) A technique subsets Hadoop-resident data, compliant to the SCL data model, with HIVE or Impala via HUE's Web interface, (5) A prototype enables a Hadoop MapReduce application to dynamically access and process data residing in a parallel file system, PVFS2 or CephFS, where high performance computing (HPC) simulation outputs such as NU-WRF's and GCE's are located. We are testing Apache Spark to speed up SCL data processing and analysis.With the SCL capabilities, SCL users can conduct large-domain on-demand tasks without downloading voluminous CRM datasets and various observations from NASA Field Campaigns and Satellite data to a local computer, and inter-compare CRM output and data with GCE and NU-WRF.
NASA AVOSS Fast-Time Wake Prediction Models: User's Guide
NASA Technical Reports Server (NTRS)
Ahmad, Nash'at N.; VanValkenburg, Randal L.; Pruis, Matthew
2014-01-01
The National Aeronautics and Space Administration (NASA) is developing and testing fast-time wake transport and decay models to safely enhance the capacity of the National Airspace System (NAS). The fast-time wake models are empirical algorithms used for real-time predictions of wake transport and decay based on aircraft parameters and ambient weather conditions. The aircraft dependent parameters include the initial vortex descent velocity and the vortex pair separation distance. The atmospheric initial conditions include vertical profiles of temperature or potential temperature, eddy dissipation rate, and crosswind. The current distribution includes the latest versions of the APA (3.4) and the TDP (2.1) models. This User's Guide provides detailed information on the model inputs, file formats, and the model output. An example of a model run and a brief description of the Memphis 1995 Wake Vortex Dataset is also provided.
NASA Astrophysics Data System (ADS)
Vionnet, Vincent; Six, Delphine; Auger, Ludovic; Lafaysse, Matthieu; Quéno, Louis; Réveillet, Marion; Dombrowski-Etchevers, Ingrid; Thibert, Emmanuel; Dumont, Marie
2017-04-01
Capturing spatial and temporal variabilities of meteorological conditions at fine scale is necessary for modelling snowpack and glacier winter mass balance in alpine terrain. In particular, precipitation amount and phase are strongly influenced by the complex topography. In this study, we assess the impact of three sub-kilometer precipitation datasets (rainfall and snowfall) on distributed simulations of snowpack and glacier winter mass balance with the detailed snowpack model Crocus for winter 2011-2012. The different precipitation datasets at 500-m grid spacing over part of the French Alps (200*200 km2 area) are coming either from (i) the SAFRAN precipitation analysis specially developed for alpine terrain, or from (ii) operational outputs of the atmospheric model AROME at 2.5-km grid spacing downscaled to 500 m with fixed lapse rate or from (iii) a version of the atmospheric model AROME at 500-m grid spacing. Others atmospherics forcings (air temperature and humidity, incoming longwave and shortwave radiation, wind speed) are taken from the AROME simulations at 500-m grid spacing. These atmospheric forcings are firstly compared against a network of automatic weather stations. Results are analysed with respect to station location (valley, mid- and high-altitude). The spatial pattern of seasonal snowfall and its dependency with elevation is then analysed for the different precipitation datasets. Large differences between SAFRAN and the two versions of AROME are found at high-altitude. Finally, results of Crocus snowpack simulations are evaluated against (i) punctual in-situ measurements of snow depth and snow water equivalent, and (ii) maps of snow covered areas retrieved from optical satellite data (MODIS). Measurements of winter accumulation of six glaciers of the French Alps are also used and provide very valuable information on precipitation at high-altitude where the conventional observation network is scarce. This study illustrates the potential and limitations of high-resolution atmospheric models to drive simulations of snowpack and glacier winter mass balance in alpine terrain.
Modelling the distribution of chickens, ducks, and geese in China
Prosser, Diann J.; Wu, Junxi; Ellis, Erie C.; Gale, Fred; Van Boeckel, Thomas P.; Wint, William; Robinson, Tim; Xiao, Xiangming; Gilbert, Marius
2011-01-01
Global concerns over the emergence of zoonotic pandemics emphasize the need for high-resolution population distribution mapping and spatial modelling. Ongoing efforts to model disease risk in China have been hindered by a lack of available species level distribution maps for poultry. The goal of this study was to develop 1 km resolution population density models for China's chickens, ducks, and geese. We used an information theoretic approach to predict poultry densities based on statistical relationships between poultry census data and high-resolution agro-ecological predictor variables. Model predictions were validated by comparing goodness of fit measures (root mean square error and correlation coefficient) for observed and predicted values for 1/4 of the sample data which were not used for model training. Final output included mean and coefficient of variation maps for each species. We tested the quality of models produced using three predictor datasets and 4 regional stratification methods. For predictor variables, a combination of traditional predictors for livestock mapping and land use predictors produced the best goodness of fit scores. Comparison of regional stratifications indicated that for chickens and ducks, a stratification based on livestock production systems produced the best results; for geese, an agro-ecological stratification produced best results. However, for all species, each method of regional stratification produced significantly better goodness of fit scores than the global model. Here we provide descriptive methods, analytical comparisons, and model output for China's first high resolution, species level poultry distribution maps. Output will be made available to the scientific and public community for use in a wide range of applications from epidemiological studies to livestock policy and management initiatives.
Modelling the distribution of chickens, ducks, and geese in China
Prosser, Diann J.; Wu, Junxi; Ellis, Erle C.; Gale, Fred; Van Boeckel, Thomas P.; Wint, William; Robinson, Tim; Xiao, Xiangming; Gilbert, Marius
2011-01-01
Global concerns over the emergence of zoonotic pandemics emphasize the need for high-resolution population distribution mapping and spatial modelling. Ongoing efforts to model disease risk in China have been hindered by a lack of available species level distribution maps for poultry. The goal of this study was to develop 1 km resolution population density models for China’s chickens, ducks, and geese. We used an information theoretic approach to predict poultry densities based on statistical relationships between poultry census data and high-resolution agro-ecological predictor variables. Model predictions were validated by comparing goodness of fit measures (root mean square error and correlation coefficient) for observed and predicted values for ¼ of the sample data which was not used for model training. Final output included mean and coefficient of variation maps for each species. We tested the quality of models produced using three predictor datasets and 4 regional stratification methods. For predictor variables, a combination of traditional predictors for livestock mapping and land use predictors produced the best goodness of fit scores. Comparison of regional stratifications indicated that for chickens and ducks, a stratification based on livestock production systems produced the best results; for geese, an agro-ecological stratification produced best results. However, for all species, each method of regional stratification produced significantly better goodness of fit scores than the global model. Here we provide descriptive methods, analytical comparisons, and model output for China’s first high resolution, species level poultry distribution maps. Output will be made available to the scientific and public community for use in a wide range of applications from epidemiological studies to livestock policy and management initiatives. PMID:21765567
A first approach to the distortion analysis of nonlinear analog circuits utilizing X-parameters
NASA Astrophysics Data System (ADS)
Weber, H.; Widemann, C.; Mathis, W.
2013-07-01
In this contribution a first approach to the distortion analysis of nonlinear 2-port-networks with X-parameters1 is presented. The X-parameters introduced by Verspecht and Root (2006) offer the possibility to describe nonlinear microwave 2-port-networks under large signal conditions. On the basis of X-parameter measurements with a nonlinear network analyzer (NVNA) behavioral models can be extracted for the networks. These models can be used to consider the nonlinear behavior during the design process of microwave circuits. The idea of the present work is to extract the behavioral models in order to describe the influence of interfering signals on the output behavior of the nonlinear circuits. Hereby, a simulator is used instead of a NVNA to extract the X-parameters. Assuming that the interfering signals are relatively small compared to the nominal input signal, the output signal can be described as a superposition of the effects of each input signal. In order to determine the functional correlation between the scattering variables, a polynomial dependency is assumed. The required datasets for the approximation of the describing functions are simulated by a directional coupler model in Cadence Design Framework. The polynomial coefficients are obtained by a least-square method. The resulting describing functions can be used to predict the system's behavior under certain conditions as well as the effects of the interfering signal on the output signal. 1 X-parameter is a registered trademark of Agilent Technologies, Inc.
NASA Technical Reports Server (NTRS)
Case, Jonathan L.; Kumar, Sujay V.; Kuligowski, Robert J.; Langston, Carrie
2013-01-01
The NASA Short ]term Prediction Research and Transition (SPoRT) Center in Huntsville, AL is running a real ]time configuration of the NASA Land Information System (LIS) with the Noah land surface model (LSM). Output from the SPoRT ]LIS run is used to initialize land surface variables for local modeling applications at select National Weather Service (NWS) partner offices, and can be displayed in decision support systems for situational awareness and drought monitoring. The SPoRT ]LIS is run over a domain covering the southern and eastern United States, fully nested within the National Centers for Environmental Prediction Stage IV precipitation analysis grid, which provides precipitation forcing to the offline LIS ]Noah runs. The SPoRT Center seeks to expand the real ]time LIS domain to the entire Continental U.S. (CONUS); however, geographical limitations with the Stage IV analysis product have inhibited this expansion. Therefore, a goal of this study is to test alternative precipitation forcing datasets that can enable the LIS expansion by improving upon the current geographical limitations of the Stage IV product. The four precipitation forcing datasets that are inter ]compared on a 4 ]km resolution CONUS domain include the Stage IV, an experimental GOES quantitative precipitation estimate (QPE) from NESDIS/STAR, the National Mosaic and QPE (NMQ) product from the National Severe Storms Laboratory, and the North American Land Data Assimilation System phase 2 (NLDAS ]2) analyses. The NLDAS ]2 dataset is used as the control run, with each of the other three datasets considered experimental runs compared against the control. The regional strengths, weaknesses, and biases of each precipitation analysis are identified relative to the NLDAS ]2 control in terms of accumulated precipitation pattern and amount, and the impacts on the subsequent LSM spin ]up simulations. The ultimate goal is to identify an alternative precipitation forcing dataset that can best support an expansion of the real ]time SPoRT ]LIS to a domain covering the entire CONUS.
Multi-Target Regression via Robust Low-Rank Learning.
Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo
2018-02-01
Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.
ParaView visualization of Abaqus output on the mechanical deformation of complex microstructures
NASA Astrophysics Data System (ADS)
Liu, Qingbin; Li, Jiang; Liu, Jie
2017-02-01
Abaqus® is a popular software suite for finite element analysis. It delivers linear and nonlinear analyses of mechanical and fluid dynamics, includes multi-body system and multi-physics coupling. However, the visualization capability of Abaqus using its CAE module is limited. Models from microtomography have extremely complicated structures, and datasets of Abaqus output are huge, requiring a visualization tool more powerful than Abaqus/CAE. We convert Abaqus output into the XML-based VTK format by developing a Python script and then using ParaView to visualize the results. Such capabilities as volume rendering, tensor glyphs, superior animation and other filters allow ParaView to offer excellent visualizing manifestations. ParaView's parallel visualization makes it possible to visualize very big data. To support full parallel visualization, the Python script achieves data partitioning by reorganizing all nodes, elements and the corresponding results on those nodes and elements. The data partition scheme minimizes data redundancy and works efficiently. Given its good readability and extendibility, the script can be extended to the processing of more different problems in Abaqus. We share the script with Abaqus users on GitHub.
AMModels: An R package for storing models, data, and metadata to facilitate adaptive management
Katz, Jonathan E.
2018-01-01
Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date. PMID:29489825
AMModels: An R package for storing models, data, and metadata to facilitate adaptive management.
Donovan, Therese M; Katz, Jonathan E
2018-01-01
Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date.
Hay, Lauren E.; LaFontaine, Jacob H.; Markstrom, Steven
2014-01-01
The accuracy of statistically downscaled general circulation model (GCM) simulations of daily surface climate for historical conditions (1961–99) and the implications when they are used to drive hydrologic and stream temperature models were assessed for the Apalachicola–Chattahoochee–Flint River basin (ACFB). The ACFB is a 50 000 km2 basin located in the southeastern United States. Three GCMs were statistically downscaled, using an asynchronous regional regression model (ARRM), to ⅛° grids of daily precipitation and minimum and maximum air temperature. These ARRM-based climate datasets were used as input to the Precipitation-Runoff Modeling System (PRMS), a deterministic, distributed-parameter, physical-process watershed model used to simulate and evaluate the effects of various combinations of climate and land use on watershed response. The ACFB was divided into 258 hydrologic response units (HRUs) in which the components of flow (groundwater, subsurface, and surface) are computed in response to climate, land surface, and subsurface characteristics of the basin. Daily simulations of flow components from PRMS were used with the climate to simulate in-stream water temperatures using the Stream Network Temperature (SNTemp) model, a mechanistic, one-dimensional heat transport model for branched stream networks.The climate, hydrology, and stream temperature for historical conditions were evaluated by comparing model outputs produced from historical climate forcings developed from gridded station data (GSD) versus those produced from the three statistically downscaled GCMs using the ARRM methodology. The PRMS and SNTemp models were forced with the GSD and the outputs produced were treated as “truth.” This allowed for a spatial comparison by HRU of the GSD-based output with ARRM-based output. Distributional similarities between GSD- and ARRM-based model outputs were compared using the two-sample Kolmogorov–Smirnov (KS) test in combination with descriptive metrics such as the mean and variance and an evaluation of rare and sustained events. In general, precipitation and streamflow quantities were negatively biased in the downscaled GCM outputs, and results indicate that the downscaled GCM simulations consistently underestimate the largest precipitation events relative to the GSD. The KS test results indicate that ARRM-based air temperatures are similar to GSD at the daily time step for the majority of the ACFB, with perhaps subweekly averaging for stream temperature. Depending on GCM and spatial location, ARRM-based precipitation and streamflow requires averaging of up to 30 days to become similar to the GSD-based output.Evaluation of the model skill for historical conditions suggests some guidelines for use of future projections; while it seems correct to place greater confidence in evaluation metrics which perform well historically, this does not necessarily mean those metrics will accurately reflect model outputs for future climatic conditions. Results from this study indicate no “best” overall model, but the breadth of analysis can be used to give the product users an indication of the applicability of the results to address their particular problem. Since results for historical conditions indicate that model outputs can have significant biases associated with them, the range in future projections examined in terms of change relative to historical conditions for each individual GCM may be more appropriate.
Automated Knowledge Discovery From Simulators
NASA Technical Reports Server (NTRS)
Burl, Michael; DeCoste, Dennis; Mazzoni, Dominic; Scharenbroich, Lucas; Enke, Brian; Merline, William
2007-01-01
A computational method, SimLearn, has been devised to facilitate efficient knowledge discovery from simulators. Simulators are complex computer programs used in science and engineering to model diverse phenomena such as fluid flow, gravitational interactions, coupled mechanical systems, and nuclear, chemical, and biological processes. SimLearn uses active-learning techniques to efficiently address the "landscape characterization problem." In particular, SimLearn tries to determine which regions in "input space" lead to a given output from the simulator, where "input space" refers to an abstraction of all the variables going into the simulator, e.g., initial conditions, parameters, and interaction equations. Landscape characterization can be viewed as an attempt to invert the forward mapping of the simulator and recover the inputs that produce a particular output. Given that a single simulation run can take days or weeks to complete even on a large computing cluster, SimLearn attempts to reduce costs by reducing the number of simulations needed to effect discoveries. Unlike conventional data-mining methods that are applied to static predefined datasets, SimLearn involves an iterative process in which a most informative dataset is constructed dynamically by using the simulator as an oracle. On each iteration, the algorithm models the knowledge it has gained through previous simulation trials and then chooses which simulation trials to run next. Running these trials through the simulator produces new data in the form of input-output pairs. The overall process is embodied in an algorithm that combines support vector machines (SVMs) with active learning. SVMs use learning from examples (the examples are the input-output pairs generated by running the simulator) and a principle called maximum margin to derive predictors that generalize well to new inputs. In SimLearn, the SVM plays the role of modeling the knowledge that has been gained through previous simulation trials. Active learning is used to determine which new input points would be most informative if their output were known. The selected input points are run through the simulator to generate new information that can be used to refine the SVM. The process is then repeated. SimLearn carefully balances exploration (semi-randomly searching around the input space) versus exploitation (using the current state of knowledge to conduct a tightly focused search). During each iteration, SimLearn uses not one, but an ensemble of SVMs. Each SVM in the ensemble is characterized by different hyper-parameters that control various aspects of the learned predictor - for example, whether the predictor is constrained to be very smooth (nearby points in input space lead to similar output predictions) or whether the predictor is allowed to be "bumpy." The various SVMs will have different preferences about which input points they would like to run through the simulator next. SimLearn includes a formal mechanism for balancing the ensemble SVM preferences so that a single choice can be made for the next set of trials.
Enhancing GIS Capabilities for High Resolution Earth Science Grids
NASA Astrophysics Data System (ADS)
Koziol, B. W.; Oehmke, R.; Li, P.; O'Kuinghttons, R.; Theurich, G.; DeLuca, C.
2017-12-01
Applications for high performance GIS will continue to increase as Earth system models pursue more realistic representations of Earth system processes. Finer spatial resolution model input and output, unstructured or irregular modeling grids, data assimilation, and regional coordinate systems present novel challenges for GIS frameworks operating in the Earth system modeling domain. This presentation provides an overview of two GIS-driven applications that combine high performance software with big geospatial datasets to produce value-added tools for the modeling and geoscientific community. First, a large-scale interpolation experiment using National Hydrography Dataset (NHD) catchments, a high resolution rectilinear CONUS grid, and the Earth System Modeling Framework's (ESMF) conservative interpolation capability will be described. ESMF is a parallel, high-performance software toolkit that provides capabilities (e.g. interpolation) for building and coupling Earth science applications. ESMF is developed primarily by the NOAA Environmental Software Infrastructure and Interoperability (NESII) group. The purpose of this experiment was to test and demonstrate the utility of high performance scientific software in traditional GIS domains. Special attention will be paid to the nuanced requirements for dealing with high resolution, unstructured grids in scientific data formats. Second, a chunked interpolation application using ESMF and OpenClimateGIS (OCGIS) will demonstrate how spatial subsetting can virtually remove computing resource ceilings for very high spatial resolution interpolation operations. OCGIS is a NESII-developed Python software package designed for the geospatial manipulation of high-dimensional scientific datasets. An overview of the data processing workflow, why a chunked approach is required, and how the application could be adapted to meet operational requirements will be discussed here. In addition, we'll provide a general overview of OCGIS's parallel subsetting capabilities including challenges in the design and implementation of a scientific data subsetter.
SutraPrep, a pre-processor for SUTRA, a model for ground-water flow with solute or energy transport
Provost, Alden M.
2002-01-01
SutraPrep facilitates the creation of three-dimensional (3D) input datasets for the USGS ground-water flow and transport model SUTRA Version 2D3D.1. It is most useful for applications in which the geometry of the 3D model domain and the spatial distribution of physical properties and boundary conditions is relatively simple. SutraPrep can be used to create a SUTRA main input (?.inp?) file, an initial conditions (?.ics?) file, and a 3D plot of the finite-element mesh in Virtual Reality Modeling Language (VRML) format. Input and output are text-based. The code can be run on any platform that has a standard FORTRAN-90 compiler. Executable code is available for Microsoft Windows.
A model of traffic signs recognition with convolutional neural network
NASA Astrophysics Data System (ADS)
Hu, Haihe; Li, Yujian; Zhang, Ting; Huo, Yi; Kuang, Wenqing
2016-10-01
In real traffic scenes, the quality of captured images are generally low due to some factors such as lighting conditions, and occlusion on. All of these factors are challengeable for automated recognition algorithms of traffic signs. Deep learning has provided a new way to solve this kind of problems recently. The deep network can automatically learn features from a large number of data samples and obtain an excellent recognition performance. We therefore approach this task of recognition of traffic signs as a general vision problem, with few assumptions related to road signs. We propose a model of Convolutional Neural Network (CNN) and apply the model to the task of traffic signs recognition. The proposed model adopts deep CNN as the supervised learning model, directly takes the collected traffic signs image as the input, alternates the convolutional layer and subsampling layer, and automatically extracts the features for the recognition of the traffic signs images. The proposed model includes an input layer, three convolutional layers, three subsampling layers, a fully-connected layer, and an output layer. To validate the proposed model, the experiments are implemented using the public dataset of China competition of fuzzy image processing. Experimental results show that the proposed model produces a recognition accuracy of 99.01 % on the training dataset, and yield a record of 92% on the preliminary contest within the fourth best.
Recent Greenland Thinning from Operation IceBridge ATM and LVIS Data
NASA Astrophysics Data System (ADS)
Sutterley, T. C.; Velicogna, I.
2015-12-01
We investigate regional thinning rates in Greenland using two Operation IceBridge lidar instruments, the Airborne Topographic Mapper (ATM) and the Land, Vegetation and Ice Sensor (LVIS). IceBridge and Pre-IceBridge ATM data are available from 1993 to present and IceBridge and Pre-Icebridge LVIS data are available from 2007 to present. We compare different techniques for combining the two datasets: overlapping footprints, triangulated irregular network meshing and radial basis functions. We validate the combination for periods with near term overlap of the two instruments. By combining the two lidar datasets, we are able to investigate intra-annual, annual, interannual surface elevation change. We investigate both the high melt season of 2012 and the low melt season of 2013. In addition, the major 2015 IceBridge Arctic campaign provides new crucial data for determining seasonal ice sheet thinning rates. We compare our LVIS/ATM results with surface mass balance outputs from two regional climate models: the Regional Atmospheric Climate Model (RACMO) and the Modèle Atmosphérique Régional (MAR). We also investigate the thinning rates of major outlet glaciers.
Towards estimates of future rainfall erosivity in Europe based on REDES and WorldClim datasets
NASA Astrophysics Data System (ADS)
Panagos, Panos; Ballabio, Cristiano; Meusburger, Katrin; Spinoni, Jonathan; Alewell, Christine; Borrelli, Pasquale
2017-05-01
The policy requests to develop trends in soil erosion changes can be responded developing modelling scenarios of the two most dynamic factors in soil erosion, i.e. rainfall erosivity and land cover change. The recently developed Rainfall Erosivity Database at European Scale (REDES) and a statistical approach used to spatially interpolate rainfall erosivity data have the potential to become useful knowledge to predict future rainfall erosivity based on climate scenarios. The use of a thorough statistical modelling approach (Gaussian Process Regression), with the selection of the most appropriate covariates (monthly precipitation, temperature datasets and bioclimatic layers), allowed to predict the rainfall erosivity based on climate change scenarios. The mean rainfall erosivity for the European Union and Switzerland is projected to be 857 MJ mm ha-1 h-1 yr-1 till 2050 showing a relative increase of 18% compared to baseline data (2010). The changes are heterogeneous in the European continent depending on the future projections of most erosive months (hot period: April-September). The output results report a pan-European projection of future rainfall erosivity taking into account the uncertainties of the climatic models.
The Handling of Hazard Data on a National Scale: A Case Study from the British Geological Survey
NASA Astrophysics Data System (ADS)
Royse, Katherine R.
2011-11-01
This paper reviews how hazard data and geological map data have been combined by the British Geological Survey (BGS) to produce a set of GIS-based national-scale hazard susceptibility maps for the UK. This work has been carried out over the last 9 years and as such reflects the combined outputs of a large number of researchers at BGS. The paper details the inception of these datasets from the development of the seamless digital geological map in 2001 through to the deterministic 2D hazard models produced today. These datasets currently include landslides, shrink-swell, soluble rocks, compressible and collapsible deposits, groundwater flooding, geological indicators of flooding, radon potential and potentially harmful elements in soil. These models have been created using a combination of expert knowledge (from both within BGS and from outside bodies such as the Health Protection Agency), national databases (which contain data collected over the past 175 years), multi-criteria analysis within geographical information systems and a flexible rule-based approach for each individual geohazard. By using GIS in this way, it has been possible to model the distribution and degree of geohazards across the whole of Britain.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
Towards estimates of future rainfall erosivity in Europe based on REDES and WorldClim datasets.
Panagos, Panos; Ballabio, Cristiano; Meusburger, Katrin; Spinoni, Jonathan; Alewell, Christine; Borrelli, Pasquale
2017-05-01
The policy requests to develop trends in soil erosion changes can be responded developing modelling scenarios of the two most dynamic factors in soil erosion, i.e. rainfall erosivity and land cover change. The recently developed Rainfall Erosivity Database at European Scale (REDES) and a statistical approach used to spatially interpolate rainfall erosivity data have the potential to become useful knowledge to predict future rainfall erosivity based on climate scenarios. The use of a thorough statistical modelling approach (Gaussian Process Regression), with the selection of the most appropriate covariates (monthly precipitation, temperature datasets and bioclimatic layers), allowed to predict the rainfall erosivity based on climate change scenarios. The mean rainfall erosivity for the European Union and Switzerland is projected to be 857 MJ mm ha -1 h -1 yr -1 till 2050 showing a relative increase of 18% compared to baseline data (2010). The changes are heterogeneous in the European continent depending on the future projections of most erosive months (hot period: April-September). The output results report a pan-European projection of future rainfall erosivity taking into account the uncertainties of the climatic models.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Enabling Cross-Discipline Collaboration Via a Functional Data Model
NASA Astrophysics Data System (ADS)
Lindholm, D. M.; Wilson, A.; Baltzer, T.
2016-12-01
Many research disciplines have very specialized data models that are used to express the detailed semantics that are meaningful to that community and easily utilized by their data analysis tools. While invaluable to members of that community, such expressive data structures and metadata are of little value to potential collaborators from other scientific disciplines. Many data interoperability efforts focus on the difficult task of computationally mapping concepts from one domain to another to facilitate discovery and use of data. Although these efforts are important and promising, we have found that a great deal of discovery and dataset understanding still happens at the level of less formal, personal communication. However, a significant barrier to inter-disciplinary data sharing that remains is one of data access.Scientists and data analysts continue to spend inordinate amounts of time simply trying to get data into their analysis tools. Providing data in a standard file format is often not sufficient since data can be structured in many ways. Adhering to more explicit community standards for data structure and metadata does little to help those in other communities.The Functional Data Model specializes the Relational Data Model (used by many database systems)by defining relations as functions between independent (domain) and dependent (codomain) variables. Given that arrays of data in many scientific data formats generally represent functionally related parameters (e.g. temperature as a function of space and time), the Functional Data Model is quite relevant for these datasets as well. The LaTiS software framework implements the Functional Data Model and provides a mechanism to expose an existing data source as a LaTiS dataset. LaTiS datasets can be manipulated using a Functional Algebra and output in any number of formats.LASP has successfully used the Functional Data Model and its implementation in the LaTiS software framework to bridge the gap between disparate data sources and communities. This presentation will demonstrate the utility of the Functional Data Model and how it can be used to facilitate cross-discipline collaboration.
Ducheyne, Els; Tran Minh, Nhu Nguyen; Haddad, Nabil; Bryssinckx, Ward; Buliva, Evans; Simard, Frédéric; Malik, Mamunur Rahman; Charlier, Johannes; De Waele, Valérie; Mahmoud, Osama; Mukhtar, Muhammad; Bouattour, Ali; Hussain, Abdulhafid; Hendrickx, Guy; Roiz, David
2018-02-14
Aedes-borne diseases as dengue, zika, chikungunya and yellow fever are an emerging problem worldwide, being transmitted by Aedes aegypti and Aedes albopictus. Lack of up to date information about the distribution of Aedes species hampers surveillance and control. Global databases have been compiled but these did not capture data in the WHO Eastern Mediterranean Region (EMR), and any models built using these datasets fail to identify highly suitable areas where one or both species may occur. The first objective of this study was therefore to update the existing Ae. aegypti (Linnaeus, 1762) and Ae. albopictus (Skuse, 1895) compendia and the second objective was to generate species distribution models targeted to the EMR. A final objective was to engage the WHO points of contacts within the region to provide feedback and hence validate all model outputs. The Ae. aegypti and Ae. albopictus compendia provided by Kraemer et al. (Sci Data 2:150035, 2015; Dryad Digit Repos, 2015) were used as starting points. These datasets were extended with more recent species and disease data. In the next step, these sets were filtered using the Köppen-Geiger classification and the Mahalanobis distance. The occurrence data were supplemented with pseudo-absence data as input to Random Forests. The resulting suitability and maximum risk of establishment maps were combined into hard-classified maps per country for expert validation. The EMR datasets consisted of 1995 presence locations for Ae. aegypti and 2868 presence locations for Ae. albopictus. The resulting suitability maps indicated that there exist areas with high suitability and/or maximum risk of establishment for these disease vectors in contrast with previous model output. Precipitation and host availability, expressed as population density and night-time lights, were the most important variables for Ae. aegypti. Host availability was the most important predictor in case of Ae. albopictus. Internal validation was assessed geographically. External validation showed high agreement between the predicted maps and the experts' extensive knowledge of the terrain. Maps of distribution and maximum risk of establishment were created for Ae. aegypti and Ae. albopictus for the WHO EMR. These region-specific maps highlighted data gaps and these gaps will be filled using targeted monitoring and surveillance. This will increase the awareness and preparedness of the different countries for Aedes borne diseases.
On the uncertainties associated with using gridded rainfall data as a proxy for observed
NASA Astrophysics Data System (ADS)
Tozer, C. R.; Kiem, A. S.; Verdon-Kidd, D. C.
2012-05-01
Gridded rainfall datasets are used in many hydrological and climatological studies, in Australia and elsewhere, including for hydroclimatic forecasting, climate attribution studies and climate model performance assessments. The attraction of the spatial coverage provided by gridded data is clear, particularly in Australia where the spatial and temporal resolution of the rainfall gauge network is sparse. However, the question that must be asked is whether it is suitable to use gridded data as a proxy for observed point data, given that gridded data is inherently "smoothed" and may not necessarily capture the temporal and spatial variability of Australian rainfall which leads to hydroclimatic extremes (i.e. droughts, floods). This study investigates this question through a statistical analysis of three monthly gridded Australian rainfall datasets - the Bureau of Meteorology (BOM) dataset, the Australian Water Availability Project (AWAP) and the SILO dataset. The results of the monthly, seasonal and annual comparisons show that not only are the three gridded datasets different relative to each other, there are also marked differences between the gridded rainfall data and the rainfall observed at gauges within the corresponding grids - particularly for extremely wet or extremely dry conditions. Also important is that the differences observed appear to be non-systematic. To demonstrate the hydrological implications of using gridded data as a proxy for gauged data, a rainfall-runoff model is applied to one catchment in South Australia initially using gauged data as the source of rainfall input and then gridded rainfall data. The results indicate a markedly different runoff response associated with each of the different sources of rainfall data. It should be noted that this study does not seek to identify which gridded dataset is the "best" for Australia, as each gridded data source has its pros and cons, as does gauged data. Rather, the intention is to quantify differences between various gridded data sources and how they compare with gauged data so that these differences can be considered and accounted for in studies that utilise these gridded datasets. Ultimately, if key decisions are going to be based on the outputs of models that use gridded data, an estimate (or at least an understanding) of the uncertainties relating to the assumptions made in the development of gridded data and how that gridded data compares with reality should be made.
Using iMCFA to Perform the CFA, Multilevel CFA, and Maximum Model for Analyzing Complex Survey Data.
Wu, Jiun-Yu; Lee, Yuan-Hsuan; Lin, John J H
2018-01-01
To construct CFA, MCFA, and maximum MCFA with LISREL v.8 and below, we provide iMCFA (integrated Multilevel Confirmatory Analysis) to examine the potential multilevel factorial structure in the complex survey data. Modeling multilevel structure for complex survey data is complicated because building a multilevel model is not an infallible statistical strategy unless the hypothesized model is close to the real data structure. Methodologists have suggested using different modeling techniques to investigate potential multilevel structure of survey data. Using iMCFA, researchers can visually set the between- and within-level factorial structure to fit MCFA, CFA and/or MAX MCFA models for complex survey data. iMCFA can then yield between- and within-level variance-covariance matrices, calculate intraclass correlations, perform the analyses and generate the outputs for respective models. The summary of the analytical outputs from LISREL is gathered and tabulated for further model comparison and interpretation. iMCFA also provides LISREL syntax of different models for researchers' future use. An empirical and a simulated multilevel dataset with complex and simple structures in the within or between level was used to illustrate the usability and the effectiveness of the iMCFA procedure on analyzing complex survey data. The analytic results of iMCFA using Muthen's limited information estimator were compared with those of Mplus using Full Information Maximum Likelihood regarding the effectiveness of different estimation methods.
Ecological Assimilation of Land and Climate Observations - the EALCO model
NASA Astrophysics Data System (ADS)
Wang, S.; Zhang, Y.; Trishchenko, A.
2004-05-01
Ecosystems are intrinsically dynamic and interact with climate at a highly integrated level. Climate variables are the main driving factors in controlling the ecosystem physical, physiological, and biogeochemical processes including energy balance, water balance, photosynthesis, respiration, and nutrient cycling. On the other hand, ecosystems function as an integrity and feedback on the climate system through their control on surface radiation balance, energy partitioning, and greenhouse gases exchange. To improve our capability in climate change impact assessment, a comprehensive ecosystem model is required to address the many interactions between climate change and ecosystems. In addition, different ecosystems can have very different responses to the climate change and its variation. To provide more scientific support for ecosystem impact assessment at national scale, it is imperative that ecosystem models have the capability of assimilating the large scale geospatial information including satellite observations, GIS datasets, and climate model outputs or reanalysis. The EALCO model (Ecological Assimilation of Land and Climate Observations) is developed for such purposes. EALCO includes the comprehensive interactions among ecosystem processes and climate, and assimilates a variety of remote sensing products and GIS database. It provides both national and local scale model outputs for ecosystem responses to climate change including radiation and energy balances, water conditions and hydrological cycles, carbon sequestration and greenhouse gas exchange, and nutrient (N) cycling. These results form the foundation for the assessment of climate change impact on ecosystems, their services, and adaptation options. In this poster, the main algorithms for the radiation, energy, water, carbon, and nitrogen simulations were diagrammed. Sample input data layers at Canada national scale were illustrated. Model outputs including the Canada wide spatial distributions of net radiation, evapotranspiration, gross primary production, net primary production, and net ecosystem production were discussed.
True beam commissioning experience at Nordland Hospital Trust, Norway
NASA Astrophysics Data System (ADS)
Daci, Lulzime; Malkaj, Partizan
2016-03-01
To evaluate the measured of all photon beam data of first Varian True Beam version 2.0 slim model, recently commissioned at Nordland Hospital Trust, Bodø. To compare and evaluate the possibility of beam matching with the Clinac2300, for the energies of 6MV and 15 MV. Materials/Methods: Measurements of PDD, OAR, and Output factors were realized with the IBA Blue-phantom with different detectors and evaluated between them for all photon energies: 6MV, 15MV, 6MV FFF and 10MV FFF. The ionization chambers used were Pin Point CC01, CC04, Semiflex CC13 and photon diode by Iba dosimetry. The data were processed using Beizer algorithm with a resolution of 1 mm. The measured depth dose curves, diagonals, OAR, and output factors were imported into Eclipse in order to calculate beam data for the anisotropic analytical algorithm (AAA version 10.0.28) for both the dataset measured with CC04 and CC13 and compared. The model head of 23EX was selected as the most near model to True Beam as a restriction of our version of Aria. It was seen that better results were achieved with the CC04 measured data as a result of better resolution. For the biggest field after 10 cm depth a larger difference is seen between measured and calculated for both dataset, but it is within the criteria for acceptance. Results: The Beam analysis criteria of 2 mm at 50% dose is achieved for all the fields accept for 40x40 that is within 3%. Depth difference at maximum dose is within 1 mm for all the fields and dose difference at 100 mm and 200 mm is lower than 1% for or all the fields. The PDD between two machines for all the fields differ after Dmax with less than 1%. For profiles in the field zone and outside field the difference is within 1% for all the fields. In the penumbra region the difference is from 2% up to 12% for big fields. As for diagonals they differ as a result of the head construction at the edge of the field and the penumbra region. The output factors differ for big fields within 5% and for the small fields within 3%. MU and dose distribution does not change for plans recalculated with the new modeled machine.
Better ILP models for haplotype assembly.
Etemadi, Maryam; Bagherian, Mehri; Chen, Zhi-Zhong; Wang, Lusheng
2018-02-19
The haplotype assembly problem for diploid is to find a pair of haplotypes from a given set of aligned Single Nucleotide Polymorphism (SNP) fragments (reads). It has many applications in association studies, drug design, and genetic research. Since this problem is computationally hard, both heuristic and exact algorithms have been designed for it. Although exact algorithms are much slower, they are still of great interest because they usually output significantly better solutions than heuristic algorithms in terms of popular measures such as the Minimum Error Correction (MEC) score, the number of switch errors, and the QAN50 score. Exact algorithms are also valuable because they can be used to witness how good a heuristic algorithm is. The best known exact algorithm is based on integer linear programming (ILP) and it is known that ILP can also be used to improve the output quality of every heuristic algorithm with a little decline in speed. Therefore, faster ILP models for the problem are highly demanded. As in previous studies, we consider not only the general case of the problem but also its all-heterozygous case where we assume that if a column of the input read matrix contains at least one 0 and one 1, then it corresponds to a heterozygous SNP site. For both cases, we design new ILP models for the haplotype assembly problem which aim at minimizing the MEC score. The new models are theoretically better because they contain significantly fewer constraints. More importantly, our experimental results show that for both simulated and real datasets, the new model for the all-heterozygous (respectively, general) case can usually be solved via CPLEX (an ILP solver) at least 5 times (respectively, twice) faster than the previous bests. Indeed, the running time can sometimes be 41 times better. This paper proposes a new ILP model for the haplotype assembly problem and its all-heterozygous case, respectively. Experiments with both real and simulated datasets show that the new models can be solved within much shorter time by CPLEX than the previous bests. We believe that the models can be used to improve heuristic algorithms as well.
NASA Astrophysics Data System (ADS)
Habarulema, J. B.; McKinnell, L.-A.
2012-05-01
In this work, results obtained by investigating the application of different neural network backpropagation training algorithms are presented. This was done to assess the performance accuracy of each training algorithm in total electron content (TEC) estimations using identical datasets in models development and verification processes. Investigated training algorithms are standard backpropagation (SBP), backpropagation with weight delay (BPWD), backpropagation with momentum (BPM) term, backpropagation with chunkwise weight update (BPC) and backpropagation for batch (BPB) training. These five algorithms are inbuilt functions within the Stuttgart Neural Network Simulator (SNNS) and the main objective was to find out the training algorithm that generates the minimum error between the TEC derived from Global Positioning System (GPS) observations and the modelled TEC data. Another investigated algorithm is the MatLab based Levenberg-Marquardt backpropagation (L-MBP), which achieves convergence after the least number of iterations during training. In this paper, neural network (NN) models were developed using hourly TEC data (for 8 years: 2000-2007) derived from GPS observations over a receiver station located at Sutherland (SUTH) (32.38° S, 20.81° E), South Africa. Verification of the NN models for all algorithms considered was performed on both "seen" and "unseen" data. Hourly TEC values over SUTH for 2003 formed the "seen" dataset. The "unseen" dataset consisted of hourly TEC data for 2002 and 2008 over Cape Town (CPTN) (33.95° S, 18.47° E) and SUTH, respectively. The models' verification showed that all algorithms investigated provide comparable results statistically, but differ significantly in terms of time required to achieve convergence during input-output data training/learning. This paper therefore provides a guide to neural network users for choosing appropriate algorithms based on the availability of computation capabilities used for research.
NASA Astrophysics Data System (ADS)
Chegwidden, O.; Nijssen, B.; Rupp, D. E.; Kao, S. C.; Clark, M. P.
2017-12-01
We describe results from a large hydrologic climate change dataset developed across the Pacific Northwestern United States and discuss how the analysis of those results can be seen as a framework for other large hydrologic ensemble investigations. This investigation will better inform future modeling efforts and large ensemble analyses across domains within and beyond the Pacific Northwest. Using outputs from the Coupled Model Intercomparison Project Phase 5 (CMIP5), we provide projections of hydrologic change for the domain through the end of the 21st century. The dataset is based upon permutations of four methodological choices: (1) ten global climate models (2) two representative concentration pathways (3) three meteorological downscaling methods and (4) four unique hydrologic model set-ups (three of which entail the same hydrologic model using independently calibrated parameter sets). All simulations were conducted across the Columbia River Basin and Pacific coastal drainages at a 1/16th ( 6 km) resolution and at a daily timestep. In total, the 172 distinct simulations offer an updated, comprehensive view of climate change projections through the end of the 21st century. The results consist of routed streamflow at 400 sites throughout the domain as well as distributed spatial fields of relevant hydrologic variables like snow water equivalent and soil moisture. In this presentation, we discuss the level of agreement with previous hydrologic projections for the study area and how these projections differ with specific methodological choices. By controlling for some methodological choices we can show how each choice affects key climatic change metrics. We discuss how the spread in results varies across hydroclimatic regimes. We will use this large dataset as a case study for distilling a wide range of hydroclimatological projections into useful climate change assessments.
NASA Astrophysics Data System (ADS)
Scherstjanoi, M.; Kaplan, J. O.; Thürig, E.; Lischke, H.
2013-09-01
Models of vegetation dynamics that are designed for application at spatial scales larger than individual forest gaps suffer from several limitations. Typically, either a population average approximation is used that results in unrealistic tree allometry and forest stand structure, or models have a high computational demand because they need to simulate both a series of age-based cohorts and a number of replicate patches to account for stochastic gap-scale disturbances. The detail required by the latter method increases the number of calculations by two to three orders of magnitude compared to the less realistic population average approach. In an effort to increase the efficiency of dynamic vegetation models without sacrificing realism, we developed a new method for simulating stand-replacing disturbances that is both accurate and faster than approaches that use replicate patches. The GAPPARD (approximating GAP model results with a Probabilistic Approach to account for stand Replacing Disturbances) method works by postprocessing the output of deterministic, undisturbed simulations of a cohort-based vegetation model by deriving the distribution of patch ages at any point in time on the basis of a disturbance probability. With this distribution, the expected value of any output variable can be calculated from the output values of the deterministic undisturbed run at the time corresponding to the patch age. To account for temporal changes in model forcing (e.g., as a result of climate change), GAPPARD performs a series of deterministic simulations and interpolates between the results in the postprocessing step. We integrated the GAPPARD method in the vegetation model LPJ-GUESS, and evaluated it in a series of simulations along an altitudinal transect of an inner-Alpine valley. We obtained results very similar to the output of the original LPJ-GUESS model that uses 100 replicate patches, but simulation time was reduced by approximately the factor 10. Our new method is therefore highly suited for rapidly approximating LPJ-GUESS results, and provides the opportunity for future studies over large spatial domains, allows easier parameterization of tree species, faster identification of areas of interesting simulation results, and comparisons with large-scale datasets and results of other forest models.
Bedmap2; Mapping, visualizing and communicating the Antarctic sub-glacial environment.
NASA Astrophysics Data System (ADS)
Fretwell, Peter; Pritchard, Hamish
2013-04-01
Bedmap2; Mapping, visualizing and communicating the Antarctic sub-glacial environment. The Bedmap2 project has been a large cooperative effort to compile, model, map and visualize the ice-rock interface beneath the Antarctic ice sheet. Here we present the final output of that project; the Bedmap2 printed map. The map is an A1, double sided print, showing 2d and 3d visualizations of the dataset. It includes scientific interpretations, cross sections and comparisons with other areas. Paper copies of the colour double sided map will be freely distributed at this session.
Deep classification hashing for person re-identification
NASA Astrophysics Data System (ADS)
Wang, Jiabao; Li, Yang; Zhang, Xiancai; Miao, Zhuang; Tao, Gang
2018-04-01
As the development of surveillance in public, person re-identification becomes more and more important. The largescale databases call for efficient computation and storage, hashing technique is one of the most important methods. In this paper, we proposed a new deep classification hashing network by introducing a new binary appropriation layer in the traditional ImageNet pre-trained CNN models. It outputs binary appropriate features, which can be easily quantized into binary hash-codes for hamming similarity comparison. Experiments show that our deep hashing method can outperform the state-of-the-art methods on the public CUHK03 and Market1501 datasets.
Web-GIS visualisation of permafrost-related Remote Sensing products for ESA GlobPermafrost
NASA Astrophysics Data System (ADS)
Haas, A.; Heim, B.; Schaefer-Neth, C.; Laboor, S.; Nitze, I.; Grosse, G.; Bartsch, A.; Kaab, A.; Strozzi, T.; Wiesmann, A.; Seifert, F. M.
2016-12-01
The ESA GlobPermafrost (www.globpermafrost.info) provides a remote sensing service for permafrost research and applications. The service comprises of data product generation for various sites and regions as well as specific infrastructure allowing overview and access to datasets. Based on an online user survey conducted within the project, the user community extensively applies GIS software to handle remote sensing-derived datasets and requires preview functionalities before accessing them. In response, we develop the Permafrost Information System PerSys which is conceptualized as an open access geospatial data dissemination and visualization portal. PerSys will allow visualisation of GlobPermafrost raster and vector products such as land cover classifications, Landsat multispectral index trend datasets, lake and wetland extents, InSAR-based land surface deformation maps, rock glacier velocity fields, spatially distributed permafrost model outputs, and land surface temperature datasets. The datasets will be published as WebGIS services relying on OGC-standardized Web Mapping Service (WMS) and Web Feature Service (WFS) technologies for data display and visualization. The WebGIS environment will be hosted at the AWI computing centre where a geodata infrastructure has been implemented comprising of ArcGIS for Server 10.4, PostgreSQL 9.2 and a browser-driven data viewer based on Leaflet (http://leafletjs.com). Independently, we will provide an `Access - Restricted Data Dissemination Service', which will be available to registered users for testing frequently updated versions of project datasets. PerSys will become a core project of the Arctic Permafrost Geospatial Centre (APGC) within the ERC-funded PETA-CARB project (www.awi.de/petacarb). The APGC Data Catalogue will contain all final products of GlobPermafrost, allow in-depth dataset search via keywords, spatial and temporal coverage, data type, etc., and will provide DOI-based links to the datasets archived in the long-term, open access PANGAEA data repository.
A Historical Forcing Ice Sheet Model Validation Framework for Greenland
NASA Astrophysics Data System (ADS)
Price, S. F.; Hoffman, M. J.; Howat, I. M.; Bonin, J. A.; Chambers, D. P.; Kalashnikova, I.; Neumann, T.; Nowicki, S.; Perego, M.; Salinger, A.
2014-12-01
We propose an ice sheet model testing and validation framework for Greenland for the years 2000 to the present. Following Perego et al. (2014), we start with a realistic ice sheet initial condition that is in quasi-equilibrium with climate forcing from the late 1990's. This initial condition is integrated forward in time while simultaneously applying (1) surface mass balance forcing (van Angelen et al., 2013) and (2) outlet glacier flux anomalies, defined using a new dataset of Greenland outlet glacier flux for the past decade (Enderlin et al., 2014). Modeled rates of mass and elevation change are compared directly to remote sensing observations obtained from GRACE and ICESat. Here, we present a detailed description of the proposed validation framework including the ice sheet model and model forcing approach, the model-to-observation comparison process, and initial results comparing model output and observations for the time period 2000-2013.
Simulating seasonal tropical cyclone intensities at landfall along the South China coast
NASA Astrophysics Data System (ADS)
Lok, Charlie C. F.; Chan, Johnny C. L.
2018-04-01
A numerical method is developed using a regional climate model (RegCM3) and the Weather Forecast and Research (WRF) model to predict seasonal tropical cyclone (TC) intensities at landfall for the South China region. In designing the model system, three sensitivity tests have been performed to identify the optimal choice of the RegCM3 model domain, WRF horizontal resolution and WRF physics packages. Driven from the National Centers for Environmental Prediction Climate Forecast System Reanalysis dataset, the model system can produce a reasonable distribution of TC intensities at landfall on a seasonal scale. Analyses of the model output suggest that the strength and extent of the subtropical ridge in the East China Sea are crucial to simulating TC landfalls in the Guangdong and Hainan provinces. This study demonstrates the potential for predicting TC intensities at landfall on a seasonal basis as well as projecting future climate changes using numerical models.
Applications of information theory, genetic algorithms, and neural models to predict oil flow
NASA Astrophysics Data System (ADS)
Ludwig, Oswaldo; Nunes, Urbano; Araújo, Rui; Schnitman, Leizer; Lepikson, Herman Augusto
2009-07-01
This work introduces a new information-theoretic methodology for choosing variables and their time lags in a prediction setting, particularly when neural networks are used in non-linear modeling. The first contribution of this work is the Cross Entropy Function (XEF) proposed to select input variables and their lags in order to compose the input vector of black-box prediction models. The proposed XEF method is more appropriate than the usually applied Cross Correlation Function (XCF) when the relationship among the input and output signals comes from a non-linear dynamic system. The second contribution is a method that minimizes the Joint Conditional Entropy (JCE) between the input and output variables by means of a Genetic Algorithm (GA). The aim is to take into account the dependence among the input variables when selecting the most appropriate set of inputs for a prediction problem. In short, theses methods can be used to assist the selection of input training data that have the necessary information to predict the target data. The proposed methods are applied to a petroleum engineering problem; predicting oil production. Experimental results obtained with a real-world dataset are presented demonstrating the feasibility and effectiveness of the method.
Beta Hebbian Learning as a New Method for Exploratory Projection Pursuit.
Quintián, Héctor; Corchado, Emilio
2017-09-01
In this research, a novel family of learning rules called Beta Hebbian Learning (BHL) is thoroughly investigated to extract information from high-dimensional datasets by projecting the data onto low-dimensional (typically two dimensional) subspaces, improving the existing exploratory methods by providing a clear representation of data's internal structure. BHL applies a family of learning rules derived from the Probability Density Function (PDF) of the residual based on the beta distribution. This family of rules may be called Hebbian in that all use a simple multiplication of the output of the neural network with some function of the residuals after feedback. The derived learning rules can be linked to an adaptive form of Exploratory Projection Pursuit and with artificial distributions, the networks perform as the theory suggests they should: the use of different learning rules derived from different PDFs allows the identification of "interesting" dimensions (as far from the Gaussian distribution as possible) in high-dimensional datasets. This novel algorithm, BHL, has been tested over seven artificial datasets to study the behavior of BHL parameters, and was later applied successfully over four real datasets, comparing its results, in terms of performance, with other well-known Exploratory and projection models such as Maximum Likelihood Hebbian Learning (MLHL), Locally-Linear Embedding (LLE), Curvilinear Component Analysis (CCA), Isomap and Neural Principal Component Analysis (Neural PCA).
Quantifying Interannual Variability for Photovoltaic Systems in PVWatts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ryberg, David Severin; Freeman, Janine; Blair, Nate
2015-10-01
The National Renewable Energy Laboratory's (NREL's) PVWatts is a relatively simple tool used by industry and individuals alike to easily estimate the amount of energy a photovoltaic (PV) system will produce throughout the course of a typical year. PVWatts Version 5 has previously been shown to be able to reasonably represent an operating system's output when provided with concurrent weather data, however this type of data is not available when estimating system output during future time frames. For this purpose PVWatts uses weather data from typical meteorological year (TMY) datasets which are available on the NREL website. The TMY filesmore » represent a statistically 'typical' year which by definition excludes anomalous weather patterns and as a result may not provide sufficient quantification of project risk to the financial community. It was therefore desired to quantify the interannual variability associated with TMY files in order to improve the understanding of risk associated with these projects. To begin to understand the interannual variability of a PV project, we simulated two archetypal PV system designs, which are common in the PV industry, in PVWatts using the NSRDB's 1961-1990 historical dataset. This dataset contains measured hourly weather data and spans the thirty years from 1961-1990 for 239 locations in the United States. To note, this historical dataset was used to compose the TMY2 dataset. Using the results of these simulations we computed several statistical metrics which may be of interest to the financial community and normalized the results with respect to the TMY energy prediction at each location, so that these results could be easily translated to similar systems. This report briefly describes the simulation process used and the statistical methodology employed for this project, but otherwise focuses mainly on a sample of our results. A short discussion of these results is also provided. It is our hope that this quantification of the interannual variability of PV systems will provide a starting point for variability considerations in future PV system designs and investigations. however this type of data is not available when estimating system output during future time frames.« less
NASA Astrophysics Data System (ADS)
Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio
2014-05-01
The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals. Since commentary metadata may originate from a range of sources, moderation of this information will become a crucial issue. If the project is successful, expert human moderation (analogous to peer-review) will become impracticable as annotation numbers increase, and some combination of algorithmic and crowd-sourced evaluation of commentary metadata will be necessary. To that end, future work will need to extend work under development to enable access control and checking of inputs, to deal with scale.
Neural Network Machine Learning and Dimension Reduction for Data Visualization
NASA Technical Reports Server (NTRS)
Liles, Charles A.
2014-01-01
Neural network machine learning in computer science is a continuously developing field of study. Although neural network models have been developed which can accurately predict a numeric value or nominal classification, a general purpose method for constructing neural network architecture has yet to be developed. Computer scientists are often forced to rely on a trial-and-error process of developing and improving accurate neural network models. In many cases, models are constructed from a large number of input parameters. Understanding which input parameters have the greatest impact on the prediction of the model is often difficult to surmise, especially when the number of input variables is very high. This challenge is often labeled the "curse of dimensionality" in scientific fields. However, techniques exist for reducing the dimensionality of problems to just two dimensions. Once a problem's dimensions have been mapped to two dimensions, it can be easily plotted and understood by humans. The ability to visualize a multi-dimensional dataset can provide a means of identifying which input variables have the highest effect on determining a nominal or numeric output. Identifying these variables can provide a better means of training neural network models; models can be more easily and quickly trained using only input variables which appear to affect the outcome variable. The purpose of this project is to explore varying means of training neural networks and to utilize dimensional reduction for visualizing and understanding complex datasets.
Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets
NASA Astrophysics Data System (ADS)
Kim, S. K.; Prabhat, M.; Williams, D. N.
2017-12-01
Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.
Reproducibility and Transparency in Ocean-Climate Modeling
NASA Astrophysics Data System (ADS)
Hannah, N.; Adcroft, A.; Hallberg, R.; Griffies, S. M.
2015-12-01
Reproducibility is a cornerstone of the scientific method. Within geophysical modeling and simulation achieving reproducibility can be difficult, especially given the complexity of numerical codes, enormous and disparate data sets, and variety of supercomputing technology. We have made progress on this problem in the context of a large project - the development of new ocean and sea ice models, MOM6 and SIS2. Here we present useful techniques and experience.We use version control not only for code but the entire experiment working directory, including configuration (run-time parameters, component versions), input data and checksums on experiment output. This allows us to document when the solutions to experiments change, whether due to code updates or changes in input data. To avoid distributing large input datasets we provide the tools for generating these from the sources, rather than provide raw input data.Bugs can be a source of non-determinism and hence irreproducibility, e.g. reading from or branching on uninitialized memory. To expose these we routinely run system tests, using a memory debugger, multiple compilers and different machines. Additional confidence in the code comes from specialised tests, for example automated dimensional analysis and domain transformations. This has entailed adopting a code style where we deliberately restrict what a compiler can do when re-arranging mathematical expressions.In the spirit of open science, all development is in the public domain. This leads to a positive feedback, where increased transparency and reproducibility makes using the model easier for external collaborators, who in turn provide valuable contributions. To facilitate users installing and running the model we provide (version controlled) digital notebooks that illustrate and record analysis of output. This has the dual role of providing a gross, platform-independent, testing capability and a means to documents model output and analysis.
NASA Astrophysics Data System (ADS)
Drapek, R. J.; Kim, J. B.
2013-12-01
We simulated ecosystem response to climate change in the USA and Canada at a 5 arc-minute grid resolution using the MC1 dynamic global vegetation model and nine CMIP3 future climate projections as input. The climate projections were produced by 3 GCMs simulating 3 SRES emissions scenarios. We examined MC1 outputs for the conterminous USA by summarizing them by EPA level II and III ecoregions to characterize model skill and evaluate the magnitude and uncertainties of simulated ecosystem response to climate change. First, we evaluated model skill by comparing outputs from the recent historical period with benchmark datasets. Distribution of potential natural vegetation simulated by MC1 was compared with Kuchler's map. Above ground live carbon simulated by MC1 was compared with the National Biomass and Carbon Dataset. Fire return intervals calculated by MC1 were compared with maximum and minimum values compiled for the United States. Each EPA Level III Ecoregion was scored for average agreement with corresponding benchmark data and an average score was calculated for all three types of output. Greatest agreement with benchmark data happened in the Western Cordillera, the Ozark / Ouachita-Appalachian Forests, and the Southeastern USA Plains (EPA Level II Ecoregions). The lowest agreement happened in the Everglades and the Tamaulipas-Texas Semiarid Plain. For simulated ecosystem response to future climate projections we examined MC1 output for shifts in vegetation type, vegetation carbon, runoff, and biomass consumed by fire. Each ecoregion was scored for the amount of change from historical conditions for each variable and an average score was calculated. Smallest changes were forecast for Western Cordillera and Marine West Coast Forest ecosystems. Largest changes were forecast for the Cold Deserts, the Mixed Wood Plains, and the Central USA Plains. By combining scores of model skill for the historical period for each EPA Level 3 Ecoregion with scores representing the magnitude of ecosystem changes in the future, we identified high and low uncertainty ecoregions. The largest anticipated changes and the lowest measures of model skill coincide in the Central USA Plains and the Mixed Wood Plains. The combination of low model skill and high degree of ecosystem change elevate the importance of our uncertainty in this ecoregion. The highest projected changes coincide with relatively high model skill in the Cold Deserts. Climate adaptation efforts are the most likely to pay off in these regions. Finally, highest model skill and lowest anticipated changes coincide in the Western Cordillera and the Marine West Coast Forests. These regions may be relatively low-risk for climate change impacts when compared to the other ecoregions. These results represent only the first step in this type of analysis; there exist many ways to strengthen it. One, MC1 calibrations can be optimized using a structured optimization technique. Two, a larger set of climate projections can be used to capture a fuller range of GCMs and emissions scenarios. And three, employing an ensemble of vegetation models would make the analysis more robust.
NASA Technical Reports Server (NTRS)
da Silva, Arlindo M.; Putman, William; Nattala, J.
2014-01-01
This document describes the gridded output files produced by a two-year global, non-hydrostatic mesoscale simulation for the period 2005-2006 produced with the non-hydrostatic version of GEOS-5 Atmospheric Global Climate Model (AGCM). In addition to standard meteorological parameters (wind, temperature, moisture, surface pressure), this simulation includes 15 aerosol tracers (dust, sea-salt, sulfate, black and organic carbon), O3, CO and CO2. This model simulation is driven by prescribed sea-surface temperature and sea-ice, daily volcanic and biomass burning emissions, as well as high-resolution inventories of anthropogenic sources. A description of the GEOS-5 model configuration used for this simulation can be found in Putman et al. (2014). The simulation is performed at a horizontal resolution of 7 km using a cubed-sphere horizontal grid with 72 vertical levels, extending up to to 0.01 hPa (approximately 80 km). For user convenience, all data products are generated on two logically rectangular longitude-latitude grids: a full-resolution 0.0625 deg grid that approximately matches the native cubed-sphere resolution, and another 0.5 deg reduced-resolution grid. The majority of the full-resolution data products are instantaneous with some fields being time-averaged. The reduced-resolution datasets are mostly time-averaged, with some fields being instantaneous. Hourly data intervals are used for the reduced-resolution datasets, while 30-minute intervals are used for the full-resolution products. All full-resolution output is on the model's native 72-layer hybrid sigma-pressure vertical grid, while the reduced-resolution output is given on native vertical levels and on 48 pressure surfaces extending up to 0.02 hPa. Section 4 presents additional details on horizontal and vertical grids. Information of the model surface representation can be found in Appendix B. The GEOS-5 product is organized into file collections that are described in detail in Appendix C. Additional details about variables listed in this file specification can be found in a separate document, the GEOS-5 File Specification Variable Definition Glossary. Documentation about the current access methods for products described in this document can be found on the GEOS-5 Nature Run portal: http://gmao.gsfc.nasa.gov/projects/G5NR. Information on the scientific quality of this simulation will appear in a forthcoming NASA Technical Report Series on Global Modeling and Data Assimilation to be available from http://gmao.gsfc.nasa.gov/pubs/tm/.
de Lusignan, S; Krause, P; Michalakidis, G; Vicente, M Tristan; Thompson, S; McGilchrist, M; Sullivan, F; van Royen, P; Agreus, L; Desombre, T; Taweel, A; Delaney, B
2012-01-01
To perform a requirements analysis of the barriers to conducting research linking of primary care, genetic and cancer data. We extended our initial data-centric approach to include socio-cultural and business requirements. We created reference models of core data requirements common to most studies using unified modelling language (UML), dataflow diagrams (DFD) and business process modelling notation (BPMN). We conducted a stakeholder analysis and constructed DFD and UML diagrams for use cases based on simulated research studies. We used research output as a sensitivity analysis. Differences between the reference model and use cases identified study specific data requirements. The stakeholder analysis identified: tensions, changes in specification, some indifference from data providers and enthusiastic informaticians urging inclusion of socio-cultural context. We identified requirements to collect information at three levels: micro- data items, which need to be semantically interoperable, meso- the medical record and data extraction, and macro- the health system and socio-cultural issues. BPMN clarified complex business requirements among data providers and vendors; and additional geographical requirements for patients to be represented in both linked datasets. High quality research output was the norm for most repositories. Reference models provide high-level schemata of the core data requirements. However, business requirements' modelling identifies stakeholder issues and identifies what needs to be addressed to enable participation.
Hostetler, S.W.; Alder, J.R.; Allan, A.M.
2011-01-01
We have completed an array of high-resolution simulations of present and future climate over Western North America (WNA) and Eastern North America (ENA) by dynamically downscaling global climate simulations using a regional climate model, RegCM3. The simulations are intended to provide long time series of internally consistent surface and atmospheric variables for use in climate-related research. In addition to providing high-resolution weather and climate data for the past, present, and future, we have developed an integrated data flow and methodology for processing, summarizing, viewing, and delivering the climate datasets to a wide range of potential users. Our simulations were run over 50- and 15-kilometer model grids in an attempt to capture more of the climatic detail associated with processes such as topographic forcing than can be captured by general circulation models (GCMs). The simulations were run using output from four GCMs. All simulations span the present (for example, 1968-1999), common periods of the future (2040-2069), and two simulations continuously cover 2010-2099. The trace gas concentrations in our simulations were the same as those of the GCMs: the IPCC 20th century time series for 1968-1999 and the A2 time series for simulations of the future. We demonstrate that RegCM3 is capable of producing present day annual and seasonal climatologies of air temperature and precipitation that are in good agreement with observations. Important features of the high-resolution climatology of temperature, precipitation, snow water equivalent (SWE), and soil moisture are consistently reproduced in all model runs over WNA and ENA. The simulations provide a potential range of future climate change for selected decades and display common patterns of the direction and magnitude of changes. As expected, there are some model to model differences that limit interpretability and give rise to uncertainties. Here, we provide background information about the GCMs and the RegCM3, a basic evaluation of the model output and examples of simulated future climate. We also provide information needed to access the web applications for visualizing and downloading the data, and give complete metadata that describe the variables in the datasets.
Assessing privacy risks in population health publications using a checklist-based approach.
O'Keefe, Christine M; Ickowicz, Adrien; Churches, Tim; Westcott, Mark; O'Sullivan, Maree; Khan, Atikur
2017-11-10
Recent growth in the number of population health researchers accessing detailed datasets, either on their own computers or through virtual data centers, has the potential to increase privacy risks. In response, a checklist for identifying and reducing privacy risks in population health analysis outputs has been proposed for use by researchers themselves. In this study we explore the usability and reliability of such an approach by investigating whether different users identify the same privacy risks on applying the checklist to a sample of publications. The checklist was applied to a sample of 100 academic population health publications distributed among 5 readers. Cohen's κ was used to measure interrater agreement. Of the 566 instances of statistical output types found in the 100 publications, the most frequently occurring were counts, summary statistics, plots, and model outputs. Application of the checklist identified 128 outputs (22.6%) with potential privacy concerns. Most of these were associated with the reporting of small counts. Among these identified outputs, the readers found no substantial actual privacy concerns when context was taken into account. Interrater agreement for identifying potential privacy concerns was generally good. This study has demonstrated that a checklist can be a reliable tool to assist researchers with anonymizing analysis outputs in population health research. This further suggests that such an approach may have the potential to be developed into a broadly applicable standard providing consistent confidentiality protection across multiple analyses of the same data. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Williams, C.; Kniveton, D.; Layberry, R.
2007-12-01
It is increasingly accepted that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable extreme events, due to a number of factors including extensive poverty, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of a state-of-the-art climate model to simulate climate at daily timescales is carried out using satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA). This dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. Once the model's ability to reproduce extremes has been assessed, idealised regions of SST anomalies are used to force the model, with the overall aim of investigating the ways in which SST anomalies influence rainfall extremes over southern Africa. In this paper, results from sensitivity testing of the UK Meteorological Office Hadley Centre's climate model's domain size are firstly presented. Then simulations of current climate from the model, operating in both regional and global mode, are compared to the MIRA dataset at daily timescales. Thirdly, the ability of the model to reproduce daily rainfall extremes will be assessed, again by a comparison with extremes from the MIRA dataset. Finally, the results from the idealised SST experiments are briefly presented, suggesting associations between rainfall extremes and both local and remote SST anomalies.
An efficient surrogate-based simulation-optimization method for calibrating a regional MODFLOW model
NASA Astrophysics Data System (ADS)
Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.
2017-01-01
Simulation-optimization method entails a large number of model simulations, which is computationally intensive or even prohibitive if the model simulation is extremely time-consuming. Statistical models have been examined as a surrogate of the high-fidelity physical model during simulation-optimization process to tackle this problem. Among them, Multivariate Adaptive Regression Splines (MARS), a non-parametric adaptive regression method, is superior in overcoming problems of high-dimensions and discontinuities of the data. Furthermore, the stability and accuracy of MARS model can be improved by bootstrap aggregating methods, namely, bagging. In this paper, Bagging MARS (BMARS) method is integrated to a surrogate-based simulation-optimization framework to calibrate a three-dimensional MODFLOW model, which is developed to simulate the groundwater flow in an arid hardrock-alluvium region in northwestern Oman. The physical MODFLOW model is surrogated by the statistical model developed using BMARS algorithm. The surrogate model, which is fitted and validated using training dataset generated by the physical model, can approximate solutions rapidly. An efficient Sobol' method is employed to calculate global sensitivities of head outputs to input parameters, which are used to analyze their importance for the model outputs spatiotemporally. Only sensitive parameters are included in the calibration process to further improve the computational efficiency. Normalized root mean square error (NRMSE) between measured and simulated heads at observation wells is used as the objective function to be minimized during optimization. The reasonable history match between the simulated and observed heads demonstrated feasibility of this high-efficient calibration framework.
NASA Astrophysics Data System (ADS)
Darmenova, K.; Higgins, G.; Kiley, H.; Apling, D.
2010-12-01
Current General Circulation Models (GCMs) provide a valuable estimate of both natural and anthropogenic climate changes and variability on global scales. At the same time, future climate projections calculated with GCMs are not of sufficient spatial resolution to address regional needs. Many climate impact models require information at scales of 50 km or less, so dynamical downscaling is often used to estimate the smaller-scale information based on larger scale GCM output. To address current deficiencies in local planning and decision making with respect to regional climate change, our research is focused on performing a dynamical downscaling with the Weather Research and Forecasting (WRF) model and developing decision aids that translate the regional climate data into actionable information for users. Our methodology involves development of climatological indices of extreme weather and heating/cooling degree days based on WRF ensemble runs initialized with the NCEP-NCAR reanalysis and the European Center/Hamburg Model (ECHAM5). Results indicate that the downscale simulations provide the necessary detailed output required by state and local governments and the private sector to develop climate adaptation plans. In addition we evaluated the WRF performance in long-term climate simulations over the Southwestern US and validated against observational datasets.
Xu, Huayong; Yu, Hui; Tu, Kang; Shi, Qianqian; Wei, Chaochun; Li, Yuan-Yuan; Li, Yi-Xue
2013-01-01
We are witnessing rapid progress in the development of methodologies for building the combinatorial gene regulatory networks involving both TFs (Transcription Factors) and miRNAs (microRNAs). There are a few tools available to do these jobs but most of them are not easy to use and not accessible online. A web server is especially needed in order to allow users to upload experimental expression datasets and build combinatorial regulatory networks corresponding to their particular contexts. In this work, we compiled putative TF-gene, miRNA-gene and TF-miRNA regulatory relationships from forward-engineering pipelines and curated them as built-in data libraries. We streamlined the R codes of our two separate forward-and-reverse engineering algorithms for combinatorial gene regulatory network construction and formalized them as two major functional modules. As a result, we released the cGRNB (combinatorial Gene Regulatory Networks Builder): a web server for constructing combinatorial gene regulatory networks through integrated engineering of seed-matching sequence information and gene expression datasets. The cGRNB enables two major network-building modules, one for MPGE (miRNA-perturbed gene expression) datasets and the other for parallel miRNA/mRNA expression datasets. A miRNA-centered two-layer combinatorial regulatory cascade is the output of the first module and a comprehensive genome-wide network involving all three types of combinatorial regulations (TF-gene, TF-miRNA, and miRNA-gene) are the output of the second module. In this article we propose cGRNB, a web server for building combinatorial gene regulatory networks through integrated engineering of seed-matching sequence information and gene expression datasets. Since parallel miRNA/mRNA expression datasets are rapidly accumulated by the advance of next-generation sequencing techniques, cGRNB will be very useful tool for researchers to build combinatorial gene regulatory networks based on expression datasets. The cGRNB web-server is free and available online at http://www.scbit.org/cgrnb.
pySeismicDQA: open source post experiment data quality assessment and processing
NASA Astrophysics Data System (ADS)
Polkowski, Marcin
2017-04-01
Seismic Data Quality Assessment is python based, open source set of tools dedicated for data processing after passive seismic experiments. Primary goal of this toolset is unification of data types and formats from different dataloggers necessary for further processing. This process requires additional data checks for errors, equipment malfunction, data format errors, abnormal noise levels, etc. In all such cases user needs to decide (manually or by automatic threshold) if data is removed from output dataset. Additionally, output dataset can be visualized in form of website with data availability charts and waveform visualization with earthquake catalog (external). Data processing can be extended with simple STA/LTA event detection. pySeismicDQA is designed and tested for two passive seismic experiments in central Europe: PASSEQ 2006-2008 and "13 BB Star" (2013-2016). National Science Centre Poland provided financial support for this work via NCN grant DEC-2011/02/A/ST10/00284.
Seabird drift as a proxy to estimate surface currents in the western Mediterranean?
NASA Astrophysics Data System (ADS)
Gomez-Navarro, Laura; Sánchez-Román, Antonio; Pascual, Ananda; Fablet, Ronan; Hernandez-Carrasco, Ismael; Mason, Evan; Arcos, José Manuel; Oro, Daniel
2017-04-01
Seabird trajectories can be used as proxies to investigate the dynamics of marine systems and their spatiotemporal evolution. Previous studies have mainly been based on analyses of long range flights, where birds are travelling at high velocities over long time periods. Such data have been used to study wind patterns, and areas of avian feeding and foraging have also been used to study oceanic fronts. Here we focus on "slow moving" periods (which we associate to when birds appear to be drifting on the sea surface), in order to investigate bird drift as a proxy for sea surface currents in the western Mediterranean Sea. We analyse trajectories corresponding to "slow moving" periods recorded by GPSs attached to individuals of the species Calonectris diomedea ( Scopoli's shearwater) from mid August to mid September 2012. The trajectories are compared with sea level anomaly (SLA), sea surface temperature (SST), Finite Size Lyapunov Exponents (FSLE), wind fields, and the outputs from an automated sea-surface-height based eddy tracker. The SLA and SST datasets were obtained from the Copernicus Marine Environment Monitoring Service (CMEMS) with a spatial resolution of 1/8 ̊ and 1/100 ̊ respectively while the FSLEs were computed from the SLA dataset. Finally, the wind data comes from the outputs of the CCMPv2 numerical model. This model has a global coverage with a spatial resolution of 1/4 ̊. Interesting relationships between the trajectories and SLA fields are found. According to the angle between the SLA gradient and the trajectories of birds, we classify drifts into three scenarios: perpendicular, parallel and other, which are associated with different driving forces. The first scenario implies that bird drift is driven by geostrophic sea surface currents. The second we associate with wind drag as the main driving force. This is validated through the wind dataset. Moreover, from the SST, FSLEs and the eddy tracker, we obtain supplementary information on the presence of oceanic structures (such as eddies or fronts), not observed in the SLA field due to its limited spatial and temporal resolutions. Therefore, this data helps to explain some of the third case scenario trajectories.
NASA Astrophysics Data System (ADS)
Faqih, A.
2017-03-01
Providing information regarding future climate scenarios is very important in climate change study. The climate scenario can be used as basic information to support adaptation and mitigation studies. In order to deliver future climate scenarios over specific region, baseline and projection data from the outputs of global climate models (GCM) is needed. However, due to its coarse resolution, the data have to be downscaled and bias corrected in order to get scenario data with better spatial resolution that match the characteristics of the observed data. Generating this downscaled data is mostly difficult for scientist who do not have specific background, experience and skill in dealing with the complex data from the GCM outputs. In this regards, it is necessary to develop a tool that can be used to simplify the downscaling processes in order to help scientist, especially in Indonesia, for generating future climate scenario data that can be used for their climate change-related studies. In this paper, we introduce a tool called as “Statistical Bias Correction for Climate Scenarios (SiBiaS)”. The tool is specially designed to facilitate the use of CMIP5 GCM data outputs and process their statistical bias corrections relative to the reference data from observations. It is prepared for supporting capacity building in climate modeling in Indonesia as part of the Indonesia 3rd National Communication (TNC) project activities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hodge, Bri-Mathias
2016-04-08
The primary objective of this work was to create a state-of-the-art national wind resource data set and to provide detailed wind plant output data for specific sites based on that data set. Corresponding retrospective wind forecasts were also included at all selected locations. The combined information from these activities was used to create the Wind Integration National Dataset (WIND), and an extraction tool was developed to allow web-based data access.
Vanderhoof, Melanie; Distler, Hayley; Mendiola, Di Ana; Lang, Megan
2017-01-01
Natural variability in surface-water extent and associated characteristics presents a challenge to gathering timely, accurate information, particularly in environments that are dominated by small and/or forested wetlands. This study mapped inundation extent across the Upper Choptank River Watershed on the Delmarva Peninsula, occurring within both Maryland and Delaware. We integrated six quad-polarized Radarsat-2 images, Worldview-3 imagery, and an enhanced topographic wetness index in a random forest model. Output maps were filtered using light detection and ranging (lidar)-derived depressions to maximize the accuracy of forested inundation extent. Overall accuracy within the integrated and filtered model was 94.3%, with 5.5% and 6.0% errors of omission and commission for inundation, respectively. Accuracy of inundation maps obtained using Radarsat-2 alone were likely detrimentally affected by less than ideal angles of incidence and recent precipitation, but were likely improved by targeting the period between snowmelt and leaf-out for imagery collection. Across the six Radarsat-2 dates, filtering inundation outputs by lidar-derived depressions slightly elevated errors of omission for water (+1.0%), but decreased errors of commission (−7.8%), resulting in an average increase of 5.4% in overall accuracy. Depressions were derived from lidar datasets collected under both dry and average wetness conditions. Although antecedent wetness conditions influenced the abundance and total area mapped as depression, the two versions of the depression datasets showed a similar ability to reduce error in the inundation maps. Accurate mapping of surface water is critical to predicting and monitoring the effect of human-induced change and interannual variability on water quantity and quality.
NASA Astrophysics Data System (ADS)
Aires, Filipe; Miolane, Léo; Prigent, Catherine; Pham Duc, Binh; Papa, Fabrice; Fluet-Chouinard, Etienne; Lehner, Bernhard
2017-04-01
The Global Inundation Extent from Multi-Satellites (GIEMS) provides multi-year monthly variations of the global surface water extent at 25kmx25km resolution. It is derived from multiple satellite observations. Its spatial resolution is usually compatible with climate model outputs and with global land surface model grids but is clearly not adequate for local applications that require the characterization of small individual water bodies. There is today a strong demand for high-resolution inundation extent datasets, for a large variety of applications such as water management, regional hydrological modeling, or for the analysis of mosquitos-related diseases. A new procedure is introduced to downscale the GIEMS low spatial resolution inundations to a 3 arc second (90 m) dataset. The methodology is based on topography and hydrography information from the HydroSHEDS database. A new floodability index is adopted and an innovative smoothing procedure is developed to ensure the smooth transition, in the high-resolution maps, between the low-resolution boxes from GIEMS. Topography information is relevant for natural hydrology environments controlled by elevation, but is more limited in human-modified basins. However, the proposed downscaling approach is compatible with forthcoming fusion with other more pertinent satellite information in these difficult regions. The resulting GIEMS-D3 database is the only high spatial resolution inundation database available globally at the monthly time scale over the 1993-2007 period. GIEMS-D3 is assessed by analyzing its spatial and temporal variability, and evaluated by comparisons to other independent satellite observations from visible (Google Earth and Landsat), infrared (MODIS) and active microwave (SAR).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Getirana, Augusto; Dutra, Emanuel; Guimberteau, Matthieu
Despite recent advances in modeling and remote sensing of land surfaces, estimates of the global water budget are still fairly uncertain. The objective of this study is to evaluate the water budget of the Amazon basin based on several state-of-the-art land surface model (LSM) outputs. Water budget variables [total water storage (TWS), evapotranspiration (ET), surface runoff (R) and baseflow (B)] are evaluated at the basin scale using both remote sensing and in situ data. Fourteen LSMs were run using meteorological forcings at a 3-hourly time step and 1-degree spatial resolution. Three experiments are performed using precipitation which has been rescaledmore » to match monthly global GPCP and GPCC datasets and the daily HYBAM dataset for the Amazon basin. R and B are used to force the Hydrological Modeling and Analysis Platform (HyMAP) river routing scheme and simulated discharges are compared against observations at 165 gauges. Simulated ET and TWS are compared against FLUXNET and MOD16A2 evapotranspiration, and GRACE TWS estimates in different catchments. At the basin scale, simulated ET ranges from 2.39mm.d-1 to 3.26mm.d-1 and a low spatial correlation between ET and P indicates that evapotranspiration does not depend on water availability over most of the basin. Results also show that other simulated water budget variables vary significantly as a function of both the LSM and precipitation used, but simulated TWS generally agree at the basin scale. The best water budget simulations resulted from experiments using the HYBAM dataset, mostly explained by a denser rainfall gauge network the daily rescaling.« less
Semi-Supervised Multi-View Learning for Gene Network Reconstruction
Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo
2015-01-01
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091
Gogoshin, Grigoriy; Boerwinkle, Eric
2017-01-01
Abstract Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc. PMID:27681505
Gogoshin, Grigoriy; Boerwinkle, Eric; Rodin, Andrei S
2017-04-01
Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.
Chapple, Christopher R; Cardozo, Linda; Snijder, Robert; Siddiqui, Emad; Herschorn, Sender
2016-12-15
Patient-level data are available for 11 randomized, controlled, Phase III/Phase IV solifenacin clinical trials. Meta-analyses were conducted to interrogate the data, to broaden knowledge about solifenacin and overactive bladder (OAB) in general. Before integrating data, datasets from individual studies were mapped to a single format using methodology developed by the Clinical Data Interchange Standards Consortium (CDISC). Initially, the data structure was harmonized, to ensure identical categorization, using the CDISC Study Data Tabulation Model (SDTM). To allow for patient level meta-analysis, data were integrated and mapped to analysis datasets. Mapping included adding derived and categorical variables and followed standards described as the Analysis Data Model (ADaM). Mapping to both SDTM and ADaM was performed twice by two independent programming teams, results compared, and inconsistencies corrected in the final output. ADaM analysis sets included assignments of patients to the Safety Analysis Set and the Full Analysis Set. There were three analysis groupings: Analysis group 1 (placebo-controlled, monotherapy, fixed-dose studies, n = 3011); Analysis group 2 (placebo-controlled, monotherapy, pooled, fixed- and flexible-dose, n = 5379); Analysis group 3 (all solifenacin monotherapy-treated patients, n = 6539). Treatment groups were: solifenacin 5 mg fixed dose, solifenacin 5/10 mg flexible dose, solifenacin 10 mg fixed dose and overall solifenacin. Patient were similar enough for data pooling to be acceptable. Creating ADaM datasets provided significant information about individual studies and the derivation decisions made in each study; validated ADaM datasets now exist for medical history, efficacy and AEs. Results from these meta-analyses were similar over time.
NASA Astrophysics Data System (ADS)
Williams, C.; Kniveton, D.; Layberry, R.
2009-04-01
It is increasingly accepted that that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of ability of a state of the art climate model to simulate climate at daily timescales is carried out using satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA). This dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. The ability of a climate model to simulate current climate provides some indication of how much confidence can be applied to its future predictions. In this paper, simulations of current climate from the UK Meteorological Office Hadley Centre's climate model, in both regional and global mode, are firstly compared to the MIRA dataset at daily timescales. This concentrates primarily on the ability of the model to simulate the spatial and temporal patterns of rainfall variability over southern Africa. Secondly, the ability of the model to reproduce daily rainfall extremes will be assessed, again by a comparison with extremes from the MIRA dataset.
AMModels: An R package for storing models, data, and metadata to facilitate adaptive management
Donovan, Therese M.; Katz, Jonathan
2018-01-01
Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date.
Yilmaz, Banu; Aras, Egemen; Nacar, Sinan; Kankal, Murat
2018-05-23
The functional life of a dam is often determined by the rate of sediment delivery to its reservoir. Therefore, an accurate estimate of the sediment load in rivers with dams is essential for designing and predicting a dam's useful lifespan. The most credible method is direct measurements of sediment input, but this can be very costly and it cannot always be implemented at all gauging stations. In this study, we tested various regression models to estimate suspended sediment load (SSL) at two gauging stations on the Çoruh River in Turkey, including artificial bee colony (ABC), teaching-learning-based optimization algorithm (TLBO), and multivariate adaptive regression splines (MARS). These models were also compared with one another and with classical regression analyses (CRA). Streamflow values and previously collected data of SSL were used as model inputs with predicted SSL data as output. Two different training and testing dataset configurations were used to reinforce the model accuracy. For the MARS method, the root mean square error value was found to range between 35% and 39% for the test two gauging stations, which was lower than errors for other models. Error values were even lower (7% to 15%) using another dataset. Our results indicate that simultaneous measurements of streamflow with SSL provide the most effective parameter for obtaining accurate predictive models and that MARS is the most accurate model for predicting SSL. Copyright © 2017 Elsevier B.V. All rights reserved.
Dunea, Daniel; Pohoata, Alin; Iordache, Stefania
2015-07-01
The paper presents the screening of various feedforward neural networks (FANN) and wavelet-feedforward neural networks (WFANN) applied to time series of ground-level ozone (O3), nitrogen dioxide (NO2), and particulate matter (PM10 and PM2.5 fractions) recorded at four monitoring stations located in various urban areas of Romania, to identify common configurations with optimal generalization performance. Two distinct model runs were performed as follows: data processing using hourly-recorded time series of airborne pollutants during cold months (O3, NO2, and PM10), when residential heating increases the local emissions, and data processing using 24-h daily averaged concentrations (PM2.5) recorded between 2009 and 2012. Dataset variability was assessed using statistical analysis. Time series were passed through various FANNs. Each time series was decomposed in four time-scale components using three-level wavelets, which have been passed also through FANN, and recomposed into a single time series. The agreement between observed and modelled output was evaluated based on the statistical significance (r coefficient and correlation between errors and data). Daubechies db3 wavelet-Rprop FANN (6-4-1) utilization gave positive results for O3 time series optimizing the exclusive use of the FANN for hourly-recorded time series. NO2 was difficult to model due to time series specificity, but wavelet integration improved FANN performances. Daubechies db3 wavelet did not improve the FANN outputs for PM10 time series. Both models (FANN/WFANN) overestimated PM2.5 forecasted values in the last quarter of time series. A potential improvement of the forecasted values could be the integration of a smoothing algorithm to adjust the PM2.5 model outputs.
Modelling the spatial distribution of ammonia emissions in the UK.
Hellsten, S; Dragosits, U; Place, C J; Vieno, M; Dore, A J; Misselbrook, T H; Tang, Y S; Sutton, M A
2008-08-01
Ammonia emissions (NH3) are characterised by a high spatial variability at a local scale. When modelling the spatial distribution of NH3 emissions, it is important to provide robust emission estimates, since the model output is used to assess potential environmental impacts, e.g. exceedance of critical loads. The aim of this study was to provide a new, updated spatial NH3 emission inventory for the UK for the year 2000, based on an improved modelling approach and the use of updated input datasets. The AENEID model distributes NH3 emissions from a range of agricultural activities, such as grazing and housing of livestock, storage and spreading of manures, and fertilizer application, at a 1-km grid resolution over the most suitable landcover types. The results of the emission calculation for the year 2000 are analysed and the methodology is compared with a previous spatial emission inventory for 1996.
Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline
2014-01-01
Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel.
Training Data Requirement for a Neural Network to Predict Aerodynamic Coefficients
NASA Technical Reports Server (NTRS)
Korsmeyer, David (Technical Monitor); Rajkumar, T.; Bardina, Jorge
2003-01-01
Basic aerodynamic coefficients are modeled as functions of angle of attack, speed brake deflection angle, Mach number, and side slip angle. Most of the aerodynamic parameters can be well-fitted using polynomial functions. We previously demonstrated that a neural network is a fast, reliable way of predicting aerodynamic coefficients. We encountered few under fitted and/or over fitted results during prediction. The training data for the neural network are derived from wind tunnel test measurements and numerical simulations. The basic questions that arise are: how many training data points are required to produce an efficient neural network prediction, and which type of transfer functions should be used between the input-hidden layer and hidden-output layer. In this paper, a comparative study of the efficiency of neural network prediction based on different transfer functions and training dataset sizes is presented. The results of the neural network prediction reflect the sensitivity of the architecture, transfer functions, and training dataset size.
NASA Astrophysics Data System (ADS)
Grieco, G.; Nirchio, F.; Montuori, A.; Migliaccio, M.; Lin, W.; Portabella, M.
2016-08-01
The dependency of the azimuth wavelength cut-off on the wind speed has been studied through a dataset of Sentinel-1 multi look SAR images co-located with wind speed measurements, significant wave height and mean wave direction from ECMWF operational output.A Geophysical Model Function (GMF) has been fitted and a retrieval exercise has been done comparing the results to a set of independent wind speed scatterometer measurements of the Chinese mission HY-2A. The preliminary results show that the dependency of the azimuth cut-off on the wind speed is linear only for fully developed sea states and that the agreement between the retrieved values and the measurements is good especially for high wind speed.A similar approach has been used to assess the dependency of the azimuth cut-off also for X-band COSMO-SkyMed data. The dataset is still incomplete but the preliminary results show a similar trend.
Processing and population genetic analysis of multigenic datasets with ProSeq3 software.
Filatov, Dmitry A
2009-12-01
The current tendency in molecular population genetics is to use increasing numbers of genes in the analysis. Here I describe a program for handling and population genetic analysis of DNA polymorphism data collected from multiple genes. The program includes a sequence/alignment editor and an internal relational database that simplify the preparation and manipulation of multigenic DNA polymorphism datasets. The most commonly used DNA polymorphism analyses are implemented in ProSeq3, facilitating population genetic analysis of large multigenic datasets. Extensive input/output options make ProSeq3 a convenient hub for sequence data processing and analysis. The program is available free of charge from http://dps.plants.ox.ac.uk/sequencing/proseq.htm.
geoknife: Reproducible web-processing of large gridded datasets
Read, Jordan S.; Walker, Jordan I.; Appling, Alison P.; Blodgett, David L.; Read, Emily K.; Winslow, Luke A.
2016-01-01
Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large-scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially-averaged time series values filtered by user-specified areas of interest, and categorical coverage fractions for various land-use types.
GODIVA2: interactive visualization of environmental data on the Web.
Blower, J D; Haines, K; Santokhee, A; Liu, C L
2009-03-13
GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.
The Community Intercomparison Suite (CIS)
NASA Astrophysics Data System (ADS)
Watson-Parris, Duncan; Schutgens, Nick; Cook, Nick; Kipling, Zak; Kershaw, Phil; Gryspeerdt, Ed; Lawrence, Bryan; Stier, Philip
2017-04-01
Earth observations (both remote and in-situ) create vast amounts of data providing invaluable constraints for the climate science community. Efficient exploitation of these complex and highly heterogeneous datasets has been limited however by the lack of suitable software tools, particularly for comparison of gridded and ungridded data, thus reducing scientific productivity. CIS (http://cistools.net) is an open-source, command line tool and Python library which allows the straight-forward quantitative analysis, intercomparison and visualisation of remote sensing, in-situ and model data. The CIS can read gridded and ungridded remote sensing, in-situ and model data - and many other data sources 'out-of-the-box', such as ESA Aerosol and Cloud CCI product, MODIS, Cloud CCI, Cloudsat, AERONET. Perhaps most importantly however CIS also employs a modular plugin architecture to allow for the reading of limitless different data types. Users are able to write their own plugins for reading the data sources which they are familiar with, and share them within the community, allowing all to benefit from their expertise. To enable the intercomparison of this data the CIS provides a number of operations including: the aggregation of ungridded and gridded datasets to coarser representations using a number of different built in averaging kernels; the subsetting of data to reduce its extent or dimensionality; the co-location of two distinct datasets onto a single set of co-ordinates; the visualisation of the input or output data through a number of different plots and graphs; the evaluation of arbitrary mathematical expressions against any number of datasets; and a number of other supporting functions such as a statistical comparison of two co-located datasets. These operations can be performed efficiently on local machines or large computing clusters - and is already available on the JASMIN computing facility. A case-study using the GASSP collection of in-situ aerosol observations will demonstrate the power of using CIS to perform model evaluations. The use of an open-source, community developed tool in this way opens up a huge amount of data which would previously have been inaccessible to many users, while also providing replicable, repeatable analysis which scientists and policy-makers alike can trust and understand.
The Transition of NASA EOS Datasets to WFO Operations: A Model for Future Technology Transfer
NASA Technical Reports Server (NTRS)
Darden, C.; Burks, J.; Jedlovec, G.; Haines, S.
2007-01-01
The collocation of a National Weather Service (NWS) Forecast Office with atmospheric scientists from NASA/Marshall Space Flight Center (MSFC) in Huntsville, Alabama has afforded a unique opportunity for science sharing and technology transfer. Specifically, the NWS office in Huntsville has interacted closely with research scientists within the SPORT (Short-term Prediction and Research and Transition) Center at MSFC. One significant technology transfer that has reaped dividends is the transition of unique NASA EOS polar orbiting datasets into NWS field operations. NWS forecasters primarily rely on the AWIPS (Advanced Weather Information and Processing System) decision support system for their day to day forecast and warning decision making. Unfortunately, the transition of data from operational polar orbiters or low inclination orbiting satellites into AWIPS has been relatively slow due to a variety of reasons. The ability to integrate these high resolution NASA datasets into operations has yielded several benefits. The MODIS (MODerate-resolution Imaging Spectrometer ) instrument flying on the Aqua and Terra satellites provides a broad spectrum of multispectral observations at resolutions as fine as 250m. Forecasters routinely utilize these datasets to locate fine lines, boundaries, smoke plumes, locations of fog or haze fields, and other mesoscale features. In addition, these important datasets have been transitioned to other WFOs for a variety of local uses. For instance, WFO Great Falls Montana utilizes the MODIS snow cover product for hydrologic planning purposes while several coastal offices utilize the output from the MODIS and AMSR-E instruments to supplement observations in the data sparse regions of the Gulf of Mexico and western Atlantic. In the short term, these datasets have benefited local WFOs in a variety of ways. In the longer term, the process by which these unique datasets were successfully transitioned to operations will benefit the planning and implementation of products and datasets derived from both NPP and NPOESS. This presentation will provide a brief overview of current WFO usage of satellite data, the transition of datasets between SPORT and the N W S , and lessons learned for future transition efforts.
NASA Astrophysics Data System (ADS)
Cammalleri, Carmelo; Vogt, Jürgen V.; Bisselink, Bernard; de Roo, Ad
2017-12-01
Agricultural drought events can affect large regions across the world, implying the need for a suitable global tool for an accurate monitoring of this phenomenon. Soil moisture anomalies are considered a good metric to capture the occurrence of agricultural drought events, and they have become an important component of several operational drought monitoring systems. In the framework of the JRC Global Drought Observatory (GDO, http://edo.jrc.ec.europa.eu/gdo/), the suitability of three datasets as possible representations of root zone soil moisture anomalies has been evaluated: (1) the soil moisture from the Lisflood distributed hydrological model (namely LIS), (2) the remotely sensed Land Surface Temperature data from the MODIS satellite (namely LST), and (3) the ESA Climate Change Initiative combined passive/active microwave skin soil moisture dataset (namely CCI). Due to the independency of these three datasets, the triple collocation (TC) technique has been applied, aiming at quantifying the likely error associated with each dataset in comparison to the unknown true status of the system. TC analysis was performed on five macro-regions (namely North America, Europe, India, southern Africa and Australia) detected as suitable for the experiment, providing insight into the mutual relationship between these datasets as well as an assessment of the accuracy of each method. Even if no definitive statement on the spatial distribution of errors can be provided, a clear outcome of the TC analysis is the good performance of the remote sensing datasets, especially CCI, over dry regions such as Australia and southern Africa, whereas the outputs of LIS seem to be more reliable over areas that are well monitored through meteorological ground station networks, such as North America and Europe. In a global drought monitoring system, the results of the error analysis are used to design a weighted-average ensemble system that exploits the advantages of each dataset.
The sensitivity of ecosystem service models to choices of input data and spatial resolution
Bagstad, Kenneth J.; Cohen, Erika; Ancona, Zachary H.; McNulty, Steven; Sun, Ge
2018-01-01
Although ecosystem service (ES) modeling has progressed rapidly in the last 10–15 years, comparative studies on data and model selection effects have become more common only recently. Such studies have drawn mixed conclusions about whether different data and model choices yield divergent results. In this study, we compared the results of different models to address these questions at national, provincial, and subwatershed scales in Rwanda. We compared results for carbon, water, and sediment as modeled using InVEST and WaSSI using (1) land cover data at 30 and 300 m resolution and (2) three different input land cover datasets. WaSSI and simpler InVEST models (carbon storage and annual water yield) were relatively insensitive to the choice of spatial resolution, but more complex InVEST models (seasonal water yield and sediment regulation) produced large differences when applied at differing resolution. Six out of nine ES metrics (InVEST annual and seasonal water yield and WaSSI) gave similar predictions for at least two different input land cover datasets. Despite differences in mean values when using different data sources and resolution, we found significant and highly correlated results when using Spearman's rank correlation, indicating consistent spatial patterns of high and low values. Our results confirm and extend conclusions of past studies, showing that in certain cases (e.g., simpler models and national-scale analyses), results can be robust to data and modeling choices. For more complex models, those with different output metrics, and subnational to site-based analyses in heterogeneous environments, data and model choices may strongly influence study findings.
Multi-modal gesture recognition using integrated model of motion, audio and video
NASA Astrophysics Data System (ADS)
Goutsu, Yusuke; Kobayashi, Takaki; Obara, Junya; Kusajima, Ikuo; Takeichi, Kazunari; Takano, Wataru; Nakamura, Yoshihiko
2015-07-01
Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.
NASA Technical Reports Server (NTRS)
Larson, Jay W.
1998-01-01
Atmospheric data assimilation is a method of combining actual observations with model forecasts to produce a more accurate description of the earth system than the observations or forecast alone can provide. The output of data assimilation, sometimes called the analysis, are regular, gridded datasets of observed and unobserved variables. Analysis plays a key role in numerical weather prediction and is becoming increasingly important for climate research. These applications, and the need for timely validation of scientific enhancements to the data assimilation system pose computational demands that are best met by distributed parallel software. The mission of the NASA Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. The system used to create these datasets is the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The core components of the the GEOS DAS are: the GEOS General Circulation Model (GCM), the Physical-space Statistical Analysis System (PSAS), the Observer, the on-line Quality Control (QC) system, the Coupler (which feeds analysis increments back to the GCM), and an I/O package for processing the large amounts of data the system produces (which will be described in another presentation in this session). The discussion will center on the following issues: the computational complexity for the whole GEOS DAS, assessment of the performance of the individual elements of GEOS DAS, and parallelization strategy for some of the components of the system.
Dem Local Accuracy Patterns in Land-Use/Land-Cover Classification
NASA Astrophysics Data System (ADS)
Katerji, Wassim; Farjas Abadia, Mercedes; Morillo Balsera, Maria del Carmen
2016-01-01
Global and nation-wide DEM do not preserve the same height accuracy throughout the area of study. Instead of assuming a single RMSE value for the whole area, this study proposes a vario-model that divides the area into sub-regions depending on the land-use / landcover (LULC) classification, and assigns a local accuracy per each zone, as these areas share similar terrain formation and roughness, and tend to have similar DEM accuracies. A pilot study over Lebanon using the SRTM and ASTER DEMs, combined with a set of 1,105 randomly distributed ground control points (GCPs) showed that even though the inputDEMs have different spatial and temporal resolution, and were collected using difierent techniques, their accuracy varied similarly when changing over difierent LULC classes. Furthermore, validating the generated vario-models proved that they provide a closer representation of the accuracy to the validating GCPs than the conventional RMSE, by 94% and 86% for the SRTMand ASTER respectively. Geostatistical analysis of the input and output datasets showed that the results have a normal distribution, which support the generalization of the proven hypothesis, making this finding applicable to other input datasets anywhere around the world.
Developing an Automated Method for Detection of Operationally Relevant Ocean Fronts and Eddies
NASA Astrophysics Data System (ADS)
Rogers-Cotrone, J. D.; Cadden, D. D. H.; Rivera, P.; Wynn, L. L.
2016-02-01
Since the early 90's, the U.S. Navy has utilized an observation-based process for identification of frontal systems and eddies. These Ocean Feature Assessments (OFA) rely on trained analysts to identify and position ocean features using satellite-observed sea surface temperatures. Meanwhile, as enhancements and expansion of the navy's Hybrid Coastal Ocean Model (HYCOM) and Regional Navy Coastal Ocean Model (RNCOM) domains have proceeded, the Naval Oceanographic Office (NAVO) has provided Tactical Oceanographic Feature Assessments (TOFA) that are based on data-validated model output but also rely on analyst identification of significant features. A recently completed project has migrated OFA production to the ArcGIS-based Acoustic Reach-back Cell Ocean Analysis Suite (ARCOAS), enabling use of additional observational datasets and significantly decreasing production time; however, it has highlighted inconsistencies inherent to this analyst-based identification process. Current efforts are focused on development of an automated method for detecting operationally significant fronts and eddies that integrates model output and observational data on a global scale. Previous attempts to employ techniques from the scientific community have been unable to meet the production tempo at NAVO. Thus, a system that incorporates existing techniques (Marr-Hildreth, Okubo-Weiss, etc.) with internally-developed feature identification methods (from model-derived physical and acoustic properties) is required. Ongoing expansions to the ARCOAS toolset have shown promising early results.
NASA Astrophysics Data System (ADS)
Wang, W.; Hashimoto, H.; Milesi, C.; Nemani, R. R.; Myneni, R.
2011-12-01
Terrestrial ecosystem models are primary scientific tools to extrapolate our understanding of ecosystem functioning from point observations to global scales as well as from the past climatic conditions into the future. However, no model is nearly perfect and there are often considerable structural uncertainties existing between different models. Ensemble model experiments thus become a mainstream approach in evaluating the current status of global carbon cycle and predicting its future changes. A key task in such applications is to quantify the sensitivity of the simulated carbon fluxes to climate variations and changes. Here we develop a systematic framework to address this question solely by analyzing the inputs and the outputs from the models. The principle of our approach is to assume the long-term (~30 years) average of the inputs/outputs as a quasi-equlibrium of the climate-vegetation system while treat the anomalies of carbon fluxes as responses to climatic disturbances. In this way, the corresponding relationships can be largely linearized and analyzed using conventional time-series techniques. This method is used to characterize three major aspects of the vegetation models that are mostly important to global carbon cycle, namely the primary production, the biomass dynamics, and the ecosystem respiration. We apply this analytical framework to quantify the climatic sensitivity of an ensemble of models including CASA, Biome-BGC, LPJ as well as several other DGVMs from previous studies, all driven by the CRU-NCEP climate dataset. The detailed analysis results are reported in this study.
The 3D elevation program - Precision agriculture and other farm practices
Sugarbaker, Larry J.; Carswell, Jr., William J.
2016-12-27
A founding motto of the Natural Resources Conservation Service (NRCS), originally the Soil Conservation Service (SCS), explains that “If we take care of the land, it will take care of us.” Digital elevation models (DEMs; see fig. 1) are derived from light detection and ranging (lidar) data and can be processed to derive values such as slope angle, aspect, and topographic curvature. These three measurements are the principal parameters of the NRCS LidarEnhanced Soil Survey (LESS) model, which improves the precision of soil surveys, by more accurately displaying the slopes and soils patterns, while increasing the objectivity and science in line placement. As combined resources, DEMs, LESS model outputs, and similar derived datasets are essential for conserving soil, wetlands, and other natural resources managed and overseen by the NRCS and other Federal and State agencies.
NASA Astrophysics Data System (ADS)
Waliser, D. E.; Kim, J.; Mattman, C.; Goodale, C.; Hart, A.; Zimdars, P.; Lean, P.
2011-12-01
Evaluation of climate models against observations is an essential part of assessing the impact of climate variations and change on regionally important sectors and improving climate models. Regional climate models (RCMs) are of a particular concern. RCMs provide fine-scale climate needed by the assessment community via downscaling global climate model projections such as those contributing to the Coupled Model Intercomparison Project (CMIP) that form one aspect of the quantitative basis of the IPCC Assessment Reports. The lack of reliable fine-resolution observational data and formal tools and metrics has represented a challenge in evaluating RCMs. Recent satellite observations are particularly useful as they provide a wealth of information and constraints on many different processes within the climate system. Due to their large volume and the difficulties associated with accessing and using contemporary observations, however, these datasets have been generally underutilized in model evaluation studies. Recognizing this problem, NASA JPL and UCLA have developed the Regional Climate Model Evaluation System (RCMES) to help make satellite observations, in conjunction with in-situ and reanalysis datasets, more readily accessible to the regional modeling community. The system includes a central database (Regional Climate Model Evaluation Database: RCMED) to store multiple datasets in a common format and codes for calculating and plotting statistical metrics to assess model performance (Regional Climate Model Evaluation Tool: RCMET). This allows the time taken to compare model data with satellite observations to be reduced from weeks to days. RCMES is a component of the recent ExArch project, an international effort for facilitating the archive and access of massive amounts data for users using cloud-based infrastructure, in this case as applied to the study of climate and climate change. This presentation will describe RCMES and demonstrate its utility using examples from RCMs applied to the southwest US as well as to Africa based on output from the CORDEX activity. Application of RCMES to the evaluation of multi-RCM hindcast for CORDEX-Africa will be presented in a companion paper in A41.
Hlusko, Leslea J; Schmitt, Christopher A; Monson, Tesla A; Brasil, Marianne F; Mahaney, Michael C
2016-08-16
Developmental genetics research on mice provides a relatively sound understanding of the genes necessary and sufficient to make mammalian teeth. However, mouse dentitions are highly derived compared with human dentitions, complicating the application of these insights to human biology. We used quantitative genetic analyses of data from living nonhuman primates and extensive osteological and paleontological collections to refine our assessment of dental phenotypes so that they better represent how the underlying genetic mechanisms actually influence anatomical variation. We identify ratios that better characterize the output of two dental genetic patterning mechanisms for primate dentitions. These two newly defined phenotypes are heritable with no measurable pleiotropic effects. When we consider how these two phenotypes vary across neontological and paleontological datasets, we find that the major Middle Miocene taxonomic shift in primate diversity is characterized by a shift in these two genetic outputs. Our results build on the mouse model by combining quantitative genetics and paleontology, and thereby elucidate how genetic mechanisms likely underlie major events in primate evolution.
Signal analysis of accelerometry data using gravity-based modeling
NASA Astrophysics Data System (ADS)
Davey, Neil P.; James, Daniel A.; Anderson, Megan E.
2004-03-01
Triaxial accelerometers have been used to measure human movement parameters in swimming. Interpretation of data is difficult due to interference sources including interaction of external bodies. In this investigation the authors developed a model to simulate the physical movement of the lower back. Theoretical accelerometery outputs were derived thus giving an ideal, or noiseless dataset. An experimental data collection apparatus was developed by adapting a system to the aquatic environment for investigation of swimming. Model data was compared against recorded data and showed strong correlation. Comparison of recorded and modeled data can be used to identify changes in body movement, this is especially useful when cyclic patterns are present in the activity. Strong correlations between data sets allowed development of signal processing algorithms for swimming stroke analysis using first the pure noiseless data set which were then applied to performance data. Video analysis was also used to validate study results and has shown potential to provide acceptable results.
Evaluation of LIS-based Soil Moisture and Evapotranspiration in the Korean Peninsula
NASA Astrophysics Data System (ADS)
Jung, H. C.; Kang, D. H.; Kim, E. J.; Yoon, Y.; Kumar, S.; Peters-Lidard, C. D.; Baeck, S. H.; Hwang, E.; Chae, H.
2017-12-01
K-water is the South Korean national water agency. It is the government-funded private agency for water resource development that provides both civil and industrial water in S. Korea. K-water is interested in exploring how earth remote sensing and modeling can help their tasks. In this context, the NASA Land Information System (LIS) is implemented to simulate land surface processes in the Korean Peninsula. The Noah land surface model with Multi-Parameterization, version 3.6 (Noah-MP) is used to reproduce the water budget variables on a 1 km spatial resolution grid with a daily temporal resolution. The Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) datasets is used to force the system. The rainfall data are spatially downscaled from high resolution WorldClim precipitation climatology. The other meteorological inputs (i.e. air temperature, humidity, pressure, winds, radiation) are also downscaled by statistical methods (i.e. lapse-rate, slope-aspect). Additional model experiments are conducted with local rainfall datasets and soil maps to replace the downscaled MERRA-2 precipitation field and the hybrid STATSGO/FAO soil texture, respectively. For the evaluation of model performance, daily soil moisture and evapotranspiration measurements at several stations are compared to the LIS-based outputs. This study demonstrates that application of NASA's LIS can enhance drought and flood prediction capabilities in South Asia and Korea.
NASA Astrophysics Data System (ADS)
Switzer, A.; Yap, W.; Lauro, F.; Gouramanis, C.; Dominey-Howes, D.; Labbate, M.
2016-12-01
This presentation provides an overview of the PERSIANN precipitation products from the near real time high-resolution (4km, 30 min) PERSIANN-CCS to the most recent 34+-year PERSIANN-CDR (25km, daily). It is widely believed that the hydrologic cycle has been intensifying due to global warming and the frequency and the intensity of hydrologic extremes has also been increasing. Using the long-term historical global high resolution (daily, 0.25 degree) PERSIANN-CDR dataset covering over three decades from 1983 to the present day, we assess changes in global precipitation across different spatial scales. Our results show differences in trends, depending on which spatial scale is used, highlighting the importance of spatial scale in trend analysis. In addition, while there is an easily observable increasing global temperature trend, the global precipitation trend results created by the PERSIANN-CDR dataset used in this study are inconclusive. In addition, we use PERSIANN-CDR to assess the performance of the 32 CMIP5 models in terms of extreme precipitation indices in various continent-climate zones. The assessment can provide a guide for both model developers to target regions and processes that are not yet fully captured in certain climate types, and for climate model output users to be able to select the models and/or the study areas that may best fit their applications of interest.
NASA Astrophysics Data System (ADS)
Sorooshian, S.; Nguyen, P.; Hsu, K. L.
2017-12-01
This presentation provides an overview of the PERSIANN precipitation products from the near real time high-resolution (4km, 30 min) PERSIANN-CCS to the most recent 34+-year PERSIANN-CDR (25km, daily). It is widely believed that the hydrologic cycle has been intensifying due to global warming and the frequency and the intensity of hydrologic extremes has also been increasing. Using the long-term historical global high resolution (daily, 0.25 degree) PERSIANN-CDR dataset covering over three decades from 1983 to the present day, we assess changes in global precipitation across different spatial scales. Our results show differences in trends, depending on which spatial scale is used, highlighting the importance of spatial scale in trend analysis. In addition, while there is an easily observable increasing global temperature trend, the global precipitation trend results created by the PERSIANN-CDR dataset used in this study are inconclusive. In addition, we use PERSIANN-CDR to assess the performance of the 32 CMIP5 models in terms of extreme precipitation indices in various continent-climate zones. The assessment can provide a guide for both model developers to target regions and processes that are not yet fully captured in certain climate types, and for climate model output users to be able to select the models and/or the study areas that may best fit their applications of interest.
The U.S. Geological Survey Monthly Water Balance Model Futures Portal
Bock, Andrew R.; Hay, Lauren E.; Markstrom, Steven L.; Emmerich, Christopher; Talbert, Marian
2017-05-03
The U.S. Geological Survey Monthly Water Balance Model Futures Portal (https://my.usgs.gov/mows/) is a user-friendly interface that summarizes monthly historical and simulated future conditions for seven hydrologic and meteorological variables (actual evapotranspiration, potential evapotranspiration, precipitation, runoff, snow water equivalent, atmospheric temperature, and streamflow) at locations across the conterminous United States (CONUS).The estimates of these hydrologic and meteorological variables were derived using a Monthly Water Balance Model (MWBM), a modular system that simulates monthly estimates of components of the hydrologic cycle using monthly precipitation and atmospheric temperature inputs. Precipitation and atmospheric temperature from 222 climate datasets spanning historical conditions (1952 through 2005) and simulated future conditions (2020 through 2099) were summarized for hydrographic features and used to drive the MWBM for the CONUS. The MWBM input and output variables were organized into an open-access database. An Open Geospatial Consortium, Inc., Web Feature Service allows the querying and identification of hydrographic features across the CONUS. To connect the Web Feature Service to the open-access database, a user interface—the Monthly Water Balance Model Futures Portal—was developed to allow the dynamic generation of summary files and plots based on plot type, geographic location, specific climate datasets, period of record, MWBM variable, and other options. Both the plots and the data files are made available to the user for download
Halsteinli, Vidar; Kittelsen, Sverre A; Magnussen, Jon
2010-02-01
The performance of health service providers may be monitored by measuring productivity. However, the policy value of such measures may depend crucially on the accuracy of input and output measures. In particular, an important question is how to adjust adequately for case-mix in the production of health care. In this study, we assess productivity growth in Norwegian outpatient child and adolescent mental health service units (CAMHS) over a period characterized by governmental utilization of simple productivity indices, a substantial increase in capacity and a concurrent change in case-mix. We analyze the sensitivity of the productivity growth estimates using different specifications of output to adjust for case-mix differences. Case-mix adjustment is achieved by distributing patients into eight groups depending on reason for referral, age and gender, as well as correcting for the number of consultations. We utilize the nonparametric Data Envelopment Analysis (DEA) method to implicitly calculate weights that maximize each unit's efficiency. Malmquist indices of technical productivity growth are estimated and bootstrap procedures are performed to calculate confidence intervals and to test alternative specifications of outputs. The dataset consist of an unbalanced panel of 48-60 CAMHS in the period 1998-2006. The mean productivity growth estimate from a simple unadjusted patient model (one single output) is 35%; adjusting for case-mix (eight outputs) reduces the growth estimate to 15%. Adding consultations increases the estimate to 28%. The latter reflects an increase in number of consultations per patient. We find that the governmental productivity indices strongly tend to overestimate productivity growth. Case-mix adjustment is of major importance and governmental utilization of performance indicators necessitates careful considerations of output specifications. Copyright 2009 Elsevier Ltd. All rights reserved.
Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B
2017-12-01
Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Mathur, P K; Herrero-Medrano, J M; Alexandri, P; Knol, E F; ten Napel, J; Rashidi, H; Mulder, H A
2014-12-01
A method was developed and tested to estimate challenge load due to disease outbreaks and other challenges in sows using reproduction records. The method was based on reproduction records from a farm with known disease outbreaks. It was assumed that the reduction in weekly reproductive output within a farm is proportional to the magnitude of the challenge. As the challenge increases beyond certain threshold, it is manifested as an outbreak. The reproduction records were divided into 3 datasets. The first dataset called the Training dataset consisted of 57,135 reproduction records from 10,901 sows from 1 farm in Canada with several outbreaks of porcine reproductive and respiratory syndrome (PRRS). The known disease status of sows was regressed on the traits number born alive, number of losses as a combination of still birth and mummified piglets, and number of weaned piglets. The regression coefficients from this analysis were then used as weighting factors for derivation of an index measure called challenge load indicator. These weighting factors were derived with i) a two-step approach using residuals or year-week solutions estimated from a previous step, and ii) a single-step approach using the trait values directly. Two types of models were used for each approach: a logistic regression model and a general additive model. The estimates of challenge load indicator were then compared based on their ability to detect PRRS outbreaks in a Test dataset consisting of records from 65,826 sows from 15 farms in the Netherlands. These farms differed from the Canadian farm with respect to PRRS virus strains, severity and frequency of outbreaks. The single-step approach using a general additive model was best and detected 14 out of the 15 outbreaks. This approach was then further validated using the third dataset consisting of reproduction records of 831,855 sows in 431 farms located in different countries in Europe and America. A total of 41 out of 48 outbreaks detected using data analysis were confirmed based on diagnostic information received from the farms. Among these, 30 outbreaks were due to PRRS while 11 were due to other diseases and challenging conditions. The results suggest that proposed method could be useful for estimation of challenge load and detection of challenge phases such as disease outbreaks.
Validation of a 30 m resolution flood hazard model of the conterminous United States
NASA Astrophysics Data System (ADS)
Wing, Oliver E. J.; Bates, Paul D.; Sampson, Christopher C.; Smith, Andrew M.; Johnson, Kris A.; Erickson, Tyler A.
2017-09-01
This paper reports the development of a ˜30 m resolution two-dimensional hydrodynamic model of the conterminous U.S. using only publicly available data. The model employs a highly efficient numerical solution of the local inertial form of the shallow water equations which simulates fluvial flooding in catchments down to 50 km2 and pluvial flooding in all catchments. Importantly, we use the U.S. Geological Survey (USGS) National Elevation Dataset to determine topography; the U.S. Army Corps of Engineers National Levee Dataset to explicitly represent known flood defenses; and global regionalized flood frequency analysis to characterize return period flows and rainfalls. We validate these simulations against the complete catalogue of Federal Emergency Management Agency (FEMA) Special Flood Hazard Area (SFHA) maps and detailed local hydraulic models developed by the USGS. Where the FEMA SFHAs are based on high-quality local models, the continental-scale model attains a hit rate of 86%. This correspondence improves in temperate areas and for basins above 400 km2. Against the higher quality USGS data, the average hit rate reaches 92% for the 1 in 100 year flood, and 90% for all flood return periods. Given typical hydraulic modeling uncertainties in the FEMA maps and USGS model outputs (e.g., errors in estimating return period flows), it is probable that the continental-scale model can replicate both to within error. The results show that continental-scale models may now offer sufficient rigor to inform some decision-making needs with dramatically lower cost and greater coverage than approaches based on a patchwork of local studies.
Application of web-GIS approach for climate change study
NASA Astrophysics Data System (ADS)
Okladnikov, Igor; Gordov, Evgeny; Titov, Alexander; Bogomolov, Vasily; Martynova, Yuliya; Shulgina, Tamara
2013-04-01
Georeferenced datasets are currently actively used in numerous applications including modeling, interpretation and forecast of climatic and ecosystem changes for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their huge size which might constitute up to tens terabytes for a single dataset at present studies in the area of climate and environmental change require a special software support. A dedicated web-GIS information-computational system for analysis of georeferenced climatological and meteorological data has been created. It is based on OGC standards and involves many modern solutions such as object-oriented programming model, modular composition, and JavaScript libraries based on GeoExt library, ExtJS Framework and OpenLayers software. The main advantage of the system lies in a possibility to perform mathematical and statistical data analysis, graphical visualization of results with GIS-functionality, and to prepare binary output files with just only a modern graphical web-browser installed on a common desktop computer connected to Internet. Several geophysical datasets represented by two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis, DWD Global Precipitation Climatology Centre's data, GMAO Modern Era-Retrospective analysis for Research and Applications, meteorological observational data for the territory of the former USSR for the 20th century, results of modeling by global and regional climatological models, and others are available for processing by the system. And this list is extending. Also a functionality to run WRF and "Planet simulator" models was implemented in the system. Due to many preset parameters and limited time and spatial ranges set in the system these models have low computational power requirements and could be used in educational workflow for better understanding of basic climatological and meteorological processes. The Web-GIS information-computational system for geophysical data analysis provides specialists involved into multidisciplinary research projects with reliable and practical instruments for complex analysis of climate and ecosystems changes on global and regional scales. Using it even unskilled user without specific knowledge can perform computational processing and visualization of large meteorological, climatological and satellite monitoring datasets through unified web-interface in a common graphical web-browser. This work is partially supported by the Ministry of education and science of the Russian Federation (contract #8345), SB RAS project VIII.80.2.1, RFBR grant #11-05-01190a, and integrated project SB RAS #131.
NASA Astrophysics Data System (ADS)
Huff, A. K.; Weber, S.; Braggio, J.; Talbot, T.; Hall, E.
2012-12-01
Fine particulate matter (PM2.5) is a criterion air pollutant, and its adverse impacts on human health are well established. Traditionally, studies that analyze the health effects of human exposure to PM2.5 use concentration measurements from ground-based monitors and predicted PM2.5 concentrations from air quality models, such as the U.S. EPA's Community Multi-scale Air Quality (CMAQ) model. There are shortcomings associated with these datasets, however. Monitors are not distributed uniformly across the U.S., which causes spatially inhomogeneous measurements of pollutant concentrations. There are often temporal variations as well, since not all monitors make daily measurements. Air quality model output, while spatially and temporally uniform, represents predictions of PM2.5 concentrations, not actual measurements. This study is exploring the potential of combining Aerosol Optical Depth (AOD) data from the MODIS instrument on NASA's Terra and Aqua satellites with PM2.5 monitor data and CMAQ predictions to create PM2.5 datasets that more accurately reflect the spatial and temporal variations in ambient PM2.5 concentrations on the metropolitan scale, with the overall goal of enhancing capabilities for environmental public health decision-making. AOD data provide regional information about particulate concentrations that can fill in the spatial and temporal gaps in the national PM2.5 monitor network. Furthermore, AOD is a measurement, so it reflects actual concentrations of particulates in the atmosphere, in contrast to PM2.5 predictions from air quality models. Results will be presented from the Battelle/U.S. EPA statistical Hierarchical Bayesian Model (HBM), which was used to combine three PM2.5 concentration datasets: monitor measurements, AOD data, and CMAQ model predictions. The study is focusing on the Baltimore, MD and New York City, NY metropolitan regions for the period 2004-2006. For each region, combined monitor/AOD/CMAQ PM2.5 datasets generated by the HBM are being correlated with data on inpatient hospitalizations and emergency room visits for seven respiratory and cardiovascular diseases using statistical case-crossover analyses. Preliminary results will be discussed regarding the potential for the addition of AOD data to increase the correlation between PM2.5 concentrations and health outcomes. Environmental public health tracking programs associated with the Maryland Department of Health and Mental Hygiene, the New York State Department of Health, the CDC, and the U.S. EPA have expressed interest in using the results of this study to enhance their existing environmental health surveillance activities.
Semi-automated surface mapping via unsupervised classification
NASA Astrophysics Data System (ADS)
D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.
2017-09-01
Due to the increasing volume of the returned data from space mission, the human search for correlation and identification of interesting features becomes more and more unfeasible. Statistical extraction of features via machine learning methods will increase the scientific output of remote sensing missions and aid the discovery of yet unknown feature hidden in dataset. Those methods exploit algorithm trained on features from multiple instrument, returning classification maps that explore intra-dataset correlation, allowing for the discovery of unknown features. We present two applications, one for Mercury and one for Vesta.
Automatic Residential/Commercial Classification of Parcels with Solar Panel Detections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morton, April M; Omitaomu, Olufemi A; Kotikot, Susan
A computational method to automatically detect solar panels on rooftops to aid policy and financial assessment of solar distributed generation. The code automatically classifies parcels containing solar panels in the U.S. as residential or commercial. The code allows the user to specify an input dataset containing parcels and detected solar panels, and then uses information about the parcels and solar panels to automatically classify the rooftops as residential or commercial using machine learning techniques. The zip file containing the code includes sample input and output datasets for the Boston and DC areas.
NASA Astrophysics Data System (ADS)
Taylor, M. B.
2009-09-01
The new plotting functionality in version 2.0 of STILTS is described. STILTS is a mature and powerful package for all kinds of table manipulation, and this version adds facilities for generating plots from one or more tables to its existing wide range of non-graphical capabilities. 2- and 3-dimensional scatter plots and 1-dimensional histograms may be generated using highly configurable style parameters. Features include multiple dataset overplotting, variable transparency, 1-, 2- or 3-dimensional symmetric or asymmetric error bars, higher-dimensional visualization using color, and textual point labeling. Vector and bitmapped output formats are supported. The plotting options provide enough flexibility to perform meaningful visualization on datasets from a few points up to tens of millions. Arbitrarily large datasets can be plotted without heavy memory usage.
The EPA Control Strategy Tool (CoST) is a software tool for projecting potential future control scenarios, their effects on emissions and estimated costs. This tool uses the NEI and the Control Measures Dataset as key inputs. CoST outputs are projections of future control scenarios.
Deep Learning: A Primer for Radiologists.
Chartrand, Gabriel; Cheng, Phillip M; Vorontsov, Eugene; Drozdzal, Michal; Turcotte, Simon; Pal, Christopher J; Kadoury, Samuel; Tang, An
2017-01-01
Deep learning is a class of machine learning methods that are gaining success and attracting interest in many domains, including computer vision, speech recognition, natural language processing, and playing games. Deep learning methods produce a mapping from raw inputs to desired outputs (eg, image classes). Unlike traditional machine learning methods, which require hand-engineered feature extraction from inputs, deep learning methods learn these features directly from data. With the advent of large datasets and increased computing power, these methods can produce models with exceptional performance. These models are multilayer artificial neural networks, loosely inspired by biologic neural systems. Weighted connections between nodes (neurons) in the network are iteratively adjusted based on example pairs of inputs and target outputs by back-propagating a corrective error signal through the network. For computer vision tasks, convolutional neural networks (CNNs) have proven to be effective. Recently, several clinical applications of CNNs have been proposed and studied in radiology for classification, detection, and segmentation tasks. This article reviews the key concepts of deep learning for clinical radiologists, discusses technical requirements, describes emerging applications in clinical radiology, and outlines limitations and future directions in this field. Radiologists should become familiar with the principles and potential applications of deep learning in medical imaging. © RSNA, 2017.
Simulation of extreme rainfall and projection of future changes using the GLIMCLIM model
NASA Astrophysics Data System (ADS)
Rashid, Md. Mamunur; Beecham, Simon; Chowdhury, Rezaul Kabir
2017-10-01
In this study, the performance of the Generalized LInear Modelling of daily CLImate sequence (GLIMCLIM) statistical downscaling model was assessed to simulate extreme rainfall indices and annual maximum daily rainfall (AMDR) when downscaled daily rainfall from National Centers for Environmental Prediction (NCEP) reanalysis and Coupled Model Intercomparison Project Phase 5 (CMIP5) general circulation models (GCM) (four GCMs and two scenarios) output datasets and then their changes were estimated for the future period 2041-2060. The model was able to reproduce the monthly variations in the extreme rainfall indices reasonably well when forced by the NCEP reanalysis datasets. Frequency Adapted Quantile Mapping (FAQM) was used to remove bias in the simulated daily rainfall when forced by CMIP5 GCMs, which reduced the discrepancy between observed and simulated extreme rainfall indices. Although the observed AMDR were within the 2.5th and 97.5th percentiles of the simulated AMDR, the model consistently under-predicted the inter-annual variability of AMDR. A non-stationary model was developed using the generalized linear model for local, shape and scale to estimate the AMDR with an annual exceedance probability of 0.01. The study shows that in general, AMDR is likely to decrease in the future. The Onkaparinga catchment will also experience drier conditions due to an increase in consecutive dry days coinciding with decreases in heavy (>long term 90th percentile) rainfall days, empirical 90th quantile of rainfall and maximum 5-day consecutive total rainfall for the future period (2041-2060) compared to the base period (1961-2000).
NASA Astrophysics Data System (ADS)
Scherstjanoi, M.; Kaplan, J. O.; Thürig, E.; Lischke, H.
2013-02-01
Models of vegetation dynamics that are designed for application at spatial scales larger than individual forest gaps suffer from several limitations. Typically, either a population average approximation is used that results in unrealistic tree allometry and forest stand structure, or models have a high computational demand because they need to simulate both a series of age-based cohorts and a number of replicate patches to account for stochastic gap-scale disturbances. The detail required by the latter method increases the number of calculations by two to three orders of magnitude compared to the less realistic population average approach. In an effort to increase the efficiency of dynamic vegetation models without sacrificing realism, and to explore patterns of spatial scaling in forests, we developed a new method for simulating stand-replacing disturbances that is both accurate and 10-50x faster than approaches that use replicate patches. The GAPPARD (approximating GAP model results with a Probabilistic Approach to account for stand Replacing Disturbances) method works by postprocessing the output of deterministic, undisturbed simulations of a cohort-based vegetation model by deriving the distribution of patch ages at any point in time on the basis of a disturbance probability. With this distribution, the expected value of any output variable can be calculated from the output values of the deterministic undisturbed run at the time corresponding to the patch age. To account for temporal changes in model forcing, e.g., as a result of climate change, GAPPARD performs a series of deterministic simulations and interpolates between the results in the postprocessing step. We integrated the GAPPARD method in the forest models LPJ-GUESS and TreeM-LPJ, and evaluated these in a series of simulations along an altitudinal transect of an inner-alpine valley. With GAPPARD applied to LPJ-GUESS results were insignificantly different from the output of the original model LPJ-GUESS using 100 replicate patches, but simulation time was reduced by approximately the factor 10. Our new method is therefore highly suited rapidly approximating LPJ-GUESS results, and provides the opportunity for future studies over large spatial domains, allows easier parameterization of tree species, faster identification of areas of interesting simulation results, and comparisons with large-scale datasets and forest models.
INFOMAR - Ireland's National Seabed Mapping Programme: A Tool For Marine Spatial Planning
NASA Astrophysics Data System (ADS)
Furey, T. M.
2016-02-01
INFOMAR is Ireland's national seabed mapping programme and is a key action in the national integrated marine plan, Harnessing Our Ocean Wealth. It comprises a multi-platform approach to delivering marine integrated mapping in 2 phases, over a projected 20 year timeline (2006-2026). The programme has three work strands; Data Acquisition, Data Exchange and Integration, and Value Added Exploitation. The Data Acquisition strand includes collection of hydrographic, oceanographic, geological, habitat and heritage datasets that will underpin future sustainable development and management of Ireland's marine resource. INFOMAR outputs are delivered through the Data Exchange and Integration strand. Uses of these outputs are wide ranging and multipurpose, from management plans for fisheries, aquaculture and coastal protection works, to environmental impact assessments, ocean renewable development and integrated coastal zone management. In order to address the evolution and diversification of maritime user requirements, the programme has realigned and developed outputs and new products, in part, through an innovative research funding initiative. Development is also fostered through the Value Added Exploitation strand. INFOMAR outputs and products serve to underpin delivery of Ireland's statutory obligations and enhance compliance with EU and national legislation. This is achieved through co-operation with the agencies responsible for supporting Ireland's international obligations and for the implementation of marine spatial planning. A strategic national seabed mapping programme such as INFOMAR, provides a critical baseline dataset which underpins development of the marine economy, and improves our understanding of the response of marine systems to pressures, and the effect of cumulative impacts. This paper will focus on the evolution and scope of INFOMAR, and look at examples of outputs being harnessed to serve approaches to the management of activities having an impact on the marine environment.
Predicting protein function and other biomedical characteristics with heterogeneous ensembles
Whalen, Sean; Pandey, Om Prakash
2015-01-01
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink. PMID:26342255
Smith, Stephen A; Moore, Michael J; Brown, Joseph W; Yang, Ya
2015-08-05
The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts ( https://bitbucket.org/blackrim/phyparts ), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.
USEEIO Elementary Flows and Life Cycle Impact Assessment (LCIA) Characterization Factors
This file contains all the elementary flows (defined in ISO 14044) used in the USEEIO model. The elementary flows come from a draft master list used by USEPA modified from the openLCA 1.4 software master list with original flows added. The characterization factors come from the openLCA 1.5.4 method pack or directly from TRACI 2.1 for the TRACI categories, or for the Non-TRACI categories, are originals used simply to sum up all types of resource use of a given type.This dataset is associated with the following publication:Yang, Y., W. Ingwersen, T. Hawkins, and D. Meyer. USEEIO: a New and Transparent United States Environmentally Extended Input-Output Model. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA,
True beam commissioning experience at Nordland Hospital Trust, Norway
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daci, Lulzime, E-mail: lulzime.daci@nodlandssykehuset.no; Malkaj, Partizan, E-mail: malkaj-p@hotmail.com
To evaluate the measured of all photon beam data of first Varian True Beam version 2.0 slim model, recently commissioned at Nordland Hospital Trust, Bodø. To compare and evaluate the possibility of beam matching with the Clinac2300, for the energies of 6MV and 15 MV. Materials/Methods: Measurements of PDD, OAR, and Output factors were realized with the IBA Blue-phantom with different detectors and evaluated between them for all photon energies: 6MV, 15MV, 6MV FFF and 10MV FFF. The ionization chambers used were Pin Point CC01, CC04, Semiflex CC13 and photon diode by Iba dosimetry. The data were processed using Beizermore » algorithm with a resolution of 1 mm. The measured depth dose curves, diagonals, OAR, and output factors were imported into Eclipse in order to calculate beam data for the anisotropic analytical algorithm (AAA version 10.0.28) for both the dataset measured with CC04 and CC13 and compared. The model head of 23EX was selected as the most near model to True Beam as a restriction of our version of Aria. It was seen that better results were achieved with the CC04 measured data as a result of better resolution. For the biggest field after 10 cm depth a larger difference is seen between measured and calculated for both dataset, but it is within the criteria for acceptance. Results: The Beam analysis criteria of 2 mm at 50% dose is achieved for all the fields accept for 40x40 that is within 3%. Depth difference at maximum dose is within 1 mm for all the fields and dose difference at 100 mm and 200 mm is lower than 1% for or all the fields. The PDD between two machines for all the fields differ after Dmax with less than 1%. For profiles in the field zone and outside field the difference is within 1% for all the fields. In the penumbra region the difference is from 2% up to 12% for big fields. As for diagonals they differ as a result of the head construction at the edge of the field and the penumbra region. The output factors differ for big fields within 5% and for the small fields within 3%. MU and dose distribution does not change for plans recalculated with the new modeled machine.« less
NASA Astrophysics Data System (ADS)
Golay, Jean; Kanevski, Mikhaïl
2013-04-01
The present research deals with the exploration and modeling of a complex dataset of 200 measurement points of sediment pollution by heavy metals in Lake Geneva. The fundamental idea was to use multivariate Artificial Neural Networks (ANN) along with geostatistical models and tools in order to improve the accuracy and the interpretability of data modeling. The results obtained with ANN were compared to those of traditional geostatistical algorithms like ordinary (co)kriging and (co)kriging with an external drift. Exploratory data analysis highlighted a great variety of relationships (i.e. linear, non-linear, independence) between the 11 variables of the dataset (i.e. Cadmium, Mercury, Zinc, Copper, Titanium, Chromium, Vanadium and Nickel as well as the spatial coordinates of the measurement points and their depth). Then, exploratory spatial data analysis (i.e. anisotropic variography, local spatial correlations and moving window statistics) was carried out. It was shown that the different phenomena to be modeled were characterized by high spatial anisotropies, complex spatial correlation structures and heteroscedasticity. A feature selection procedure based on General Regression Neural Networks (GRNN) was also applied to create subsets of variables enabling to improve the predictions during the modeling phase. The basic modeling was conducted using a Multilayer Perceptron (MLP) which is a workhorse of ANN. MLP models are robust and highly flexible tools which can incorporate in a nonlinear manner different kind of high-dimensional information. In the present research, the input layer was made of either two (spatial coordinates) or three neurons (when depth as auxiliary information could possibly capture an underlying trend) and the output layer was composed of one (univariate MLP) to eight neurons corresponding to the heavy metals of the dataset (multivariate MLP). MLP models with three input neurons can be referred to as Artificial Neural Networks with EXternal drift (ANNEX). Moreover, the exact number of output neurons and the selection of the corresponding variables were based on the subsets created during the exploratory phase. Concerning hidden layers, no restriction were made and multiple architectures were tested. For each MLP model, the quality of the modeling procedure was assessed by variograms: if the variogram of the residuals demonstrates pure nugget effect and if the level of the nugget exactly corresponds to the nugget value of the theoretical variogram of the corresponding variable, all the structured information has been correctly extracted without overfitting. Finally, it is worth mentioning that simple MLP models are not always able to remove all the spatial correlation structure from the data. In that case, Neural Network Residual Kriging (NNRK) can be carried out and risk assessment can be conducted with Neural Network Residual Simulations (NNRS). Finally, the results of the ANNEX models were compared to those of ordinary (co)kriging and (co)kriging with an external drift. It was shown that the ANNEX models performed better than traditional geostatistical algorithms when the relationship between the variable of interest and the auxiliary predictor was not linear. References Kanevski, M. and Maignan, M. (2004). Analysis and Modelling of Spatial Environmental Data. Lausanne: EPFL Press.
Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D
2014-03-25
A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
Enhancing PTFs with remotely sensed data for multi-scale soil water retention estimation
NASA Astrophysics Data System (ADS)
Jana, Raghavendra B.; Mohanty, Binayak P.
2011-03-01
SummaryUse of remotely sensed data products in the earth science and water resources fields is growing due to increasingly easy availability of the data. Traditionally, pedotransfer functions (PTFs) employed for soil hydraulic parameter estimation from other easily available data have used basic soil texture and structure information as inputs. Inclusion of surrogate/supplementary data such as topography and vegetation information has shown some improvement in the PTF's ability to estimate more accurate soil hydraulic parameters. Artificial neural networks (ANNs) are a popular tool for PTF development, and are usually applied across matching spatial scales of inputs and outputs. However, different hydrologic, hydro-climatic, and contaminant transport models require input data at different scales, all of which may not be easily available from existing databases. In such a scenario, it becomes necessary to scale the soil hydraulic parameter values estimated by PTFs to suit the model requirements. Also, uncertainties in the predictions need to be quantified to enable users to gauge the suitability of a particular dataset in their applications. Bayesian Neural Networks (BNNs) inherently provide uncertainty estimates for their outputs due to their utilization of Markov Chain Monte Carlo (MCMC) techniques. In this paper, we present a PTF methodology to estimate soil water retention characteristics built on a Bayesian framework for training of neural networks and utilizing several in situ and remotely sensed datasets jointly. The BNN is also applied across spatial scales to provide fine scale outputs when trained with coarse scale data. Our training data inputs include ground/remotely sensed soil texture, bulk density, elevation, and Leaf Area Index (LAI) at 1 km resolutions, while similar properties measured at a point scale are used as fine scale inputs. The methodology was tested at two different hydro-climatic regions. We also tested the effect of varying the support scale of the training data for the BNNs by sequentially aggregating finer resolution training data to coarser resolutions, and the applicability of the technique to upscaling problems. The BNN outputs are corrected for bias using a non-linear CDF-matching technique. Final results show good promise of the suitability of this Bayesian Neural Network approach for soil hydraulic parameter estimation across spatial scales using ground-, air-, or space-based remotely sensed geophysical parameters. Inclusion of remotely sensed data such as elevation and LAI in addition to in situ soil physical properties improved the estimation capabilities of the BNN-based PTF in certain conditions.
PSO-MISMO modeling strategy for multistep-ahead time series prediction.
Bao, Yukun; Xiong, Tao; Hu, Zhongyi
2014-05-01
Multistep-ahead time series prediction is one of the most challenging research topics in the field of time series modeling and prediction, and is continually under research. Recently, the multiple-input several multiple-outputs (MISMO) modeling strategy has been proposed as a promising alternative for multistep-ahead time series prediction, exhibiting advantages compared with the two currently dominating strategies, the iterated and the direct strategies. Built on the established MISMO strategy, this paper proposes a particle swarm optimization (PSO)-based MISMO modeling strategy, which is capable of determining the number of sub-models in a self-adaptive mode, with varying prediction horizons. Rather than deriving crisp divides with equal-size s prediction horizons from the established MISMO, the proposed PSO-MISMO strategy, implemented with neural networks, employs a heuristic to create flexible divides with varying sizes of prediction horizons and to generate corresponding sub-models, providing considerable flexibility in model construction, which has been validated with simulated and real datasets.
NASA Astrophysics Data System (ADS)
Arendt, A. A.; Houser, P.; Kapnick, S. B.; Kargel, J. S.; Kirschbaum, D.; Kumar, S.; Margulis, S. A.; McDonald, K. C.; Osmanoglu, B.; Painter, T. H.; Raup, B. H.; Rupper, S.; Tsay, S. C.; Velicogna, I.
2017-12-01
The High Mountain Asia Team (HiMAT) is an assembly of 13 research groups funded by NASA to improve understanding of cryospheric and hydrological changes in High Mountain Asia (HMA). Our project goals are to quantify historical and future variability in weather and climate over the HMA, partition the components of the water budget across HMA watersheds, explore physical processes driving changes, and predict couplings and feedbacks between physical and human systems through assessment of hazards and downstream impacts. These objectives are being addressed through analysis of remote sensing datasets combined with modeling and assimilation methods to enable data integration across multiple spatial and temporal scales. Our work to date has focused on developing improved high resolution precipitation, snow cover and snow water equivalence products through a variety of statistical uncertainty analysis, dynamical downscaling and assimilation techniques. These and other high resolution climate products are being used as input and validation for an assembly of land surface and General Circulation Models. To quantify glacier change in the region we have calculated multidecadal mass balances of a subset of HMA glaciers by comparing commercial satellite imagery with earlier elevation datasets. HiMAT is using these tools and datasets to explore the impact of atmospheric aerosols and surface impurities on surface energy exchanges, to determine drivers of glacier and snowpack melt rates, and to improve our capacity to predict future hydrological variability. Outputs from the climate and land surface assessments are being combined with landslide and glacier lake inventories to refine our ability to predict hazards in the region. Economic valuation models are also being used to assess impacts on water resources and hydropower. Field data of atmospheric aerosol, radiative flux and glacier lake conditions are being collected to provide ground validation for models and remote sensing products. In this presentation we will discuss initial results and outline plans for a scheduled release of our datasets and findings to the broader community. We will also describe our methods for cross-team collaboration through the adoption of cloud computing and data integration tools.
Joint Labeling Of Multiple Regions of Interest (Rois) By Enhanced Auto Context Models.
Kim, Minjeong; Wu, Guorong; Guo, Yanrong; Shen, Dinggang
2015-04-01
Accurate segmentation of a set of regions of interest (ROIs) in the brain images is a key step in many neuroscience studies. Due to the complexity of image patterns, many learning-based segmentation methods have been proposed, including auto context model (ACM) that can capture high-level contextual information for guiding segmentation. However, since current ACM can only handle one ROI at a time, neighboring ROIs have to be labeled separately with different ACMs that are trained independently without communicating each other. To address this, we enhance the current single-ROI learning ACM to multi-ROI learning ACM for joint labeling of multiple neighboring ROIs (called e ACM). First, we extend current independently-trained single-ROI ACMs to a set of jointly-trained cross-ROI ACMs, by simultaneous training of ACMs for all spatially-connected ROIs to let them to share their respective intermediate outputs for coordinated labeling of each image point. Then, the context features in each ACM can capture the cross-ROI dependence information from the outputs of other ACMs that are designed for neighboring ROIs. Second, we upgrade the output labeling map of each ACM with the multi-scale representation, thus both local and global context information can be effectively used to increase the robustness in characterizing geometric relationship among neighboring ROIs. Third, we integrate ACM into a multi-atlases segmentation paradigm, for encompassing high variations among subjects. Experiments on LONI LPBA40 dataset show much better performance by our e ACM, compared to the conventional ACM.
NASA Astrophysics Data System (ADS)
Liu, L.; Du, L.; Liao, Y.
2017-12-01
Based on the ensemble hindcast dataset of CSM1.1m by NCC, CMA, Bayesian merging models and a two-step statistical model are developed and employed to predict monthly grid/station precipitation in the Huaihe River China during summer at the lead-time of 1 to 3 months. The hindcast datasets span a period of 1991 to 2014. The skill of the two models is evaluated using area under the ROC curve (AUC) in a leave-one-out cross-validation framework, and is compared to the skill of CSM1.1m. CSM1.1m has highest skill for summer precipitation from April while lowest from May, and has highest skill for precipitation in June but lowest for precipitation in July. Compared with raw outputs of climate models, some schemes of the two approaches have higher skill for the prediction from March and May, but almost schemes have lower skill for prediction from April. Compared to two-step approach, one sampling scheme of Bayesian merging approach has higher skill for the prediction from March, but has lower skill from May. The results suggest that there is potential to apply the two statistical models for monthly precipitation forecast in summer from March and from May over Huaihe River basin, but is potential to apply CSM1.1m forecast from April. Finally, the summer runoff during 1991 to 2014 is simulated based on one hydrological model using the climate hindcast of CSM1.1m and the two statistical models.
Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline
2014-01-01
Background: Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. Methods: We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. Results: The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Conclusion: Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel. PMID:24795875
Design of double fuzzy clustering-driven context neural networks.
Kim, Eun-Hu; Oh, Sung-Kwun; Pedrycz, Witold
2018-08-01
In this study, we introduce a novel category of double fuzzy clustering-driven context neural networks (DFCCNNs). The study is focused on the development of advanced design methodologies for redesigning the structure of conventional fuzzy clustering-based neural networks. The conventional fuzzy clustering-based neural networks typically focus on dividing the input space into several local spaces (implied by clusters). In contrast, the proposed DFCCNNs take into account two distinct local spaces called context and cluster spaces, respectively. Cluster space refers to the local space positioned in the input space whereas context space concerns a local space formed in the output space. Through partitioning the output space into several local spaces, each context space is used as the desired (target) local output to construct local models. To complete this, the proposed network includes a new context layer for reasoning about context space in the output space. In this sense, Fuzzy C-Means (FCM) clustering is useful to form local spaces in both input and output spaces. The first one is used in order to form clusters and train weights positioned between the input and hidden layer, whereas the other one is applied to the output space to form context spaces. The key features of the proposed DFCCNNs can be enumerated as follows: (i) the parameters between the input layer and hidden layer are built through FCM clustering. The connections (weights) are specified as constant terms being in fact the centers of the clusters. The membership functions (represented through the partition matrix) produced by the FCM are used as activation functions located at the hidden layer of the "conventional" neural networks. (ii) Following the hidden layer, a context layer is formed to approximate the context space of the output variable and each node in context layer means individual local model. The outputs of the context layer are specified as a combination of both weights formed as linear function and the outputs of the hidden layer. The weights are updated using the least square estimation (LSE)-based method. (iii) At the output layer, the outputs of context layer are decoded to produce the corresponding numeric output. At this time, the weighted average is used and the weights are also adjusted with the use of the LSE scheme. From the viewpoint of performance improvement, the proposed design methodologies are discussed and experimented with the aid of benchmark machine learning datasets. Through the experiments, it is shown that the generalization abilities of the proposed DFCCNNs are better than those of the conventional FCNNs reported in the literature. Copyright © 2018 Elsevier Ltd. All rights reserved.
Multi-output decision trees for lesion segmentation in multiple sclerosis
NASA Astrophysics Data System (ADS)
Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.
2015-03-01
Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.
NASA Technical Reports Server (NTRS)
Ott, L.; Putman, B.; Collatz, J.; Gregg, W.
2012-01-01
Column CO2 observations from current and future remote sensing missions represent a major advancement in our understanding of the carbon cycle and are expected to help constrain source and sink distributions. However, data assimilation and inversion methods are challenged by the difference in scale of models and observations. OCO-2 footprints represent an area of several square kilometers while NASA s future ASCENDS lidar mission is likely to have an even smaller footprint. In contrast, the resolution of models used in global inversions are typically hundreds of kilometers wide and often cover areas that include combinations of land, ocean and coastal areas and areas of significant topographic, land cover, and population density variations. To improve understanding of scales of atmospheric CO2 variability and representativeness of satellite observations, we will present results from a global, 10-km simulation of meteorology and atmospheric CO2 distributions performed using NASA s GEOS-5 general circulation model. This resolution, typical of mesoscale atmospheric models, represents an order of magnitude increase in resolution over typical global simulations of atmospheric composition allowing new insight into small scale CO2 variations across a wide range of surface flux and meteorological conditions. The simulation includes high resolution flux datasets provided by NASA s Carbon Monitoring System Flux Pilot Project at half degree resolution that have been down-scaled to 10-km using remote sensing datasets. Probability distribution functions are calculated over larger areas more typical of global models (100-400 km) to characterize subgrid-scale variability in these models. Particular emphasis is placed on coastal regions and regions containing megacities and fires to evaluate the ability of coarse resolution models to represent these small scale features. Additionally, model output are sampled using averaging kernels characteristic of OCO-2 and ASCENDS measurement concepts to create realistic pseudo-datasets. Pseudo-data are averaged over coarse model grid cell areas to better understand the ability of measurements to characterize CO2 distributions and spatial gradients on both short (daily to weekly) and long (monthly to seasonal) time scales
Using open source data for flood risk mapping and management in Brazil
NASA Astrophysics Data System (ADS)
Whitley, Alison; Malloy, James; Chirouze, Manuel
2013-04-01
Whitley, A., Malloy, J. and Chirouze, M. Worldwide the frequency and severity of major natural disasters, particularly flooding, has increased. Concurrently, countries such as Brazil are experiencing rapid socio-economic development with growing and increasingly concentrated populations, particularly in urban areas. Hence, it is unsurprising that Brazil has experienced a number of major floods in the past 30 years such as the January 2011 floods which killed 900 people and resulted in significant economic losses of approximately 1 billion US dollars. Understanding, mitigating against and even preventing flood risk is high priority. There is a demand for flood models in many developing economies worldwide for a range of uses including risk management, emergency planning and provision of insurance solutions. However, developing them can be expensive. With an increasing supply of freely-available, open source data, the costs can be significantly reduced, making the tools required for natural hazard risk assessment more accessible. By presenting a flood model developed for eight urban areas of Brazil as part of a collaboration between JBA Risk Management and Guy Carpenter, we explore the value of open source data and demonstrate its usability in a business context within the insurance industry. We begin by detailing the open source data available and compare its suitability to commercially-available equivalents for datasets including digital terrain models and river gauge records. We present flood simulation outputs in order to demonstrate the impact of the choice of dataset on the results obtained and its use in a business context. Via use of the 2D hydraulic model JFlow+, our examples also show how advanced modelling techniques can be used on relatively crude datasets to obtain robust and good quality results. In combination with accessible, standard specification GPU technology and open source data, use of JFlow+ has enabled us to produce large-scale hazard maps suitable for business use and emergency planning such as those we show for Brazil.
One-way coupling of an atmospheric and a hydrologic model in Colorado
Hay, L.E.; Clark, M.P.; Pagowski, M.; Leavesley, G.H.; Gutowski, W.J.
2006-01-01
This paper examines the accuracy of high-resolution nested mesoscale model simulations of surface climate. The nesting capabilities of the atmospheric fifth-generation Pennsylvania State University (PSU)-National Center for Atmospheric Research (NCAR) Mesoscale Model (MM5) were used to create high-resolution, 5-yr climate simulations (from 1 October 1994 through 30 September 1999), starting with a coarse nest of 20 km for the western United States. During this 5-yr period, two finer-resolution nests (5 and 1.7 km) were run over the Yampa River basin in northwestern Colorado. Raw and bias-corrected daily precipitation and maximum and minimum temperature time series from the three MM5 nests were used as input to the U.S. Geological Survey's distributed hydrologic model [the Precipitation Runoff Modeling System (PRMS)] and were compared with PRMS results using measured climate station data. The distributed capabilities of PRMS were provided by partitioning the Yampa River basin into hydrologic response units (HRUs). In addition to the classic polygon method of HRU definition, HRUs for PRMS were defined based on the three MM5 nests. This resulted in 16 datasets being tested using PRMS. The input datasets were derived using measured station data and raw and bias-corrected MM5 20-, 5-, and 1.7-km output distributed to 1) polygon HRUs and 2) 20-, 5-, and 1.7-km-gridded HRUs, respectively. Each dataset was calibrated independently, using a multiobjective, stepwise automated procedure. Final results showed a general increase in the accuracy of simulated runoff with an increase in HRU resolution. In all steps of the calibration procedure, the station-based simulations of runoff showed higher accuracy than the MM5-based simulations, although the accuracy of MM5 simulations was close to station data for the high-resolution nests. Further work is warranted in identifying the causes of the biases in MM5 local climate simulations and developing methods to remove them. ?? 2006 American Meteorological Society.
Kohlmayer, Florian; Prasser, Fabian; Kuhn, Klaus A
2015-12-01
With the ARX data anonymization tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility according to a given measure. To achieve scalability, existing optimal anonymization algorithms exclude parts of the search space by predicting the outcome of data transformations regarding privacy and utility without explicitly applying them to the input dataset. These optimizations cannot be used if data is transformed with generalization and suppression. As optimal data utility and scalability are important for anonymizing biomedical data, we had to develop a novel method. In this article, we first confirm experimentally that combining generalization with suppression significantly increases data utility. Next, we proof that, within this coding model, the outcome of data transformations regarding privacy and utility cannot be predicted. As a consequence, existing algorithms fail to deliver optimal data utility. We confirm this finding experimentally. The limitation of previous work can be overcome at the cost of increased computational complexity. However, scalability is important for anonymizing data with user feedback. Consequently, we identify properties of datasets that may be predicted in our context and propose a novel and efficient algorithm. Finally, we evaluate our solution with multiple datasets and privacy models. This work presents the first thorough investigation of which properties of datasets can be predicted when data is anonymized with generalization and suppression. Our novel approach adopts existing optimization strategies to our context and combines different search methods. The experiments show that our method is able to efficiently solve a broad spectrum of anonymization problems. Our work shows that implementing syntactic privacy models is challenging and that existing algorithms are not well suited for anonymizing data with transformation models which are more complex than generalization alone. As such models have been recommended for use in the biomedical domain, our results are of general relevance for de-identifying structured biomedical data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Identifying human influences on atmospheric temperature
Santer, Benjamin D.; Painter, Jeffrey F.; Mears, Carl A.; Doutriaux, Charles; Caldwell, Peter; Arblaster, Julie M.; Cameron-Smith, Philip J.; Gillett, Nathan P.; Gleckler, Peter J.; Lanzante, John; Perlwitz, Judith; Solomon, Susan; Stott, Peter A.; Taylor, Karl E.; Terray, Laurent; Thorne, Peter W.; Wehner, Michael F.; Wentz, Frank J.; Wigley, Tom M. L.; Wilcox, Laura J.; Zou, Cheng-Zhi
2013-01-01
We perform a multimodel detection and attribution study with climate model simulation output and satellite-based measurements of tropospheric and stratospheric temperature change. We use simulation output from 20 climate models participating in phase 5 of the Coupled Model Intercomparison Project. This multimodel archive provides estimates of the signal pattern in response to combined anthropogenic and natural external forcing (the fingerprint) and the noise of internally generated variability. Using these estimates, we calculate signal-to-noise (S/N) ratios to quantify the strength of the fingerprint in the observations relative to fingerprint strength in natural climate noise. For changes in lower stratospheric temperature between 1979 and 2011, S/N ratios vary from 26 to 36, depending on the choice of observational dataset. In the lower troposphere, the fingerprint strength in observations is smaller, but S/N ratios are still significant at the 1% level or better, and range from three to eight. We find no evidence that these ratios are spuriously inflated by model variability errors. After removing all global mean signals, model fingerprints remain identifiable in 70% of the tests involving tropospheric temperature changes. Despite such agreement in the large-scale features of model and observed geographical patterns of atmospheric temperature change, most models do not replicate the size of the observed changes. On average, the models analyzed underestimate the observed cooling of the lower stratosphere and overestimate the warming of the troposphere. Although the precise causes of such differences are unclear, model biases in lower stratospheric temperature trends are likely to be reduced by more realistic treatment of stratospheric ozone depletion and volcanic aerosol forcing. PMID:23197824
Sediment fingerprinting experiments to test the sensitivity of multivariate mixing models
NASA Astrophysics Data System (ADS)
Gaspar, Leticia; Blake, Will; Smith, Hugh; Navas, Ana
2014-05-01
Sediment fingerprinting techniques provide insight into the dynamics of sediment transfer processes and support for catchment management decisions. As questions being asked of fingerprinting datasets become increasingly complex, validation of model output and sensitivity tests are increasingly important. This study adopts an experimental approach to explore the validity and sensitivity of mixing model outputs for materials with contrasting geochemical and particle size composition. The experiments reported here focused on (i) the sensitivity of model output to different fingerprint selection procedures and (ii) the influence of source material particle size distributions on model output. Five soils with significantly different geochemistry, soil organic matter and particle size distributions were selected as experimental source materials. A total of twelve sediment mixtures were prepared in the laboratory by combining different quantified proportions of the < 63 µm fraction of the five source soils i.e. assuming no fluvial sorting of the mixture. The geochemistry of all source and mixture samples (5 source soils and 12 mixed soils) were analysed using X-ray fluorescence (XRF). Tracer properties were selected from 18 elements for which mass concentrations were found to be significantly different between sources. Sets of fingerprint properties that discriminate target sources were selected using a range of different independent statistical approaches (e.g. Kruskal-Wallis test, Discriminant Function Analysis (DFA), Principal Component Analysis (PCA), or correlation matrix). Summary results for the use of the mixing model with the different sets of fingerprint properties for the twelve mixed soils were reasonably consistent with the initial mixing percentages initially known. Given the experimental nature of the work and dry mixing of materials, geochemical conservative behavior was assumed for all elements, even for those that might be disregarded in aquatic systems (e.g. P). In general, the best fits between actual and modeled proportions were found using a set of nine tracer properties (Sr, Rb, Fe, Ti, Ca, Al, P, Si, K, Si) that were derived using DFA coupled with a multivariate stepwise algorithm, with errors between real and estimated value that did not exceed 6.7 % and values of GOF above 94.5 %. The second set of experiments aimed to explore the sensitivity of model output to variability in the particle size of source materials assuming that a degree of fluvial sorting of the resulting mixture took place. Most particle size correction procedures assume grain size affects are consistent across sources and tracer properties which is not always the case. Consequently, the < 40 µm fraction of selected soil mixtures was analysed to simulate the effect of selective fluvial transport of finer particles and the results were compared to those for source materials. Preliminary findings from this experiment demonstrate the sensitivity of the numerical mixing model outputs to different particle size distributions of source material and the variable impact of fluvial sorting on end member signatures used in mixing models. The results suggest that particle size correction procedures require careful scrutiny in the context of variable source characteristics.
Impact of Climate Change on Water Resources in the Guadalquivir River Basin
NASA Astrophysics Data System (ADS)
Yeste Donaire, P.; García-Valdecasas-Ojeda, M.; Góngora García, T. M.; Gámiz-Fortis, S. R.; Castro-Diez, Y.; Esteban-Parra, M. J.
2017-12-01
Climate change has lead to a decrease of precipitation and an increase of temperature in the Mediterranean Basin during the last fifty years. These changes will be more intense over the course of the 21thcentury according to global climate projections. As a consequence, water resources are expected to decrease, particularly in the Guadalquivir River Basin. This study focuses on the hydrological response of the Guadalquivir River Basin to the climate change. For this end, firstly, the implementation of the Variable Infiltration Capacity (VIC) model in the Basin was carried out. The VIC model was calibrated with a dataset of daily precipitation, temperature and streamflow for the period 1990-2000. Precipitation and temperature data were extracted from SPAIN02, a dataset that covers the Peninsular Spain at 0.11º of spatial resolution. Streamflow data were gathered for a representative subset of gauging stations in the basin. These data were provided by the Spanish Center for Public Work Experimentation and Study (CEDEX). Subsequently, the VIC model was validated for the period 2000-2005 in order to verify that the model outputs fit well with the observational data. After the validation of the VIC model for present climate, secondly, the effect of climate change on the Guadalquivir River Basin will be analyzed by developing several simulations of the streamflow for future climate. Precipitation and temperature data will be obtained in this case from future projections coming from high resolution (at 0.088º) simulations carried out with the Weather Research and Forecasting (WRF) model for the Iberian Peninsula. These last simulations will be driven under two different Representative Concentration Pathway (RCP) scenarios, RCP 4.5 and RCP 8.5 for the periods 2021-50 and 2071-2100. The first results of this work show that the VIC model outputs are in good agreement with the observed streamflow for both the calibration and validation periods. In the context of climate change, a generalized decrease in surface and subsurface water resources is expected in the Guadalquivir River Basin. All these results will be of interest for water policy makers and practitioners in the next decades. ACKNOWLEDGEMENTS: This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía) and CGL2013-48539-R (MINECO-Spain, FEDER).
NASA Astrophysics Data System (ADS)
Perna, L.; Pezzopane, M.; Pietrella, M.; Zolesi, B.; Cander, L. R.
2017-09-01
The SIRM model proposed by Zolesi et al. (1993, 1996) is an ionospheric regional model for predicting the vertical-sounding characteristics that has been frequently used in developing ionospheric web prediction services (Zolesi and Cander, 2014). Recently the model and its outputs were implemented in the framework of two European projects: DIAS (DIgital upper Atmosphere Server; http://www.iono.noa.gr/DIAS/ (Belehaki et al., 2005, 2015) and ESPAS (Near-Earth Space Data Infrastructure for e-Science; http://www.espas-fp7.eu/) (Belehaki et al., 2016). In this paper an updated version of the SIRM model, called SIRMPol, is described and corresponding outputs in terms of the F2-layer critical frequency (foF2) are compared with values recorded at the mid-latitude station of Rome (41.8°N, 12.5°E), for extremely high (year 1958) and low (years 2008 and 2009) solar activity. The main novelties introduced in the SIRMPol model are: (1) an extension of the Rome ionosonde input dataset that, besides data from 1957 to 1987, includes also data from 1988 to 2007; (2) the use of second order polynomial regressions, instead of linear ones, to fit the relation foF2 vs. solar activity index R12; (3) the use of polynomial relations, instead of linear ones, to fit the relations A0 vs. R12, An vs. R12 and Yn vs. R12, where A0, An and Yn are the coefficients of the Fourier analysis performed by the SIRM model to reproduce the values calculated by using relations in (2). The obtained results show that SIRMPol outputs are better than those of the SIRM model. As the SIRMPol model represents only a partial updating of the SIRM model based on inputs from only Rome ionosonde data, it can be considered a particular case of a single-station model. Nevertheless, the development of the SIRMPol model allowed getting some useful guidelines for a future complete and more accurate updating of the SIRM model, of which both DIAS and ESPAS could benefit.
A Global Landslide Nowcasting System using Remotely Sensed Information
NASA Astrophysics Data System (ADS)
Kirschbaum, Dalia; Stanely, Thomas
2017-04-01
A global Landslide Hazard Assessment model for Situational Awareness (LHASA) has been developed that combines susceptibility information with satellite-based precipitation to provide an indication of potential landslide activity at the global scale every 30 minutes. This model utilizes a 1-km global susceptibility map derived from information on slope, geology, road networks, fault zones, and forest loss. A multi-satellite dataset from the Global Precipitation Measurement (GPM) mission is used to identify the current and antecedent rainfall conditions from the past 7 days. When both rainfall and susceptibility are high, a "nowcast" is issued to indicate areas where a landslide may be likely. The global LHASA model is currently being run in near real-time every 30 minutes and the outputs are available in several different formats at https://pmm.nasa.gov/precip-apps. This talk outlines the LHASA system, discusses the performance metrics and potential applications of the LHASA system.
A Stochastic Climate Generator for Agriculture in Southeast Asian Domains
NASA Astrophysics Data System (ADS)
Greene, A. M.; Allis, E. C.
2014-12-01
We extend a previously-described method for generating future climate scenarios, suitable for driving agricultural models, to selected domains in Lao PDR, Bangladesh and Indonesia. There are notable differences in climatology among the study regions, most importantly the inverse seasonal relationship of southeast Asian and Australian monsoons. These differences necessitate a partially-differentiated modeling approach, utilizing common features for better estimation while allowing independent modeling of divergent attributes. The method attempts to constrain uncertainty due to both anthropogenic and natural influences, providing a measure of how these effects may combine during specified future decades. Seasonal climate fields are downscaled to the daily time step by resampling the AgMERRA dataset, providing a full suite of agriculturally relevant variables and enabling the propagation of climate uncertainty to agricultural outputs. The role of this research in a broader project, conducted under the auspices of the International Fund for Agricultural Development (IFAD), is discussed.
NASA Astrophysics Data System (ADS)
Tellman, B.; Sullivan, J.; Kettner, A.; Brakenridge, G. R.; Slayback, D. A.; Kuhn, C.; Doyle, C.
2016-12-01
There is an increasing need to understand flood vulnerability as the societal and economic effects of flooding increases. Risk models from insurance companies and flood models from hydrologists must be calibrated based on flood observations in order to make future predictions that can improve planning and help societies reduce future disasters. Specifically, to improve these models both traditional methods of flood prediction from physically based models as well as data-driven techniques, such as machine learning, require spatial flood observation to validate model outputs and quantify uncertainty. A key dataset that is missing for flood model validation is a global historical geo-database of flood event extents. Currently, the most advanced database of historical flood extent is hosted and maintained at the Dartmouth Flood Observatory (DFO) that has catalogued 4320 floods (1985-2015) but has only mapped 5% of these floods. We are addressing this data gap by mapping the inventory of floods in the DFO database to create a first-of- its-kind, comprehensive, global and historical geospatial database of flood events. To do so, we combine water detection algorithms on MODIS and Landsat 5,7 and 8 imagery in Google Earth Engine to map discrete flood events. The created database will be available in the Earth Engine Catalogue for download by country, region, or time period. This dataset can be leveraged for new data-driven hydrologic modeling using machine learning algorithms in Earth Engine's highly parallelized computing environment, and we will show examples for New York and Senegal.
Climatology of convective showers dynamics in a convection-permitting model
NASA Astrophysics Data System (ADS)
Brisson, Erwan; Brendel, Christoph; Ahrens, Bodo
2017-04-01
Convection-permitting simulations have proven their usefulness in improving both the representation of convective rain and the uncertainty range of climate projections. However, most studies have focused on temporal scales greater or equal to convection cell lifetime. A large knowledge gap remains on the model's performance in representing the temporal dynamic of convective showers and how could this temporal dynamic be altered in a warmer climate. In this study, we proposed to fill this gap by analyzing 5-minute convection-permitting model (CPM) outputs. In total, more than 1200 one-day cases are simulated at the resolution of 0.01° using the regional climate model COSMO-CLM over central Europe. The analysis follows a Lagrangian approach and consists of tracking showers characterized by five-minute intensities greater than 20 mm/hour. The different features of these showers (e.g., temporal evolution, horizontal speed, lifetime) are investigated. These features as modeled by an ERA-Interim forced simulation are evaluated using a radar dataset for the period 2004-2010. The model shows good performance in representing most features observed in the radar dataset. Besides, the observed relation between the temporal evolution of precipitation and temperature are well reproduced by the CPM. In a second modeling experiment, the impact of climate change on convective cell features are analyzed based on an EC-Earth RCP8.5 forced simulation for the period 2071-2100. First results show only minor changes in the temporal structure and size of showers. The increase in convective precipitation found in previous studies seems to be mainly due to an increase in the number of convective cells.
NASA Astrophysics Data System (ADS)
Williams, C. J. R.; Kniveton, D. R.; Layberry, R.
2009-04-01
It is increasingly accepted that that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of ability of a state of the art climate model to simulate climate at daily timescales is carried out using satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA). This dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. The ability of a climate model to simulate current climate provides some indication of how much confidence can be applied to its future predictions. In this paper, simulations of current climate from the UK Meteorological Office Hadley Centre's climate model, in both regional and global mode, are firstly compared to the MIRA dataset at daily timescales. This concentrates primarily on the ability of the model to simulate the spatial and temporal patterns of rainfall variability over southern Africa. Secondly, the ability of the model to reproduce daily rainfall extremes will be assessed, again by a comparison with extremes from the MIRA dataset. The paper will conclude by discussing the user needs of satellite rainfall retrievals from a climate change modelling prospective.
Assessing microscope image focus quality with deep learning.
Yang, Samuel J; Berndl, Marc; Michael Ando, D; Barch, Mariya; Narayanaswamy, Arunachalam; Christiansen, Eric; Hoyer, Stephan; Roat, Chris; Hung, Jane; Rueden, Curtis T; Shankar, Asim; Finkbeiner, Steven; Nelson, Philip
2018-03-15
Large image datasets acquired on automated microscopes typically have some fraction of low quality, out-of-focus images, despite the use of hardware autofocus systems. Identification of these images using automated image analysis with high accuracy is important for obtaining a clean, unbiased image dataset. Complicating this task is the fact that image focus quality is only well-defined in foreground regions of images, and as a result, most previous approaches only enable a computation of the relative difference in quality between two or more images, rather than an absolute measure of quality. We present a deep neural network model capable of predicting an absolute measure of image focus on a single image in isolation, without any user-specified parameters. The model operates at the image-patch level, and also outputs a measure of prediction certainty, enabling interpretable predictions. The model was trained on only 384 in-focus Hoechst (nuclei) stain images of U2OS cells, which were synthetically defocused to one of 11 absolute defocus levels during training. The trained model can generalize on previously unseen real Hoechst stain images, identifying the absolute image focus to within one defocus level (approximately 3 pixel blur diameter difference) with 95% accuracy. On a simpler binary in/out-of-focus classification task, the trained model outperforms previous approaches on both Hoechst and Phalloidin (actin) stain images (F-scores of 0.89 and 0.86, respectively over 0.84 and 0.83), despite only having been presented Hoechst stain images during training. Lastly, we observe qualitatively that the model generalizes to two additional stains, Hoechst and Tubulin, of an unseen cell type (Human MCF-7) acquired on a different instrument. Our deep neural network enables classification of out-of-focus microscope images with both higher accuracy and greater precision than previous approaches via interpretable patch-level focus and certainty predictions. The use of synthetically defocused images precludes the need for a manually annotated training dataset. The model also generalizes to different image and cell types. The framework for model training and image prediction is available as a free software library and the pre-trained model is available for immediate use in Fiji (ImageJ) and CellProfiler.
Diouf, Ibrahima; Rodriguez-Fonseca, Belen; Deme, Abdoulaye; Caminade, Cyril; Morse, Andrew P; Cisse, Moustapha; Sy, Ibrahima; Dia, Ibrahima; Ermert, Volker; Ndione, Jacques-André; Gaye, Amadou Thierno
2017-09-25
The analysis of the spatial and temporal variability of climate parameters is crucial to study the impact of climate-sensitive vector-borne diseases such as malaria. The use of malaria models is an alternative way of producing potential malaria historical data for Senegal due to the lack of reliable observations for malaria outbreaks over a long time period. Consequently, here we use the Liverpool Malaria Model (LMM), driven by different climatic datasets, in order to study and validate simulated malaria parameters over Senegal. The findings confirm that the risk of malaria transmission is mainly linked to climate variables such as rainfall and temperature as well as specific landscape characteristics. For the whole of Senegal, a lag of two months is generally observed between the peak of rainfall in August and the maximum number of reported malaria cases in October. The malaria transmission season usually takes place from September to November, corresponding to the second peak of temperature occurring in October. Observed malaria data from the Programme National de Lutte contre le Paludisme (PNLP, National Malaria control Programme in Senegal) and outputs from the meteorological data used in this study were compared. The malaria model outputs present some consistencies with observed malaria dynamics over Senegal, and further allow the exploration of simulations performed with reanalysis data sets over a longer time period. The simulated malaria risk significantly decreased during the 1970s and 1980s over Senegal. This result is consistent with the observed decrease of malaria vectors and malaria cases reported by field entomologists and clinicians in the literature. The main differences between model outputs and observations regard amplitude, but can be related not only to reanalysis deficiencies but also to other environmental and socio-economic factors that are not included in this mechanistic malaria model framework. The present study can be considered as a validation of the reliability of reanalysis to be used as inputs for the calculation of malaria parameters in the Sahel using dynamical malaria models.
NASA Astrophysics Data System (ADS)
Beriro, D. J.; Abrahart, R. J.; Nathanail, C. P.
2012-04-01
Data-driven modelling is most commonly used to develop predictive models that will simulate natural processes. This paper, in contrast, uses Gene Expression Programming (GEP) to construct two alternative models of different pan evaporation estimations by means of symbolic regression: a simulator, a model of a real-world process developed on observed records, and an emulator, an imitator of some other model developed on predicted outputs calculated by that source model. The solutions are compared and contrasted for the purposes of determining whether any substantial differences exist between either option. This analysis will address recent arguments over the impact of using downloaded hydrological modelling datasets originating from different initial sources i.e. observed or calculated. These differences can be easily be overlooked by modellers, resulting in a model of a model developed on estimations derived from deterministic empirical equations and producing exceptionally high goodness-of-fit. This paper uses different lines-of-evidence to evaluate model output and in so doing paves the way for a new protocol in machine learning applications. Transparent modelling tools such as symbolic regression offer huge potential for explaining stochastic processes, however, the basic tenets of data quality and recourse to first principles with regard to problem understanding should not be trivialised. GEP is found to be an effective tool for the prediction of observed and calculated pan evaporation, with results supported by an understanding of the records, and of the natural processes concerned, evaluated using one-at-a-time response function sensitivity analysis. The results show that both architectures and response functions are very similar, implying that previously observed differences in goodness-of-fit can be explained by whether models are applied to observed or calculated data.
Leaf Area Index Estimation Using Chinese GF-1 Wide Field View Data in an Agriculture Region.
Wei, Xiangqin; Gu, Xingfa; Meng, Qingyan; Yu, Tao; Zhou, Xiang; Wei, Zheng; Jia, Kun; Wang, Chunmei
2017-07-08
Leaf area index (LAI) is an important vegetation parameter that characterizes leaf density and canopy structure, and plays an important role in global change study, land surface process simulation and agriculture monitoring. The wide field view (WFV) sensor on board the Chinese GF-1 satellite can acquire multi-spectral data with decametric spatial resolution, high temporal resolution and wide coverage, which are valuable data sources for dynamic monitoring of LAI. Therefore, an automatic LAI estimation algorithm for GF-1 WFV data was developed based on the radiative transfer model and LAI estimation accuracy of the developed algorithm was assessed in an agriculture region with maize as the dominated crop type. The radiative transfer model was firstly used to simulate the physical relationship between canopy reflectance and LAI under different soil and vegetation conditions, and then the training sample dataset was formed. Then, neural networks (NNs) were used to develop the LAI estimation algorithm using the training sample dataset. Green, red and near-infrared band reflectances of GF-1 WFV data were used as the input variables of the NNs, as well as the corresponding LAI was the output variable. The validation results using field LAI measurements in the agriculture region indicated that the LAI estimation algorithm could achieve satisfactory results (such as R² = 0.818, RMSE = 0.50). In addition, the developed LAI estimation algorithm had potential to operationally generate LAI datasets using GF-1 WFV land surface reflectance data, which could provide high spatial and temporal resolution LAI data for agriculture, ecosystem and environmental management researches.
DOMstudio: an integrated workflow for Digital Outcrop Model reconstruction and interpretation
NASA Astrophysics Data System (ADS)
Bistacchi, Andrea
2015-04-01
Different Remote Sensing technologies, including photogrammetry and LIDAR, allow collecting 3D dataset that can be used to create 3D digital representations of outcrop surfaces, called Digital Outcrop Models (DOM), or sometimes Virtual Outcrop Models (VOM). Irrespective of the Remote Sensing technique used, DOMs can be represented either by photorealistic point clouds (PC-DOM) or textured surfaces (TS-DOM). The first are datasets composed of millions of points with XYZ coordinates and RGB colour, whilst the latter are triangulated surfaces onto which images of the outcrop have been mapped or "textured" (applying a tech-nology originally developed for movies and videogames). Here we present a workflow that allows exploiting in an integrated and efficient, yet flexible way, both kinds of dataset: PC-DOMs and TS-DOMs. The workflow is composed of three main steps: (1) data collection and processing, (2) interpretation, and (3) modelling. Data collection can be performed with photogrammetry, LIDAR, or other techniques. The quality of photogrammetric datasets obtained with Structure From Motion (SFM) techniques has shown a tremendous improvement over the past few years, and this is becoming the more effective way to collect DOM datasets. The main advantages of photogrammetry over LIDAR are represented by the very simple and lightweight field equipment (a digital camera), and by the arbitrary spatial resolution, that can be increased simply getting closer to the out-crop or by using a different lens. It must be noted that concerns about the precision of close-range photogrammetric surveys, that were justified in the past, are no more a problem if modern software and acquisition schemas are applied. In any case, LIDAR is a well-tested technology and it is still very common. Irrespective of the data collection technology, the output will be a photorealistic point cloud and a collection of oriented photos, plus additional imagery in special projects (e.g. infrared images). This dataset can be used as-is (PC-DOM), or a 3D triangulated surface can be interpolated from the point cloud, and images can be used to associate a texture to this surface (TS-DOM). In the DOMstudio workflow we use both PC-DOMs and TS-DOMs. Particularly, the latter are obtained projecting the original images onto the triangulated surface, without any downsampling, thus retaining the original resolution and quality of images collected in the field. In the DOMstudio interpretation step, PC-DOM is considered the best option for fracture analysis in outcrops where facets corresponding to fractures are present. This allows obtaining orientation statistics (e.g. stereoplots, Fisher statistics, etc.) directly from a point cloud where, for each point, the unit vector normal to the outcrop surface has been calculated. A recent development in this kind of processing is represented by the possibility to automatically select (segment) subset point clouds representing single fracture surfaces, which can be used for studies on fracture length, spacing, etc., allowing to obtain parameters like the frequency-length distribution, P21, etc. PC-DOM interpretation can be combined or complemented, depending on the outcrop morphology, with an interpretation carried out on a TS-DOM in terms of traces, which are the linear intersection of "geological" surfaces (fractures, faults, bedding, etc.) with the outcrop surface. This kind of interpretation is very well suited for outcrops with smooth surfaces, and can be performed either by manual picking, or by applying image analysis techniques on the images associated with the DOM. In this case, a huge mass of data, with very high resolution, can be collected very effectively. If we consider applications like lithological or mineral map-ping, TS-DOM datasets are the only suitable support. Finally, the DOMstudio workflow produces output in formats that are compatible with all common geomodelling packages (e.g. Gocad/Skua, Petrel, Move), allowing to directly use the quantitative data collected on DOMs to generate and calibrate geological, structural, or geostatistical models. I will present examples of applications including hydrocarbon reservoir analogue studies, studies on fault zone architecture, lithological mapping on sedimentary and metamorphic rocks, and studies on the surface of planets and small bodies in the Solar System.
Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters
Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard
2011-01-01
The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874
Comparative microbial modules resource: generation and visualization of multi-species biclusters.
Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard
2011-12-01
The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.
NASA Technical Reports Server (NTRS)
Ruane, Alex C.; Teichmann, Claas; Arnell, Nigel W.; Carter, Timothy R.; Ebi, Kristie L.; Frieler, Katja; Goodess, Clare M.; Hewitson, Bruce; Horton, Radley; Kovats, R. Sari;
2016-01-01
This paper describes the motivation for the creation of the Vulnerability, Impacts, Adaptation and Climate Services (VIACS) Advisory Board for the Sixth Phase of the Coupled Model Intercomparison Project (CMIP6), its initial activities, and its plans to serve as a bridge between climate change applications experts and climate modelers. The climate change application community comprises researchers and other specialists who use climate information (alongside socioeconomic and other environmental information) to analyze vulnerability, impacts, and adaptation of natural systems and society in relation to past, ongoing, and projected future climate change. Much of this activity is directed toward the co-development of information needed by decisionmakers for managing projected risks. CMIP6 provides a unique opportunity to facilitate a two-way dialog between climate modelers and VIACS experts who are looking to apply CMIP6 results for a wide array of research and climate services objectives. The VIACS Advisory Board convenes leaders of major impact sectors, international programs, and climate services to solicit community feedback that increases the applications relevance of the CMIP6-Endorsed Model Intercomparison Projects (MIPs). As an illustration of its potential, the VIACS community provided CMIP6 leadership with a list of prioritized climate model variables and MIP experiments of the greatest interest to the climate model applications community, indicating the applicability and societal relevance of climate model simulation outputs. The VIACS Advisory Board also recommended an impacts version of Obs4MIPs (observational datasets) and indicated user needs for the gridding and processing of model output.
Concept for Future Data Services at the Long-Term Archive of WDCC combining DOIs with common PIDs
NASA Astrophysics Data System (ADS)
Stockhause, Martina; Weigel, Tobias; Toussaint, Frank; Höck, Heinke; Thiemann, Hannes; Lautenschlager, Michael
2013-04-01
The World Data Center for Climate (WDCC) hosted at the German Climate Computing Center (DKRZ) maintains a long-term archive (LTA) of climate model data as well as observational data. WDCC distinguishes between two types of LTA data: Structured data: Data output of an instrument or of a climate model run consists of numerous, highly structured individual datasets in a uniform format. Part of these data is also published on an ESGF (Earth System Grid Federation) data node. Detailed metadata is available allowing for fine-grained user-defined data access. Unstructured data: LTA data of finished scientific projects are in general unstructured and consist of datasets of different formats, different sizes, and different contents. For these data compact metadata is available as content information. The structured data is suitable for WDCC's DataCite DOI process, the project data only in exceptional cases. The DOI process includes a thorough quality control process of technical as well as scientific aspects by the publication agent and the data creator. DOIs are assigned to data collections appropriate to be cited in scientific publications, like a simulation run. The data collection is defined in agreement with the data creator. At the moment there is no possibility to identify and cite individual datasets within this DOI data collection analogous to the citation of chapters in a book. Also missing is a compact citation regulation for a user-specified collection of data. WDCC therefore complements its existing LTA/DOI concept by Persistent Identifier (PID) assignment to datasets using Handles. In addition to data identification for internal and external use, the concept of PIDs allows to define relations among PIDs. Such structural information is stored as key-value pair directly in the handles. Thus, relations provide basic provenance or lineage information, even if part of the data like intermediate results are lost. WDCC intends to use additional PIDs on metadata entities with a relation to the data PID(s). These add background information on the data creation process (e.g. descriptions of experiment, model, model set-up, and platform for the model run etc.) to the data. These pieces of additional information increase the re-usability of the archived model data, significantly. Other valuable additional information for scientific collaboration could be added by the same mechanism, like quality information and annotations. Apart from relations among data and metadata entities, PIDs on collections are advantageous for model data: Collections allow for persistent references to single datasets or subsets of data assigned a DOI, Data objects and additional information objects can be consistently connected via relations (provenance, creation, quality information for data),
Top-Down Visual Saliency via Joint CRF and Dictionary Learning.
Yang, Jimei; Yang, Ming-Hsuan
2017-03-01
Top-down visual saliency is an important module of visual attention. In this work, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a visual dictionary. The proposed model incorporates a layered structure from top to bottom: CRF, sparse coding and image patches. With sparse coding as an intermediate layer, CRF is learned in a feature-adaptive manner; meanwhile with CRF as the output layer, the dictionary is learned under structured supervision. For efficient and effective joint learning, we develop a max-margin approach via a stochastic gradient descent algorithm. Experimental results on the Graz-02 and PASCAL VOC datasets show that our model performs favorably against state-of-the-art top-down saliency methods for target object localization. In addition, the dictionary update significantly improves the performance of our model. We demonstrate the merits of the proposed top-down saliency model by applying it to prioritizing object proposals for detection and predicting human fixations.
Liu, Da; Xu, Ming; Niu, Dongxiao; Wang, Shoukai; Liang, Sai
2016-01-01
Traditional forecasting models fit a function approximation from dependent invariables to independent variables. However, they usually get into trouble when date are presented in various formats, such as text, voice and image. This study proposes a novel image-encoded forecasting method that input and output binary digital two-dimensional (2D) images are transformed from decimal data. Omitting any data analysis or cleansing steps for simplicity, all raw variables were selected and converted to binary digital images as the input of a deep learning model, convolutional neural network (CNN). Using shared weights, pooling and multiple-layer back-propagation techniques, the CNN was adopted to locate the nexus among variations in local binary digital images. Due to the computing capability that was originally developed for binary digital bitmap manipulation, this model has significant potential for forecasting with vast volume of data. The model was validated by a power loads predicting dataset from the Global Energy Forecasting Competition 2012.
Xu, Ming; Niu, Dongxiao; Wang, Shoukai; Liang, Sai
2016-01-01
Traditional forecasting models fit a function approximation from dependent invariables to independent variables. However, they usually get into trouble when date are presented in various formats, such as text, voice and image. This study proposes a novel image-encoded forecasting method that input and output binary digital two-dimensional (2D) images are transformed from decimal data. Omitting any data analysis or cleansing steps for simplicity, all raw variables were selected and converted to binary digital images as the input of a deep learning model, convolutional neural network (CNN). Using shared weights, pooling and multiple-layer back-propagation techniques, the CNN was adopted to locate the nexus among variations in local binary digital images. Due to the computing capability that was originally developed for binary digital bitmap manipulation, this model has significant potential for forecasting with vast volume of data. The model was validated by a power loads predicting dataset from the Global Energy Forecasting Competition 2012. PMID:27281032
What is the effect of LiDAR-derived DEM resolution on large-scale watershed model results?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ping Yang; Daniel B. Ames; Andre Fonseca
This paper examines the effect of raster cell size on hydrographic feature extraction and hydrological modeling using LiDAR derived DEMs. LiDAR datasets for three experimental watersheds were converted to DEMs at various cell sizes. Watershed boundaries and stream networks were delineated from each DEM and were compared to reference data. Hydrological simulations were conducted and the outputs were compared. Smaller cell size DEMs consistently resulted in less difference between DEM-delineated features and reference data. However, minor differences been found between streamflow simulations resulted for a lumped watershed model run at daily simulations aggregated at an annual average. These findings indicatemore » that while higher resolution DEM grids may result in more accurate representation of terrain characteristics, such variations do not necessarily improve watershed scale simulation modeling. Hence the additional expense of generating high resolution DEM's for the purpose of watershed modeling at daily or longer time steps may not be warranted.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Yiqi; Ahlström, Anders; Allison, Steven D.
Soil carbon (C) is a critical component of Earth system models (ESMs) and its diverse representations are a major source of the large spread across models in the terrestrial C sink from the 3rd to 5th assessment reports of the Intergovernmental Panel on Climate Change (IPCC). Improving soil C projections is of a high priority for Earth system modeling in the future IPCC and other assessments. To achieve this goal, we suggest that (1) model structures should reflect real-world processes, (2) parameters should be calibrated to match model outputs with observations, and (3) external forcing variables should accurately prescribe themore » environmental conditions that soils experience. Firstly, most soil C cycle models simulate C input from litter production and C release through decomposition. The latter process has traditionally been represented by 1st-order decay functions, regulated primarily by temperature, moisture, litter quality, and soil texture. While this formulation well captures macroscopic SOC dynamics, better understanding is needed of their underlying mechanisms as related to microbial processes, depth-dependent environmental controls, and other processes that strongly affect soil C dynamics. Secondly, incomplete use of observations in model parameterization is a major cause of bias in soil C projections from ESMs. Optimal parameter calibration with both pool- and flux-based datasets through data assimilation is among the highest priorities for near-term research to reduce biases among ESMs. Thirdly, external variables are represented inconsistently among ESMs, leading to differences in modeled soil C dynamics. We recommend the implementation of traceability analyses to identify how external variables and model parameterizations influence SOC dynamics in different ESMs. Overall, projections of the terrestrial C sink can be substantially improved when reliable datasets are available to select the most representative model structure, constrain parameters, and prescribe forcing fields.« less
Assessment of a climate model to reproduce rainfall variability and extremes over Southern Africa
NASA Astrophysics Data System (ADS)
Williams, C. J. R.; Kniveton, D. R.; Layberry, R.
2010-01-01
It is increasingly accepted that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The sub-continent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of ability of a state of the art climate model to simulate climate at daily timescales is carried out using satellite-derived rainfall data from the Microwave Infrared Rainfall Algorithm (MIRA). This dataset covers the period from 1993 to 2002 and the whole of southern Africa at a spatial resolution of 0.1° longitude/latitude. This paper concentrates primarily on the ability of the model to simulate the spatial and temporal patterns of present-day rainfall variability over southern Africa and is not intended to discuss possible future changes in climate as these have been documented elsewhere. Simulations of current climate from the UK Meteorological Office Hadley Centre's climate model, in both regional and global mode, are firstly compared to the MIRA dataset at daily timescales. Secondly, the ability of the model to reproduce daily rainfall extremes is assessed, again by a comparison with extremes from the MIRA dataset. The results suggest that the model reproduces the number and spatial distribution of rainfall extremes with some accuracy, but that mean rainfall and rainfall variability is under-estimated (over-estimated) over wet (dry) regions of southern Africa.
NASA Astrophysics Data System (ADS)
Zhang, Jun; Cain, Elizabeth Hope; Saha, Ashirbani; Zhu, Zhe; Mazurowski, Maciej A.
2018-02-01
Breast mass detection in mammography and digital breast tomosynthesis (DBT) is an essential step in computerized breast cancer analysis. Deep learning-based methods incorporate feature extraction and model learning into a unified framework and have achieved impressive performance in various medical applications (e.g., disease diagnosis, tumor detection, and landmark detection). However, these methods require large-scale accurately annotated data. Unfortunately, it is challenging to get precise annotations of breast masses. To address this issue, we propose a fully convolutional network (FCN) based heatmap regression method for breast mass detection, using only weakly annotated mass regions in mammography images. Specifically, we first generate heat maps of masses based on human-annotated rough regions for breast masses. We then develop an FCN model for end-to-end heatmap regression with an F-score loss function, where the mammography images are regarded as the input and heatmaps for breast masses are used as the output. Finally, the probability map of mass locations can be estimated with the trained model. Experimental results on a mammography dataset with 439 subjects demonstrate the effectiveness of our method. Furthermore, we evaluate whether we can use mammography data to improve detection models for DBT, since mammography shares similar structure with tomosynthesis. We propose a transfer learning strategy by fine-tuning the learned FCN model from mammography images. We test this approach on a small tomosynthesis dataset with only 40 subjects, and we show an improvement in the detection performance as compared to training the model from scratch.
Evolutionary-based approaches for determining the deviatoric stress of calcareous sands
NASA Astrophysics Data System (ADS)
Shahnazari, Habib; Tutunchian, Mohammad A.; Rezvani, Reza; Valizadeh, Fatemeh
2013-01-01
Many hydrocarbon reservoirs are located near oceans which are covered by calcareous deposits. These sediments consist mainly of the remains of marine plants or animals, so calcareous soils can have a wide variety of engineering properties. Due to their local expansion and considerable differences from terrigenous soils, the evaluation of engineering behaviors of calcareous sediments has been a major concern for geotechnical engineers in recent years. Deviatoric stress is one of the most important parameters directly affecting important shearing characteristics of soils. In this study, a dataset of experimental triaxial tests was gathered from two sources. First, the data of previous experimental studies from the literature were gathered. Then, a series of triaxial tests was performed on calcareous sands of the Persian Gulf to develop the dataset. This work resulted in a large database of experimental results on the maximum deviatoric stress of different calcareous sands. To demonstrate the capabilities of evolutionary-based approaches in modeling the deviatoric stress of calcareous sands, two promising variants of genetic programming (GP), multigene genetic programming (MGP) and gene expression programming (GEP), were applied to propose new predictive models. The models' input parameters were the physical and in-situ condition properties of soil and the output was the maximum deviatoric stress (i.e., the axial-deviator stress). The results of statistical analyses indicated the robustness of these models, and a parametric study was also conducted for further verification of the models, in which the resulting trends were consistent with the results of the experimental study. Finally, the proposed models were further simplified by applying a practical geotechnical correlation.
Context-Sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry
Grover, Himanshu; Wallstrom, Garrick; Wu, Christine C.
2013-01-01
Abstract Peptide and protein identification via tandem mass spectrometry (MS/MS) lies at the heart of proteomic characterization of biological samples. Several algorithms are able to search, score, and assign peptides to large MS/MS datasets. Most popular methods, however, underutilize the intensity information available in the tandem mass spectrum due to the complex nature of the peptide fragmentation process, thus contributing to loss of potential identifications. We present a novel probabilistic scoring algorithm called Context-Sensitive Peptide Identification (CSPI) based on highly flexible Input-Output Hidden Markov Models (IO-HMM) that capture the influence of peptide physicochemical properties on their observed MS/MS spectra. We use several local and global properties of peptides and their fragment ions from literature. Comparison with two popular algorithms, Crux (re-implementation of SEQUEST) and X!Tandem, on multiple datasets of varying complexity, shows that peptide identification scores from our models are able to achieve greater discrimination between true and false peptides, identifying up to ∼25% more peptides at a False Discovery Rate (FDR) of 1%. We evaluated two alternative normalization schemes for fragment ion-intensities, a global rank-based and a local window-based. Our results indicate the importance of appropriate normalization methods for learning superior models. Further, combining our scores with Crux using a state-of-the-art procedure, Percolator, we demonstrate the utility of using scoring features from intensity-based models, identifying ∼4-8 % additional identifications over Percolator at 1% FDR. IO-HMMs offer a scalable and flexible framework with several modeling choices to learn complex patterns embedded in MS/MS data. PMID:23289783
Interactive Visualization and Analysis of Geospatial Data Sets - TrikeND-iGlobe
NASA Astrophysics Data System (ADS)
Rosebrock, Uwe; Hogan, Patrick; Chandola, Varun
2013-04-01
The visualization of scientific datasets is becoming an ever-increasing challenge as advances in computing technologies have enabled scientists to build high resolution climate models that have produced petabytes of climate data. To interrogate and analyze these large datasets in real-time is a task that pushes the boundaries of computing hardware and software. But integration of climate datasets with geospatial data requires considerable amount of effort and close familiarity of various data formats and projection systems, which has prevented widespread utilization outside of climate community. TrikeND-iGlobe is a sophisticated software tool that bridges this gap, allows easy integration of climate datasets with geospatial datasets and provides sophisticated visualization and analysis capabilities. The objective for TrikeND-iGlobe is the continued building of an open source 4D virtual globe application using NASA World Wind technology that integrates analysis of climate model outputs with remote sensing observations as well as demographic and environmental data sets. This will facilitate a better understanding of global and regional phenomenon, and the impact analysis of climate extreme events. The critical aim is real-time interactive interrogation. At the data centric level the primary aim is to enable the user to interact with the data in real-time for the purpose of analysis - locally or remotely. TrikeND-iGlobe provides the basis for the incorporation of modular tools that provide extended interactions with the data, including sub-setting, aggregation, re-shaping, time series analysis methods and animation to produce publication-quality imagery. TrikeND-iGlobe may be run locally or can be accessed via a web interface supported by high-performance visualization compute nodes placed close to the data. It supports visualizing heterogeneous data formats: traditional geospatial datasets along with scientific data sets with geographic coordinates (NetCDF, HDF, etc.). It also supports multiple data access mechanisms, including HTTP, FTP, WMS, WCS, and Thredds Data Server (for NetCDF data and for scientific data, TrikeND-iGlobe supports various visualization capabilities, including animations, vector field visualization, etc. TrikeND-iGlobe is a collaborative open-source project, contributors include NASA (ARC-PX), ORNL (Oakridge National Laboratories), Unidata, Kansas University, CSIRO CMAR Australia and Geoscience Australia.
NASA Astrophysics Data System (ADS)
Hempelmann, Nils; Ehbrecht, Carsten; Alvarez-Castro, Carmen; Brockmann, Patrick; Falk, Wolfgang; Hoffmann, Jörg; Kindermann, Stephan; Koziol, Ben; Nangini, Cathy; Radanovics, Sabine; Vautard, Robert; Yiou, Pascal
2018-01-01
Analyses of extreme weather events and their impacts often requires big data processing of ensembles of climate model simulations. Researchers generally proceed by downloading the data from the providers and processing the data files ;at home; with their own analysis processes. However, the growing amount of available climate model and observation data makes this procedure quite awkward. In addition, data processing knowledge is kept local, instead of being consolidated into a common resource of reusable code. These drawbacks can be mitigated by using a web processing service (WPS). A WPS hosts services such as data analysis processes that are accessible over the web, and can be installed close to the data archives. We developed a WPS named 'flyingpigeon' that communicates over an HTTP network protocol based on standards defined by the Open Geospatial Consortium (OGC), to be used by climatologists and impact modelers as a tool for analyzing large datasets remotely. Here, we present the current processes we developed in flyingpigeon relating to commonly-used processes (preprocessing steps, spatial subsets at continent, country or region level, and climate indices) as well as methods for specific climate data analysis (weather regimes, analogues of circulation, segetal flora distribution, and species distribution models). We also developed a novel, browser-based interactive data visualization for circulation analogues, illustrating the flexibility of WPS in designing custom outputs. Bringing the software to the data instead of transferring the data to the code is becoming increasingly necessary, especially with the upcoming massive climate datasets.
NASA Astrophysics Data System (ADS)
Dinsmore, P.; Prepas, E.; Putz, G.; Smith, D.
2008-12-01
The Forest Watershed and Riparian Disturbance (FORWARD) Project has collected data on weather, soils, vegetation, streamflow and stream water quality under relatively undisturbed conditions, as well as after experimental forest harvest, in partnership with industrial forest operations within the Boreal Plain and Boreal Shield ecozones of Canada. Research-based contributions from FORWARD were integrated into our Boreal Plain industry partner's 2007-2016 Detailed Forest Management Plan. These contributions consisted of three components: 1) A GIS watershed and stream layer that included a hydrological network, a Digital Elevation Model, and Strahler classified streams and watersheds for 1st- and 3rd-order watersheds; 2) a combined soil and wetland GIS layer that included maps and associated datasets for relatively coarse mineral soils (which drain quickly) and wetlands (which retain water), which were the key features that needed to be identified for the FORWARD modelling effort; and 3) a lookup table was developed that permits planners to determine runoff coefficients (the variable selected for hydrological modelling) for 1st-order watersheds, based upon slope, vegetation and soil attributes in forest polygons. The lookup table was populated with output from the deterministic Soil and Water Assessment Tool (SWAT), adapted for boreal forest vegetation with a version of the plant growth model, ALMANAC. The runoff coefficient lookup table facilitated integration of predictions of hydrologic impacts of forest harvest into planning. This pilot-scale effort will ultimately be extended to the Boreal Shield study area.
Online Phase Detection Using Wearable Sensors for Walking with a Robotic Prosthesis
Goršič, Maja; Kamnik, Roman; Ambrožič, Luka; Vitiello, Nicola; Lefeber, Dirk; Pasquini, Guido; Munih, Marko
2014-01-01
This paper presents a gait phase detection algorithm for providing feedback in walking with a robotic prosthesis. The algorithm utilizes the output signals of a wearable wireless sensory system incorporating sensorized shoe insoles and inertial measurement units attached to body segments. The principle of detecting transitions between gait phases is based on heuristic threshold rules, dividing a steady-state walking stride into four phases. For the evaluation of the algorithm, experiments with three amputees, walking with the robotic prosthesis and wearable sensors, were performed. Results show a high rate of successful detection for all four phases (the average success rate across all subjects >90%). A comparison of the proposed method to an off-line trained algorithm using hidden Markov models reveals a similar performance achieved without the need for learning dataset acquisition and previous model training. PMID:24521944
A High-Resolution Merged Wind Dataset for DYNAMO: Progress and Future Plans
NASA Technical Reports Server (NTRS)
Lang, Timothy J.; Mecikalski, John; Li, Xuanli; Chronis, Themis; Castillo, Tyler; Hoover, Kacie; Brewer, Alan; Churnside, James; McCarty, Brandi; Hein, Paul;
2015-01-01
In order to support research on optimal data assimilation methods for the Cyclone Global Navigation Satellite System (CYGNSS), launching in 2016, work has been ongoing to produce a high-resolution merged wind dataset for the Dynamics of the Madden Julian Oscillation (DYNAMO) field campaign, which took place during late 2011/early 2012. The winds are produced by assimilating DYNAMO observations into the Weather Research and Forecasting (WRF) three-dimensional variational (3DVAR) system. Data sources from the DYNAMO campaign include the upper-air sounding network, radial velocities from the radar network, vector winds from the Advanced Scatterometer (ASCAT) and Oceansat-2 Scatterometer (OSCAT) satellite instruments, the NOAA High Resolution Doppler Lidar (HRDL), and several others. In order the prep them for 3DVAR, significant additional quality control work is being done for the currently available TOGA and SMART-R radar datasets, including automatically dealiasing radial velocities and correcting for intermittent TOGA antenna azimuth angle errors. The assimilated winds are being made available as model output fields from WRF on two separate grids with different horizontal resolutions - a 3-km grid focusing on the main DYNAMO quadrilateral (i.e., Gan Island, the R/V Revelle, the R/V Mirai, and Diego Garcia), and a 1-km grid focusing on the Revelle. The wind dataset is focused on three separate approximately 2-week periods during the Madden Julian Oscillation (MJO) onsets that occurred in October, November, and December 2011. Work is ongoing to convert the 10-m surface winds from these model fields to simulated CYGNSS observations using the CYGNSS End-To-End Simulator (E2ES), and these simulated satellite observations are being compared to radar observations of DYNAMO precipitation systems to document the anticipated ability of CYGNSS to provide information on the relationships between surface winds and oceanic precipitation at the mesoscale level. This research will improve our understanding of the future utility of CYGNSS for documenting key MJO processes.
HCP: A Flexible CNN Framework for Multi-label Image Classification.
Wei, Yunchao; Xia, Wei; Lin, Min; Huang, Junshi; Ni, Bingbing; Dong, Jian; Zhao, Yao; Yan, Shuicheng
2015-10-26
Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [44] based on hand-crafted features on the VOC 2012 dataset.
Garcia, L; Bedos, C; Génermont, S; Braud, I; Cellier, P
2011-09-01
Ammonia and pesticide volatilization in the field is a surface phenomenon involving physical and chemical processes that depend on the soil surface temperature and water content. The water transfer, heat transfer and energy budget sub models of volatilization models are adapted from the most commonly accepted formalisms and parameterizations. They are less detailed than the dedicated models describing water and heat transfers and surface status. The aim of this work was to assess the ability of one of the available mechanistic volatilization models, Volt'Air, to accurately describe the pedo-climatic conditions of a soil surface at the required time and space resolution. The assessment involves: (i) a sensitivity analysis, (ii) an evaluation of Volt'Air outputs in the light of outputs from a reference Soil-Vegetation-Atmosphere Transfer model (SiSPAT) and three experimental datasets, and (iii) the study of three tests based on modifications of SiSPAT to establish the potential impact of the simplifying assumptions used in Volt'Air. The analysis confirmed that a 5 mm surface layer was well suited, and that Volt'Air surface temperature correlated well with the experimental measurements as well as with SiSPAT outputs. In terms of liquid water transfers, Volt'Air was overall consistent with SiSPAT, with discrepancies only during major rainfall events and dry weather conditions. The tests enabled us to identify the main source of the discrepancies between Volt'Air and SiSPAT: the lack of gaseous water transfer description in Volt'Air. They also helped to explain why neither Volt'Air nor SiSPAT was able to represent lower values of surface water content: current classical water retention and hydraulic conductivity models are not yet adapted to cases of very dry conditions. Given the outcomes of this study, we discuss to what extent the volatilization models can be improved and the questions they pose for current research in water transfer modeling and parameterization. Copyright © 2011 Elsevier B.V. All rights reserved.
Privacy-Preserving Integration of Medical Data : A Practical Multiparty Private Set Intersection.
Miyaji, Atsuko; Nakasho, Kazuhisa; Nishida, Shohei
2017-03-01
Medical data are often maintained by different organizations. However, detailed analyses sometimes require these datasets to be integrated without violating patient or commercial privacy. Multiparty Private Set Intersection (MPSI), which is an important privacy-preserving protocol, computes an intersection of multiple private datasets. This approach ensures that only designated parties can identify the intersection. In this paper, we propose a practical MPSI that satisfies the following requirements: The size of the datasets maintained by the different parties is independent of the others, and the computational complexity of the dataset held by each party is independent of the number of parties. Our MPSI is based on the use of an outsourcing provider, who has no knowledge of the data inputs or outputs. This reduces the computational complexity. The performance of the proposed MPSI is evaluated by implementing a prototype on a virtual private network to enable parallel computation in multiple threads. Our protocol is confirmed to be more efficient than comparable existing approaches.
NASA Astrophysics Data System (ADS)
Mitchell, M. J.; Pichugina, Y. L.; Banta, R. M.
2015-12-01
Models are important tools for assessing potential of wind energy sites, but the accuracy of these projections has not been properly validated. In this study, High Resolution Doppler Lidar (HRDL) data obtained with high temporal and spatial resolution at heights of modern turbine rotors were compared to output from the WRF-chem model in order to help improve the performance of the model in producing accurate wind forecasts for the industry. HRDL data were collected from January 23-March 1, 2012 during the Uintah Basin Winter Ozone Study (UBWOS) field campaign. A model validation method was based on the qualitative comparison of the wind field images, time-series analysis and statistical analysis of the observed and modeled wind speed and direction, both for case studies and for the whole experiment. To compare the WRF-chem model output to the HRDL observations, the model heights and forecast times were interpolated to match the observed times and heights. Then, time-height cross-sections of the HRDL and WRF-Chem wind speed and directions were plotted to select case studies. Cross-sections of the differences between the observed and forecasted wind speed and directions were also plotted to visually analyze the model performance in different wind flow conditions. A statistical analysis includes the calculation of vertical profiles and time series of bias, correlation coefficient, root mean squared error, and coefficient of determination between two datasets. The results from this analysis reveals where and when the model typically struggles in forecasting winds at heights of modern turbine rotors so that in the future the model can be improved for the industry.
NASA Astrophysics Data System (ADS)
Kemp, C.; Car, N. J.
2016-12-01
Geoscience Australia (GA) is a government agency that provides advice on the geology and geography of Australia. It is the custodian of many digital and physical datasets of national significance. For several years GA has been implementing an enterprise approach to provenance management. The goal for transparency and reproducibility for all of GA's information products; an objective supported at the highest levels and explicitly listed in its Science Principles. Currently GA is finalising a set of enterprise tools to assist with provenance management and rolling out provenance reporting to different science areas. GA has adopted or developed: provenance storage systems; provenance collection code libraries (for use within automated systems); reporting interfaces (for manual use) and provenance representation capability within legacy catalogues. Using these tools within GA's science areas involves modelling the scenario first and then assessing whether the area has its data managed in such a way that allows links to data within provenance to be resolvable in perpetuity. We don't just want to represent provenance (demonstrating transparency), we want to access data via provenance (allowing for reproducibility). A subtask of GA's current work is to link physical samples to information products (datasets, reports, papers) by uniquely and persistently identifying samples using International GeoSample Numbers and then modelling automated & manual laboratory workflows and associated tasks, such as data delivery to corporate databases using the W3C's PROV Data Model. We use PROV DM throughout our modelling and systems. We are also moving to deliver all sample and digital dataset metadata across the agency in the Web Ontology Language (OWL) and exposing it via Linked Data methods in order to allow Semantic Web querying of multiple systems allowing provenance to be leveraged using as a single method and query point. Through the Science First Transformation Program GA is undergoing a significant rethinking of its data architecture, curation and access to support the Digital Science capability for which Provenance management is an output.
NASA Astrophysics Data System (ADS)
Mahesh, A.; Mudigonda, M.; Kim, S. K.; Kashinath, K.; Kahou, S.; Michalski, V.; Williams, D. N.; Liu, Y.; Prabhat, M.; Loring, B.; O'Brien, T. A.; Collins, W. D.
2017-12-01
Atmospheric rivers (ARs) can be the difference between CA facing drought or hurricane-level storms. ARs are a form of extreme weather defined as long, narrow columns of moisture which transport water vapor outside the tropics. When they make landfall, they release the vapor as rain or snow. Convolutional neural networks (CNNs), a machine learning technique that uses filters to recognize features, are the leading computer vision mechanism for classifying multichannel images. CNNs have been proven to be effective in identifying extreme weather events in climate simulation output (Liu et. al. 2016, ABDA'16, http://bit.ly/2hlrFNV). Here, we compare three different CNN architectures, tuned with different hyperparameters and training schemes. We compare two-layer, three-layer, four-layer, and sixteen-layer CNNs' ability to recognize ARs in Community Atmospheric Model version 5 output, and we explore the ability of data augmentation and pre-trained models to increase the accuracy of the classifier. Because pre-training the model with regular images (i.e. benches, stoves, and dogs) yielded the highest accuracy rate, this strategy, also known as transfer learning, may be vital in future scientific CNNs, which likely will not have access to a large labelled training dataset. By choosing the most effective CNN architecture, climate scientists can build an accurate historical database of ARs, which can be used to develop a predictive understanding of these phenomena.
NASA Astrophysics Data System (ADS)
Jennings, E.; Madigan, M.
2017-04-01
Given the complexity of modern cosmological parameter inference where we are faced with non-Gaussian data and noise, correlated systematics and multi-probe correlated datasets,the Approximate Bayesian Computation (ABC) method is a promising alternative to traditional Markov Chain Monte Carlo approaches in the case where the Likelihood is intractable or unknown. The ABC method is called "Likelihood free" as it avoids explicit evaluation of the Likelihood by using a forward model simulation of the data which can include systematics. We introduce astroABC, an open source ABC Sequential Monte Carlo (SMC) sampler for parameter estimation. A key challenge in astrophysics is the efficient use of large multi-probe datasets to constrain high dimensional, possibly correlated parameter spaces. With this in mind astroABC allows for massive parallelization using MPI, a framework that handles spawning of processes across multiple nodes. A key new feature of astroABC is the ability to create MPI groups with different communicators, one for the sampler and several others for the forward model simulation, which speeds up sampling time considerably. For smaller jobs the Python multiprocessing option is also available. Other key features of this new sampler include: a Sequential Monte Carlo sampler; a method for iteratively adapting tolerance levels; local covariance estimate using scikit-learn's KDTree; modules for specifying optimal covariance matrix for a component-wise or multivariate normal perturbation kernel and a weighted covariance metric; restart files output frequently so an interrupted sampling run can be resumed at any iteration; output and restart files are backed up at every iteration; user defined distance metric and simulation methods; a module for specifying heterogeneous parameter priors including non-standard prior PDFs; a module for specifying a constant, linear, log or exponential tolerance level; well-documented examples and sample scripts. This code is hosted online at https://github.com/EliseJ/astroABC.
Automated image based prominent nucleoli detection
Yap, Choon K.; Kalaw, Emarene M.; Singh, Malay; Chong, Kian T.; Giron, Danilo M.; Huang, Chao-Hui; Cheng, Li; Law, Yan N.; Lee, Hwee Kuan
2015-01-01
Introduction: Nucleolar changes in cancer cells are one of the cytologic features important to the tumor pathologist in cancer assessments of tissue biopsies. However, inter-observer variability and the manual approach to this work hamper the accuracy of the assessment by pathologists. In this paper, we propose a computational method for prominent nucleoli pattern detection. Materials and Methods: Thirty-five hematoxylin and eosin stained images were acquired from prostate cancer, breast cancer, renal clear cell cancer and renal papillary cell cancer tissues. Prostate cancer images were used for the development of a computer-based automated prominent nucleoli pattern detector built on a cascade farm. An ensemble of approximately 1000 cascades was constructed by permuting different combinations of classifiers such as support vector machines, eXclusive component analysis, boosting, and logistic regression. The output of cascades was then combined using the RankBoost algorithm. The output of our prominent nucleoli pattern detector is a ranked set of detected image patches of patterns of prominent nucleoli. Results: The mean number of detected prominent nucleoli patterns in the top 100 ranked detected objects was 58 in the prostate cancer dataset, 68 in the breast cancer dataset, 86 in the renal clear cell cancer dataset, and 76 in the renal papillary cell cancer dataset. The proposed cascade farm performs twice as good as the use of a single cascade proposed in the seminal paper by Viola and Jones. For comparison, a naive algorithm that randomly chooses a pixel as a nucleoli pattern would detect five correct patterns in the first 100 ranked objects. Conclusions: Detection of sparse nucleoli patterns in a large background of highly variable tissue patterns is a difficult challenge our method has overcome. This study developed an accurate prominent nucleoli pattern detector with the potential to be used in the clinical settings. PMID:26167383
Automated image based prominent nucleoli detection.
Yap, Choon K; Kalaw, Emarene M; Singh, Malay; Chong, Kian T; Giron, Danilo M; Huang, Chao-Hui; Cheng, Li; Law, Yan N; Lee, Hwee Kuan
2015-01-01
Nucleolar changes in cancer cells are one of the cytologic features important to the tumor pathologist in cancer assessments of tissue biopsies. However, inter-observer variability and the manual approach to this work hamper the accuracy of the assessment by pathologists. In this paper, we propose a computational method for prominent nucleoli pattern detection. Thirty-five hematoxylin and eosin stained images were acquired from prostate cancer, breast cancer, renal clear cell cancer and renal papillary cell cancer tissues. Prostate cancer images were used for the development of a computer-based automated prominent nucleoli pattern detector built on a cascade farm. An ensemble of approximately 1000 cascades was constructed by permuting different combinations of classifiers such as support vector machines, eXclusive component analysis, boosting, and logistic regression. The output of cascades was then combined using the RankBoost algorithm. The output of our prominent nucleoli pattern detector is a ranked set of detected image patches of patterns of prominent nucleoli. The mean number of detected prominent nucleoli patterns in the top 100 ranked detected objects was 58 in the prostate cancer dataset, 68 in the breast cancer dataset, 86 in the renal clear cell cancer dataset, and 76 in the renal papillary cell cancer dataset. The proposed cascade farm performs twice as good as the use of a single cascade proposed in the seminal paper by Viola and Jones. For comparison, a naive algorithm that randomly chooses a pixel as a nucleoli pattern would detect five correct patterns in the first 100 ranked objects. Detection of sparse nucleoli patterns in a large background of highly variable tissue patterns is a difficult challenge our method has overcome. This study developed an accurate prominent nucleoli pattern detector with the potential to be used in the clinical settings.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
Artan, G.A.; Verdin, J.P.; Lietzow, R.
2013-01-01
We illustrate the ability to monitor the status of snowpack over large areas by using a~spatially distributed snow accumulation and ablation model in the Upper Colorado Basin. The model was forced with precipitation fields from the National Weather Service (NWS) Multi-sensor Precipitation Estimator (MPE) and the Tropical Rainfall Measuring Mission (TRMM) datasets; remaining meteorological model input data was from NOAA's Global Forecast System (GFS) model output fields. The simulated snow water equivalent (SWE) was compared to SWEs from the Snow Data Assimilation System (SNODAS) and SNOwpack TELemetry system (SNOTEL) over a~region of the Western United States that covers parts of the Upper Colorado Basin. We also compared the SWE product estimated from the Special Sensor Microwave Imager (SSM/I) and Scanning Multichannel Microwave Radiometer (SMMR) to the SNODAS and SNOTEL SWE datasets. Agreement between the spatial distribution of the simulated SWE with both SNODAS and SNOTEL was high for the two model runs for the entire snow accumulation period. Model-simulated SWEs, both with MPE and TRMM, were significantly correlated spatially on average with the SNODAS (r = 0.81 and r = 0.54; d.f. = 543) and the SNOTEL SWE (r = 0.85 and r = 0.55; d.f. = 543), when monthly basinwide simulated average SWE the correlation was also highly significant (r = 0.95 and r = 0.73; d.f. = 12). The SWE estimated from the passive microwave imagery was not correlated either with the SNODAS SWE or (r = 0.14, d.f. = 7) SNOTEL-reported SWE values (r = 0.08, d.f. = 7). The agreement between modeled SWE and the SWE recorded by SNODAS and SNOTEL weakened during the snowmelt period due to an underestimation bias of the air temperature that was used as model input forcing.
Evaluation of coral reef carbonate production models at a global scale
NASA Astrophysics Data System (ADS)
Jones, N. S.; Ridgwell, A.; Hendy, E. J.
2014-09-01
Calcification by coral reef communities is estimated to account for half of all carbonate produced in shallow water environments and more than 25% of the total carbonate buried in marine sediments globally. Production of calcium carbonate by coral reefs is therefore an important component of the global carbon cycle. It is also threatened by future global warming and other global change pressures. Numerical models of reefal carbonate production are essential for understanding how carbonate deposition responds to environmental conditions including future atmospheric CO2 concentrations, but these models must first be evaluated in terms of their skill in recreating present day calcification rates. Here we evaluate four published model descriptions of reef carbonate production in terms of their predictive power, at both local and global scales, by comparing carbonate budget outputs with independent estimates. We also compile available global data on reef calcification to produce an observation-based dataset for the model evaluation. The four calcification models are based on functions sensitive to combinations of light availability, aragonite saturation (Ωa) and temperature and were implemented within a specifically-developed global framework, the Global Reef Accretion Model (GRAM). None of the four models correlated with independent rate estimates of whole reef calcification. The temperature-only based approach was the only model output to significantly correlate with coral-calcification rate observations. The absence of any predictive power for whole reef systems, even when consistent at the scale of individual corals, points to the overriding importance of coral cover estimates in the calculations. Our work highlights the need for an ecosystem modeling approach, accounting for population dynamics in terms of mortality and recruitment and hence coral cover, in estimating global reef carbonate budgets. In addition, validation of reef carbonate budgets is severely hampered by limited and inconsistent methodology in reef-scale observations.
Sreenivasa, Manish; Ayusawa, Ko; Nakamura, Yoshihiko
2016-05-01
This study develops a multi-level neuromuscular model consisting of topological pools of spiking motor, sensory and interneurons controlling a bi-muscular model of the human arm. The spiking output of motor neuron pools were used to drive muscle actions and skeletal movement via neuromuscular junctions. Feedback information from muscle spindles were relayed via monosynaptic excitatory and disynaptic inhibitory connections, to simulate spinal afferent pathways. Subject-specific model parameters were identified from human experiments by using inverse dynamics computations and optimization methods. The identified neuromuscular model was used to simulate the biceps stretch reflex and the results were compared to an independent dataset. The proposed model was able to track the recorded data and produce dynamically consistent neural spiking patterns, muscle forces and movement kinematics under varying conditions of external forces and co-contraction levels. This additional layer of detail in neuromuscular models has important relevance to the research communities of rehabilitation and clinical movement analysis by providing a mathematical approach to studying neuromuscular pathology.
Cahill, James A; Soares, André E R; Green, Richard E; Shapiro, Beth
2016-07-19
Understanding when species diverged aids in identifying the drivers of speciation, but the end of gene flow between populations can be difficult to ascertain from genetic data. We explore the use of pairwise sequential Markovian coalescent (PSMC) modelling to infer the timing of divergence between species and populations. PSMC plots generated using artificial hybrid genomes show rapid increases in effective population size at the time when the two parent lineages diverge, and this approach has been used previously to infer divergence between human lineages. We show that, even without high coverage or phased input data, PSMC can detect the end of significant gene flow between populations by comparing the PSMC output from artificial hybrids to the output of simulations with known demographic histories. We then apply PSMC to detect divergence times among lineages within two real datasets: great apes and bears within the genus Ursus Our results confirm most previously proposed divergence times for these lineages, and suggest that gene flow between recently diverged lineages may have been common among bears and great apes, including up to one million years of continued gene flow between chimpanzees and bonobos after the formation of the Congo River.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. © 2016 The Author(s).
A Comparison of the Forecast Skills among Three Numerical Models
NASA Astrophysics Data System (ADS)
Lu, D.; Reddy, S. R.; White, L. J.
2003-12-01
Three numerical weather forecast models, MM5, COAMPS and WRF, operating with a joint effort of NOAA HU-NCAS and Jackson State University (JSU) during summer 2003 have been chosen to study their forecast skills against observations. The models forecast over the same region with the same initialization, boundary condition, forecast length and spatial resolution. AVN global dataset have been ingested as initial conditions. Grib resolution of 27 km is chosen to represent the current mesoscale model. The forecasts with the length of 36h are performed to output the result with 12h interval. The key parameters used to evaluate the forecast skill include 12h accumulated precipitation, sea level pressure, wind, surface temperature and dew point. Precipitation is evaluated statistically using conventional skill scores, Threat Score (TS) and Bias Score (BS), for different threshold values based on 12h rainfall observations whereas other statistical methods such as Mean Error (ME), Mean Absolute Error(MAE) and Root Mean Square Error (RMSE) are applied to other forecast parameters.
Cordeiro, Taynara Cristina; Barrella, Walter; Butturi-Gomes, Davi; Petrere Júnior, Miguel
2018-03-01
Given the complexity of the dynamics in litter reposition, our objective was modeling the possible main and interaction effects of tidal oscillations, seasons of the year and the moon phases over the solid waste in Santos beaches. A total of 80 collections were carried out using quadrat sampling, from which we classified, counted and weighed all residue items. We fitted mixed Hurdle models to the output datasets and performed hypotheses tests based on this framework. We found plastic to be the most abundant residue in all seasons, moon phases and tides, followed by Styrofoam and wood. Our models suggest the strongest effect was due to seasonal variations, which, in turn, may be related to different human activities. Although the dynamics of different components showed independency of all interaction structures, plastics depended on the interaction of tide and season, whose impact over estuarine life and ecosystem services shall be further investigated. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Gill, Roger; Schnase, John L.
2012-01-01
The Invasive Species Forecasting System (ISFS) is an online decision support system that allows users to load point occurrence field sample data for a plant species of interest and quickly generate habitat suitability maps for geographic regions of interest, such as a national park, monument, forest, or refuge. Target customers for ISFS are natural resource managers and decision makers who have a need for scientifically valid, model- based predictions of the habitat suitability of plant species of management concern. In a joint project involving NASA and the Maryland Department of Natural Resources, ISFS has been used to model the potential distribution of Wavyleaf Basketgrass in Maryland's Chesapeake Bay Watershed. Maximum entropy techniques are used to generate predictive maps using predictor datasets derived from remotely sensed data and climate simulation outputs. The workflow to run a model is implemented in an iRODS microservice using a custom ISFS file driver that clips and re-projects data to geographic regions of interest, then shells out to perform MaxEnt processing on the input data. When the model completes, all output files and maps from the model run are registered in iRODS and made accessible to the user. The ISFS user interface is a web browser that uses the iRODS PHP client to interact with the ISFS/iRODS- server. ISFS is designed to reside in a VMware virtual machine running SLES 11 and iRODS 3.0. The ISFS virtual machine is hosted in a VMware vSphere private cloud infrastructure to deliver the online service.
Coupled catastrophes: sudden shifts cascade and hop among interdependent systems
Barnett, George; D'Souza, Raissa M.
2015-01-01
An important challenge in several disciplines is to understand how sudden changes can propagate among coupled systems. Examples include the synchronization of business cycles, population collapse in patchy ecosystems, markets shifting to a new technology platform, collapses in prices and in confidence in financial markets, and protests erupting in multiple countries. A number of mathematical models of these phenomena have multiple equilibria separated by saddle-node bifurcations. We study this behaviour in its normal form as fast–slow ordinary differential equations. In our model, a system consists of multiple subsystems, such as countries in the global economy or patches of an ecosystem. Each subsystem is described by a scalar quantity, such as economic output or population, that undergoes sudden changes via saddle-node bifurcations. The subsystems are coupled via their scalar quantity (e.g. trade couples economic output; diffusion couples populations); that coupling moves the locations of their bifurcations. The model demonstrates two ways in which sudden changes can propagate: they can cascade (one causing the next), or they can hop over subsystems. The latter is absent from classic models of cascades. For an application, we study the Arab Spring protests. After connecting the model to sociological theories that have bistability, we use socioeconomic data to estimate relative proximities to tipping points and Facebook data to estimate couplings among countries. We find that although protests tend to spread locally, they also seem to ‘hop' over countries, like in the stylized model; this result highlights a new class of temporal motifs in longitudinal network datasets. PMID:26559684
Regional Climate Models Downscaling in the Alpine Area with Multimodel SuperEnsemble
NASA Astrophysics Data System (ADS)
Cane, D.; Barbarino, S.; Renier, L.; Ronchi, C.
2012-04-01
The climatic scenarios show a strong signal of warming in the Alpine area already for the mid XXI century. The climate simulation, however, even when obtained with Regional Climate Models (RCMs), are affected by strong errors where compared with observations in the control period, due to their difficulties in representing the complex orography of the Alps and limitations in their physical parametrization. In this work we use a selection of RCMs runs from the ENSEMBLES project, carefully chosen in order to maximise the variety of leading Global Climate Models and of the RCMs themselves, calculated on the SRES scenario A1B. The reference observation for the Greater Alpine Area are extracted from the European dataset E-OBS produced by the project ENSEMBLES with an available resolution of 25 km. For the study area of Piemonte daily temperature and precipitation observations (1957-present) were carefully gridded on a 14-km grid over Piemonte Region with an Optimal Interpolation technique. We applied the Multimodel SuperEnsemble technique to temperature fields, reducing the high biases of RCMs temperature field compared to observations in the control period. We propose also the first application to RCMs of a brand new probabilistic Multimodel SuperEnsemble Dressing technique to estimate precipitation fields, already applied successfully to weather forecast models, with careful description of precipitation Probability Density Functions conditioned to the model outputs. This technique reduces the strong precipitation overestimation by RCMs over the alpine chain and reproduces the monthly behaviour of observed precipitation in the control period far better than the direct model outputs.
High resolution global climate modelling; the UPSCALE project, a large simulation campaign
NASA Astrophysics Data System (ADS)
Mizielinski, M. S.; Roberts, M. J.; Vidale, P. L.; Schiemann, R.; Demory, M.-E.; Strachan, J.; Edwards, T.; Stephens, A.; Lawrence, B. N.; Pritchard, M.; Chiu, P.; Iwi, A.; Churchill, J.; del Cano Novales, C.; Kettleborough, J.; Roseblade, W.; Selwood, P.; Foster, M.; Glover, M.; Malcolm, A.
2014-01-01
The UPSCALE (UK on PRACE: weather-resolving Simulations of Climate for globAL Environmental risk) project constructed and ran an ensemble of HadGEM3 (Hadley centre Global Environment Model 3) atmosphere-only global climate simulations over the period 1985-2011, at resolutions of N512 (25 km), N216 (60 km) and N96 (130 km) as used in current global weather forecasting, seasonal prediction and climate modelling respectively. Alongside these present climate simulations a parallel ensemble looking at extremes of future climate was run, using a time-slice methodology to consider conditions at the end of this century. These simulations were primarily performed using a 144 million core hour, single year grant of computing time from PRACE (the Partnership for Advanced Computing in Europe) in 2012, with additional resources supplied by the Natural Environmental Research Council (NERC) and the Met Office. Almost 400 terabytes of simulation data were generated on the HERMIT supercomputer at the high performance computing center Stuttgart (HLRS), and transferred to the JASMIN super-data cluster provided by the Science and Technology Facilities Council Centre for Data Archival (STFC CEDA) for analysis and storage. In this paper we describe the implementation of the project, present the technical challenges in terms of optimisation, data output, transfer and storage that such a project involves and include details of the model configuration and the composition of the UPSCALE dataset. This dataset is available for scientific analysis to allow assessment of the value of model resolution in both present and potential future climate conditions.
Bridging Archival Standards: Building Software to Translate Metadata Between PDS3 and PDS4
NASA Astrophysics Data System (ADS)
De Cesare, C. M.; Padams, J. H.
2018-04-01
Transitioning datasets from PDS3 to PDS4 requires manual and detail-oriented work. To increase efficiency and reduce human error, we've built the Label Mapping Tool, which compares a PDS3 label to a PDS4 label template and outputs mappings between the two.
NASA Astrophysics Data System (ADS)
Klehmet, K.; Rockel, B.
2012-04-01
The analysis of long-term changes and variability of climate variables for the large areal extent of Siberia - covering arctic, subarctic and temperate northern latitudes - is hampered by the sparseness of in-situ observations. To counteract this deficiency we aimed to provide a reconstruction of regional climate for the period 1948-2010 getting homogenous, consistent fields of various terrestrial and atmospheric parameters for Siberia. In order to obtain in addition a higher temporal and spatial resolution than global datasets can provide, we performed the reconstruction using the regional climate model COSMO-CLM (climate mode of the limited area model COSMO developed by the German weather service). However, the question arises whether the dynamically downscaled data of reanalysis can improve the representation of recent climate conditions. As global forcing for the initialization and the regional boundaries we use NCEP-1 Reanalysis of the National Centers for Environmental Prediction since it has the longest temporal data coverage among the reanalysis products. Additionally, spectral nudging is applied to prevent the regional model from deviating from the prescribed large-scale circulation within the whole simulation domain. The area of interest covers a region in Siberia, spanning from the Laptev Sea and Kara Sea to Northern Mongolia and from the West Siberian Lowland to the border of Sea of Okhotsk. The current horizontal resolution is of about 50 km which is planned to be increased to 25 km. To answer the question, we investigate spatial and temporal characteristics of temperature and precipitation of the model output in comparison to global reanalysis data (NCEP-1, ERA40, ERA-Interim). As reference Russian station data from the "Global Summary of the Day" data set, provided by NCDC, is used. Temperature is analyzed with respect to its climatologically spatial patterns across the model domain and its variability of extremes based on climate indices derived from daily mean, maximum, minimum temperature (e.g. frost days) for different subregions. The decreasing number of frost days from north to south of the region, calculated from the reanalysis datasets and COSMO-CLM output, indicates the temperature gradient from the arctic to temperate latitudes. For most of the considered subregions NCEP-1 shows more frost days than ERA-Interim and COSMO-CLM.
Figueroa-Torres, Gonzalo M; Pittman, Jon K; Theodoropoulos, Constantinos
2017-10-01
Microalgal starch and lipids, carbon-based storage molecules, are useful as potential biofuel feedstocks. In this work, cultivation strategies maximising starch and lipid formation were established by developing a multi-parameter kinetic model describing microalgal growth as well as starch and lipid formation, in conjunction with laboratory-scale experiments. Growth dynamics are driven by nitrogen-limited mixotrophic conditions, known to increase cellular starch and lipid contents whilst enhancing biomass growth. Model parameters were computed by fitting model outputs to a range of experimental datasets from batch cultures of Chlamydomonas reinhardtii. Predictive capabilities of the model were established against different experimental data. The model was subsequently used to compute optimal nutrient-based cultivation strategies in terms of initial nitrogen and carbon concentrations. Model-based optimal strategies yielded a significant increase of 261% for starch (0.065gCL -1 ) and 66% for lipid (0.08gCL -1 ) production compared to base-case conditions (0.018gCL -1 starch, 0.048gCL -1 lipids). Copyright © 2017 Elsevier Ltd. All rights reserved.
Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston
2016-10-28
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
Kar, Subrata; Majumder, D Dutta
2017-08-01
Investigation of brain cancer can detect the abnormal growth of tissue in the brain using computed tomography (CT) scans and magnetic resonance (MR) images of patients. The proposed method classifies brain cancer on shape-based feature extraction as either benign or malignant. The authors used input variables such as shape distance (SD) and shape similarity measure (SSM) in fuzzy tools, and used fuzzy rules to evaluate the risk status as an output variable. We presented a classifier neural network system (NNS), namely Levenberg-Marquardt (LM), which is a feed-forward back-propagation learning algorithm used to train the NN for the status of brain cancer, if any, and which achieved satisfactory performance with 100% accuracy. The proposed methodology is divided into three phases. First, we find the region of interest (ROI) in the brain to detect the tumors using CT and MR images. Second, we extract the shape-based features, like SD and SSM, and grade the brain tumors as benign or malignant with the concept of SD function and SSM as shape-based parameters. Third, we classify the brain cancers using neuro-fuzzy tools. In this experiment, we used a 16-sample database with SSM (μ) values and classified the benignancy or malignancy of the brain tumor lesions using the neuro-fuzzy system (NFS). We have developed a fuzzy expert system (FES) and NFS for early detection of brain cancer from CT and MR images. In this experiment, shape-based features, such as SD and SSM, were extracted from the ROI of brain tumor lesions. These shape-based features were considered as input variables and, using fuzzy rules, we were able to evaluate brain cancer risk values for each case. We used an NNS with LM, a feed-forward back-propagation learning algorithm, as a classifier for the diagnosis of brain cancer and achieved satisfactory performance with 100% accuracy. The proposed network was trained with MR image datasets of 16 cases. The 16 cases were fed to the ANN with 2 input neurons, one hidden layer of 10 neurons and 2 output neurons. Of the 16-sample database, 10 datasets for training, 3 datasets for validation, and 3 datasets for testing were used in the ANN classification system. From the SSM (µ) confusion matrix, the number of output datasets of true positive, false positive, true negative and false negative was 6, 0, 10, and 0, respectively. The sensitivity, specificity and accuracy were each equal to 100%. The method of diagnosing brain cancer presented in this study is a successful model to assist doctors in the screening and treatment of brain cancer patients. The presented FES successfully identified the presence of brain cancer in CT and MR images using the extracted shape-based features and the use of NFS for the identification of brain cancer in the early stages. From the analysis and diagnosis of the disease, the doctors can decide the stage of cancer and take the necessary steps for more accurate treatment. Here, we have presented an investigation and comparison study of the shape-based feature extraction method with the use of NFS for classifying brain tumors as showing normal or abnormal patterns. The results have proved that the shape-based features with the use of NFS can achieve a satisfactory performance with 100% accuracy. We intend to extend this methodology for the early detection of cancer in other regions such as the prostate region and human cervix.
The use of hierarchical clustering for the design of optimized monitoring networks
NASA Astrophysics Data System (ADS)
Soares, Joana; Makar, Paul Andrew; Aklilu, Yayne; Akingunola, Ayodeji
2018-05-01
Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 - R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different sampling methodologies as well as outliers (stations' time series which are markedly different from all others in a given dataset).
NASA Technical Reports Server (NTRS)
Jonathan L. Case; Kumar, Sujay V.; Srikishen, Jayanthi; Jedlovec, Gary J.
2010-01-01
One of the most challenging weather forecast problems in the southeastern U.S. is daily summertime pulse-type convection. During the summer, atmospheric flow and forcing are generally weak in this region; thus, convection typically initiates in response to local forcing along sea/lake breezes, and other discontinuities often related to horizontal gradients in surface heating rates. Numerical simulations of pulse convection usually have low skill, even in local predictions at high resolution, due to the inherent chaotic nature of these precipitation systems. Forecast errors can arise from assumptions within parameterization schemes, model resolution limitations, and uncertainties in both the initial state of the atmosphere and land surface variables such as soil moisture and temperature. For this study, it is hypothesized that high-resolution, consistent representations of surface properties such as soil moisture, soil temperature, and sea surface temperature (SST) are necessary to better simulate the interactions between the surface and atmosphere, and ultimately improve predictions of summertime pulse convection. This paper describes a sensitivity experiment using the Weather Research and Forecasting (WRF) model. Interpolated land and ocean surface fields from a large-scale model are replaced with high-resolution datasets provided by unique NASA assets in an experimental simulation: the Land Information System (LIS) and Moderate Resolution Imaging Spectroradiometer (MODIS) SSTs. The LIS is run in an offline mode for several years at the same grid resolution as the WRF model to provide compatible land surface initial conditions in an equilibrium state. The MODIS SSTs provide detailed analyses of SSTs over the oceans and large lakes compared to current operational products. The WRF model runs initialized with the LIS+MODIS datasets result in a reduction in the overprediction of rainfall areas; however, the skill is almost equally as low in both experiments using traditional verification methodologies. Output from object-based verification within NCAR s Meteorological Evaluation Tools reveals that the WRF runs initialized with LIS+MODIS data consistently generated precipitation objects that better matched observed precipitation objects, especially at higher precipitation intensities. The LIS+MODIS runs produced on average a 4% increase in matched precipitation areas and a simultaneous 4% decrease in unmatched areas during three months of daily simulations.
An automated system to simulate the River discharge in Kyushu Island using the H08 model
NASA Astrophysics Data System (ADS)
Maji, A.; Jeon, J.; Seto, S.
2015-12-01
Kyushu Island is located in southwestern part of Japan, and it is often affected by typhoons and a Baiu front. There have been severe water-related disasters recorded in Kyushu Island. On the other hand, because of high population density and for crop growth, water resource is an important issue of Kyushu Island.The simulation of river discharge is important for water resource management and early warning of water-related disasters. This study attempts to apply H08 model to simulate river discharge in Kyushu Island. Geospatial meteorological and topographical data were obtained from Japanese Ministry of Land, Infrastructure, Transport and Tourism (MLIT) and Automated Meteorological Data Acquisition System (AMeDAS) of Japan Meteorological Agency (JMA). The number of the observation stations of AMeDAS is limited and is not quite satisfactory for the application of water resources models in Kyushu. It is necessary to spatially interpolate the point data to produce grid dataset. Meteorological grid dataset is produced by considering elevation dependence. Solar radiation is estimated from hourly sunshine duration by a conventional formula. We successfully improved the accuracy of interpolated data just by considering elevation dependence and found out that the bias is related to geographical location. The rain/snow classification is done by H08 model and is validated by comparing estimated and observed snow rate. The estimates tend to be larger than the corresponding observed values. A system to automatically produce daily meteorological grid dataset is being constructed.The geospatial river network data were produced by ArcGIS and they were utilized in the H08 model to simulate the river discharge. Firstly, this research is to compare simulated and measured specific discharge, which is the ratio of discharge to watershed area. Significant error between simulated and measured data were seen in some rivers. Secondly, the outputs by the coupled model including crop growth module and reservoir operation module were analyzed. However, there are differences between the simulated value and measured value. We need to improve dam operation, artificial water intake, and parameters such as depth of the soil and flow velocity in the river.
Real-time data for estimating a forward-looking interest rate rule of the ECB.
Bletzinger, Tilman; Wieland, Volker
2017-12-01
The purpose of the data presented in this article is to use it in ex post estimations of interest rate decisions by the European Central Bank (ECB), as it is done by Bletzinger and Wieland (2017) [1]. The data is of quarterly frequency from 1999 Q1 until 2013 Q2 and consists of the ECB's policy rate, inflation rate, real output growth and potential output growth in the euro area. To account for forward-looking decision making in the interest rate rule, the data consists of expectations about future inflation and output dynamics. While potential output is constructed based on data from the European Commission's annual macro-economic database, inflation and real output growth are taken from two different sources both provided by the ECB: the Survey of Professional Forecasters and projections made by ECB staff. Careful attention was given to the publication date of the collected data to ensure a real-time dataset only consisting of information which was available to the decision makers at the time of the decision.
NASA Astrophysics Data System (ADS)
Ota, Shunsuke; Deguchi, Daisuke; Kitasaka, Takayuki; Mori, Kensaku; Suenaga, Yasuhito; Hasegawa, Yoshinori; Imaizumi, Kazuyoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi
2008-03-01
This paper presents a method for automated anatomical labeling of bronchial branches (ALBB) extracted from 3D CT datasets. The proposed method constructs classifiers that output anatomical names of bronchial branches by employing the machine-learning approach. We also present its application to a bronchoscopy guidance system. Since the bronchus has a complex tree structure, bronchoscopists easily tend to get disoriented and lose the way to a target location. A bronchoscopy guidance system is strongly expected to be developed to assist bronchoscopists. In such guidance system, automated presentation of anatomical names is quite useful information for bronchoscopy. Although several methods for automated ALBB were reported, most of them constructed models taking only variations of branching patterns into account and did not consider those of running directions. Since the running directions of bronchial branches differ greatly in individuals, they could not perform ALBB accurately when running directions of bronchial branches were different from those of models. Our method tries to solve such problems by utilizing the machine-learning approach. Actual procedure consists of three steps: (a) extraction of bronchial tree structures from 3D CT datasets, (b) construction of classifiers using the multi-class AdaBoost technique, and (c) automated classification of bronchial branches by using the constructed classifiers. We applied the proposed method to 51 cases of 3D CT datasets. The constructed classifiers were evaluated by leave-one-out scheme. The experimental results showed that the proposed method could assign correct anatomical names to bronchial branches of 89.1% up to segmental lobe branches. Also, we confirmed that it was quite useful to assist the bronchoscopy by presenting anatomical names of bronchial branches on real bronchoscopic views.
ClimateSpark: An In-memory Distributed Computing Framework for Big Climate Data Analytics
NASA Astrophysics Data System (ADS)
Hu, F.; Yang, C. P.; Duffy, D.; Schnase, J. L.; Li, Z.
2016-12-01
Massive array-based climate data is being generated from global surveillance systems and model simulations. They are widely used to analyze the environment problems, such as climate changes, natural hazards, and public health. However, knowing the underlying information from these big climate datasets is challenging due to both data- and computing- intensive issues in data processing and analyzing. To tackle the challenges, this paper proposes ClimateSpark, an in-memory distributed computing framework to support big climate data processing. In ClimateSpark, the spatiotemporal index is developed to enable Apache Spark to treat the array-based climate data (e.g. netCDF4, HDF4) as native formats, which are stored in Hadoop Distributed File System (HDFS) without any preprocessing. Based on the index, the spatiotemporal query services are provided to retrieve dataset according to a defined geospatial and temporal bounding box. The data subsets will be read out, and a data partition strategy will be applied to equally split the queried data to each computing node, and store them in memory as climateRDDs for processing. By leveraging Spark SQL and User Defined Function (UDFs), the climate data analysis operations can be conducted by the intuitive SQL language. ClimateSpark is evaluated by two use cases using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. One use case is to conduct the spatiotemporal query and visualize the subset results in animation; the other one is to compare different climate model outputs using Taylor-diagram service. Experimental results show that ClimateSpark can significantly accelerate data query and processing, and enable the complex analysis services served in the SQL-style fashion.
Feature weighting using particle swarm optimization for learning vector quantization classifier
NASA Astrophysics Data System (ADS)
Dongoran, A.; Rahmadani, S.; Zarlis, M.; Zakarias
2018-03-01
This paper discusses and proposes a method of feature weighting in classification assignments on competitive learning artificial neural network LVQ. The weighting feature method is the search for the weight of an attribute using the PSO so as to give effect to the resulting output. This method is then applied to the LVQ-Classifier and tested on the 3 datasets obtained from the UCI Machine Learning repository. Then an accuracy analysis will be generated by two approaches. The first approach using LVQ1, referred to as LVQ-Classifier and the second approach referred to as PSOFW-LVQ, is a proposed model. The result shows that the PSO algorithm is capable of finding attribute weights that increase LVQ-classifier accuracy.
Lamberto, Giuliano; Martelli, Saulo; Cappozzo, Aurelio; Mazzà, Claudia
2017-09-06
Musculoskeletal models are widely used to estimate joint kinematics, intersegmental loads, and muscle and joint contact forces during movement. These estimates can be heavily affected by the soft tissue artefact (STA) when input positional data are obtained using stereophotogrammetry, but this aspect has not yet been fully characterised for muscle and joint forces. This study aims to assess the sensitivity to the STA of three open-source musculoskeletal models, implemented in OpenSim. A baseline dataset of marker trajectories was created for each model from experimental data of one healthy volunteer. Five hundred STA realizations were then statistically generated using a marker-dependent model of the pelvis and lower limb artefact and added to the baseline data. The STA׳s impact on the musculoskeletal model estimates was finally quantified using a Monte Carlo analysis. The modelled STA distributions were in line with the literature. Observed output variations were comparable across the three models, and sensitivity to the STA was evident for most investigated quantities. Shape, magnitude and timing of the joint angle and moment time histories were not significantly affected throughout the entire gait cycle, whereas magnitude variations were observed for muscle and joint forces. Ranges of contact force variations differed between joints, with hip variations up to 1.8 times body weight observed. Variations of more than 30% were observed for some of the muscle forces. In conclusion, musculoskeletal simulations using stereophotogrammetry may be safely run when only interested in overall output patterns. Caution should be paid when more accurate estimated values are needed. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Hg concentrations in fish from coastal waters of California and Western North America
Davis, Jay; Ross, John; Bezalel, Shira; Sim, Lawrence; Bonnema, Autumn; Ichikawa, Gary; Heim, Wes; Schiff, Kenneth C; Eagles-Smith, Collin A.; Ackerman, Joshua T.
2016-01-01
The State of California conducted an extensive and systematic survey of mercury (Hg) in fish from the California coast in 2009 and 2010. The California survey sampled 3483 fish representing 46 species at 68 locations, and demonstrated that methylHg in fish presents a widespread exposure risk to fish consumers. Most of the locations sampled (37 of 68) had a species with an average concentration above 0.3 μg/g wet weight (ww), and 10 locations an average above 1.0 μg/g ww. The recent and robust dataset from California provided a basis for a broader examination of spatial and temporal patterns in fish Hg in coastal waters of Western North America. There is a striking lack of data in publicly accessible databases on Hg and other contaminants in coastal fish. An assessment of the raw data from these databases suggested the presence of relatively high concentrations along the California coast and in Puget Sound, and relatively low concentrations along the coasts of Alaska and Oregon, and the outer coast of Washington. The dataset suggests that Hg concentrations of public health concern can be observed at any location on the coast of Western North America where long-lived predator species are sampled. Output from a linear mixed-effects model resembled the spatial pattern observed for the raw data and suggested, based on the limited dataset, a lack of trend in fish Hg over the nearly 30-year period covered by the dataset. Expanded and continued monitoring, accompanied by rigorous data management procedures, would be of great value in characterizing methylHg exposure, and tracking changes in contamination of coastal fish in response to possible increases in atmospheric Hg emissions in Asia, climate change, and terrestrial Hg control efforts in coastal watersheds.
NASA Astrophysics Data System (ADS)
Rock, Gilles; Fischer, Kim; Schlerf, Martin; Gerhards, Max; Udelhoven, Thomas
2017-04-01
The development and optimization of image processing algorithms requires the availability of datasets depicting every step from earth surface to the sensor's detector. The lack of ground truth data obliges to develop algorithms on simulated data. The simulation of hyperspectral remote sensing data is a useful tool for a variety of tasks such as the design of systems, the understanding of the image formation process, and the development and validation of data processing algorithms. An end-to-end simulator has been set up consisting of a forward simulator, a backward simulator and a validation module. The forward simulator derives radiance datasets based on laboratory sample spectra, applies atmospheric contributions using radiative transfer equations, and simulates the instrument response using configurable sensor models. This is followed by the backward simulation branch, consisting of an atmospheric correction (AC), a temperature and emissivity separation (TES) or a hybrid AC and TES algorithm. An independent validation module allows the comparison between input and output dataset and the benchmarking of different processing algorithms. In this study, hyperspectral thermal infrared scenes of a variety of surfaces have been simulated to analyze existing AC and TES algorithms. The ARTEMISS algorithm was optimized and benchmarked against the original implementations. The errors in TES were found to be related to incorrect water vapor retrieval. The atmospheric characterization could be optimized resulting in increasing accuracies in temperature and emissivity retrieval. Airborne datasets of different spectral resolutions were simulated from terrestrial HyperCam-LW measurements. The simulated airborne radiance spectra were subjected to atmospheric correction and TES and further used for a plant species classification study analyzing effects related to noise and mixed pixels.
NASA Technical Reports Server (NTRS)
Case, Jonathan L.; White, Kristopher D.
2014-01-01
The NASA Short-term Prediction Research and Transition (SPoRT) Center in Huntsville, AL (Jedlovec 2013; Ralph et al. 2013; Merceret et al. 2013) is running a real-time configuration of the Noah land surface model (LSM) within the NASA Land Information System (LIS) framework (hereafter referred to as the "SPoRT-LIS"). Output from the real-time SPoRT-LIS is used for (1) initializing land surface variables for local modeling applications, and (2) displaying in decision support systems for situational awareness and drought monitoring at select NOAA/National Weather Service (NWS) partner offices. The SPoRT-LIS is currently run over a domain covering the southeastern half of the Continental United States (CONUS), with an additional experimental real-time run over the entire CONUS and surrounding portions of southern Canada and northern Mexico. The experimental CONUS run incorporates hourly quantitative precipitation estimation (QPE) from the National Severe Storms Laboratory Multi- Radar Multi-Sensor (MRMS) product (Zhang et al. 2011, 2014), which will be transitioned into operations at the National Centers for Environmental Prediction (NCEP) in Fall 2014. This paper describes the current and experimental SPoRT-LIS configurations, and documents some of the limitations still remaining through the advent of MRMS precipitation analyses in the SPoRT-LIS land surface model (LSM) simulations. Section 2 gives background information on the NASA LIS and describes the realtime SPoRT-LIS configurations being compared. Section 3 presents recent work done to develop a training module on situational awareness applications of real-time SPoRT-LIS output. Comparisons between output from the two SPoRT-LIS runs are shown in Section 4, including a documentation of issues encountered in using the MRMS precipitation dataset. A summary and future work in given in Section 5, followed by acknowledgements and references.
NASA Technical Reports Server (NTRS)
Coy, James; Schultz, Christopher J.; Case, Jonathan L.
2017-01-01
Can we use modeled information of the land surface and characteristics of lightning beyond flash occurrence to increase the identification and prediction of wildfires? Combine observed cloud-to-ground (CG) flashes with real-time land surface model output, and Compare data with areas where lightning did not start a wildfire to determine what land surface conditions and lightning characteristics were responsible for causing wildfires. Statistical differences between suspected fire-starters and non-fire-starters were peak-current dependent 0-10 cm Volumetric and Relative Soil Moisture comparisons were statistically dependent to at least the p = 0.05 independence level for both polarity flash types Suspected fire-starters typically occurred in areas of lower soil moisture than non-fire-starters. GVF value comparisons were only found to be statistically dependent for -CG flashes. However, random sampling of the -CG non-fire starter dataset revealed that this relationship may not always hold.
These files contain the environmental data as particular emissions or resources associated with a BEA sectors that are used in the USEEIO model. They are organized by the emission or resources type, as described in the manuscript. The main files (without SI) show the final satellite tables in the 'Exchanges' sheet which have emissions or resource use per USD for 2013. The other sheets in these files provide meta data for the create of the tables, including general information, sources, etc. The 'export' sheet is used for saving the satellite table for csv export. The data dictionary describes the fields in this sheet. The supporting files provide all the details data transformation and organization for the development of the satellite tables.This dataset is associated with the following publication:Yang, Y., W. Ingwersen, T. Hawkins, and D. Meyer. USEEIO: a New and Transparent United States Environmentally Extended Input-Output Model. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA,
Mei, Suyu
2012-10-07
Recent years have witnessed much progress in computational modeling for protein subcellular localization. However, there are far few computational models for predicting plant protein subcellular multi-localization. In this paper, we propose a multi-label multi-kernel transfer learning model for predicting multiple subcellular locations of plant proteins (MLMK-TLM). The method proposes a multi-label confusion matrix and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which we further extend our published work MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for plant protein subcellular multi-localization. By proper homolog knowledge transfer, MLMK-TLM is applicable to novel plant protein subcellular localization in multi-label learning scenario. The experiments on plant protein benchmark dataset show that MLMK-TLM outperforms the baseline model. Unlike the existing models, MLMK-TLM also reports its misleading tendency, which is important for comprehensive survey of model's multi-labeling performance. Copyright © 2012 Elsevier Ltd. All rights reserved.
Bayesian stock assessment of Pacific herring in Prince William Sound, Alaska.
Muradian, Melissa L; Branch, Trevor A; Moffitt, Steven D; Hulson, Peter-John F
2017-01-01
The Pacific herring (Clupea pallasii) population in Prince William Sound, Alaska crashed in 1993 and has yet to recover, affecting food web dynamics in the Sound and impacting Alaskan communities. To help researchers design and implement the most effective monitoring, management, and recovery programs, a Bayesian assessment of Prince William Sound herring was developed by reformulating the current model used by the Alaska Department of Fish and Game. The Bayesian model estimated pre-fishery spawning biomass of herring age-3 and older in 2013 to be a median of 19,410 mt (95% credibility interval 12,150-31,740 mt), with a 54% probability that biomass in 2013 was below the management limit used to regulate fisheries in Prince William Sound. The main advantages of the Bayesian model are that it can more objectively weight different datasets and provide estimates of uncertainty for model parameters and outputs, unlike the weighted sum-of-squares used in the original model. In addition, the revised model could be used to manage herring stocks with a decision rule that considers both stock status and the uncertainty in stock status.
DeltaSA tool for source apportionment benchmarking, description and sensitivity analysis
NASA Astrophysics Data System (ADS)
Pernigotti, D.; Belis, C. A.
2018-05-01
DeltaSA is an R-package and a Java on-line tool developed at the EC-Joint Research Centre to assist and benchmark source apportionment applications. Its key functionalities support two critical tasks in this kind of studies: the assignment of a factor to a source in factor analytical models (source identification) and the model performance evaluation. The source identification is based on the similarity between a given factor and source chemical profiles from public databases. The model performance evaluation is based on statistical indicators used to compare model output with reference values generated in intercomparison exercises. The references values are calculated as the ensemble average of the results reported by participants that have passed a set of testing criteria based on chemical profiles and time series similarity. In this study, a sensitivity analysis of the model performance criteria is accomplished using the results of a synthetic dataset where "a priori" references are available. The consensus modulated standard deviation punc gives the best choice for the model performance evaluation when a conservative approach is adopted.
Bayesian stock assessment of Pacific herring in Prince William Sound, Alaska
Moffitt, Steven D.; Hulson, Peter-John F.
2017-01-01
The Pacific herring (Clupea pallasii) population in Prince William Sound, Alaska crashed in 1993 and has yet to recover, affecting food web dynamics in the Sound and impacting Alaskan communities. To help researchers design and implement the most effective monitoring, management, and recovery programs, a Bayesian assessment of Prince William Sound herring was developed by reformulating the current model used by the Alaska Department of Fish and Game. The Bayesian model estimated pre-fishery spawning biomass of herring age-3 and older in 2013 to be a median of 19,410 mt (95% credibility interval 12,150–31,740 mt), with a 54% probability that biomass in 2013 was below the management limit used to regulate fisheries in Prince William Sound. The main advantages of the Bayesian model are that it can more objectively weight different datasets and provide estimates of uncertainty for model parameters and outputs, unlike the weighted sum-of-squares used in the original model. In addition, the revised model could be used to manage herring stocks with a decision rule that considers both stock status and the uncertainty in stock status. PMID:28222151
NASA Astrophysics Data System (ADS)
Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi
2017-07-01
We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.
Quantile Mapping Bias correction for daily precipitation over Vietnam in a regional climate model
NASA Astrophysics Data System (ADS)
Trinh, L. T.; Matsumoto, J.; Ngo-Duc, T.
2017-12-01
In the past decades, Regional Climate Models (RCMs) have been developed significantly, allowing climate simulation to be conducted at a higher resolution. However, RCMs often contained biases when comparing with observations. Therefore, statistical correction methods were commonly employed to reduce/minimize the model biases. In this study, outputs of the Regional Climate Model (RegCM) version 4.3 driven by the CNRM-CM5 global products were evaluated with and without the Quantile Mapping (QM) bias correction method. The model domain covered the area from 90oE to 145oE and from 15oS to 40oN with a horizontal resolution of 25km. The QM bias correction processes were implemented by using the Vietnam Gridded precipitation dataset (VnGP) and the outputs of RegCM historical run in the period 1986-1995 and then validated for the period 1996-2005. Based on the statistical quantity of spatial correlation and intensity distributions, the QM method showed a significant improvement in rainfall compared to the non-bias correction method. The improvements both in time and space were recognized in all seasons and all climatic sub-regions of Vietnam. Moreover, not only the rainfall amount but also some extreme indices such as R10m, R20mm, R50m, CDD, CWD, R95pTOT, R99pTOT were much better after the correction. The results suggested that the QM correction method should be taken into practice for the projections of the future precipitation over Vietnam.
Performance of the CORDEX regional climate models in simulating offshore wind and wind potential
NASA Astrophysics Data System (ADS)
Kulkarni, Sumeet; Deo, M. C.; Ghosh, Subimal
2018-03-01
This study is oriented towards quantification of the skill addition by regional climate models (RCMs) in the parent general circulation models (GCMs) while simulating wind speed and wind potential with particular reference to the Indian offshore region. To arrive at a suitable reference dataset, the performance of wind outputs from three different reanalysis datasets is evaluated. The comparison across the RCMs and their corresponding parent GCMs is done on the basis of annual/seasonal wind statistics, intermodel bias, wind climatology, and classes of wind potential. It was observed that while the RCMs could simulate spatial variability of winds, well for certain subregions, they generally failed to replicate the overall spatial pattern, especially in monsoon and winter. Various causes of biases in RCMs were determined by assessing corresponding maps of wind vectors, surface temperature, and sea-level pressure. The results highlight the necessity to carefully assess the RCM-yielded winds before using them for sensitive applications such as coastal vulnerability and hazard assessment. A supplementary outcome of this study is in form of wind potential atlas, based on spatial distribution of wind classes. This could be beneficial in suitably identifying viable subregions for developing offshore wind farms by intercomparing both the RCM and GCM outcomes. It is encouraging that most of the RCMs and GCMs indicate that around 70% of the Indian offshore locations in monsoon would experience mean wind potential greater than 200 W/m2.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lekadir, Karim, E-mail: karim.lekadir@upf.edu; Hoogendoorn, Corné; Armitage, Paul
Purpose: This paper presents a statistical approach for the prediction of trabecular bone parameters from low-resolution multisequence magnetic resonance imaging (MRI) in children, thus addressing the limitations of high-resolution modalities such as HR-pQCT, including the significant exposure of young patients to radiation and the limited applicability of such modalities to peripheral bones in vivo. Methods: A statistical predictive model is constructed from a database of MRI and HR-pQCT datasets, to relate the low-resolution MRI appearance in the cancellous bone to the trabecular parameters extracted from the high-resolution images. The description of the MRI appearance is achieved between subjects by usingmore » a collection of feature descriptors, which describe the texture properties inside the cancellous bone, and which are invariant to the geometry and size of the trabecular areas. The predictive model is built by fitting to the training data a nonlinear partial least square regression between the input MRI features and the output trabecular parameters. Results: Detailed validation based on a sample of 96 datasets shows correlations >0.7 between the trabecular parameters predicted from low-resolution multisequence MRI based on the proposed statistical model and the values extracted from high-resolution HRp-QCT. Conclusions: The obtained results indicate the promise of the proposed predictive technique for the estimation of trabecular parameters in children from multisequence MRI, thus reducing the need for high-resolution radiation-based scans for a fragile population that is under development and growth.« less
NASA Astrophysics Data System (ADS)
Mendoza, Pablo A.; Mizukami, Naoki; Ikeda, Kyoko; Clark, Martyn P.; Gutmann, Ethan D.; Arnold, Jeffrey R.; Brekke, Levi D.; Rajagopalan, Balaji
2016-10-01
We examine the effects of regional climate model (RCM) horizontal resolution and forcing scaling (i.e., spatial aggregation of meteorological datasets) on the portrayal of climate change impacts. Specifically, we assess how the above decisions affect: (i) historical simulation of signature measures of hydrologic behavior, and (ii) projected changes in terms of annual water balance and hydrologic signature measures. To this end, we conduct our study in three catchments located in the headwaters of the Colorado River basin. Meteorological forcings for current and a future climate projection are obtained at three spatial resolutions (4-, 12- and 36-km) from dynamical downscaling with the Weather Research and Forecasting (WRF) regional climate model, and hydrologic changes are computed using four different hydrologic model structures. These projected changes are compared to those obtained from running hydrologic simulations with current and future 4-km WRF climate outputs re-scaled to 12- and 36-km. The results show that the horizontal resolution of WRF simulations heavily affects basin-averaged precipitation amounts, propagating into large differences in simulated signature measures across model structures. The implications of re-scaled forcing datasets on historical performance were primarily observed on simulated runoff seasonality. We also found that the effects of WRF grid resolution on projected changes in mean annual runoff and evapotranspiration may be larger than the effects of hydrologic model choice, which surpasses the effects from re-scaled forcings. Scaling effects on projected variations in hydrologic signature measures were found to be generally smaller than those coming from WRF resolution; however, forcing aggregation in many cases reversed the direction of projected changes in hydrologic behavior.
A prediction model of short-term ionospheric foF2 based on AdaBoost
NASA Astrophysics Data System (ADS)
Zhao, Xiukuan; Ning, Baiqi; Liu, Libo; Song, Gangbing
2014-02-01
In this paper, the AdaBoost-BP algorithm is used to construct a new model to predict the critical frequency of the ionospheric F2-layer (foF2) one hour ahead. Different indices were used to characterize ionospheric diurnal and seasonal variations and their dependence on solar and geomagnetic activity. These indices, together with the current observed foF2 value, were input into the prediction model and the foF2 value at one hour ahead was output. We analyzed twenty-two years' foF2 data from nine ionosonde stations in the East-Asian sector in this work. The first eleven years' data were used as a training dataset and the second eleven years' data were used as a testing dataset. The results show that the performance of AdaBoost-BP is better than those of BP Neural Network (BPNN), Support Vector Regression (SVR) and the IRI model. For example, the AdaBoost-BP prediction absolute error of foF2 at Irkutsk station (a middle latitude station) is 0.32 MHz, which is better than 0.34 MHz from BPNN, 0.35 MHz from SVR and also significantly outperforms the IRI model whose absolute error is 0.64 MHz. Meanwhile, AdaBoost-BP prediction absolute error at Taipei station from the low latitude is 0.78 MHz, which is better than 0.81 MHz from BPNN, 0.81 MHz from SVR and 1.37 MHz from the IRI model. Finally, the variety characteristics of the AdaBoost-BP prediction error along with seasonal variation, solar activity and latitude variation were also discussed in the paper.
Two statistics for evaluating parameter identifiability and error reduction
Doherty, John; Hunt, Randall J.
2009-01-01
Two statistics are presented that can be used to rank input parameters utilized by a model in terms of their relative identifiability based on a given or possible future calibration dataset. Identifiability is defined here as the capability of model calibration to constrain parameters used by a model. Both statistics require that the sensitivity of each model parameter be calculated for each model output for which there are actual or presumed field measurements. Singular value decomposition (SVD) of the weighted sensitivity matrix is then undertaken to quantify the relation between the parameters and observations that, in turn, allows selection of calibration solution and null spaces spanned by unit orthogonal vectors. The first statistic presented, "parameter identifiability", is quantitatively defined as the direction cosine between a parameter and its projection onto the calibration solution space. This varies between zero and one, with zero indicating complete non-identifiability and one indicating complete identifiability. The second statistic, "relative error reduction", indicates the extent to which the calibration process reduces error in estimation of a parameter from its pre-calibration level where its value must be assigned purely on the basis of prior expert knowledge. This is more sophisticated than identifiability, in that it takes greater account of the noise associated with the calibration dataset. Like identifiability, it has a maximum value of one (which can only be achieved if there is no measurement noise). Conceptually it can fall to zero; and even below zero if a calibration problem is poorly posed. An example, based on a coupled groundwater/surface-water model, is included that demonstrates the utility of the statistics. ?? 2009 Elsevier B.V.
Kahmann, A; Anzanello, M J; Fogliatto, F S; Marcelo, M C A; Ferrão, M F; Ortiz, R S; Mariotti, K C
2018-04-15
Street cocaine is typically altered with several compounds that increase its harmful health-related side effects, most notably depression, convulsions, and severe damages to the cardiovascular system, lungs, and brain. Thus, determining the concentration of cocaine and adulterants in seized drug samples is important from both health and forensic perspectives. Although FTIR has been widely used to identify the fingerprint and concentration of chemical compounds, spectroscopy datasets are usually comprised of thousands of highly correlated wavenumbers which, when used as predictors in regression models, tend to undermine the predictive performance of multivariate techniques. In this paper, we propose an FTIR wavenumber selection method aimed at identifying FTIR spectra intervals that best predict the concentration of cocaine and adulterants (e.g. caffeine, phenacetin, levamisole, and lidocaine) in cocaine samples. For that matter, the Mutual Information measure is integrated into a Quadratic Programming problem with the objective of minimizing the probability of retaining redundant wavenumbers, while maximizing the relationship between retained wavenumbers and compounds' concentrations. Optimization outputs guide the order of inclusion of wavenumbers in a predictive model, using a forward-based wavenumber selection method. After the inclusion of each wavenumber, parameters of three alternative regression models are estimated, and each model's prediction error is assessed through the Mean Average Error (MAE) measure; the recommended subset of retained wavenumbers is the one that minimizes the prediction error with maximum parsimony. Using our propositions in a dataset of 115 cocaine samples we obtained a best prediction model with average MAE of 0.0502 while retaining only 2.29% of the original wavenumbers, increasing the predictive precision by 0.0359 when compared to a model using the complete set of wavenumbers as predictors. Copyright © 2018 Elsevier B.V. All rights reserved.
Fottrell, Edward; Byass, Peter; Berhane, Yemane
2008-03-25
As in any measurement process, a certain amount of error may be expected in routine population surveillance operations such as those in demographic surveillance sites (DSSs). Vital events are likely to be missed and errors made no matter what method of data capture is used or what quality control procedures are in place. The extent to which random errors in large, longitudinal datasets affect overall health and demographic profiles has important implications for the role of DSSs as platforms for public health research and clinical trials. Such knowledge is also of particular importance if the outputs of DSSs are to be extrapolated and aggregated with realistic margins of error and validity. This study uses the first 10-year dataset from the Butajira Rural Health Project (BRHP) DSS, Ethiopia, covering approximately 336,000 person-years of data. Simple programmes were written to introduce random errors and omissions into new versions of the definitive 10-year Butajira dataset. Key parameters of sex, age, death, literacy and roof material (an indicator of poverty) were selected for the introduction of errors based on their obvious importance in demographic and health surveillance and their established significant associations with mortality. Defining the original 10-year dataset as the 'gold standard' for the purposes of this investigation, population, age and sex compositions and Poisson regression models of mortality rate ratios were compared between each of the intentionally erroneous datasets and the original 'gold standard' 10-year data. The composition of the Butajira population was well represented despite introducing random errors, and differences between population pyramids based on the derived datasets were subtle. Regression analyses of well-established mortality risk factors were largely unaffected even by relatively high levels of random errors in the data. The low sensitivity of parameter estimates and regression analyses to significant amounts of randomly introduced errors indicates a high level of robustness of the dataset. This apparent inertia of population parameter estimates to simulated errors is largely due to the size of the dataset. Tolerable margins of random error in DSS data may exceed 20%. While this is not an argument in favour of poor quality data, reducing the time and valuable resources spent on detecting and correcting random errors in routine DSS operations may be justifiable as the returns from such procedures diminish with increasing overall accuracy. The money and effort currently spent on endlessly correcting DSS datasets would perhaps be better spent on increasing the surveillance population size and geographic spread of DSSs and analysing and disseminating research findings.
Climate Model Diagnostic Analyzer Web Service System
NASA Astrophysics Data System (ADS)
Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Kubar, T. L.; Li, J.; Zhang, J.; Wang, W.
2015-12-01
Both the National Research Council Decadal Survey and the latest Intergovernmental Panel on Climate Change Assessment Report stressed the need for the comprehensive and innovative evaluation of climate models with the synergistic use of global satellite observations in order to improve our weather and climate simulation and prediction capabilities. The abundance of satellite observations for fundamental climate parameters and the availability of coordinated model outputs from CMIP5 for the same parameters offer a great opportunity to understand and diagnose model biases in climate models. In addition, the Obs4MIPs efforts have created several key global observational datasets that are readily usable for model evaluations. However, a model diagnostic evaluation process requires physics-based multi-variable comparisons that typically involve large-volume and heterogeneous datasets, making them both computationally- and data-intensive. In response, we have developed a novel methodology to diagnose model biases in contemporary climate models and implementing the methodology as a web-service based, cloud-enabled, provenance-supported climate-model evaluation system. The evaluation system is named Climate Model Diagnostic Analyzer (CMDA), which is the product of the research and technology development investments of several current and past NASA ROSES programs. The current technologies and infrastructure of CMDA are designed and selected to address several technical challenges that the Earth science modeling and model analysis community faces in evaluating and diagnosing climate models. In particular, we have three key technology components: (1) diagnostic analysis methodology; (2) web-service based, cloud-enabled technology; (3) provenance-supported technology. The diagnostic analysis methodology includes random forest feature importance ranking, conditional probability distribution function, conditional sampling, and time-lagged correlation map. We have implemented the new methodology as web services and incorporated the system into the Cloud. We have also developed a provenance management system for CMDA where CMDA service semantics modeling, service search and recommendation, and service execution history management are designed and implemented.
Evaluation of nine popular de novo assemblers in microbial genome assembly.
Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher
2017-12-01
Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.
Murugaiyan, Jayaseelan; Eravci, Murat; Weise, Christoph; Roesler, Uwe
2017-06-01
Here, we provide the dataset associated with our research article 'label-free quantitative proteomic analysis of harmless and pathogenic strains of infectious microalgae, Prototheca spp.' (Murugaiyan et al., 2017) [1]. This dataset describes liquid chromatography-mass spectrometry (LC-MS)-based protein identification and quantification of a non-infectious strain, Prototheca zopfii genotype 1 and two strains associated with severe and mild infections, respectively, P. zopfii genotype 2 and Prototheca blaschkeae . Protein identification and label-free quantification was carried out by analysing MS raw data using the MaxQuant-Andromeda software suit. The expressional level differences of the identified proteins among the strains were computed using Perseus software and the results were presented in [1]. This DiB provides the MaxQuant output file and raw data deposited in the PRIDE repository with the dataset identifier PXD005305.
Phase 1 Free Air CO2 Enrichment Model-Data Synthesis (FACE-MDS): Meteorological Data
Norby, R. J.; Oren, R.; Boden, T. A. [Carbon Dioxide Information Analysis Center (CDIAC), Oak Ridge National Laboratory (ORNL); De Kauwe, M. G.; Kim, D.; Medlyn, B. E.; Riggs, J. S.; Tharp, M. L.; Walker, A. P.; Yang, B.; Zaehle, S.
2015-01-01
These datasets comprise the meteorological, CO2 and N deposition data used to run models for the Duke and Oak Ridge FACE experiments. Phase 1 datasets are reproduced here for posterity and reproducibility although these meteorological datasets are superseded by the Phase 2 datasets. If you would like to use the meteorological datasets to run your own model or for any other purpose please use the Phase 2 datasets.
NASA Astrophysics Data System (ADS)
Seamon, E.; Gessler, P. E.; Flathers, E.; Walden, V. P.
2014-12-01
As climate change and weather variability raise issues regarding agricultural production, agricultural sustainability has become an increasingly important component for farmland management (Fisher, 2005, Akinci, 2013). Yet with changes in soil quality, agricultural practices, weather, topography, land use, and hydrology - accurately modeling such agricultural outcomes has proven difficult (Gassman et al, 2007, Williams et al, 1995). This study examined agricultural sustainability and soil health over a heterogeneous multi-watershed area within the Inland Pacific Northwest of the United States (IPNW) - as part of a five year, USDA funded effort to explore the sustainability of cereal production systems (Regional Approaches to Climate Change for Pacific Northwest Agriculture - award #2011-68002-30191). In particular, crop growth and soil erosion were simulated across a spectrum of variables and time periods - using the CropSyst crop growth model (Stockle et al, 2002) and the Water Erosion Protection Project Model (WEPP - Flanagan and Livingston, 1995), respectively. A preliminary range of historical scenarios were run, using a high-resolution, 4km gridded dataset of surface meteorological variables from 1979-2010 (Abatzoglou, 2012). In addition, Coupled Model Inter-comparison Project (CMIP5) global climate model (GCM) outputs were used as input to run crop growth model and erosion future scenarios (Abatzoglou and Brown, 2011). To facilitate our integrated data analysis efforts, an agricultural sustainability web service architecture (THREDDS/Java/Python based) is under development, to allow for the programmatic uploading, sharing and processing of variable input data, running model simulations, as well as downloading and visualizing output results. The results of this study will assist in better understanding agricultural sustainability and erosion relationships in the IPNW, as well as provide a tangible server-based tool for use by researchers and farmers - for both small scale field examination, or more regionalized scenarios.
NASA Astrophysics Data System (ADS)
Wetterhall, F.; He, Y.; Cloke, H.; Pappenberger, F.; Freer, J.; Wilson, M.; McGregor, G.
2009-04-01
Local flooding events are often triggered by high-intensity rain-fall events, and it is important that these can be correctly modelled by Regional Climate Models (RCMs) if the results are to be used in climate impact assessment. In this study, daily precipitation from 16 RCMs was compared with observations over a meso-scale catchment in the Midlands Region of England. The RCM data was provided from the European research project ENSEMBLES and the precipitation data from the UK MetOffice. The RCMs were all driven by reanalysis data from the ERA40 dataset over the time period 1961-2000. The ENSEMBLES data is on the spatial scale of 25 x 25 km and it was disaggregated onto a 5 x 5 km grid over the catchment and compared with interpolated observational data with the same resolution. The mean precipitation was generally underestimated by the ENSEMBLES data, and the maximum and persistence of high intensity rainfall was even more underestimated. The inter-annual variability was not fully captured by the RCMs, and there was a systematic underestimation of precipitation during the autumn months. The spatial pattern in the modelled precipitation data was too smooth in comparison with the observed data, especially in the high altitudes in the western part of the catchment where the high precipitation usually occurs. The RCM outputs cannot reproduce the current high intensity precipitation events that are needed to sufficiently model extreme flood events. The results point out the discrepancy between climate model output and the high intensity precipitation input needs for hydrological impact modelling.
He, Yan-Lin; Xu, Yuan; Geng, Zhi-Qiang; Zhu, Qun-Xiong
2016-03-01
In this paper, a hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) is proposed. Firstly, an improved functional link neural network with small norm of expanded weights and high input-output correlation (SNEWHIOC-FLNN) was proposed for enhancing the generalization performance of FLNN. Unlike the traditional FLNN, the expanded variables of the original inputs are not directly used as the inputs in the proposed SNEWHIOC-FLNN model. The original inputs are attached to some small norm of expanded weights. As a result, the correlation coefficient between some of the expanded variables and the outputs is enhanced. The larger the correlation coefficient is, the more relevant the expanded variables tend to be. In the end, the expanded variables with larger correlation coefficient are selected as the inputs to improve the performance of the traditional FLNN. In order to test the proposed SNEWHIOC-FLNN model, three UCI (University of California, Irvine) regression datasets named Housing, Concrete Compressive Strength (CCS), and Yacht Hydro Dynamics (YHD) are selected. Then a hybrid model based on the improved FLNN integrating with partial least square (IFLNN-PLS) was built. In IFLNN-PLS model, the connection weights are calculated using the partial least square method but not the error back propagation algorithm. Lastly, IFLNN-PLS was developed as an intelligent measurement model for accurately predicting the key variables in the Purified Terephthalic Acid (PTA) process and the High Density Polyethylene (HDPE) process. Simulation results illustrated that the IFLNN-PLS could significant improve the prediction performance. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Abdullah, Kamarul A; McEntee, Mark F; Reed, Warren; Kench, Peter L
2018-04-30
An ideal organ-specific insert phantom should be able to simulate the anatomical features with appropriate appearances in the resultant computed tomography (CT) images. This study investigated a 3D printing technology to develop a novel and cost-effective cardiac insert phantom derived from volumetric CT image datasets of anthropomorphic chest phantom. Cardiac insert volumes were segmented from CT image datasets, derived from an anthropomorphic chest phantom of Lungman N-01 (Kyoto Kagaku, Japan). These segmented datasets were converted to a virtual 3D-isosurface of heart-shaped shell, while two other removable inserts were included using computer-aided design (CAD) software program. This newly designed cardiac insert phantom was later printed by using a fused deposition modelling (FDM) process via a Creatbot DM Plus 3D printer. Then, several selected filling materials, such as contrast media, oil, water and jelly, were loaded into designated spaces in the 3D-printed phantom. The 3D-printed cardiac insert phantom was positioned within the anthropomorphic chest phantom and 30 repeated CT acquisitions performed using a multi-detector scanner at 120-kVp tube potential. Attenuation (Hounsfield Unit, HU) values were measured and compared to the image datasets of real-patient and Catphan ® 500 phantom. The output of the 3D-printed cardiac insert phantom was a solid acrylic plastic material, which was strong, light in weight and cost-effective. HU values of the filling materials were comparable to the image datasets of real-patient and Catphan ® 500 phantom. A novel and cost-effective cardiac insert phantom for anthropomorphic chest phantom was developed using volumetric CT image datasets with a 3D printer. Hence, this suggested the printing methodology could be applied to generate other phantoms for CT imaging studies. © 2018 The Authors. Journal of Medical Radiation Sciences published by John Wiley & Sons Australia, Ltd on behalf of Australian Society of Medical Imaging and Radiation Therapy and New Zealand Institute of Medical Radiation Technology.
A statistical approach to nuclear fuel design and performance
NASA Astrophysics Data System (ADS)
Cunning, Travis Andrew
As CANDU fuel failures can have significant economic and operational consequences on the Canadian nuclear power industry, it is essential that factors impacting fuel performance are adequately understood. Current industrial practice relies on deterministic safety analysis and the highly conservative "limit of operating envelope" approach, where all parameters are assumed to be at their limits simultaneously. This results in a conservative prediction of event consequences with little consideration given to the high quality and precision of current manufacturing processes. This study employs a novel approach to the prediction of CANDU fuel reliability. Probability distributions are fitted to actual fuel manufacturing datasets provided by Cameco Fuel Manufacturing, Inc. They are used to form input for two industry-standard fuel performance codes: ELESTRES for the steady-state case and ELOCA for the transient case---a hypothesized 80% reactor outlet header break loss of coolant accident. Using a Monte Carlo technique for input generation, 105 independent trials are conducted and probability distributions are fitted to key model output quantities. Comparing model output against recognized industrial acceptance criteria, no fuel failures are predicted for either case. Output distributions are well removed from failure limit values, implying that margin exists in current fuel manufacturing and design. To validate the results and attempt to reduce the simulation burden of the methodology, two dimensional reduction methods are assessed. Using just 36 trials, both methods are able to produce output distributions that agree strongly with those obtained via the brute-force Monte Carlo method, often to a relative discrepancy of less than 0.3% when predicting the first statistical moment, and a relative discrepancy of less than 5% when predicting the second statistical moment. In terms of global sensitivity, pellet density proves to have the greatest impact on fuel performance, with an average sensitivity index of 48.93% on key output quantities. Pellet grain size and dish depth are also significant contributors, at 31.53% and 13.46%, respectively. A traditional limit of operating envelope case is also evaluated. This case produces output values that exceed the maximum values observed during the 105 Monte Carlo trials for all output quantities of interest. In many cases the difference between the predictions of the two methods is very prominent, and the highly conservative nature of the deterministic approach is demonstrated. A reliability analysis of CANDU fuel manufacturing parametric data, specifically pertaining to the quantification of fuel performance margins, has not been conducted previously. Key Words: CANDU, nuclear fuel, Cameco, fuel manufacturing, fuel modelling, fuel performance, fuel reliability, ELESTRES, ELOCA, dimensional reduction methods, global sensitivity analysis, deterministic safety analysis, probabilistic safety analysis.
NASA Astrophysics Data System (ADS)
Alonso-González, Esteban; López-Moreno, J. Ignacio; Gascoin, Simon; García-Valdecasas Ojeda, Matilde; Sanmiguel-Vallelado, Alba; Navarro-Serrano, Francisco; Revuelto, Jesús; Ceballos, Antonio; Jesús Esteban-Parra, María; Essery, Richard
2018-02-01
We present snow observations and a validated daily gridded snowpack dataset that was simulated from downscaled reanalysis of data for the Iberian Peninsula. The Iberian Peninsula has long-lasting seasonal snowpacks in its different mountain ranges, and winter snowfall occurs in most of its area. However, there are only limited direct observations of snow depth (SD) and snow water equivalent (SWE), making it difficult to analyze snow dynamics and the spatiotemporal patterns of snowfall. We used meteorological data from downscaled reanalyses as input of a physically based snow energy balance model to simulate SWE and SD over the Iberian Peninsula from 1980 to 2014. More specifically, the ERA-Interim reanalysis was downscaled to 10 km × 10 km resolution using the Weather Research and Forecasting (WRF) model. The WRF outputs were used directly, or as input to other submodels, to obtain data needed to drive the Factorial Snow Model (FSM). We used lapse rate coefficients and hygrobarometric adjustments to simulate snow series at 100 m elevations bands for each 10 km × 10 km grid cell in the Iberian Peninsula. The snow series were validated using data from MODIS satellite sensor and ground observations. The overall simulated snow series accurately reproduced the interannual variability of snowpack and the spatial variability of snow accumulation and melting, even in very complex topographic terrains. Thus, the presented dataset may be useful for many applications, including land management, hydrometeorological studies, phenology of flora and fauna, winter tourism, and risk management. The data presented here are freely available for download from Zenodo (https://doi.org/10.5281/zenodo.854618). This paper fully describes the work flow, data validation, uncertainty assessment, and possible applications and limitations of the database.
Methods for mapping and monitoring global glaciovolcanism
NASA Astrophysics Data System (ADS)
Curtis, Aaron; Kyle, Philip
2017-03-01
The most deadly (Nevado del Ruiz, 1985) and the most costly (Eyjafjallajökull, 2010) eruptions of the last 100 years were both glaciovolcanic. Considering its great importance to studies of volcanic hazards, global climate, and even astrobiology, the global distribution of glaciovolcanism is insufficiently understood. We present and assess three algorithms for mapping, monitoring, and predicting likely centers of glaciovolcanic activity worldwide. Each algorithm intersects buffer zones representing known Holocene-active volcanic centers with existing datasets of snow, ice, and permafrost. Two detection algorithms, RGGA and PZGA, are simple spatial join operations computed from the Randolph Glacier Inventory and the Permafrost Zonation Index, respectively. The third, MDGA, is an algorithm run on all 15 available years of the MOD10A2 weekly snow cover product from the Terra MODIS satellite radiometer. Shortcomings and advantages of the three methods are discussed, including previously unreported blunders in the MOD10A2 dataset. Comparison of the results leads to an effective approach for integrating the three methods. We show that 20.4% of known Holocene volcanic centers host glaciers or areas of permanent snow. A further 10.9% potentially interact with permafrost. MDGA and PZGA do not rely on any human input, rendering them useful for investigations of change over time. An intermediate step in MDGA involves estimating the snow-covered area at every Holocene volcanic center. These estimations can be updated weekly with no human intervention. To investigate the feasibility of an automatic ice-loss alert system, we consider three examples of glaciovolcanism in the MDGA weekly dataset. We also discuss the potential use of PZGA to model past and future glaciovolcanism based on global circulation model outputs. Combined, the three algorithms provide an automated system for understanding the geographic and temporal patterns of global glaciovolcanism which should be of use for hazard assessment, the search for extreme microbiomes, climate models, and implementation of ice-cover-based volcano monitoring systems.
4D very high-resolution topography monitoring of surface deformation using UAV-SfM framework.
NASA Astrophysics Data System (ADS)
Clapuyt, François; Vanacker, Veerle; Schlunegger, Fritz; Van Oost, Kristof
2016-04-01
During the last years, exploratory research has shown that UAV-based image acquisition is suitable for environmental remote sensing and monitoring. Image acquisition with cameras mounted on an UAV can be performed at very-high spatial resolution and high temporal frequency in the most dynamic environments. Combined with Structure-from-Motion algorithm, the UAV-SfM framework is capable of providing digital surface models (DSM) which are highly accurate when compared to other very-high resolution topographic datasets and highly reproducible for repeated measurements over the same study area. In this study, we aim at assessing (1) differential movement of the Earth's surface and (2) the sediment budget of a complex earthflow located in the Central Swiss Alps based on three topographic datasets acquired over a period of 2 years. For three time steps, we acquired aerial photographs with a standard reflex camera mounted on a low-cost and lightweight UAV. Image datasets were then processed with the Structure-from-Motion algorithm in order to reconstruct a 3D dense point cloud representing the topography. Georeferencing of outputs has been achieved based on the ground control point (GCP) extraction method, previously surveyed on the field with a RTK GPS. Finally, digital elevation model of differences (DOD) has been computed to assess the topographic changes between the three acquisition dates while surface displacements have been quantified by using image correlation techniques. Our results show that the digital elevation model of topographic differences is able to capture surface deformation at cm-scale resolution. The mean annual displacement of the earthflow is about 3.6 m while the forefront of the landslide has advanced by ca. 30 meters over a period of 18 months. The 4D analysis permits to identify the direction and velocity of Earth movement. Stable topographic ridges condition the direction of the flow with highest downslope movement on steep slopes, and diffuse movement due to lateral sediment flux in the central part of the earthflow.
SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data
NASA Astrophysics Data System (ADS)
Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.
2015-12-01
Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.
A Global Drought and Flood Catalogue for the past 100 years
NASA Astrophysics Data System (ADS)
Sheffield, J.; He, X.; Peng, L.; Pan, M.; Fisher, C. K.; Wood, E. F.
2017-12-01
Extreme hydrological events cause the most impacts of natural hazards globally, impacting on a wide range of sectors including, most prominently, agriculture, food security and water availability and quality, but also on energy production, forestry, health, transportation and fisheries. Understanding how floods and droughts intersect, and have changed in the past provides the basis for understanding current risk and how it may change in the future. To do this requires an understanding of the mechanisms associated with events and therefore their predictability, attribution of long-term changes in risk, and quantification of projections of changes in the future. Of key importance are long-term records of relevant variables so that risk can be quantified more accurately, given the growing acknowledgement that risk is not stationary under long-term climate variability and climate change. To address this, we develop a catalogue of drought and flood events based on land surface and hydrodynamic modeling, forced by a hybrid meteorological dataset that draws from the continuity and coverage of reanalysis, and satellite datasets, merged with global gauge databases. The meteorological dataset is corrected for temporal inhomogeneities, spurious trends and variable inter-dependencies to ensure long-term consistency, as well as realistic representation of short-term variability and extremes. The VIC land surface model is run for the past 100 years at 0.25-degree resolution for global land areas. The VIC runoff is then used to drive the CaMa-Flood hydrodynamic model to obtain information on flood inundation risk. The model outputs are compared to satellite based estimates of flood and drought conditions and the observational flood record. The data are analyzed in terms of the spatio-temporal characteristics of large-scale flood and drought events with a particular focus on characterizing the long-term variability in risk. Significant changes in risk occur on multi-decadal time scales and are mostly associated with variability in the North Atlantic and Pacific. The catalogue can be used for analysis of extreme events, risk assessment, and as a benchmark for model evaluation.
The Last Millennium Reanalysis: Improvements to proxies and proxy modeling
NASA Astrophysics Data System (ADS)
Tardif, R.; Hakim, G. J.; Emile-Geay, J.; Noone, D.; Anderson, D. M.
2017-12-01
The Last Millennium Reanalysis (LMR) employs a paleoclimate data assimilation (PDA) approach to produce climate field reconstructions (CFRs). Here, we focus on two key factors in PDA generated CFRs: the set of assimilated proxy records and forward models (FMs) used to estimate proxies from climate model output. In the initial configuration of the LMR [Hakim et al., 2016], the proxy dataset of [PAGES2k Consortium, 2013] was used, along with univariate linear FMs calibrated against annually-averaged 20th century temperature datasets. In an updated configuration, proxy records from the recent dataset [PAGES2k Consortium, 2017] are used, while a hierarchy of statistical FMs are tested: (1) univariate calibrated on annual temperature as in the initial configuration, (2) univariate against temperature as in (1) but calibration performed using expert-derived seasonality for individual proxy records, (3) as in (2) but expert proxy seasonality replaced by seasonal averaging determined objectively as part of the calibration process, (4) linear objective seasonal FMs as in (3) but objectively selecting relationships calibrated either on temperature or precipitation, and (5) bivariate linear models calibrated on temperature and precipitation with objectively-derived seasonality. (4) and (5) specifically aim at better representing the physical drivers of tree ring width proxies. Reconstructions generated using the CCSM4 Last Millennium simulation as an uninformed prior are evaluated against various 20th century data products. Results show the benefits of using the new proxy collection, particularly on the detrended global mean temperature and spatial patterns. The positive impact of using proper seasonality and temperature/moisture sensitivities for tree ring width records is also notable. This updated configuration will be used for the first generation of LMR-generated CFRs to be publicly released. These also provide a benchmark for future efforts aimed at evaluating the impact of additional proxy records and/or more sophisticated physically-based forward models. References: Hakim, G. J., and co-authors (2016), J. Geophys. Res. Atmos., doi:10.1002/2016JD024751 PAGES2K Consortium (2013), Nat. Geosci., doi:10.1038/ngeo1797 PAGES2k Consortium (2017), Sci. Data. doi:10.1038/sdata.2017.88
TerraSAR-X/TanDEM-X data for natural hazards research in mountainous regions of Uzbekistan
NASA Astrophysics Data System (ADS)
Semakova, Eleonora; Bühler, Yves
2017-07-01
Accurate and up-to-date digital elevation models (DEMs) are important tools for studying mountain hazards. We considered natural hazards related to glacier retreat, debris flows, and snow avalanches in two study areas of the Western Tien-Shan mountains, Uzbekistan. High-resolution DEMs were generated using single TerraSAR-X/TanDEM-X datasets. The high quality and actuality of the DEMs were proved through a comparison with Shuttle Radar Topography Mission, Advanced Spaceborne Emission and Reflection Radiometer, and Topo DEMs, using Ice, Cloud, and Land Elevation Satellite data as the reference dataset. For the first study area, which had high levels of economic activity, we applied the generated TanDEM-X DEM to an avalanche dynamics simulation using RAMMS software. Verification of the output results showed good agreement with field observations. For the second study area, with a wide spatial distribution of glaciers, we applied the TanDEM-X DEM to an assessment of glacier surface elevation changes. The results can be used to calculate the local mass balance in glacier ablation zones in other areas. Models were applied to estimate the probability of moraine-dammed lake formation and the affected area of a possible debris flow resulting from glacial lake outburst. The natural hazard research methods considered here will minimize costly ground observations in poorly accessible mountains and mitigate the impacts of hazards on the environment of Uzbekistan.
Analysis of Neuronal Spike Trains, Deconstructed
Aljadeff, Johnatan; Lansdell, Benjamin J.; Fairhall, Adrienne L.; Kleinfeld, David
2016-01-01
As information flows through the brain, neuronal firing progresses from encoding the world as sensed by the animal to driving the motor output of subsequent behavior. One of the more tractable goals of quantitative neuroscience is to develop predictive models that relate the sensory or motor streams with neuronal firing. Here we review and contrast analytical tools used to accomplish this task. We focus on classes of models in which the external variable is compared with one or more feature vectors to extract a low-dimensional representation, the history of spiking and other variables are potentially incorporated, and these factors are nonlinearly transformed to predict the occurrences of spikes. We illustrate these techniques in application to datasets of different degrees of complexity. In particular, we address the fitting of models in the presence of strong correlations in the external variable, as occurs in natural sensory stimuli and in movement. Spectral correlation between predicted and measured spike trains is introduced to contrast the relative success of different methods. PMID:27477016
Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering
NASA Technical Reports Server (NTRS)
Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland
2000-01-01
Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.
The Kalman Filter and High Performance Computing at NASA's Data Assimilation Office (DAO)
NASA Technical Reports Server (NTRS)
Lyster, Peter M.
1999-01-01
Atmospheric data assimilation is a method of combining actual observations with model simulations to produce a more accurate description of the earth system than the observations alone provide. The output of data assimilation, sometimes called "the analysis", are accurate regular, gridded datasets of observed and unobserved variables. This is used not only for weather forecasting but is becoming increasingly important for climate research. For example, these datasets may be used to assess retrospectively energy budgets or the effects of trace gases such as ozone. This allows researchers to understand processes driving weather and climate, which have important scientific and policy implications. The primary goal of the NASA's Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. This presentation will: (1) describe ongoing work on the advanced Kalman/Lagrangian filter parallel algorithm for the assimilation of trace gases in the stratosphere; and (2) discuss the Kalman filter in relation to other presentations from the DAO on Four Dimensional Data Assimilation at this meeting. Although the designation "Kalman filter" is often used to describe the overarching work, the series of talks will show that the scientific software and the kind of parallelization techniques that are being developed at the DAO are very different depending on the type of problem being considered, the extent to which the problem is mission critical, and the degree of Software Engineering that has to be applied.
WEGO 2.0: a web tool for analyzing and plotting GO annotations, 2018 update.
Ye, Jia; Zhang, Yong; Cui, Huihai; Liu, Jiawei; Wu, Yuqing; Cheng, Yun; Xu, Huixing; Huang, Xingxin; Li, Shengting; Zhou, An; Zhang, Xiuqing; Bolund, Lars; Chen, Qiang; Wang, Jian; Yang, Huanming; Fang, Lin; Shi, Chunmei
2018-05-18
WEGO (Web Gene Ontology Annotation Plot), created in 2006, is a simple but useful tool for visualizing, comparing and plotting GO (Gene Ontology) annotation results. Owing largely to the rapid development of high-throughput sequencing and the increasing acceptance of GO, WEGO has benefitted from outstanding performance regarding the number of users and citations in recent years, which motivated us to update to version 2.0. WEGO uses the GO annotation results as input. Based on GO's standardized DAG (Directed Acyclic Graph) structured vocabulary system, the number of genes corresponding to each GO ID is calculated and shown in a graphical format. WEGO 2.0 updates have targeted four aspects, aiming to provide a more efficient and up-to-date approach for comparative genomic analyses. First, the number of input files, previously limited to three, is now unlimited, allowing WEGO to analyze multiple datasets. Also added in this version are the reference datasets of nine model species that can be adopted as baselines in genomic comparative analyses. Furthermore, in the analyzing processes each Chi-square test is carried out for multiple datasets instead of every two samples. At last, WEGO 2.0 provides an additional output graph along with the traditional WEGO histogram, displaying the sorted P-values of GO terms and indicating their significant differences. At the same time, WEGO 2.0 features an entirely new user interface. WEGO is available for free at http://wego.genomics.org.cn.
Dreano, Denis; Raitsos, Dionysios E; Gittings, John; Krokos, George; Hoteit, Ibrahim
2016-01-01
Knowledge on large-scale biological processes in the southern Red Sea is relatively limited, primarily due to the scarce in situ, and satellite-derived chlorophyll-a (Chl-a) datasets. During summer, adverse atmospheric conditions in the southern Red Sea (haze and clouds) have long severely limited the retrieval of satellite ocean colour observations. Recently, a new merged ocean colour product developed by the European Space Agency (ESA)-the Ocean Color Climate Change Initiative (OC-CCI)-has substantially improved the southern Red Sea coverage of Chl-a, allowing the discovery of unexpected intense summer blooms. Here we provide the first detailed description of their spatiotemporal distribution and report the mechanisms regulating them. During summer, the monsoon-driven wind reversal modifies the circulation dynamics at the Bab-el-Mandeb strait, leading to a subsurface influx of colder, fresher, nutrient-rich water from the Indian Ocean. Using satellite observations, model simulation outputs, and in situ datasets, we track the pathway of this intrusion into the extensive shallow areas and coral reef complexes along the basin's shores. We also provide statistical evidence that the subsurface intrusion plays a key role in the development of the southern Red Sea phytoplankton blooms.
Development of web-GIS system for analysis of georeferenced geophysical data
NASA Astrophysics Data System (ADS)
Okladnikov, I.; Gordov, E. P.; Titov, A. G.; Bogomolov, V. Y.; Genina, E.; Martynova, Y.; Shulgina, T. M.
2012-12-01
Georeferenced datasets (meteorological databases, modeling and reanalysis results, remote sensing products, etc.) are currently actively used in numerous applications including modeling, interpretation and forecast of climatic and ecosystem changes for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their huge size which might constitute up to tens terabytes for a single dataset at present studies in the area of climate and environmental change require a special software support. A dedicated web-GIS information-computational system for analysis of georeferenced climatological and meteorological data has been created. The information-computational system consists of 4 basic parts: computational kernel developed using GNU Data Language (GDL), a set of PHP-controllers run within specialized web-portal, JavaScript class libraries for development of typical components of web mapping application graphical user interface (GUI) based on AJAX technology, and an archive of geophysical datasets. Computational kernel comprises of a number of dedicated modules for querying and extraction of data, mathematical and statistical data analysis, visualization, and preparing output files in geoTIFF and netCDF format containing processing results. Specialized web-portal consists of a web-server Apache, complying OGC standards Geoserver software which is used as a base for presenting cartographical information over the Web, and a set of PHP-controllers implementing web-mapping application logic and governing computational kernel. JavaScript libraries aiming at graphical user interface development are based on GeoExt library combining ExtJS Framework and OpenLayers software. The archive of geophysical data consists of a number of structured environmental datasets represented by data files in netCDF, HDF, GRIB, ESRI Shapefile formats. For processing by the system are available: two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis, DWD Global Precipitation Climatology Centre's data, GMAO Modern Era-Retrospective analysis for Research and Applications, meteorological observational data for the territory of the former USSR for the 20th century, results of modeling by global and regional climatological models, and others. The system is already involved into a scientific research process. Particularly, recently the system was successfully used for analysis of Siberia climate changes and its impact in the region. The Web-GIS information-computational system for geophysical data analysis provides specialists involved into multidisciplinary research projects with reliable and practical instruments for complex analysis of climate and ecosystems changes on global and regional scales. Using it even unskilled user without specific knowledge can perform computational processing and visualization of large meteorological, climatological and satellite monitoring datasets through unified web-interface in a common graphical web-browser. This work is partially supported by the Ministry of education and science of the Russian Federation (contract #07.514.114044), projects IV.31.1.5, IV.31.2.7, RFBR grants #10-07-00547a, #11-05-01190a, and integrated project SB RAS #131.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Di Vittorio, Alan V.; Kyle, Page; Collins, William D.
Understanding potential impacts of climate change is complicated by spatially mismatched land representations between gridded datasets and models, and land use models with larger regions defined by geopolitical and/or biophysical criteria. Here in this study, we quantify the sensitivity of Global Change Assessment Model (GCAM) outputs to the delineation of Agro-Ecological Zones (AEZs), which are normally based on historical (1961–1990) climate. We reconstruct GCAM's land regions using projected (2071–2100) climate, and find large differences in estimated future land use that correspond with differences in agricultural commodity prices and production volumes. Importantly, historically delineated AEZs experience spatially heterogeneous climate impacts overmore » time, and do not necessarily provide more homogenous initial land productivity than projected AEZs. Finally, we conclude that non-climatic criteria for land use region delineation are likely preferable for modeling land use change in the context of climate change, and that uncertainty associated with land delineation needs to be quantified.« less
Dynamical influences on thermospheric composition: implications for semi-empirical models
NASA Astrophysics Data System (ADS)
Sutton, E. K.; Solomon, S. C.
2014-12-01
The TIE-GCM was recently augmented to include helium and argon, two approximately inert species that can be used as tracers of dynamics in the thermosphere. The former species is treated as a major species due to its large abundance near the upper boundary. The effects of exospheric transport are also included in order to simulate realistic seasonal and latitudinal helium distributions. The latter species is treated as a classical minor species, imparting absolutely no forces on the background atmosphere. In this study, we examine the interplay of the various dynamical terms - i.e. background circulation, molecular and Eddy diffusion - as they drive departures from the distributions that would be expected under the assumption of diffusive equilibrium. As this has implications on the formulation of all empirical thermospheric models, we use this understanding to address the following questions: (1) how do errors caused by the assumption of diffusive equilibrium manifest within empirical models of the thermosphere? and (2) where and when does an empirical model's output disagree with its underlying datasets due to the inherent limitations of said model's formulation?
Di Vittorio, Alan V.; Kyle, Page; Collins, William D.
2016-09-03
Understanding potential impacts of climate change is complicated by spatially mismatched land representations between gridded datasets and models, and land use models with larger regions defined by geopolitical and/or biophysical criteria. Here in this study, we quantify the sensitivity of Global Change Assessment Model (GCAM) outputs to the delineation of Agro-Ecological Zones (AEZs), which are normally based on historical (1961–1990) climate. We reconstruct GCAM's land regions using projected (2071–2100) climate, and find large differences in estimated future land use that correspond with differences in agricultural commodity prices and production volumes. Importantly, historically delineated AEZs experience spatially heterogeneous climate impacts overmore » time, and do not necessarily provide more homogenous initial land productivity than projected AEZs. Finally, we conclude that non-climatic criteria for land use region delineation are likely preferable for modeling land use change in the context of climate change, and that uncertainty associated with land delineation needs to be quantified.« less
NASA Astrophysics Data System (ADS)
Vannametee, E.; Karssenberg, D.; Hendriks, M. R.; de Jong, S. M.; Bierkens, M. F. P.
2010-05-01
We propose a modelling framework for distributed hydrological modelling of 103-105 km2 catchments by discretizing the catchment in geomorphologic units. Each of these units is modelled using a lumped model representative for the processes in the unit. Here, we focus on the development and parameterization of this lumped model as a component of our framework. The development of the lumped model requires rainfall-runoff data for an extensive set of geomorphological units. Because such large observational data sets do not exist, we create artificial data. With a high-resolution, physically-based, rainfall-runoff model, we create artificial rainfall events and resulting hydrographs for an extensive set of different geomorphological units. This data set is used to identify the lumped model of geomorphologic units. The advantage of this approach is that it results in a lumped model with a physical basis, with representative parameters that can be derived from point-scale measurable physical parameters. The approach starts with the development of the high-resolution rainfall-runoff model that generates an artificial discharge dataset from rainfall inputs as a surrogate of a real-world dataset. The model is run for approximately 105 scenarios that describe different characteristics of rainfall, properties of the geomorphologic units (i.e. slope gradient, unit length and regolith properties), antecedent moisture conditions and flow patterns. For each scenario-run, the results of the high-resolution model (i.e. runoff and state variables) at selected simulation time steps are stored in a database. The second step is to develop the lumped model of a geomorphological unit. This forward model consists of a set of simple equations that calculate Hortonian runoff and state variables of the geomorphologic unit over time. The lumped model contains only three parameters: a ponding factor, a linear reservoir parameter, and a lag time. The model is capable of giving an appropriate representation of the transient rainfall-runoff relations that exist in the artificial data set generated with the high-resolution model. The third step is to find the values of empirical parameters in the lumped forward model using the artificial dataset. For each scenario of the high-resolution model run, a set of lumped model parameters is determined with a fitting method using the corresponding time series of state variables and outputs retrieved from the database. Thus, the parameters in the lumped model can be estimated by using the artificial data set. The fourth step is to develop an approach to assign lumped model parameters based upon the properties of the geomorphological unit. This is done by finding relationships between the measurable physical properties of geomorphologic units (i.e. slope gradient, unit length, and regolith properties) and the lumped forward model parameters using multiple regression techniques. In this way, a set of lumped forward model parameters can be estimated as a function of morphology and physical properties of the geomorphologic units. The lumped forward model can then be applied to different geomorphologic units. Finally, the performance of the lumped forward model is evaluated; the outputs of the lumped forward model are compared with the results of the high-resolution model. Our results show that the lumped forward model gives the best estimates of total discharge volumes and peak discharges when rain intensities are not significantly larger than the infiltration capacities of the units and when the units are small with a flat gradient. Hydrograph shapes are fairly well reproduced for most cases except for flat and elongated units with large runoff volumes. The results of this study provide a first step towards developing low-dimensional models for large ungauged basins.
NASA Astrophysics Data System (ADS)
Ferré, Hélène; Belmahfoud, Nizar; Boichard, Jean-Luc; Brissebrat, Guillaume; Cloché, Sophie; Descloitres, Jacques; Fleury, Laurence; Focsa, Loredana; Henriot, Nicolas; Mière, Arnaud; Ramage, Karim; Vermeulen, Anne; Boulanger, Damien
2015-04-01
The Chemistry-Aerosol Mediterranean Experiment (ChArMEx, http://charmex.lsce.ipsl.fr/) aims at a scientific assessment of the present and future state of the atmospheric environment in the Mediterranean Basin, and of its impacts on the regional climate, air quality, and marine biogeochemistry. The project includes long term monitoring of environmental parameters , intensive field campaigns, use of satellite data and modelling studies. Therefore ChARMEx scientists produce and need to access a wide diversity of data. In this context, the objective of the database task is to organize data management, distribution system and services, such as facilitating the exchange of information and stimulating the collaboration between researchers within the ChArMEx community, and beyond. The database relies on a strong collaboration between ICARE, IPSL and OMP data centers and has been set up in the framework of the Mediterranean Integrated Studies at Regional And Locals Scales (MISTRALS) program data portal. ChArMEx data, either produced or used by the project, are documented and accessible through the database website: http://mistrals.sedoo.fr/ChArMEx. The website offers the usual but user-friendly functionalities: data catalog, user registration procedure, search tool to select and access data... The metadata (data description) are standardized, and comply with international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). A Digital Object Identifier (DOI) assignement procedure allows to automatically register the datasets, in order to make them easier to access, cite, reuse and verify. At present, the ChArMEx database contains about 120 datasets, including more than 80 in situ datasets (2012, 2013 and 2014 summer campaigns, background monitoring station of Ersa...), 25 model output sets (dust model intercomparison, MEDCORDEX scenarios...), a high resolution emission inventory over the Mediterranean... Many in situ datasets have been inserted in a relational database, in order to enable more accurate selection and download of different datasets in a shared format. Many dedicated satellite products (SEVIRI, TRIMM, PARASOL...) are processed and will soon be accessible through the database website. In order to meet the operational needs of the airborne and ground based observational teams during the ChArMEx campaigns, a day-to-day chart display website has been developed and operated: http://choc.sedoo.org. It offers a convenient way to browse weather conditions and chemical composition during the campaign periods. Every scientist is invited to visit the ChArMEx websites, to register and to request data. Feel free to contact charmex-database@sedoo.fr for any question.
Exact algorithms for haplotype assembly from whole-genome sequence data.
Chen, Zhi-Zhong; Deng, Fei; Wang, Lusheng
2013-08-15
Haplotypes play a crucial role in genetic analysis and have many applications such as gene disease diagnoses, association studies, ancestry inference and so forth. The development of DNA sequencing technologies makes it possible to obtain haplotypes from a set of aligned reads originated from both copies of a chromosome of a single individual. This approach is often known as haplotype assembly. Exact algorithms that can give optimal solutions to the haplotype assembly problem are highly demanded. Unfortunately, previous algorithms for this problem either fail to output optimal solutions or take too long time even executed on a PC cluster. We develop an approach to finding optimal solutions for the haplotype assembly problem under the minimum-error-correction (MEC) model. Most of the previous approaches assume that the columns in the input matrix correspond to (putative) heterozygous sites. This all-heterozygous assumption is correct for most columns, but it may be incorrect for a small number of columns. In this article, we consider the MEC model with or without the all-heterozygous assumption. In our approach, we first use new methods to decompose the input read matrix into small independent blocks and then model the problem for each block as an integer linear programming problem, which is then solved by an integer linear programming solver. We have tested our program on a single PC [a Linux (x64) desktop PC with i7-3960X CPU], using the filtered HuRef and the NA 12878 datasets (after applying some variant calling methods). With the all-heterozygous assumption, our approach can optimally solve the whole HuRef data set within a total time of 31 h (26 h for the most difficult block of the 15th chromosome and only 5 h for the other blocks). To our knowledge, this is the first time that MEC optimal solutions are completely obtained for the filtered HuRef dataset. Moreover, in the general case (without the all-heterozygous assumption), for the HuRef dataset our approach can optimally solve all the chromosomes except the most difficult block in chromosome 15 within a total time of 12 days. For both of the HuRef and NA12878 datasets, the optimal costs in the general case are sometimes much smaller than those in the all-heterozygous case. This implies that some columns in the input matrix (after applying certain variant calling methods) still correspond to false-heterozygous sites. Our program, the optimal solutions found for the HuRef dataset available at http://rnc.r.dendai.ac.jp/hapAssembly.html.
Diouf, Ibrahima; Rodriguez-Fonseca, Belen; Deme, Abdoulaye; Caminade, Cyril; Morse, Andrew P.; Cisse, Moustapha; Sy, Ibrahima; Dia, Ibrahima; Ermert, Volker; Ndione, Jacques-André; Gaye, Amadou Thierno
2017-01-01
The analysis of the spatial and temporal variability of climate parameters is crucial to study the impact of climate-sensitive vector-borne diseases such as malaria. The use of malaria models is an alternative way of producing potential malaria historical data for Senegal due to the lack of reliable observations for malaria outbreaks over a long time period. Consequently, here we use the Liverpool Malaria Model (LMM), driven by different climatic datasets, in order to study and validate simulated malaria parameters over Senegal. The findings confirm that the risk of malaria transmission is mainly linked to climate variables such as rainfall and temperature as well as specific landscape characteristics. For the whole of Senegal, a lag of two months is generally observed between the peak of rainfall in August and the maximum number of reported malaria cases in October. The malaria transmission season usually takes place from September to November, corresponding to the second peak of temperature occurring in October. Observed malaria data from the Programme National de Lutte contre le Paludisme (PNLP, National Malaria control Programme in Senegal) and outputs from the meteorological data used in this study were compared. The malaria model outputs present some consistencies with observed malaria dynamics over Senegal, and further allow the exploration of simulations performed with reanalysis data sets over a longer time period. The simulated malaria risk significantly decreased during the 1970s and 1980s over Senegal. This result is consistent with the observed decrease of malaria vectors and malaria cases reported by field entomologists and clinicians in the literature. The main differences between model outputs and observations regard amplitude, but can be related not only to reanalysis deficiencies but also to other environmental and socio-economic factors that are not included in this mechanistic malaria model framework. The present study can be considered as a validation of the reliability of reanalysis to be used as inputs for the calculation of malaria parameters in the Sahel using dynamical malaria models. PMID:28946705
The distance function effect on k-nearest neighbor classification for medical datasets.
Hu, Li-Yu; Huang, Min-Wei; Ke, Shih-Wen; Tsai, Chih-Fong
2016-01-01
K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.
Multi-Party Privacy-Preserving Set Intersection with Quasi-Linear Complexity
NASA Astrophysics Data System (ADS)
Cheon, Jung Hee; Jarecki, Stanislaw; Seo, Jae Hong
Secure computation of the set intersection functionality allows n parties to find the intersection between their datasets without revealing anything else about them. An efficient protocol for such a task could have multiple potential applications in commerce, health care, and security. However, all currently known secure set intersection protocols for n>2 parties have computational costs that are quadratic in the (maximum) number of entries in the dataset contributed by each party, making secure computation of the set intersection only practical for small datasets. In this paper, we describe the first multi-party protocol for securely computing the set intersection functionality with both the communication and the computation costs that are quasi-linear in the size of the datasets. For a fixed security parameter, our protocols require O(n2k) bits of communication and Õ(n2k) group multiplications per player in the malicious adversary setting, where k is the size of each dataset. Our protocol follows the basic idea of the protocol proposed by Kissner and Song, but we gain efficiency by using different representations of the polynomials associated with users' datasets and careful employment of algorithms that interpolate or evaluate polynomials on multiple points more efficiently. Moreover, the proposed protocol is robust. This means that the protocol outputs the desired result even if some corrupted players leave during the execution of the protocol.
Black carbon in the Arctic: How well is it captured by models?
NASA Astrophysics Data System (ADS)
Eckhardt, Sabine
2015-04-01
A correct representation of the spatial distribution of aerosols in atmospheric models is essential for realistic simulations of deposition and calculations of radiative forcing. It has been observed that transport of black carbon (BC) into the Arctic and scavenging is sometimes not captured accurately enough in chemistry transport models (CTM) as well as global circulation models (GCM). In this study we determine the discrepancies between measured equivalent BC (EBC) and modeled BC for several Arctic measurement stations as well as for Arctic aircraft campaigns. For this, we use the output of a set of 11 models based on the same emission dataset (ECLIPSE emissions, see eclipse.nilu.no) and evaluate the simulated concentrations at the measurement locations and times. Emissions are separated for different sources such as biomass burning, domestic heating, gas flaring, industry and the transport sector. We focus on the years 2008 and 2009, where many campaigns took place in the framework of the International Polar Year. Arctic stations like Barrow, Alert, Station Nord in Greenland and Zeppelin show a very pronounced winter/spring maximum in BC. While monthly averaged measured EBC values are around 80 ng/m3, the models severely underestimate this with some models simulating only a small percentage of the observed values. During summer measured concentrations are a magnitude lower, and still underestimated by almost an order of magnitude in some models. However, the best models are with a factor of 2 in winter/spring and realistic concentrations in summer. In order to get information on the vertical profile we used measurements from aircraft campaigns like ARCTAS, ARCPAC and HIPPO. It is found that BC in latitudes below 60 degrees is better captured by the models than BC at higher latitudes, even though it is overestimated at high altitudes. A systematic analysis of the performance of different models is presented. With the dataset we capture remote, polluted and fire-influenced conditions.
Pelone, Ferruccio; Kringos, Dionne S; Spreeuwenberg, Peter; De Belvis, Antonio G; Groenewegen, Peter P
2013-09-01
To measure the relative efficiency of primary care (PC) in turning their structures into services delivery and turning their services delivery into quality outcomes. Cross-sectional study based on the dataset of the Primary Healthcare Activity Monitor for Europe project. Two Data Envelopment models were run to compare the relative technical efficiency. A sensitivity analysis of the resulting efficiency scores was performed. PC systems in 22 European countries in 2009/2010. Model 1 included data on PC governance, workforce development and economic conditions as inputs and access, coordination, continuity and comprehensiveness of care as outputs. Model 2 included the previous process dimensions as inputs and quality indicators as outputs. There is relatively reasonable efficiency in all countries at delivering as many as possible PC processes at a given level of PC structure. It is particularly important to invest in economic conditions to achieve an efficient structure-process balance. Only five countries have fully efficient PC systems in turning their services delivery into high quality outcomes, using a similar combination of access, continuity and comprehensiveness, although they differ on the adoption of coordination of services. There is a large variation in efficiency levels obtained by countries with inefficient PC in turning their services delivery into quality outcomes. Maximizing the individual functions of PC without taking into account the coherence within the health-care system is not sufficient from a policymaker's point of view when aiming to achieve efficiency.
NASA Astrophysics Data System (ADS)
Keppel-Aleks, G.; Hoffman, F. M.
2014-12-01
Feedbacks between the global carbon cycle and climate represent one of the largest uncertainties in climate prediction. A promising method for reducing uncertainty in predictions of carbon-climate feedbacks is based on identifying an "emergent constraint" that leverages correlations between mechanistically linked long-term feedbacks and short-term variations within the model ensemble. By applying contemporary observations to evaluate model skill in simulating short-term variations, we may be able to better assess the probability of simulated long-term feedbacks. We probed the constraint on long-term terrestrial carbon stocks provided by climate-driven fluctuations in the atmospheric CO2 growth rate at contemporary timescales. We considered the impact of both temperature and precipitation anomalies on terrestrial ecosystem exchange and further separated the direct influence of fire where possible. When we explicitly considered the role of atmospheric transport in smoothing the imprint of climate-driven flux anomalies on atmospheric CO2 patterns, we found that the extent of temporal averaging of both the observations and ESM output leads to estimates for the long-term climate sensitivity of tropical land carbon storage that are different by a factor of two. In the context of these results, we discuss strategies for applying emergent constraints for benchmarking biogeochemical feedbacks in ESMs. Specifically, our results underscore the importance of selecting appropriate observational benchmarks and, for future model intercomparison projects, outputting fields that most closely correspond to available observational datasets.
A downscaled 1 km dataset of daily Greenland ice sheet surface mass balance components (1958-2014)
NASA Astrophysics Data System (ADS)
Noel, B.; Van De Berg, W. J.; Fettweis, X.; Machguth, H.; Howat, I. M.; van den Broeke, M. R.
2015-12-01
The current spatial resolution in regional climate models (RCMs), typically around 5 to 20 km, remains too coarse to accurately reproduce the spatial variability in surface mass balance (SMB) components over the narrow ablation zones, marginal outlet glaciers and neighbouring ice caps of the Greenland ice sheet (GrIS). In these topographically rough terrains, the SMB components are highly dependent on local variations in topography. However, the relatively low-resolution elevation and ice mask prescribed in RCMs contribute to significantly underestimate melt and runoff in these regions due to unresolved valley glaciers and fjords. Therefore, near-km resolution topography is essential to better capture SMB variability in these spatially restricted regions. We present a 1 km resolution dataset of daily GrIS SMB covering the period 1958-2014, which is statistically downscaled from data of the polar regional climate model RACMO2.3 at 11 km, using an elevation dependence. The dataset includes all individual SMB components projected on the elevation and ice mask from the GIMP DEM, down-sampled to 1 km. Daily runoff and sublimation are interpolated to the 1 km topography using a local regression to elevation valid for each day specifically; daily precipitation is bi-linearly downscaled without elevation corrections. The daily SMB dataset is then reconstructed by summing downscaled precipitation, sublimation and runoff. High-resolution elevation and ice mask allow for properly resolving the narrow ablation zones and valley glaciers at the GrIS margins, leading to significant increase in runoff estimate. In these regions, and especially over narrow glaciers tongues, the downscaled products improve on the original RACMO2.3 outputs by better representing local SMB patterns through a gradual ablation increase towards the GrIS margins. We discuss the impact of downscaling on the SMB components in a case study for a spatially restricted region, where large elevation discrepancies are observed between both resolutions. Owing to generally enhanced runoff in the GrIS ablation zone, the evaluation of daily downscaled SMB against ablation measurements, collected at in-situ measuring sites derived from a newly compiled ablation dataset, shows a better agreement with observations relative to native RACMO2.3 SMB at 11 km.
How well does your model capture the terrestrial ecosystem dynamics of the Arctic-Boreal Region?
NASA Astrophysics Data System (ADS)
Stofferahn, E.; Fisher, J. B.; Hayes, D. J.; Huntzinger, D. N.; Schwalm, C.
2016-12-01
The Arctic-Boreal Region (ABR) is a major source of uncertainties for terrestrial biosphere model (TBM) simulations. These uncertainties are precipitated by a lack of observational data from the region, affecting the parameterizations of cold environment processes in the models. Addressing these uncertainties requires a coordinated effort of data collection and integration of the following key indicators of the ABR ecosystem: disturbance, flora / fauna and related ecosystem function, carbon pools and biogeochemistry, permafrost, and hydrology. We are developing a model-data integration framework for NASA's Arctic Boreal Vulnerability Experiment (ABoVE), wherein data collection for the key ABoVE indicators is driven by matching observations and model outputs to the ABoVE indicators. The data are used as reference datasets for a benchmarking system which evaluates TBM performance with respect to ABR processes. The benchmarking system utilizes performance metrics to identify intra-model and inter-model strengths and weaknesses, which in turn provides guidance to model development teams for reducing uncertainties in TBM simulations of the ABR. The system is directly connected to the International Land Model Benchmarking (ILaMB) system, as an ABR-focused application.
Towards 250 m mapping of terrestrial primary productivity over Canada
NASA Astrophysics Data System (ADS)
Gonsamo, A.; Chen, J. M.
2011-12-01
Terrestrial ecosystems are an important part of the climate and global change systems. Their role in climate change and in the global carbon cycle is yet to be well understood. Dataset from satellite earth observation, coupled with numerical models provide the unique tools for monitoring the spatial and temporal dynamics of territorial carbon cycle. The Boreal Ecosystems Productivity Simulator (BEPS) is a remote sensing based approach to quantifying the terrestrial carbon cycle by that gross and net primary productivity (GPP and NPP) and terrestrial carbon sinks and sources expressed as net ecosystem productivity (NEP). We have currently implemented a scheme to map the GPP, NPP and NEP at 250 m for first time over Canada using BEPS model. This is supplemented by improved mapping of land cover and leaf area index (LAI) at 250 m over Canada from MODIS satellite dataset. The results from BEPS are compared with MODIS GPP product and further evaluated with estimated LAI from various sources to evaluate if the results capture the trend in amount of photosynthetic biomass distributions. Final evaluation will be to validate both BEPS and MODIS primary productivity estimates over the Fluxnet sites over Canada. The primary evaluation indicate that BEPS GPP estimates capture the over storey LAI variations over Canada very well compared to MODIS GPP estimates. There is a large offset of MODIS GPP, over-estimating the lower GPP value compared to BEPS GPP estimates. These variations will further be validated based on the measured values from the Fluxnet tower measurements over Canadian. The high resolution GPP (NPP) products at 250 m will further be used to scale the outputs between different ecosystem productivity models, in our case the Canadian carbon budget model of Canadian forest sector CBM-CFS) and the Integrated Terrestrial Ecosystem Carbon model (InTEC).
NASA Astrophysics Data System (ADS)
Chen, Xin; Xing, Pei; Luo, Yong; Nie, Suping; Zhao, Zongci; Huang, Jianbin; Wang, Shaowu; Tian, Qinhua
2017-02-01
A new dataset of surface temperature over North America has been constructed by merging climate model results and empirical tree-ring data through the application of an optimal interpolation algorithm. Errors of both the Community Climate System Model version 4 (CCSM4) simulation and the tree-ring reconstruction were considered to optimize the combination of the two elements. Variance matching was used to reconstruct the surface temperature series. The model simulation provided the background field, and the error covariance matrix was estimated statistically using samples from the simulation results with a running 31-year window for each grid. Thus, the merging process could continue with a time-varying gain matrix. This merging method (MM) was tested using two types of experiment, and the results indicated that the standard deviation of errors was about 0.4 °C lower than the tree-ring reconstructions and about 0.5 °C lower than the model simulation. Because of internal variabilities and uncertainties in the external forcing data, the simulated decadal warm-cool periods were readjusted by the MM such that the decadal variability was more reliable (e.g., the 1940-1960s cooling). During the two centuries (1601-1800 AD) of the preindustrial period, the MM results revealed a compromised spatial pattern of the linear trend of surface temperature, which is in accordance with the phase transition of the Pacific decadal oscillation and Atlantic multidecadal oscillation. Compared with pure CCSM4 simulations, it was demonstrated that the MM brought a significant improvement to the decadal variability of the gridded temperature via the merging of temperature-sensitive tree-ring records.
2012-01-01
Introduction Traditionally, genomic or transcriptomic data have been restricted to a few model or emerging model organisms, and to a handful of species of medical and/or environmental importance. Next-generation sequencing techniques have the capability of yielding massive amounts of gene sequence data for virtually any species at a modest cost. Here we provide a comparative analysis of de novo assembled transcriptomic data for ten non-model species of previously understudied animal taxa. Results cDNA libraries of ten species belonging to five animal phyla (2 Annelida [including Sipuncula], 2 Arthropoda, 2 Mollusca, 2 Nemertea, and 2 Porifera) were sequenced in different batches with an Illumina Genome Analyzer II (read length 100 or 150 bp), rendering between ca. 25 and 52 million reads per species. Read thinning, trimming, and de novo assembly were performed under different parameters to optimize output. Between 67,423 and 207,559 contigs were obtained across the ten species, post-optimization. Of those, 9,069 to 25,681 contigs retrieved blast hits against the NCBI non-redundant database, and approximately 50% of these were assigned with Gene Ontology terms, covering all major categories, and with similar percentages in all species. Local blasts against our datasets, using selected genes from major signaling pathways and housekeeping genes, revealed high efficiency in gene recovery compared to available genomes of closely related species. Intriguingly, our transcriptomic datasets detected multiple paralogues in all phyla and in nearly all gene pathways, including housekeeping genes that are traditionally used in phylogenetic applications for their purported single-copy nature. Conclusions We generated the first study of comparative transcriptomics across multiple animal phyla (comparing two species per phylum in most cases), established the first Illumina-based transcriptomic datasets for sponge, nemertean, and sipunculan species, and generated a tractable catalogue of annotated genes (or gene fragments) and protein families for ten newly sequenced non-model organisms, some of commercial importance (i.e., Octopus vulgaris). These comprehensive sets of genes can be readily used for phylogenetic analysis, gene expression profiling, developmental analysis, and can also be a powerful resource for gene discovery. The characterization of the transcriptomes of such a diverse array of animal species permitted the comparison of sequencing depth, functional annotation, and efficiency of genomic sampling using the same pipelines, which proved to be similar for all considered species. In addition, the datasets revealed their potential as a resource for paralogue detection, a recurrent concern in various aspects of biological inquiry, including phylogenetics, molecular evolution, development, and cellular biochemistry. PMID:23190771
Shao, Wei; Liu, Mingxia; Zhang, Daoqiang
2016-01-01
The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. The dataset and code can be downloaded from https://github.com/shaoweinuaa/. dqzhang@nuaa.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Flexible Environmental Modeling with Python and Open - GIS
NASA Astrophysics Data System (ADS)
Pryet, Alexandre; Atteia, Olivier; Delottier, Hugo; Cousquer, Yohann
2015-04-01
Numerical modeling now represents a prominent task of environmental studies. During the last decades, numerous commercial programs have been made available to environmental modelers. These software applications offer user-friendly graphical user interfaces that allow an efficient management of many case studies. However, they suffer from a lack of flexibility and closed-source policies impede source code reviewing and enhancement for original studies. Advanced modeling studies require flexible tools capable of managing thousands of model runs for parameter optimization, uncertainty and sensitivity analysis. In addition, there is a growing need for the coupling of various numerical models associating, for instance, groundwater flow modeling to multi-species geochemical reactions. Researchers have produced hundreds of open-source powerful command line programs. However, there is a need for a flexible graphical user interface allowing an efficient processing of geospatial data that comes along any environmental study. Here, we present the advantages of using the free and open-source Qgis platform and the Python scripting language for conducting environmental modeling studies. The interactive graphical user interface is first used for the visualization and pre-processing of input geospatial datasets. Python scripting language is then employed for further input data processing, call to one or several models, and post-processing of model outputs. Model results are eventually sent back to the GIS program, processed and visualized. This approach combines the advantages of interactive graphical interfaces and the flexibility of Python scripting language for data processing and model calls. The numerous python modules available facilitate geospatial data processing and numerical analysis of model outputs. Once input data has been prepared with the graphical user interface, models may be run thousands of times from the command line with sequential or parallel calls. We illustrate this approach with several case studies in groundwater hydrology and geochemistry and provide links to several python libraries that facilitate pre- and post-processing operations.
Baumes, Laurent A
2006-01-01
One of the main problems in high-throughput research for materials is still the design of experiments. At early stages of discovery programs, purely exploratory methodologies coupled with fast screening tools should be employed. This should lead to opportunities to find unexpected catalytic results and identify the "groups" of catalyst outputs, providing well-defined boundaries for future optimizations. However, very few new papers deal with strategies that guide exploratory studies. Mostly, traditional designs, homogeneous covering, or simple random samplings are exploited. Typical catalytic output distributions exhibit unbalanced datasets for which an efficient learning is hardly carried out, and interesting but rare classes are usually unrecognized. Here is suggested a new iterative algorithm for the characterization of the search space structure, working independently of learning processes. It enhances recognition rates by transferring catalysts to be screened from "performance-stable" space zones to "unsteady" ones which necessitate more experiments to be well-modeled. The evaluation of new algorithm attempts through benchmarks is compulsory due to the lack of past proofs about their efficiency. The method is detailed and thoroughly tested with mathematical functions exhibiting different levels of complexity. The strategy is not only empirically evaluated, the effect or efficiency of sampling on future Machine Learning performances is also quantified. The minimum sample size required by the algorithm for being statistically discriminated from simple random sampling is investigated.
Selection-Fusion Approach for Classification of Datasets with Missing Values
Ghannad-Rezaie, Mostafa; Soltanian-Zadeh, Hamid; Ying, Hao; Dong, Ming
2010-01-01
This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. PMID:20212921
Plant Species Identification by Bi-channel Deep Convolutional Networks
NASA Astrophysics Data System (ADS)
He, Guiqing; Xia, Zhaoqiang; Zhang, Qiqi; Zhang, Haixi; Fan, Jianping
2018-04-01
Plant species identification achieves much attention recently as it has potential application in the environmental protection and human life. Although deep learning techniques can be directly applied for plant species identification, it still needs to be designed for this specific task to obtain the state-of-art performance. In this paper, a bi-channel deep learning framework is developed for identifying plant species. In the framework, two different sub-networks are fine-tuned over their pretrained models respectively. And then a stacking layer is used to fuse the output of two different sub-networks. We construct a plant dataset of Orchidaceae family for algorithm evaluation. Our experimental results have demonstrated that our bi-channel deep network can achieve very competitive performance on accuracy rates compared to the existing deep learning algorithm.
The BAOBAB data portal and DACCIWA database
NASA Astrophysics Data System (ADS)
Brissebrat, Guillaume; Belmahfoud, Nizar; Cloché, Sophie; Ferré, Hélène; Fleury, Laurence; Mière, Arnaud; Ramage, Karim
2017-04-01
In the framework of the African Monsoon Multidisciplinary Analyses (AMMA) programme, several tools have been developed in order to boost the data and information exchange between researchers from different disciplines: a user-friendly data management and dissemination system, quasi real-time display websites and a scientific paper exchange collaborative tool. The information system is enriched by past and ongoing projects (IMPETUS, FENNEC, ESCAPE, QweCI, ACASIS, DACCIWA...) addressing meteorology, atmospheric chemistry, hydrology, extreme events, health, adaptation of human societies... It is becoming a reference information system on environmental issues in West Africa: BAOBAB (Base Afrique de l'Ouest beyond AMMA Base). The projects include airborne, ground-based and ocean measurements, social science surveys, satellite data use, modelling studies and value-added product development. Therefore, the BAOBAB data portal enables to access a great amount and a large variety of data: - 250 local observation datasets, that have been collected by operational networks since 1850, long term monitoring research networks and intensive scientific campaigns; - 1350 outputs of a socio-economics questionnaire; - 60 operational satellite products and several research products; - 10 output sets of meteorological and ocean operational models and 15 of research simulations. Data documentation complies with metadata international standards, and data are delivered into standard formats. The data request interface takes full advantage of the database relational structure and enables users to elaborate multicriteria requests (period, area, property…). The BAOBAB data portal counts about 900 registered users, and 50 data requests every month. The databases and data portal have been developed and are operated jointly by SEDOO and ESPRI in France: http://baoab.sedoo.fr. The ongoing DACCIWA (Dynamics-Aerosol-Chemistry-Cloud Interactions over West Africa) project uses the BAOBAB portal to distribute its data: http://baobab.sedoo.fr/DACCIWA/. 30 datasets are already available: - Local observation from DACCIWA-supersites at Savé (Benin), Kumasi (Ghana), and Ile-Ife (Nigeria); - Radiosonde data from stations in Benin, Cameroon, Côte d'Ivoire, Ghana and Nigeria. During the June-July 2016 DACCIWA campaign, a day-to-day chart display software has been designed and operated in order to monitor meteorological and environment information and to meet the observational team needs: - Quickooks from DACCIWA-supersites instruments; - Atmospheric and chemical models outputs; - Satellite products (Eumetsat, TERRA-MODIS...). This website (http://dacciwa.sedoo.fr) constitutes now a synthetic view on the campaign and a preliminary investigation tool for researchers. Similar websites are still online for past campaigns : AMMA 2006 (http://aoc.amma-international.org) and FENNEC 2011 (http://fenoc.sedoo.fr). Since 2011, the same software enables a group of French and Senegalese researchers and forecasters to exchange in near real-time physical indices and diagnosis calculated from numerical weather operational forecasts, satellite products and in situ operational observations along the monsoon season, in order to better assess, understand and anticipate the monsoon intraseasonal variability (http://misva.sedoo.fr). Another similar website is dedicated to heat waves diagnosis and monitoring (http://acasis.sedoo.fr). It aims at becoming an operational component for national early warning systems. Every scientist is invited to make use of the BAOBAB online tools and data. Scientists or project leaders who have management needs for existing or future datasets concerning West Africa are welcome to use the BAOBAB framework and to contact baobab@sedoo.fr.
Research on Zheng Classification Fusing Pulse Parameters in Coronary Heart Disease
Guo, Rui; Wang, Yi-Qin; Xu, Jin; Yan, Hai-Xia; Yan, Jian-Jun; Li, Fu-Feng; Xu, Zhao-Xia; Xu, Wen-Jie
2013-01-01
This study was conducted to illustrate that nonlinear dynamic variables of Traditional Chinese Medicine (TCM) pulse can improve the performances of TCM Zheng classification models. Pulse recordings of 334 coronary heart disease (CHD) patients and 117 normal subjects were collected in this study. Recurrence quantification analysis (RQA) was employed to acquire nonlinear dynamic variables of pulse. TCM Zheng models in CHD were constructed, and predictions using a novel multilabel learning algorithm based on different datasets were carried out. Datasets were designed as follows: dataset1, TCM inquiry information including inspection information; dataset2, time-domain variables of pulse and dataset1; dataset3, RQA variables of pulse and dataset1; and dataset4, major principal components of RQA variables and dataset1. The performances of the different models for Zheng differentiation were compared. The model for Zheng differentiation based on RQA variables integrated with inquiry information had the best performance, whereas that based only on inquiry had the worst performance. Meanwhile, the model based on time-domain variables of pulse integrated with inquiry fell between the above two. This result showed that RQA variables of pulse can be used to construct models of TCM Zheng and improve the performance of Zheng differentiation models. PMID:23737839
USDA-ARS?s Scientific Manuscript database
The Fort Cobb Reservoir, which is within the Fort Cobb Reservoir Experimental watershed (FCREW) in Oklahoma, is on the Oklahoma 303(d) list (list of water bodies that do not meet the water quality standards as given in the Clean Water Act) based on sedimentation and trophic level of the lake associa...
NASA Astrophysics Data System (ADS)
Behera, Kishore Kumar; Pal, Snehanshu
2018-03-01
This paper describes a new approach towards optimum utilisation of ferrochrome added during stainless steel making in AOD converter. The objective of optimisation is to enhance end blow chromium content of steel and reduce the ferrochrome addition during refining. By developing a thermodynamic based mathematical model, a study has been conducted to compute the optimum trade-off between ferrochrome addition and end blow chromium content of stainless steel using a predator prey genetic algorithm through training of 100 dataset considering different input and output variables such as oxygen, argon, nitrogen blowing rate, duration of blowing, initial bath temperature, chromium and carbon content, weight of ferrochrome added during refining. Optimisation is performed within constrained imposed on the input parameters whose values fall within certain ranges. The analysis of pareto fronts is observed to generate a set of feasible optimal solution between the two conflicting objectives that provides an effective guideline for better ferrochrome utilisation. It is found out that after a certain critical range, further addition of ferrochrome does not affect the chromium percentage of steel. Single variable response analysis is performed to study the variation and interaction of all individual input parameters on output variables.
Cao, Hui; Yan, Xingyu; Li, Yaojiang; Wang, Yanxia; Zhou, Yan; Yang, Sanchun
2014-01-01
Quantitative analysis for the flue gas of natural gas-fired generator is significant for energy conservation and emission reduction. The traditional partial least squares method may not deal with the nonlinear problems effectively. In the paper, a nonlinear partial least squares method with extended input based on radial basis function neural network (RBFNN) is used for components prediction of flue gas. For the proposed method, the original independent input matrix is the input of RBFNN and the outputs of hidden layer nodes of RBFNN are the extension term of the original independent input matrix. Then, the partial least squares regression is performed on the extended input matrix and the output matrix to establish the components prediction model of flue gas. A near-infrared spectral dataset of flue gas of natural gas combustion is used for estimating the effectiveness of the proposed method compared with PLS. The experiments results show that the root-mean-square errors of prediction values of the proposed method for methane, carbon monoxide, and carbon dioxide are, respectively, reduced by 4.74%, 21.76%, and 5.32% compared to those of PLS. Hence, the proposed method has higher predictive capabilities and better robustness.
KAnalyze: a fast versatile pipelined K-mer toolkit
Audano, Peter; Vannberg, Fredrik
2014-01-01
Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. Availability and implementation: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/ Contact: fredrik.vannberg@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24642064
KAnalyze: a fast versatile pipelined k-mer toolkit.
Audano, Peter; Vannberg, Fredrik
2014-07-15
Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/. © The Author 2014. Published by Oxford University Press.
Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C. M. A.; Saltz, Joel
2017-01-01
We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies. PMID:29081725
Exploring 4D Flow Data in an Immersive Virtual Environment
NASA Astrophysics Data System (ADS)
Stevens, A. H.; Butkiewicz, T.
2017-12-01
Ocean models help us to understand and predict a wide range of intricate physical processes which comprise the atmospheric and oceanic systems of the Earth. Because these models output an abundance of complex time-varying three-dimensional (i.e., 4D) data, effectively conveying the myriad information from a given model poses a significant visualization challenge. The majority of the research effort into this problem has concentrated around synthesizing and examining methods for representing the data itself; by comparison, relatively few studies have looked into the potential merits of various viewing conditions and virtual environments. We seek to improve our understanding of the benefits offered by current consumer-grade virtual reality (VR) systems through an immersive, interactive 4D flow visualization system. Our dataset is a Regional Ocean Modeling System (ROMS) model representing a 12-hour tidal cycle of the currents within New Hampshire's Great Bay estuary. The model data was loaded into a custom VR particle system application using the OpenVR software library and the HTC Vive hardware, which tracks a headset and two six-degree-of-freedom (6DOF) controllers within a 5m-by-5m area. The resulting visualization system allows the user to coexist in the same virtual space as the data, enabling rapid and intuitive analysis of the flow model through natural interactions with the dataset and within the virtual environment. Whereas a traditional computer screen typically requires the user to reposition a virtual camera in the scene to obtain the desired view of the data, in virtual reality the user can simply move their head to the desired viewpoint, completely eliminating the mental context switches from data exploration/analysis to view adjustment and back. The tracked controllers become tools to quickly manipulate (reposition, reorient, and rescale) the dataset and to interrogate it by, e.g., releasing dye particles into the flow field, probing scalar velocities, placing a cutting plane through a region of interest, etc. It is hypothesized that the advantages afforded by head-tracked viewing and 6DOF interaction devices will lead to faster and more efficient examination of 4D flow data. A human factors study is currently being prepared to empirically evaluate this method of visualization and interaction.
Computational Tools for Parsimony Phylogenetic Analysis of Omics Data
Salazar, Jose; Amri, Hakima; Noursi, David
2015-01-01
Abstract High-throughput assays from genomics, proteomics, metabolomics, and next generation sequencing produce massive omics datasets that are challenging to analyze in biological or clinical contexts. Thus far, there is no publicly available program for converting quantitative omics data into input formats to be used in off-the-shelf robust phylogenetic programs. To the best of our knowledge, this is the first report on creation of two Windows-based programs, OmicsTract and SynpExtractor, to address this gap. We note, as a way of introduction and development of these programs, that one particularly useful bioinformatics inferential modeling is the phylogenetic cladogram. Cladograms are multidimensional tools that show the relatedness between subgroups of healthy and diseased individuals and the latter's shared aberrations; they also reveal some characteristics of a disease that would not otherwise be apparent by other analytical methods. The OmicsTract and SynpExtractor were written for the respective tasks of (1) accommodating advanced phylogenetic parsimony analysis (through standard programs of MIX [from PHYLIP] and TNT), and (2) extracting shared aberrations at the cladogram nodes. OmicsTract converts comma-delimited data tables through assigning each data point into a binary value (“0” for normal states and “1” for abnormal states) then outputs the converted data tables into the proper input file formats for MIX or with embedded commands for TNT. SynapExtractor uses outfiles from MIX and TNT to extract the shared aberrations of each node of the cladogram, matching them with identifying labels from the dataset and exporting them into a comma-delimited file. Labels may be gene identifiers in gene-expression datasets or m/z values in mass spectrometry datasets. By automating these steps, OmicsTract and SynpExtractor offer a veritable opportunity for rapid and standardized phylogenetic analyses of omics data; their model can also be extended to next generation sequencing (NGS) data. We make OmicsTract and SynpExtractor publicly and freely available for non-commercial use in order to strengthen and build capacity for the phylogenetic paradigm of omics analysis. PMID:26230532
Large-Scale Image Analytics Using Deep Learning
NASA Astrophysics Data System (ADS)
Ganguly, S.; Nemani, R. R.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.
2014-12-01
High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The AWS core components that we use to solve this problem are DynamoDB along with S3 for database query and storage, ElastiCache shared memory architecture for image segmentation, Elastic Map Reduce (EMR) for image feature extraction, and the memory optimized Elastic Cloud Compute (EC2) for the learning algorithm.
NASA Astrophysics Data System (ADS)
Erickson, T.
2016-12-01
Deriving actionable information from Earth observation data obtained from sensors or models can be quite complicated, and sharing those insights with others in a form that they can understand, reproduce, and improve upon is equally difficult. Journal articles, even if digital, commonly present just a summary of an analysis that cannot be understood in depth or reproduced without major effort on the part of the reader. Here we show a method of improving scientific literacy by pairing a recently developed scientific presentation technology (Jupyter Notebooks) with a petabyte-scale platform for accessing and analyzing Earth observation and model data (Google Earth Engine). Jupyter Notebooks are interactive web documents that mix live code with annotations such as rich-text markup, equations, images, videos, hyperlinks and dynamic output. Notebooks were first introduced as part of the IPython project in 2011, and have since gained wide acceptance in the scientific programming community, initially among Python programmers but later by a wide range of scientific programming languages. While Jupyter Notebooks have been widely adopted for general data analysis, data visualization, and machine learning, to date there have been relatively few examples of using Jupyter Notebooks to analyze geospatial datasets. Google Earth Engine is cloud-based platform for analyzing geospatial data, such as satellite remote sensing imagery and/or Earth system model output. Through its Python API, Earth Engine makes petabytes of Earth observation data accessible, and provides hundreds of algorithmic building blocks that can be chained together to produce high-level algorithms and outputs in real-time. We anticipate that this technology pairing will facilitate a better way of creating, documenting, and sharing complex analyses that derive information on our Earth that can be used to promote broader understanding of the complex issues that it faces. http://jupyter.orghttps://earthengine.google.com
NASA Astrophysics Data System (ADS)
Dafonte, C.; Fustes, D.; Manteiga, M.; Garabato, D.; Álvarez, M. A.; Ulla, A.; Allende Prieto, C.
2016-10-01
Aims: We present an innovative artificial neural network (ANN) architecture, called Generative ANN (GANN), that computes the forward model, that is it learns the function that relates the unknown outputs (stellar atmospheric parameters, in this case) to the given inputs (spectra). Such a model can be integrated in a Bayesian framework to estimate the posterior distribution of the outputs. Methods: The architecture of the GANN follows the same scheme as a normal ANN, but with the inputs and outputs inverted. We train the network with the set of atmospheric parameters (Teff, log g, [Fe/H] and [α/ Fe]), obtaining the stellar spectra for such inputs. The residuals between the spectra in the grid and the estimated spectra are minimized using a validation dataset to keep solutions as general as possible. Results: The performance of both conventional ANNs and GANNs to estimate the stellar parameters as a function of the star brightness is presented and compared for different Galactic populations. GANNs provide significantly improved parameterizations for early and intermediate spectral types with rich and intermediate metallicities. The behaviour of both algorithms is very similar for our sample of late-type stars, obtaining residuals in the derivation of [Fe/H] and [α/ Fe] below 0.1 dex for stars with Gaia magnitude Grvs < 12, which accounts for a number in the order of four million stars to be observed by the Radial Velocity Spectrograph of the Gaia satellite. Conclusions: Uncertainty estimation of computed astrophysical parameters is crucial for the validation of the parameterization itself and for the subsequent exploitation by the astronomical community. GANNs produce not only the parameters for a given spectrum, but a goodness-of-fit between the observed spectrum and the predicted one for a given set of parameters. Moreover, they allow us to obtain the full posterior distribution over the astrophysical parameters space once a noise model is assumed. This can be used for novelty detection and quality assessment.
Comparative Evaluation of Five Fire Emissions Datasets Using the GEOS-5 Model
NASA Astrophysics Data System (ADS)
Ichoku, C. M.; Pan, X.; Chin, M.; Bian, H.; Darmenov, A.; Ellison, L.; Kucsera, T. L.; da Silva, A. M., Jr.; Petrenko, M. M.; Wang, J.; Ge, C.; Wiedinmyer, C.
2017-12-01
Wildfires and other types of biomass burning affect most vegetated parts of the globe, contributing 40% of the annual global atmospheric loading of carbonaceous aerosols, as well as significant amounts of numerous trace gases, such as carbon dioxide, carbon monoxide, and methane. Many of these smoke constituents affect the air quality and/or the climate system directly or through their interactions with solar radiation and cloud properties. However, fire emissions are poorly constrained in global and regional models, resulting in high levels of uncertainty in understanding their real impacts. With the advent of satellite remote sensing of fires and burned areas in the last couple of decades, a number of fire emissions products have become available for use in relevant research and applications. In this study, we evaluated five global biomass burning emissions datasets, namely: (1) GFEDv3.1 (Global Fire Emissions Database version 3.1); (2) GFEDv4s (Global Fire Emissions Database version 4 with small fires); (3) FEERv1 (Fire Energetics and Emissions Research version 1.0); (4) QFEDv2.4 (Quick Fire Emissions Dataset version 2.4); and (5) Fire INventory from NCAR (FINN) version 1.5. Overall, the spatial patterns of biomass burning emissions from these inventories are similar, although the magnitudes of the emissions can be noticeably different. The inventories derived using top-down approaches (QFEDv2.4 and FEERv1) are larger than those based on bottom-up approaches. For example, global organic carbon (OC) emissions in 2008 are: QFEDv2.4 (51.93 Tg), FEERv1 (28.48 Tg), FINN v1.5 (19.48 Tg), GFEDv3.1 (15.65 Tg) and GFEDv4s (13.76 Tg); representing a factor of 3.7 difference between the largest and the least. We also used all five biomass-burning emissions datasets to conduct aerosol simulations using the NASA Goddard Earth Observing System Model, Version 5 (GEOS-5), and compared the resulting aerosol optical depth (AOD) output to the corresponding retrievals from MODIS and AERONET. Simulated AOD based on all five emissions inventories show significant underestimation in biomass burning dominated regions. Attributions of possible factors responsible for the differences among the inventories were further explored in Southern Africa and South America, two of the major biomass burning regions of the world.
Deep SOMs for automated feature extraction and classification from big data streaming
NASA Astrophysics Data System (ADS)
Sakkari, Mohamed; Ejbali, Ridha; Zaied, Mourad
2017-03-01
In this paper, we proposed a deep self-organizing map model (Deep-SOMs) for automated features extracting and learning from big data streaming which we benefit from the framework Spark for real time streams and highly parallel data processing. The SOMs deep architecture is based on the notion of abstraction (patterns automatically extract from the raw data, from the less to more abstract). The proposed model consists of three hidden self-organizing layers, an input and an output layer. Each layer is made up of a multitude of SOMs, each map only focusing at local headmistress sub-region from the input image. Then, each layer trains the local information to generate more overall information in the higher layer. The proposed Deep-SOMs model is unique in terms of the layers architecture, the SOMs sampling method and learning. During the learning stage we use a set of unsupervised SOMs for feature extraction. We validate the effectiveness of our approach on large data sets such as Leukemia dataset and SRBCT. Results of comparison have shown that the Deep-SOMs model performs better than many existing algorithms for images classification.
CalFitter: a web server for analysis of protein thermal denaturation data.
Mazurenko, Stanislav; Stourac, Jan; Kunka, Antonin; Nedeljkovic, Sava; Bednar, David; Prokop, Zbynek; Damborsky, Jiri
2018-05-14
Despite significant advances in the understanding of protein structure-function relationships, revealing protein folding pathways still poses a challenge due to a limited number of relevant experimental tools. Widely-used experimental techniques, such as calorimetry or spectroscopy, critically depend on a proper data analysis. Currently, there are only separate data analysis tools available for each type of experiment with a limited model selection. To address this problem, we have developed the CalFitter web server to be a unified platform for comprehensive data fitting and analysis of protein thermal denaturation data. The server allows simultaneous global data fitting using any combination of input data types and offers 12 protein unfolding pathway models for selection, including irreversible transitions often missing from other tools. The data fitting produces optimal parameter values, their confidence intervals, and statistical information to define unfolding pathways. The server provides an interactive and easy-to-use interface that allows users to directly analyse input datasets and simulate modelled output based on the model parameters. CalFitter web server is available free at https://loschmidt.chemi.muni.cz/calfitter/.
A 30m resolution hydrodynamic model of the entire conterminous United States.
NASA Astrophysics Data System (ADS)
Bates, P. D.; Neal, J. C.; Smith, A.; Sampson, C.; Johnson, K.; Wing, O.
2016-12-01
In this paper we describe the development and validation of a 30m resolution hydrodynamic model covering the entire conterminous United States. The model can be used to simulate inundation and water depths resulting from either return period flows (so equivalent to FEMA Flood Insurance Rate Maps), hindcasts of historic events or forecasts of future river flow from a rainfall-runoff or land surface model. As topographic data the model uses the U.S. Geological Survey National Elevation Dataset or NED, and return period flows are generated using a regional flood frequency analysis methodology (Smith et al., 2015. Worldwide flood frequency estimation. Water Resources Research, 51, 539-553). Flood defences nationwide are represented using data from the US Army Corps of Engineers. Using these data flows are simulated using an explicit and highly efficient finite difference solution of the local inertial form of the Shallow Water equations identical to that implemented in the LISFLOOD-FP model. Even with this efficient numerical solution a simulation at this resolution over a whole continent is a huge undertaking, and a variety of High Performance Computing technologies therefore need to be employed to make these simulations possible. The size of the output datasets is also challenging, and to solve this we use the GIS and graphical display functions of Google Earth Engine to facilitate easy visualisation and interrogation of the results. The model is validated against the return period flood extents contained in FEMA Flood Insurance Rate Maps and real flood event data from the Texas 2015 flood event which was hindcast using the model. Finally, we present an application of the model to the Upper Mississippi river basin where simulations both with and without flood defences are used to determine floodplain areas benefitting from protection in order to quantify the benefits of flood defence spending.