Sample records for extremely large datasets

  1. TECA: A Parallel Toolkit for Extreme Climate Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prabhat, Mr; Ruebel, Oliver; Byna, Surendra

    2012-03-12

    We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

  2. Evaluation of precipitation extremes over the Asian domain: observation and modelling studies

    NASA Astrophysics Data System (ADS)

    Kim, In-Won; Oh, Jaiho; Woo, Sumin; Kripalani, R. H.

    2018-04-01

    In this study, a comparison in the precipitation extremes as exhibited by the seven reference datasets is made to ascertain whether the inferences based on these datasets agree or they differ. These seven datasets, roughly grouped in three categories i.e. rain-gauge based (APHRODITE, CPC-UNI), satellite-based (TRMM, GPCP1DD) and reanalysis based (ERA-Interim, MERRA, and JRA55), having a common data period 1998-2007 are considered. Focus is to examine precipitation extremes in the summer monsoon rainfall over South Asia, East Asia and Southeast Asia. Measures of extreme precipitation include the percentile thresholds, frequency of extreme precipitation events and other quantities. Results reveal that the differences in displaying extremes among the datasets are small over South Asia and East Asia but large differences among the datasets are displayed over the Southeast Asian region including the maritime continent. Furthermore, precipitation data appear to be more consistent over East Asia among the seven datasets. Decadal trends in extreme precipitation are consistent with known results over South and East Asia. No trends in extreme precipitation events are exhibited over Southeast Asia. Outputs of the Coupled Model Intercomparison Project Phase 5 (CMIP5) simulation data are categorized as high, medium and low-resolution models. The regions displaying maximum intensity of extreme precipitation appear to be dependent on model resolution. High-resolution models simulate maximum intensity of extreme precipitation over the Indian sub-continent, medium-resolution models over northeast India and South China and the low-resolution models over Bangladesh, Myanmar and Thailand. In summary, there are differences in displaying extreme precipitation statistics among the seven datasets considered here and among the 29 CMIP5 model data outputs.

  3. Approaching the exa-scale: a real-world evaluation of rendering extremely large data sets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patchett, John M; Ahrens, James P; Lo, Li - Ta

    2010-10-15

    Extremely large scale analysis is becoming increasingly important as supercomputers and their simulations move from petascale to exascale. The lack of dedicated hardware acceleration for rendering on today's supercomputing platforms motivates our detailed evaluation of the possibility of interactive rendering on the supercomputer. In order to facilitate our understanding of rendering on the supercomputing platform, we focus on scalability of rendering algorithms and architecture envisioned for exascale datasets. To understand tradeoffs for dealing with extremely large datasets, we compare three different rendering algorithms for large polygonal data: software based ray tracing, software based rasterization and hardware accelerated rasterization. We presentmore » a case study of strong and weak scaling of rendering extremely large data on both GPU and CPU based parallel supercomputers using Para View, a parallel visualization tool. Wc use three different data sets: two synthetic and one from a scientific application. At an extreme scale, algorithmic rendering choices make a difference and should be considered while approaching exascale computing, visualization, and analysis. We find software based ray-tracing offers a viable approach for scalable rendering of the projected future massive data sizes.« less

  4. Segmentation of Unstructured Datasets

    NASA Technical Reports Server (NTRS)

    Bhat, Smitha

    1996-01-01

    Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.

  5. Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets

    NASA Astrophysics Data System (ADS)

    Kim, S. K.; Prabhat, M.; Williams, D. N.

    2017-12-01

    Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.

  6. Large uncertainties in observed daily precipitation extremes over land

    NASA Astrophysics Data System (ADS)

    Herold, Nicholas; Behrangi, Ali; Alexander, Lisa V.

    2017-01-01

    We explore uncertainties in observed daily precipitation extremes over the terrestrial tropics and subtropics (50°S-50°N) based on five commonly used products: the Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) dataset, the Global Precipitation Climatology Centre-Full Data Daily (GPCC-FDD) dataset, the Tropical Rainfall Measuring Mission (TRMM) multi-satellite research product (T3B42 v7), the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR), and the Global Precipitation Climatology Project's One-Degree Daily (GPCP-1DD) dataset. We use the precipitation indices R10mm and Rx1day, developed by the Expert Team on Climate Change Detection and Indices, to explore the behavior of "moderate" and "extreme" extremes, respectively. In order to assess the sensitivity of extreme precipitation to different grid sizes we perform our calculations on four common spatial resolutions (0.25° × 0.25°, 1° × 1°, 2.5° × 2.5°, and 3.75° × 2.5°). The impact of the chosen "order of operation" in calculating these indices is also determined. Our results show that moderate extremes are relatively insensitive to product and resolution choice, while extreme extremes can be very sensitive. For example, at 0.25° × 0.25° quasi-global mean Rx1day values vary from 37 mm in PERSIANN-CDR to 62 mm in T3B42. We find that the interproduct spread becomes prominent at resolutions of 1° × 1° and finer, thus establishing a minimum effective resolution at which observational products agree. Without improvements in interproduct spread, these exceedingly large observational uncertainties at high spatial resolution may limit the usefulness of model evaluations. As has been found previously, resolution sensitivity can be largely eliminated by applying an order of operation where indices are calculated prior to regridding. However, this approach is not appropriate when true area averages are desired (e.g., for model evaluations).

  7. Hydrological Retrospective of floods and droughts: Case study in the Amazon

    NASA Astrophysics Data System (ADS)

    Wongchuig Correa, Sly; Cauduro Dias de Paiva, Rodrigo; Carlo Espinoza Villar, Jhan; Collischonn, Walter

    2017-04-01

    Recent studies have reported an increase in intensity and frequency of hydrological extreme events in many regions of the Amazon basin over last decades, these events such as seasonal floods and droughts have originated a significant impact in human and natural systems. Recently, methodologies such as climatic reanalysis are being developed in order to create a coherent register of climatic systems, thus taking this notion, this research efforts to produce a methodology called Hydrological Retrospective (HR), that essentially simulate large rainfall datasets over hydrological models in order to develop a record over past hydrology, enabling the analysis of past floods and droughts. We developed our methodology on the Amazon basin, thus we used eight large precipitation datasets (more than 30 years) through a large scale hydrological and hydrodynamic model (MGB-IPH), after that HR products were validated against several in situ discharge gauges dispersed throughout Amazon basin, given focus in maximum and minimum events. For better HR results according performance metrics, we performed a forecast skill of HR to detect floods and droughts considering in-situ observations. Furthermore, statistical temporal series trend was performed for intensity of seasonal floods and drought in the whole Amazon basin. Results indicate that better HR represented well most past extreme events registered by in-situ observed data and also showed coherent with many events cited by literature, thus we consider viable to use some large precipitation datasets as climatic reanalysis mainly based on land surface component and datasets based in merged products for represent past regional hydrology and seasonal hydrological extreme events. On the other hand, an increase trend of intensity was realized for maximum annual discharges (related to floods) in north-western regions and for minimum annual discharges (related to drought) in central-south regions of the Amazon basin, these features were previously detected by other researches. In the whole basin, we estimated an upward trend of maximum annual discharges at Amazon River. In order to estimate better future hydrological behavior and their impacts on the society, HR could be used as a methodology to understand past extreme events occurrence in many places considering the global coverage of rainfall datasets.

  8. Deep learning in the small sample size setting: cascaded feed forward neural networks for medical image segmentation

    NASA Astrophysics Data System (ADS)

    Gaonkar, Bilwaj; Hovda, David; Martin, Neil; Macyszyn, Luke

    2016-03-01

    Deep Learning, refers to large set of neural network based algorithms, have emerged as promising machine- learning tools in the general imaging and computer vision domains. Convolutional neural networks (CNNs), a specific class of deep learning algorithms, have been extremely effective in object recognition and localization in natural images. A characteristic feature of CNNs, is the use of a locally connected multi layer topology that is inspired by the animal visual cortex (the most powerful vision system in existence). While CNNs, perform admirably in object identification and localization tasks, typically require training on extremely large datasets. Unfortunately, in medical image analysis, large datasets are either unavailable or are extremely expensive to obtain. Further, the primary tasks in medical imaging are organ identification and segmentation from 3D scans, which are different from the standard computer vision tasks of object recognition. Thus, in order to translate the advantages of deep learning to medical image analysis, there is a need to develop deep network topologies and training methodologies, that are geared towards medical imaging related tasks and can work in a setting where dataset sizes are relatively small. In this paper, we present a technique for stacked supervised training of deep feed forward neural networks for segmenting organs from medical scans. Each `neural network layer' in the stack is trained to identify a sub region of the original image, that contains the organ of interest. By layering several such stacks together a very deep neural network is constructed. Such a network can be used to identify extremely small regions of interest in extremely large images, inspite of a lack of clear contrast in the signal or easily identifiable shape characteristics. What is even more intriguing is that the network stack achieves accurate segmentation even when it is trained on a single image with manually labelled ground truth. We validate this approach,using a publicly available head and neck CT dataset. We also show that a deep neural network of similar depth, if trained directly using backpropagation, cannot acheive the tasks achieved using our layer wise training paradigm.

  9. Multi-decadal Hydrological Retrospective: Case study of Amazon floods and droughts

    NASA Astrophysics Data System (ADS)

    Wongchuig Correa, Sly; Paiva, Rodrigo Cauduro Dias de; Espinoza, Jhan Carlo; Collischonn, Walter

    2017-06-01

    Recently developed methodologies such as climate reanalysis make it possible to create a historical record of climate systems. This paper proposes a methodology called Hydrological Retrospective (HR), which essentially simulates large rainfall datasets, using this as input into hydrological models to develop a record of past hydrology, making it possible to analyze past floods and droughts. We developed a methodology for the Amazon basin, where studies have shown an increase in the intensity and frequency of hydrological extreme events in recent decades. We used eight large precipitation datasets (more than 30 years) as input for a large scale hydrological and hydrodynamic model (MGB-IPH). HR products were then validated against several in situ discharge gauges controlling the main Amazon sub-basins, focusing on maximum and minimum events. For the most accurate HR, based on performance metrics, we performed a forecast skill of HR to detect floods and droughts, comparing the results with in-situ observations. A statistical temporal series trend was performed for intensity of seasonal floods and droughts in the entire Amazon basin. Results indicate that HR could represent most past extreme events well, compared with in-situ observed data, and was consistent with many events reported in literature. Because of their flow duration, some minor regional events were not reported in literature but were captured by HR. To represent past regional hydrology and seasonal hydrological extreme events, we believe it is feasible to use some large precipitation datasets such as i) climate reanalysis, which is mainly based on a land surface component, and ii) datasets based on merged products. A significant upward trend in intensity was seen in maximum annual discharge (related to floods) in western and northwestern regions and for minimum annual discharge (related to droughts) in south and central-south regions of the Amazon basin. Because of the global coverage of rainfall datasets, this methodology can be transferred to other regions for better estimation of future hydrological behavior and its impact on society.

  10. [Parallel virtual reality visualization of extreme large medical datasets].

    PubMed

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  11. Climate Change and Hydrological Extreme Events - Risks and Perspectives for Water Management in Bavaria and Québec

    NASA Astrophysics Data System (ADS)

    Ludwig, R.

    2017-12-01

    There is as yet no confirmed knowledge whether and how climate change contributes to the magnitude and frequency of hydrological extreme events and how regional water management could adapt to the corresponding risks. The ClimEx project (2015-2019) investigates the effects of climate change on the meteorological and hydrological extreme events and their implications for water management in Bavaria and Québec. High Performance Computing is employed to enable the complex simulations in a hydro-climatological model processing chain, resulting in a unique high-resolution and transient (1950-2100) dataset of climatological and meteorological forcing and hydrological response: (1) The climate module has developed a large ensemble of high resolution data (12km) of the CRCM5 RCM for Central Europe and North-Eastern North America, downscaled from 50 members of the CanESM2 GCM. The dataset is complemented by all available data from the Euro-CORDEX project to account for the assessment of both natural climate variability and climate change. The large ensemble with several thousand model years provides the potential to catch rare extreme events and thus improves the process understanding of extreme events with return periods of 1000+ years. (2) The hydrology module comprises process-based and spatially explicit model setups (e.g. WaSiM) for all major catchments in Bavaria and Southern Québec in high temporal (3h) and spatial (500m) resolution. The simulations form the basis for in depth analysis of hydrological extreme events based on the inputs from the large climate model dataset. The specific data situation enables to establish a new method for `virtual perfect prediction', which assesses climate change impacts on flood risk and water resources management by identifying patterns in the data which reveal preferential triggers of hydrological extreme events. The presentation will highlight first results from the analysis of the large scale ClimEx model ensemble, showing the current and future ratio of natural variability and climate change impacts on meteorological extreme events. Selected data from the ensemble is used to drive a hydrological model experiment to illustrate the capacity to better determine the recurrence periods of hydrological extreme events under conditions of climate change.

  12. ClimEx - Climate change and hydrological extreme events - risks and perspectives for water management in Bavaria and Québec

    NASA Astrophysics Data System (ADS)

    Ludwig, Ralf; Baese, Frank; Braun, Marco; Brietzke, Gilbert; Brissette, Francois; Frigon, Anne; Giguère, Michel; Komischke, Holger; Kranzlmueller, Dieter; Leduc, Martin; Martel, Jean-Luc; Ricard, Simon; Schmid, Josef; von Trentini, Fabian; Turcotte, Richard; Weismueller, Jens; Willkofer, Florian; Wood, Raul

    2017-04-01

    The recent accumulation of extreme hydrological events in Bavaria and Québec has stimulated scientific and also societal interest. In addition to the challenges of an improved prediction of such situations and the implications for the associated risk management, there is, as yet, no confirmed knowledge whether and how climate change contributes to the magnitude and frequency of hydrological extreme events and how regional water management could adapt to the corresponding risks. The ClimEx project (2015-2019) investigates the effects of climate change on the meteorological and hydrological extreme events and their implications for water management in Bavaria and Québec. High Performance Computing is employed to enable the complex simulations in a hydro-climatological model processing chain, resulting in a unique high-resolution and transient (1950-2100) dataset of climatological and meteorological forcing and hydrological response: (1) The climate module has developed a large ensemble of high resolution data (12km) of the CRCM5 RCM for Central Europe and North-Eastern North America, downscaled from 50 members of the CanESM2 GCM. The dataset is complemented by all available data from the Euro-CORDEX project to account for the assessment of both natural climate variability and climate change. The large ensemble with several thousand model years provides the potential to catch rare extreme events and thus improves the process understanding of extreme events with return periods of 1000+ years. (2) The hydrology module comprises process-based and spatially explicit model setups (e.g. WaSiM) for all major catchments in Bavaria and Southern Québec in high temporal (3h) and spatial (500m) resolution. The simulations form the basis for in depth analysis of hydrological extreme events based on the inputs from the large climate model dataset. The specific data situation enables to establish a new method for 'virtual perfect prediction', which assesses climate change impacts on flood risk and water resources management by identifying patterns in the data which reveal preferential triggers of hydrological extreme events. The presentation will highlight first results from the analysis of the large scale ClimEx model ensemble, showing the current and future ratio of natural variability and climate change impacts on meteorological extreme events. Selected data from the ensemble is used to drive a hydrological model experiment to illustrate the capacity to better determine the recurrence periods of hydrological extreme events under conditions of climate change. [The authors acknowledge funding for the project from the Bavarian State Ministry for the Environment and Consumer Protection].

  13. Spark and HPC for High Energy Physics Data Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sehrish, Saba; Kowalkowski, Jim; Paterno, Marc

    A full High Energy Physics (HEP) data analysis is divided into multiple data reduction phases. Processing within these phases is extremely time consuming, therefore intermediate results are stored in files held in mass storage systems and referenced as part of large datasets. This processing model limits what can be done with interactive data analytics. Growth in size and complexity of experimental datasets, along with emerging big data tools are beginning to cause changes to the traditional ways of doing data analyses. Use of big data tools for HEP analysis looks promising, mainly because extremely large HEP datasets can be representedmore » and held in memory across a system, and accessed interactively by encoding an analysis using highlevel programming abstractions. The mainstream tools, however, are not designed for scientific computing or for exploiting the available HPC platform features. We use an example from the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) in Geneva, Switzerland. The LHC is the highest energy particle collider in the world. Our use case focuses on searching for new types of elementary particles explaining Dark Matter in the universe. We use HDF5 as our input data format, and Spark to implement the use case. We show the benefits and limitations of using Spark with HDF5 on Edison at NERSC.« less

  14. Variability of hydrological extreme events in East Asia and their dynamical control: a comparison between observations and two high-resolution global climate models

    NASA Astrophysics Data System (ADS)

    Freychet, N.; Duchez, A.; Wu, C.-H.; Chen, C.-A.; Hsu, H.-H.; Hirschi, J.; Forryan, A.; Sinha, B.; New, A. L.; Graham, T.; Andrews, M. B.; Tu, C.-Y.; Lin, S.-J.

    2017-02-01

    This work investigates the variability of extreme weather events (drought spells, DS15, and daily heavy rainfall, PR99) over East Asia. It particularly focuses on the large scale atmospheric circulation associated with high levels of the occurrence of these extreme events. Two observational datasets (APHRODITE and PERSIANN) are compared with two high-resolution global climate models (HiRAM and HadGEM3-GC2) and an ensemble of other lower resolution climate models from CMIP5. We first evaluate the performance of the high resolution models. They both exhibit good skill in reproducing extreme events, especially when compared with CMIP5 results. Significant differences exist between the two observational datasets, highlighting the difficulty of having a clear estimate of extreme events. The link between the variability of the extremes and the large scale circulation is investigated, on monthly and interannual timescales, using composite and correlation analyses. Both extreme indices DS15 and PR99 are significantly linked to the low level wind intensity over East Asia, i.e. the monsoon circulation. It is also found that DS15 events are strongly linked to the surface temperature over the Siberian region and to the land-sea pressure contrast, while PR99 events are linked to the sea surface temperature anomalies over the West North Pacific. These results illustrate the importance of the monsoon circulation on extremes over East Asia. The dependencies on of the surface temperature over the continent and the sea surface temperature raise the question as to what extent they could affect the occurrence of extremes over tropical regions in future projections.

  15. TLEM 2.0 - a comprehensive musculoskeletal geometry dataset for subject-specific modeling of lower extremity.

    PubMed

    Carbone, V; Fluit, R; Pellikaan, P; van der Krogt, M M; Janssen, D; Damsgaard, M; Vigneron, L; Feilkas, T; Koopman, H F J M; Verdonschot, N

    2015-03-18

    When analyzing complex biomechanical problems such as predicting the effects of orthopedic surgery, subject-specific musculoskeletal models are essential to achieve reliable predictions. The aim of this paper is to present the Twente Lower Extremity Model 2.0, a new comprehensive dataset of the musculoskeletal geometry of the lower extremity, which is based on medical imaging data and dissection performed on the right lower extremity of a fresh male cadaver. Bone, muscle and subcutaneous fat (including skin) volumes were segmented from computed tomography and magnetic resonance images scans. Inertial parameters were estimated from the image-based segmented volumes. A complete cadaver dissection was performed, in which bony landmarks, attachments sites and lines-of-action of 55 muscle actuators and 12 ligaments, bony wrapping surfaces, and joint geometry were measured. The obtained musculoskeletal geometry dataset was finally implemented in the AnyBody Modeling System (AnyBody Technology A/S, Aalborg, Denmark), resulting in a model consisting of 12 segments, 11 joints and 21 degrees of freedom, and including 166 muscle-tendon elements for each leg. The new TLEM 2.0 dataset was purposely built to be easily combined with novel image-based scaling techniques, such as bone surface morphing, muscle volume registration and muscle-tendon path identification, in order to obtain subject-specific musculoskeletal models in a quick and accurate way. The complete dataset, including CT and MRI scans and segmented volume and surfaces, is made available at http://www.utwente.nl/ctw/bw/research/projects/TLEMsafe for the biomechanical community, in order to accelerate the development and adoption of subject-specific models on large scale. TLEM 2.0 is freely shared for non-commercial use only, under acceptance of the TLEMsafe Research License Agreement. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. A Comparison of Latent Heat Fluxes over Global Oceans for Four Flux Products

    NASA Technical Reports Server (NTRS)

    Chou, Shu-Hsien; Nelkin, Eric; Ardizzone, Joe; Atlas, Robert M.

    2003-01-01

    To improve our understanding of global energy and water cycle variability, and to improve model simulations of climate variations, it is vital to have accurate latent heat fluxes (LHF) over global oceans. Monthly LHF, 10-m wind speed (U10m), 10-m specific humidity (Q10h), and sea-air humidity difference (Qs-Q10m) of GSSTF2 (version 2 Goddard Satellite-based Surface Turbulent Fluxes) over global Oceans during 1992-93 are compared with those of HOAPS (Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data), NCEP (NCEP/NCAR reanalysis). The mean differences, standard deviations of differences, and temporal correlation of these monthly variables over global Oceans during 1992-93 between GSSTF2 and each of the three datasets are analyzed. The large-scale patterns of the 2yr-mean fields for these variables are similar among these four datasets, but significant quantitative differences are found. The temporal correlation is higher in the northern extratropics than in the south for all variables, with the contrast being especially large for da Silva as a result of more missing ship data in the south. The da Silva has extremely low temporal correlation and large differences with GSSTF2 for all variables in the southern extratropics, indicating that da Silva hardly produces a realistic variability in these variables. The NCEP has extremely low temporal correlation (0.27) and large spatial variations of differences with GSSTF2 for Qs-Q10m in the tropics, which causes the low correlation for LHF. Over the tropics, the HOAPS LHF is significantly smaller than GSSTF2 by approx. 31% (37 W/sq m), whereas the other two datasets are comparable to GSSTF2. This is because the HOAPS has systematically smaller LHF than GSSTF2 in space, while the other two datasets have very large spatial variations of large positive and negative LHF differences with GSSTF2 to cancel and to produce smaller regional-mean differences. Our analyses suggest that the GSSTF2 latent heat flux, surface air humidity, and winds are likely to be more realistic than the other three flux datasets examined, although those of GSSTF2 are still subject to regional biases.

  17. Influence of spatial and temporal scales in identifying temperature extremes

    NASA Astrophysics Data System (ADS)

    van Eck, Christel M.; Friedlingstein, Pierre; Mulder, Vera L.; Regnier, Pierre A. G.

    2016-04-01

    Extreme heat events are becoming more frequent. Notable are severe heatwaves such as the European heatwave of 2003, the Russian heat wave of 2010 and the Australian heatwave of 2013. Surface temperature is attaining new maxima not only during the summer but also during the winter. The year of 2015 is reported to be a temperature record breaking year for both summer and winter. These extreme temperatures are taking their human and environmental toll, emphasizing the need for an accurate method to define a heat extreme in order to fully understand the spatial and temporal spread of an extreme and its impact. This research aims to explore how the use of different spatial and temporal scales influences the identification of a heat extreme. For this purpose, two near-surface temperature datasets of different temporal scale and spatial scale are being used. First, the daily ERA-Interim dataset of 0.25 degree and a time span of 32 years (1979-2010). Second, the daily Princeton Meteorological Forcing Dataset of 0.5 degree and a time span of 63 years (1948-2010). A temperature is considered extreme anomalous when it is surpassing the 90th, 95th, or the 99th percentile threshold based on the aforementioned pre-processed datasets. The analysis is conducted on a global scale, dividing the world in IPCC's so-called SREX regions developed for the analysis of extreme climate events. Pre-processing is done by detrending and/or subtracting the monthly climatology based on 32 years of data for both datasets and on 63 years of data for only the Princeton Meteorological Forcing Dataset. This results in 6 datasets of temperature anomalies from which the location in time and space of the anomalous warm days are identified. Comparison of the differences between these 6 datasets in terms of absolute threshold temperatures for extremes and the temporal and spatial spread of the extreme anomalous warm days show a dependence of the results on the datasets and methodology used. This stresses the need for a careful selection of data and methodology when identifying heat extremes.

  18. HadISD: a quality-controlled global synoptic report database for selected variables at long-term stations from 1973-2011

    NASA Astrophysics Data System (ADS)

    Dunn, R. J. H.; Willett, K. M.; Thorne, P. W.; Woolley, E. V.; Durre, I.; Dai, A.; Parker, D. E.; Vose, R. S.

    2012-10-01

    This paper describes the creation of HadISD: an automatically quality-controlled synoptic resolution dataset of temperature, dewpoint temperature, sea-level pressure, wind speed, wind direction and cloud cover from global weather stations for 1973-2011. The full dataset consists of over 6000 stations, with 3427 long-term stations deemed to have sufficient sampling and quality for climate applications requiring sub-daily resolution. As with other surface datasets, coverage is heavily skewed towards Northern Hemisphere mid-latitudes. The dataset is constructed from a large pre-existing ASCII flatfile data bank that represents over a decade of substantial effort at data retrieval, reformatting and provision. These raw data have had varying levels of quality control applied to them by individual data providers. The work proceeded in several steps: merging stations with multiple reporting identifiers; reformatting to netCDF; quality control; and then filtering to form a final dataset. Particular attention has been paid to maintaining true extreme values where possible within an automated, objective process. Detailed validation has been performed on a subset of global stations and also on UK data using known extreme events to help finalise the QC tests. Further validation was performed on a selection of extreme events world-wide (Hurricane Katrina in 2005, the cold snap in Alaska in 1989 and heat waves in SE Australia in 2009). Some very initial analyses are performed to illustrate some of the types of problems to which the final data could be applied. Although the filtering has removed the poorest station records, no attempt has been made to homogenise the data thus far, due to the complexity of retaining the true distribution of high-resolution data when applying adjustments. Hence non-climatic, time-varying errors may still exist in many of the individual station records and care is needed in inferring long-term trends from these data. This dataset will allow the study of high frequency variations of temperature, pressure and humidity on a global basis over the last four decades. Both individual extremes and the overall population of extreme events could be investigated in detail to allow for comparison with past and projected climate. A version-control system has been constructed for this dataset to allow for the clear documentation of any updates and corrections in the future.

  19. Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

    NASA Astrophysics Data System (ADS)

    Williamson, A.; Newman, A. V.

    2017-12-01

    Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.

  20. Characterizing the Spatial Contiguity of Extreme Precipitation over the US in the Recent Past

    NASA Astrophysics Data System (ADS)

    Touma, D. E.; Swain, D. L.; Diffenbaugh, N. S.

    2016-12-01

    The spatial characteristics of extreme precipitation over an area can define the hydrologic response in a basin, subsequently affecting the flood risk in the region. Here, we examine the spatial extent of extreme precipitation in the US by defining its "footprint": a contiguous area of rainfall exceeding a certain threshold (e.g., 90th percentile) on a given day. We first characterize the climatology of extreme rainfall footprint sizes across the US from 1980-2015 using Daymet, a high-resolution observational gridded rainfall dataset. We find that there are distinct regional and seasonal differences in average footprint sizes of extreme daily rainfall. In the winter, the Midwest shows footprints exceeding 500,000 sq. km while the Front Range exhibits footprints of 10,000 sq. km. Alternatively, the summer average footprint size is generally smaller and more uniform across the US, ranging from 10,000 sq. km in the Southwest to 100,000 sq. km in Montana and North Dakota. Moreover, we find that there are some significant increasing trends of average footprint size between 1980-2015, specifically in the Southwest in the winter and the Northeast in the spring. While gridded daily rainfall datasets allow for a practical framework in calculating footprint size, this calculation heavily depends on the interpolation methods that have been used in creating the dataset. Therefore, we assess footprint size using the GHCN-Daily station network and use geostatistical methods to define footprints of extreme rainfall directly from station data. Compared to the findings from Daymet, preliminary results using this method show fewer small daily footprint sizes over the US while large footprints are of similar number and magnitude to Daymet. Overall, defining the spatial characteristics of extreme rainfall as well as observed and expected changes in these characteristics allows us to better understand the hydrologic response to extreme rainfall and how to better characterize flood risks.

  1. Very Large Graphs for Information Extraction (VLG) Detection and Inference in the Presence of Uncertainty

    DTIC Science & Technology

    2015-09-21

    this framework, MIT LL carried out a one-year proof- of-concept study to determine the capabilities and challenges in the detection of anomalies in...extremely large graphs [5]. Under this effort, two real datasets were considered, and algorithms for data modeling and anomaly detection were developed...is required in a well-defined experimental framework for the detection of anomalies in very large graphs. This study is intended to inform future

  2. Improved Hourly and Sub-Hourly Gauge Data for Assessing Precipitation Extremes in the U.S.

    NASA Astrophysics Data System (ADS)

    Lawrimore, J. H.; Wuertz, D.; Palecki, M. A.; Kim, D.; Stevens, S. E.; Leeper, R.; Korzeniewski, B.

    2017-12-01

    The NOAA/National Weather Service (NWS) Fischer-Porter (F&P) weighing bucket precipitation gauge network consists of approximately 2000 stations that comprise a subset of the NWS Cooperative Observers Program network. This network has operated since the mid-20th century, providing one of the longest records of hourly and 15-minute precipitation observations in the U.S. The lengthy record of this dataset combined with its relatively high spatial density, provides an important source of data for many hydrological applications including understanding trends and variability in the frequency and intensity of extreme precipitation events. In recent years NOAA's National Centers for Environmental Information initiated an upgrade of its end-to-end processing and quality control system for these data. This involved a change from a largely manual review and edit process to a fully automated system that removes the subjectivity that was previously a necessary part of dataset quality control and processing. An overview of improvements to this dataset is provided along with the results of an analysis of observed variability and trends in U.S. precipitation extremes since the mid-20th century. Multi-decadal trends in many parts of the nation are consistent with model projections of an increase in the frequency and intensity of heavy precipitation in a warming world.

  3. A comparison of two global datasets of extreme sea levels and resulting flood exposure

    NASA Astrophysics Data System (ADS)

    Muis, Sanne; Verlaan, Martin; Nicholls, Robert J.; Brown, Sally; Hinkel, Jochen; Lincke, Daniel; Vafeidis, Athanasios T.; Scussolini, Paolo; Winsemius, Hessel C.; Ward, Philip J.

    2017-04-01

    Estimating the current risk of coastal flooding requires adequate information on extreme sea levels. For over a decade, the only global data available was the DINAS-COAST Extreme Sea Levels (DCESL) dataset, which applies a static approximation to estimate extreme sea levels. Recently, a dynamically derived dataset was developed: the Global Tide and Surge Reanalysis (GTSR) dataset. Here, we compare the two datasets. The differences between DCESL and GTSR are generally larger than the confidence intervals of GTSR. Compared to observed extremes, DCESL generally overestimates extremes with a mean bias of 0.6 m. With a mean bias of -0.2 m GTSR generally underestimates extremes, particularly in the tropics. The Dynamic Interactive Vulnerability Assessment model is applied to calculate the present-day flood exposure in terms of the land area and the population below the 1 in 100-year sea levels. Global exposed population is 28% lower when based on GTSR instead of DCESL. Considering the limited data available at the time, DCESL provides a good estimate of the spatial variation in extremes around the world. However, GTSR allows for an improved assessment of the impacts of coastal floods, including confidence bounds. We further improve the assessment of coastal impacts by correcting for the conflicting vertical datum of sea-level extremes and land elevation, which has not been accounted for in previous global assessments. Converting the extreme sea levels to the same vertical reference used for the elevation data is shown to be a critical step resulting in 39-59% higher estimate of population exposure.

  4. Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling.

    PubMed

    Hird, Sarah; Kubatko, Laura; Carstens, Bryan

    2010-11-01

    We describe a method for estimating species trees that relies on replicated subsampling of large data matrices. One application of this method is phylogeographic research, which has long depended on large datasets that sample intensively from the geographic range of the focal species; these datasets allow systematicists to identify cryptic diversity and understand how contemporary and historical landscape forces influence genetic diversity. However, analyzing any large dataset can be computationally difficult, particularly when newly developed methods for species tree estimation are used. Here we explore the use of replicated subsampling, a potential solution to the problem posed by large datasets, with both a simulation study and an empirical analysis. In the simulations, we sample different numbers of alleles and loci, estimate species trees using STEM, and compare the estimated to the actual species tree. Our results indicate that subsampling three alleles per species for eight loci nearly always results in an accurate species tree topology, even in cases where the species tree was characterized by extremely rapid divergence. Even more modest subsampling effort, for example one allele per species and two loci, was more likely than not (>50%) to identify the correct species tree topology, indicating that in nearly all cases, computing the majority-rule consensus tree from replicated subsampling provides a good estimate of topology. These results were supported by estimating the correct species tree topology and reasonable branch lengths for an empirical 10-locus great ape dataset. Copyright © 2010 Elsevier Inc. All rights reserved.

  5. Evolution of precipitation extremes in two large ensembles of climate simulations

    NASA Astrophysics Data System (ADS)

    Martel, Jean-Luc; Mailhot, Alain; Talbot, Guillaume; Brissette, François; Ludwig, Ralf; Frigon, Anne; Leduc, Martin; Turcotte, Richard

    2017-04-01

    Recent studies project significant changes in the future distribution of precipitation extremes due to global warming. It is likely that extreme precipitation intensity will increase in a future climate and that extreme events will be more frequent. In this work, annual maxima daily precipitation series from the Canadian Earth System Model (CanESM2) 50-member large ensemble (spatial resolution of 2.8°x2.8°) and the Community Earth System Model (CESM1) 40-member large ensemble (spatial resolution of 1°x1°) are used to investigate extreme precipitation over the historical (1980-2010) and future (2070-2100) periods. The use of these ensembles results in respectively 1 500 (30 years x 50 members) and 1200 (30 years x 40 members) simulated years over both the historical and future periods. These large datasets allow the computation of empirical daily extreme precipitation quantiles for large return periods. Using the CanESM2 and CESM1 large ensembles, extreme daily precipitation with return periods ranging from 2 to 100 years are computed in historical and future periods to assess the impact of climate change. Results indicate that daily precipitation extremes generally increase in the future over most land grid points and that these increases will also impact the 100-year extreme daily precipitation. Considering that many public infrastructures have lifespans exceeding 75 years, the increase in extremes has important implications on service levels of water infrastructures and public safety. Estimated increases in precipitation associated to very extreme precipitation events (e.g. 100 years) will drastically change the likelihood of flooding and their extent in future climate. These results, although interesting, need to be extended to sub-daily durations, relevant for urban flooding protection and urban infrastructure design (e.g. sewer networks, culverts). Models and simulations at finer spatial and temporal resolution are therefore needed.

  6. Topological data analyses and machine learning for detection, classification and characterization of atmospheric rivers

    NASA Astrophysics Data System (ADS)

    Muszynski, G.; Kashinath, K.; Wehner, M. F.; Prabhat, M.; Kurlin, V.

    2017-12-01

    We investigate novel approaches to detecting, classifying and characterizing extreme weather events, such as atmospheric rivers (ARs), in large high-dimensional climate datasets. ARs are narrow filaments of concentrated water vapour in the atmosphere that bring much of the precipitation in many mid-latitude regions. The precipitation associated with ARs is also responsible for major flooding events in many coastal regions of the world, including the west coast of the United States and western Europe. In this study we combine ideas from Topological Data Analysis (TDA) with Machine Learning (ML) for detecting, classifying and characterizing extreme weather events, like ARs. TDA is a new field that sits at the interface between topology and computer science, that studies "shape" - hidden topological structure - in raw data. It has been applied successfully in many areas of applied sciences, including complex networks, signal processing and image recognition. Using TDA we provide ARs with a shape characteristic as a new feature descriptor for the task of AR classification. In particular, we track the change in topology in precipitable water (integrated water vapour) fields using the Union-Find algorithm. We use the generated feature descriptors with ML classifiers to establish reliability and classification performance of our approach. We utilize the parallel toolkit for extreme climate events analysis (TECA: Petascale Pattern Recognition for Climate Science, Prabhat et al., Computer Analysis of Images and Patterns, 2015) for comparison (it is assumed that events identified by TECA is ground truth). Preliminary results indicate that our approach brings new insight into the study of ARs and provides quantitative information about the relevance of topological feature descriptors in analyses of a large climate datasets. We illustrate this method on climate model output and NCEP reanalysis datasets. Further, our method outperforms existing methods on detection and classification of ARs. This work illustrates that TDA combined with ML may provide a uniquely powerful approach for detection, classification and characterization of extreme weather phenomena.

  7. Assessment of Observational Uncertainty in Extreme Precipitation Events over the Continental United States

    NASA Astrophysics Data System (ADS)

    Slinskey, E. A.; Loikith, P. C.; Waliser, D. E.; Goodman, A.

    2017-12-01

    Extreme precipitation events are associated with numerous societal and environmental impacts. Furthermore, anthropogenic climate change is projected to alter precipitation intensity across portions of the Continental United States (CONUS). Therefore, a spatial understanding and intuitive means of monitoring extreme precipitation over time is critical. Towards this end, we apply an event-based indicator, developed as a part of NASA's support of the ongoing efforts of the US National Climate Assessment, which assigns categories to extreme precipitation events based on 3-day storm totals as a basis for dataset intercomparison. To assess observational uncertainty across a wide range of historical precipitation measurement approaches, we intercompare in situ station data from the Global Historical Climatology Network (GHCN), satellite-derived precipitation data from NASA's Tropical Rainfall Measuring Mission (TRMM), gridded in situ station data from the Parameter-elevation Regressions on Independent Slopes Model (PRISM), global reanalysis from NASA's Modern Era Retrospective-Analysis version 2 (MERRA 2), and regional reanalysis with gauge data assimilation from NCEP's North American Regional Reanalysis (NARR). Results suggest considerable variability across the five-dataset suite in the frequency, spatial extent, and magnitude of extreme precipitation events. Consistent with expectations, higher resolution datasets were found to resemble station data best and capture a greater frequency of high-end extreme events relative to lower spatial resolution datasets. The degree of dataset agreement varies regionally, however all datasets successfully capture the seasonal cycle of precipitation extremes across the CONUS. These intercomparison results provide additional insight about observational uncertainty and the ability of a range of precipitation measurement and analysis products to capture extreme precipitation event climatology. While the event category threshold is fixed in this analysis, preliminary results from the development of a flexible categorization scheme, that scales with grid resolution, are presented.

  8. Evolution of Precipitation Extremes in Three Large Ensembles of Climate Simulations - Impact of Spatial and Temporal Resolutions

    NASA Astrophysics Data System (ADS)

    Martel, J. L.; Brissette, F.; Mailhot, A.; Wood, R. R.; Ludwig, R.; Frigon, A.; Leduc, M.; Turcotte, R.

    2017-12-01

    Recent studies indicate that the frequency and intensity of extreme precipitation will increase in future climate due to global warming. In this study, we compare annual maxima precipitation series from three large ensembles of climate simulations at various spatial and temporal resolutions. The first two are at the global scale: the Canadian Earth System Model (CanESM2) 50-member large ensemble (CanESM2-LE) at a 2.8° resolution and the Community Earth System Model (CESM1) 40-member large ensemble (CESM1-LE) at a 1° resolution. The third ensemble is at the regional scale over both Eastern North America and Europe: the Canadian Regional Climate Model (CRCM5) 50-member large ensemble (CRCM5-LE) at a 0.11° resolution, driven at its boundaries by the CanESM-LE. The CRCM5-LE is a new ensemble issued from the ClimEx project (http://www.climex-project.org), a Québec-Bavaria collaboration. Using these three large ensembles, change in extreme precipitations over the historical (1980-2010) and future (2070-2100) periods are investigated. This results in 1 500 (30 years x 50 members for CanESM2-LE and CRCM5-LE) and 1200 (30 years x 40 members for CESM1-LE) simulated years over both the historical and future periods. Using these large datasets, the empirical daily (and sub-daily for CRCM5-LE) extreme precipitation quantiles for large return periods ranging from 2 to 100 years are computed. Results indicate that daily extreme precipitations generally will increase over most land grid points of both domains according to the three large ensembles. Regarding the CRCM5-LE, the increase in sub-daily extreme precipitations will be even more important than the one observed for daily extreme precipitations. Considering that many public infrastructures have lifespans exceeding 75 years, the increase in extremes has important implications on service levels of water infrastructures and public safety.

  9. Ensemble reconstruction of spatio-temporal extreme low-flow events in France since 1871

    NASA Astrophysics Data System (ADS)

    Caillouet, Laurie; Vidal, Jean-Philippe; Sauquet, Eric; Devers, Alexandre; Graff, Benjamin

    2017-06-01

    The length of streamflow observations is generally limited to the last 50 years even in data-rich countries like France. It therefore offers too small a sample of extreme low-flow events to properly explore the long-term evolution of their characteristics and associated impacts. To overcome this limit, this work first presents a daily 140-year ensemble reconstructed streamflow dataset for a reference network of near-natural catchments in France. This dataset, called SCOPE Hydro (Spatially COherent Probabilistic Extended Hydrological dataset), is based on (1) a probabilistic precipitation, temperature, and reference evapotranspiration downscaling of the Twentieth Century Reanalysis over France, called SCOPE Climate, and (2) continuous hydrological modelling using SCOPE Climate as forcings over the whole period. This work then introduces tools for defining spatio-temporal extreme low-flow events. Extreme low-flow events are first locally defined through the sequent peak algorithm using a novel combination of a fixed threshold and a daily variable threshold. A dedicated spatial matching procedure is then established to identify spatio-temporal events across France. This procedure is furthermore adapted to the SCOPE Hydro 25-member ensemble to characterize in a probabilistic way unrecorded historical events at the national scale. Extreme low-flow events are described and compared in a spatially and temporally homogeneous way over 140 years on a large set of catchments. Results highlight well-known recent events like 1976 or 1989-1990, but also older and relatively forgotten ones like the 1878 and 1893 events. These results contribute to improving our knowledge of historical events and provide a selection of benchmark events for climate change adaptation purposes. Moreover, this study allows for further detailed analyses of the effect of climate variability and anthropogenic climate change on low-flow hydrology at the scale of France.

  10. Can Dynamic Global Vegetation Models Reproduce Satellite Observed Extreme Browning and Greening Events in Vegetation Productivity?

    NASA Astrophysics Data System (ADS)

    van Eck, C. M.; Morfopoulos, C.; Betts, R. A.; Chang, J.; Ciais, P.; Friedlingstein, P.; Regnier, P. A. G.

    2016-12-01

    The frequency and severity of extreme climate events such as droughts, extreme precipitation and heatwaves are expected to increase in our changing climate. These extreme climate events will have an effect on vegetation either by enhanced or reduced productivity. Subsequently, this can have a substantial impact on the terrestrial carbon sink and thus the global carbon cycle, especially as extreme climate events are expected to increase in frequency and severity. Connecting observational datasets with modelling studies provides new insights into these climate-vegetation interactions. This study aims to compare extremes in vegetation productivity as derived from observations with that of Dynamic Global Vegetation Models (DGVMs). In this case GIMMS-NDVI 3g is selected as the observational dataset and both JULES (Joint UK Land Environment Simulator) and ORCHIDEE (Organising Carbon and Hydrology In Dynamic Ecosystems) as the DGVMs. Both models are forced with PGFv2 Global Meteorological Forcing Dataset according to the ISI-MIP2 protocol for historical runs. Extremes in vegetation productivity are the focal point, which are identified as NDVI anomalies below the 10th percentile or above the 90th percentile during the growing season, referred to as browning or greening events respectively. The monthly NDVI dataset GIMMS-NDVI 3g is used to obtain the location in time and space of the vegetation extremes. The global GIMMS-NDVI 3g dataset has been subdivided into IPCC's SREX-regions for which the NDVI anomalies are calculated and the extreme thresholds are determined. With this information we can identify the location in time and space of the browning and greening events in remotely-sensed vegetation productivity. The same procedure is applied to the modelled Gross Primary Productivity (GPP) allowing a comparison between the spatial and temporal occurrence of the browning and greening events in the observational dataset and the models' output. The capacity of the models to catch observed extremes in vegetation productivity is assessed and compared. Factors contributing to observed and modelled vegetation browning/greening extremes are analysed. The results of this study provide a stepping stone to modelling future extremes in vegetation productivity.

  11. A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics.

    PubMed

    Deng, Wan-Yu; Bai, Zuo; Huang, Guang-Bin; Zheng, Qing-Hua

    2016-05-01

    Big dimensional data is a growing trend that is emerging in many real world contexts, extending from web mining, gene expression analysis, protein-protein interaction to high-frequency financial data. Nowadays, there is a growing consensus that the increasing dimensionality poses impeding effects on the performances of classifiers, which is termed as the "peaking phenomenon" in the field of machine intelligence. To address the issue, dimensionality reduction is commonly employed as a preprocessing step on the Big dimensional data before building the classifiers. In this paper, we propose an Extreme Learning Machine (ELM) approach for large-scale data analytic. In contrast to existing approaches, we embed hidden nodes that are designed using singular value decomposition (SVD) into the classical ELM. These SVD nodes in the hidden layer are shown to capture the underlying characteristics of the Big dimensional data well, exhibiting excellent generalization performances. The drawback of using SVD on the entire dataset, however, is the high computational complexity involved. To address this, a fast divide and conquer approximation scheme is introduced to maintain computational tractability on high volume data. The resultant algorithm proposed is labeled here as Fast Singular Value Decomposition-Hidden-nodes based Extreme Learning Machine or FSVD-H-ELM in short. In FSVD-H-ELM, instead of identifying the SVD hidden nodes directly from the entire dataset, SVD hidden nodes are derived from multiple random subsets of data sampled from the original dataset. Comprehensive experiments and comparisons are conducted to assess the FSVD-H-ELM against other state-of-the-art algorithms. The results obtained demonstrated the superior generalization performance and efficiency of the FSVD-H-ELM. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Curious or spurious correlations within a national-scale forest inventory?

    Treesearch

    Christopher W. Woodall; James A. Westfall

    2012-01-01

    Foresters are increasingly required to assess trends not only in traditional forest attributes (e.g., growing-stock volumes), but also across suites of forest health indicators and site/climate variables. Given the tenuous relationship between correlation and causality within extremely large datasets, the goal of this study was to use a nationwide annual forest...

  13. Contribution of large-scale circulation anomalies to changes in extreme precipitation frequency in the United States

    NASA Astrophysics Data System (ADS)

    Yu, Lejiang; Zhong, Shiyuan; Pei, Lisi; Bian, Xindi; Heilman, Warren E.

    2016-04-01

    The mean global climate has warmed as a result of the increasing emission of greenhouse gases induced by human activities. This warming is considered the main reason for the increasing number of extreme precipitation events in the US. While much attention has been given to extreme precipitation events occurring over several days, which are usually responsible for severe flooding over a large region, little is known about how extreme precipitation events that cause flash flooding and occur at sub-daily time scales have changed over time. Here we use the observed hourly precipitation from the North American Land Data Assimilation System Phase 2 forcing datasets to determine trends in the frequency of extreme precipitation events of short (1 h, 3 h, 6 h, 12 h and 24 h) duration for the period 1979-2013. The results indicate an increasing trend in the central and eastern US. Over most of the western US, especially the Southwest and the Intermountain West, the trends are generally negative. These trends can be largely explained by the interdecadal variability of the Pacific Decadal Oscillation and Atlantic Multidecadal Oscillation (AMO), with the AMO making a greater contribution to the trends in both warm and cold seasons.

  14. Quality-control of an hourly rainfall dataset and climatology of extremes for the UK.

    PubMed

    Blenkinsop, Stephen; Lewis, Elizabeth; Chan, Steven C; Fowler, Hayley J

    2017-02-01

    Sub-daily rainfall extremes may be associated with flash flooding, particularly in urban areas but, compared with extremes on daily timescales, have been relatively little studied in many regions. This paper describes a new, hourly rainfall dataset for the UK based on ∼1600 rain gauges from three different data sources. This includes tipping bucket rain gauge data from the UK Environment Agency (EA), which has been collected for operational purposes, principally flood forecasting. Significant problems in the use of such data for the analysis of extreme events include the recording of accumulated totals, high frequency bucket tips, rain gauge recording errors and the non-operation of gauges. Given the prospect of an intensification of short-duration rainfall in a warming climate, the identification of such errors is essential if sub-daily datasets are to be used to better understand extreme events. We therefore first describe a series of procedures developed to quality control this new dataset. We then analyse ∼380 gauges with near-complete hourly records for 1992-2011 and map the seasonal climatology of intense rainfall based on UK hourly extremes using annual maxima, n-largest events and fixed threshold approaches. We find that the highest frequencies and intensities of hourly extreme rainfall occur during summer when the usual orographically defined pattern of extreme rainfall is replaced by a weaker, north-south pattern. A strong diurnal cycle in hourly extremes, peaking in late afternoon to early evening, is also identified in summer and, for some areas, in spring. This likely reflects the different mechanisms that generate sub-daily rainfall, with convection dominating during summer. The resulting quality-controlled hourly rainfall dataset will provide considerable value in several contexts, including the development of standard, globally applicable quality-control procedures for sub-daily data, the validation of the new generation of very high-resolution climate models and improved understanding of the drivers of extreme rainfall.

  15. Quality‐control of an hourly rainfall dataset and climatology of extremes for the UK

    PubMed Central

    Lewis, Elizabeth; Chan, Steven C.; Fowler, Hayley J.

    2016-01-01

    ABSTRACT Sub‐daily rainfall extremes may be associated with flash flooding, particularly in urban areas but, compared with extremes on daily timescales, have been relatively little studied in many regions. This paper describes a new, hourly rainfall dataset for the UK based on ∼1600 rain gauges from three different data sources. This includes tipping bucket rain gauge data from the UK Environment Agency (EA), which has been collected for operational purposes, principally flood forecasting. Significant problems in the use of such data for the analysis of extreme events include the recording of accumulated totals, high frequency bucket tips, rain gauge recording errors and the non‐operation of gauges. Given the prospect of an intensification of short‐duration rainfall in a warming climate, the identification of such errors is essential if sub‐daily datasets are to be used to better understand extreme events. We therefore first describe a series of procedures developed to quality control this new dataset. We then analyse ∼380 gauges with near‐complete hourly records for 1992–2011 and map the seasonal climatology of intense rainfall based on UK hourly extremes using annual maxima, n‐largest events and fixed threshold approaches. We find that the highest frequencies and intensities of hourly extreme rainfall occur during summer when the usual orographically defined pattern of extreme rainfall is replaced by a weaker, north–south pattern. A strong diurnal cycle in hourly extremes, peaking in late afternoon to early evening, is also identified in summer and, for some areas, in spring. This likely reflects the different mechanisms that generate sub‐daily rainfall, with convection dominating during summer. The resulting quality‐controlled hourly rainfall dataset will provide considerable value in several contexts, including the development of standard, globally applicable quality‐control procedures for sub‐daily data, the validation of the new generation of very high‐resolution climate models and improved understanding of the drivers of extreme rainfall. PMID:28239235

  16. Automatic identification of bird targets with radar via patterns produced by wing flapping.

    PubMed

    Zaugg, Serge; Saporta, Gilbert; van Loon, Emiel; Schmaljohann, Heiko; Liechti, Felix

    2008-09-06

    Bird identification with radar is important for bird migration research, environmental impact assessments (e.g. wind farms), aircraft security and radar meteorology. In a study on bird migration, radar signals from birds, insects and ground clutter were recorded. Signals from birds show a typical pattern due to wing flapping. The data were labelled by experts into the four classes BIRD, INSECT, CLUTTER and UFO (unidentifiable signals). We present a classification algorithm aimed at automatic recognition of bird targets. Variables related to signal intensity and wing flapping pattern were extracted (via continuous wavelet transform). We used support vector classifiers to build predictive models. We estimated classification performance via cross validation on four datasets. When data from the same dataset were used for training and testing the classifier, the classification performance was extremely to moderately high. When data from one dataset were used for training and the three remaining datasets were used as test sets, the performance was lower but still extremely to moderately high. This shows that the method generalizes well across different locations or times. Our method provides a substantial gain of time when birds must be identified in large collections of radar signals and it represents the first substantial step in developing a real time bird identification radar system. We provide some guidelines and ideas for future research.

  17. The Generation of a Stochastic Flood Event Catalogue for Continental USA

    NASA Astrophysics Data System (ADS)

    Quinn, N.; Wing, O.; Smith, A.; Sampson, C. C.; Neal, J. C.; Bates, P. D.

    2017-12-01

    Recent advances in the acquisition of spatiotemporal environmental data and improvements in computational capabilities has enabled the generation of large scale, even global, flood hazard layers which serve as a critical decision-making tool for a range of end users. However, these datasets are designed to indicate only the probability and depth of inundation at a given location and are unable to describe the likelihood of concurrent flooding across multiple sites.Recent research has highlighted that although the estimation of large, widespread flood events is of great value to flood mitigation and insurance industries, to date it has been difficult to deal with this spatial dependence structure in flood risk over relatively large scales. Many existing approaches have been restricted to empirical estimates of risk based on historic events, limiting their capability of assessing risk over the full range of plausible scenarios. Therefore, this research utilises a recently developed model-based approach to describe the multisite joint distribution of extreme river flows across continental USA river gauges. Given an extreme event at a site, the model characterises the likelihood neighbouring sites are also impacted. This information is used to simulate an ensemble of plausible synthetic extreme event footprints from which flood depths are extracted from an existing global flood hazard catalogue. Expected economic losses are then estimated by overlaying flood depths with national datasets defining asset locations, characteristics and depth damage functions. The ability of this approach to quantify probabilistic economic risk and rare threshold exceeding events is expected to be of value to those interested in the flood mitigation and insurance sectors.This work describes the methodological steps taken to create the flood loss catalogue over a national scale; highlights the uncertainty in the expected annual economic vulnerability within the USA from extreme river flows; and presents future developments to the modelling approach.

  18. Global patterns of extreme drought-induced loss in land primary production: Identifying ecological extremes from rain-use efficiency.

    PubMed

    Du, Ling; Mikle, Nathaniel; Zou, Zhenhua; Huang, Yuanyuan; Shi, Zheng; Jiang, Lifen; McCarthy, Heather R; Liang, Junyi; Luo, Yiqi

    2018-07-01

    Quantifying the ecological patterns of loss of ecosystem function in extreme drought is important to understand the carbon exchange between the land and atmosphere. Rain-use efficiency [RUE; gross primary production (GPP)/precipitation] acts as a typical indicator of ecosystem function. In this study, a novel method based on maximum rain-use efficiency (RUE max ) was developed to detect losses of ecosystem function globally. Three global GPP datasets from the MODIS remote sensing data (MOD17), ground upscaling FLUXNET observations (MPI-BGC), and process-based model simulations (BESS), and a global gridded precipitation product (CRU) were used to develop annual global RUE datasets for 2001-2011. Large, well-known extreme drought events were detected, e.g. 2003 drought in Europe, 2002 and 2011 drought in the U.S., and 2010 drought in Russia. Our results show that extreme drought-induced loss of ecosystem function could impact 0.9% ± 0.1% of earth's vegetated land per year and was mainly distributed in semi-arid regions. The reduced carbon uptake caused by functional loss (0.14 ± 0.03 PgC/yr) could explain >70% of the interannual variation in GPP in drought-affected areas (p ≤ 0.001). Our results highlight the impact of ecosystem function loss in semi-arid regions with increasing precipitation variability and dry land expansion expected in the future. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. LITHO1.0: An Updated Crust and Lithosphere Model of the Earth

    NASA Astrophysics Data System (ADS)

    Masters, G.; Ma, Z.; Laske, G.; Pasyanos, M. E.

    2011-12-01

    We are developing LITHO1.0: an updated crust and lithosphere model of the Earth. The overall plan is to take the popular CRUST2.0 model - a global model of crustal structure with a relatively poor representation of the uppermost mantle - and improve its nominal resolution to 1 degree and extend the model to include lithospheric structure. The new model, LITHO1.0, will be constrained by many different datasets including extremely large new datasets of relatively short period group velocity data. Other data sets include (but are not limited to) compilations of receiver function constraints and active source studies. To date, we have completed the compilation of extremely large global datasets of group velocity for Rayleigh and Love waves from 10mHz to 40mHz using a cluster analysis technique. We have also extended the method to measure phase velocity and are complementing the group velocity with global data sets of longer period phase data that help to constrain deep lithosphere properties. To model these data, we require a starting model for the crust at a nominal resolution of 1 degree. This has been developed by constructing a map of crustal thickness using data from receiver function and active source experiments where available, and by using CRUST2.0 where other constraints are not available. Particular care has been taken to make sure that the locations of sharp changes in crustal thickness are accurately represented. This map is then used as a template to extend CRUST2.0 to 1 degree nominal resolution and to develop starting maps of all crustal properties. We are currently modeling the data using two techniques. The first is a linearized inversion about the 3D crustal starting model. Note that it is important to use local eigenfunctions to compute Frechet derivatives due to the extreme variations in crustal structure. Another technique uses a targeted grid search method. A preliminary model for the crustal part of the model will be presented.

  20. Evaluation of CORDEX-Arctic daily precipitation and temperature-based climate indices over Canadian Arctic land areas

    NASA Astrophysics Data System (ADS)

    Diaconescu, Emilia Paula; Mailhot, Alain; Brown, Ross; Chaumont, Diane

    2018-03-01

    This study focuses on the evaluation of daily precipitation and temperature climate indices and extremes simulated by an ensemble of 12 Regional Climate Model (RCM) simulations from the ARCTIC-CORDEX experiment with surface observations in the Canadian Arctic from the Adjusted Historical Canadian Climate Dataset. Five global reanalyses products (ERA-Interim, JRA55, MERRA, CFSR and GMFD) are also included in the evaluation to assess their potential for RCM evaluation in data sparse regions. The study evaluated the means and annual anomaly distributions of indices over the 1980-2004 dataset overlap period. The results showed that RCM and reanalysis performance varied with the climate variables being evaluated. Most RCMs and reanalyses were able to simulate well climate indices related to mean air temperature and hot extremes over most of the Canadian Arctic, with the exception of the Yukon region where models displayed the largest biases related to topographic effects. Overall performance was generally poor for indices related to cold extremes. Likewise, only a few RCM simulations and reanalyses were able to provide realistic simulations of precipitation extreme indicators. The multi-reanalysis ensemble provided superior results to individual datasets for climate indicators related to mean air temperature and hot extremes, but not for other indicators. These results support the use of reanalyses as reference datasets for the evaluation of RCM mean air temperature and hot extremes over northern Canada, but not for cold extremes and precipitation indices.

  1. Tempest: Tools for Addressing the Needs of Next-Generation Climate Models

    NASA Astrophysics Data System (ADS)

    Ullrich, P. A.; Guerra, J. E.; Pinheiro, M. C.; Fong, J.

    2015-12-01

    Tempest is a comprehensive simulation-to-science infrastructure that tackles the needs of next-generation, high-resolution, data intensive climate modeling activities. This project incorporates three key components: TempestDynamics, a global modeling framework for experimental numerical methods and high-performance computing; TempestRemap, a toolset for arbitrary-order conservative and consistent remapping between unstructured grids; and TempestExtremes, a suite of detection and characterization tools for identifying weather extremes in large climate datasets. In this presentation, the latest advances with the implementation of this framework will be discussed, and a number of projects now utilizing these tools will be featured.

  2. Analysis of extreme summers and prior late winter/spring conditions in central Europe

    NASA Astrophysics Data System (ADS)

    Träger-Chatterjee, C.; Müller, R. W.; Bendix, J.

    2013-05-01

    Drought and heat waves during summer in mid-latitudes are a serious threat to human health and agriculture and have negative impacts on the infrastructure, such as problems in energy supply. The appearance of such extreme events is expected to increase with the progress of global warming. A better understanding of the development of extremely hot and dry summers and the identification of possible precursors could help improve existing seasonal forecasts in this regard, and could possibly lead to the development of early warning methods. The development of extremely hot and dry summer seasons in central Europe is attributed to a combined effect of the dominance of anticyclonic weather regimes and soil moisture-atmosphere interactions. The atmospheric circulation largely determines the amount of solar irradiation and the amount of precipitation in an area. These two variables are themselves major factors controlling the soil moisture. Thus, solar irradiation and precipitation are used as proxies to analyse extreme sunny and dry late winter/spring and summer seasons for the period 1958-2011 in Germany and adjacent areas. For this purpose, solar irradiation data from the European Center for Medium Range Weather Forecast 40-yr and interim re-analysis dataset, as well as remote sensing data are used. Precipitation data are taken from the Global Precipitation Climatology Project. To analyse the atmospheric circulation geopotential data at 850 hPa are also taken from the European Center for Medium Range Weather Forecast 40-yr and interim re-analysis datasets. For the years in which extreme summers in terms of high solar irradiation and low precipitation are identified, the previous late winter/spring conditions of solar irradiation and precipitation in Germany and adjacent areas are analysed. Results show that if the El Niño-Southern Oscillation (ENSO) is not very intensely developed, extremely high solar irradiation amounts, together with extremely low precipitation amounts during late winter/spring, might serve as precursor of extremely sunny and dry summer months to be expected.

  3. Observations of Stratiform Lightning Flashes and Their Microphysical and Kinematic Environments

    NASA Technical Reports Server (NTRS)

    Lang, Timothy J.; Williams, Earle

    2016-01-01

    During the Midlatitude Continental Convective Clouds Experiment (MC3E), combined observations of clouds and precipitation were made from airborne and ground-based in situ and remote sensing platforms. These observations were coordinated for multiple mesoscale convective systems (MCSs) that passed over the MC3E domain in northern Oklahoma. Notably, during a storm on 20 May 2011 in situ and remote sensing airborne observations were made near the times and locations of stratiform positive cloud-to-ground (+CG) lightning flashes. These +CGs resulted from extremely large stratiform lightning flashes that were hundreds of km in length and lasted several seconds. This dataset provides an unprecedented look at kinematic and microphysical environments in the vicinity of large, powerful, and long-lived stratiform lightning flashes. We will use this dataset to understand the influence of low liquid water contents (LWCs) in the electrical charging of MCS stratiform regions.

  4. Observations of Stratiform Lightning Flashes and Their Microphysical and Kinematic Environments

    NASA Technical Reports Server (NTRS)

    Lang, Timothy J.; Williams, Earle

    2017-01-01

    During the Midlatitude Continental Convective Clouds Experiment (MC3E), combined observations of clouds and precipitation were made from airborne and ground-based in situ and remote sensing platforms. These observations were coordinated for multiple mesoscale convective systems (MCSs) that passed over the MC3E domain in northern Oklahoma. Notably, during a storm on 20 May 2011 in situ and remote sensing airborne observations were made near the times and locations of stratiform positive cloud-to-ground (+CG) lightning flashes. These +CGs resulted from extremely large stratiform lightning flashes that were hundreds of km in length and lasted several seconds. This dataset provides an unprecedented look at kinematic and microphysical environments in the vicinity of large, powerful, and long-lived stratiform lightning flashes. We will use this dataset to understand the influence of low liquid water contents (LWCs) in the electrical charging of MCS stratiform regions.

  5. Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset

    PubMed Central

    Lipps, David; Devineni, Sree

    2016-01-01

    MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at: https://github.com/xueLab/mirMeta PMID:28002428

  6. Evaluation of a new satellite-based precipitation dataset for climate studies in the Xiang River basin, Southern China

    NASA Astrophysics Data System (ADS)

    Zhu, Q.; Xu, Y. P.; Hsu, K. L.

    2017-12-01

    A new satellite-based precipitation dataset, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) with long-term time series dating back to 1983 can be one valuable dataset for climate studies. This study investigates the feasibility of using PERSIANN-CDR as a reference dataset for climate studies. Sixteen CMIP5 models are evaluated over the Xiang River basin, southern China, by comparing their performance on precipitation projection and streamflow simulation, particularly on extreme precipitation and streamflow events. The results show PERSIANN-CDR is a valuable dataset for climate studies, even on extreme precipitation events. The precipitation estimates and their extreme events from CMIP5 models are improved significantly compared with rain gauge observations after bias-correction by the PERSIANN-CDR precipitation estimates. Given streamflows simulated with raw and bias-corrected precipitation estimates from 16 CMIP5 models, 10 out of 16 are improved after bias-correction. The impact of bias-correction on extreme events for streamflow simulations are unstable, with eight out of 16 models can be clearly claimed they are improved after the bias-correction. Concerning the performance of raw CMIP5 models on precipitation, IPSL-CM5A-MR excels the other CMIP5 models, while MRI-CGCM3 outperforms on extreme events with its better performance on six extreme precipitation metrics. Case studies also show that raw CCSM4, CESM1-CAM5, and MRI-CGCM3 outperform other models on streamflow simulation, while MIROC5-ESM-CHEM, MIROC5-ESM and IPSL-CM5A-MR behaves better than the other models after bias-correction.

  7. Rainfall extremes from TRMM data and the Metastatistical Extreme Value Distribution

    NASA Astrophysics Data System (ADS)

    Zorzetto, Enrico; Marani, Marco

    2017-04-01

    A reliable quantification of the probability of weather extremes occurrence is essential for designing resilient water infrastructures and hazard mitigation measures. However, it is increasingly clear that the presence of inter-annual climatic fluctuations determines a substantial long-term variability in the frequency of occurrence of extreme events. This circumstance questions the foundation of the traditional extreme value theory, hinged on stationary Poisson processes or on asymptotic assumptions to derive the Generalized Extreme Value (GEV) distribution. We illustrate here, with application to daily rainfall, a new approach to extreme value analysis, the Metastatistical Extreme Value Distribution (MEVD). The MEVD relaxes the above assumptions and is based on the whole distribution of daily rainfall events, thus allowing optimal use of all available observations. Using a global dataset of rain gauge observations, we show that the MEVD significantly outperforms the Generalized Extreme Value distribution, particularly for long average recurrence intervals and when small samples are available. The latter property suggests MEVD to be particularly suited for applications to satellite rainfall estimates, which only cover two decades, thus making extreme value estimation extremely challenging. Here we apply MEVD to the TRMM TMPA 3B42 product, an 18-year dataset of remotely-sensed daily rainfall providing a quasi-global coverage. Our analyses yield a global scale mapping of daily rainfall extremes and of their distributional tail properties, bridging the existing large gaps in ground-based networks. Finally, we illustrate how our global-scale analysis can provide insight into how properties of local rainfall regimes affect tail estimation uncertainty when using the GEV or MEVD approach. We find a dependence of the estimation uncertainty, for both the GEV- and MEV-based approaches, on the average annual number and on the inter-annual variability of rainy days. In particular, estimation uncertainty decreases 1) as the mean annual number of wet days increases, and 2) as the variability in the number of rainy days, expressed by its coefficient of variation, decreases. We tentatively explain this behavior in terms of the assumptions underlying the two approaches.

  8. Temporal Clustering of Regional-Scale Extreme Precipitation Events in Southern Switzerland

    NASA Astrophysics Data System (ADS)

    Barton, Yannick; Giannakaki, Paraskevi; Von Waldow, Harald; Chevalier, Clément; Pfhal, Stephan; Martius, Olivia

    2017-04-01

    Temporal clustering of extreme precipitation events on subseasonal time scales is a form of compound extremes and is of crucial importance for the formation of large-scale flood events. Here, the temporal clustering of regional-scale extreme precipitation events in southern Switzerland is studied. These precipitation events are relevant for the flooding of lakes in southern Switzerland and northern Italy. This research determines whether temporal clustering is present and then identifies the dynamics that are responsible for the clustering. An observation-based gridded precipitation dataset of Swiss daily rainfall sums and ECMWF reanalysis datasets are used. To analyze the clustering in the precipitation time series a modified version of Ripley's K function is used. It determines the average number of extreme events in a time period, to characterize temporal clustering on subseasonal time scales and to determine the statistical significance of the clustering. Significant clustering of regional-scale precipitation extremes is found on subseasonal time scales during the fall season. Four high-impact clustering episodes are then selected and the dynamics responsible for the clustering are examined. During the four clustering episodes, all heavy precipitation events were associated with an upperlevel breaking Rossby wave over western Europe and in most cases strong diabatic processes upstream over the Atlantic played a role in the amplification of these breaking waves. Atmospheric blocking downstream over eastern Europe supported this wave breaking during two of the clustering episodes. During one of the clustering periods, several extratropical transitions of tropical cyclones in the Atlantic contributed to the formation of high-amplitude ridges over the Atlantic basin and downstream wave breaking. During another event, blocking over Alaska assisted the phase locking of the Rossby waves downstream over the Atlantic.

  9. Abundance Ratios in a Large Sample of Emps with VLT+UVES

    NASA Astrophysics Data System (ADS)

    Hill, Vanessa; Cayrel, Roger; Spite, Monique; Bonifacio, Piercarlo; Eric, Depagne; Patrick, François; Timothy, Beers C.; Johannes, Andersen; Beatriz, Barbuy; Birgitta, Nordström

    Constraints on Early Galactic Enrichement from a large sample of Extremely Metal Poor Stars I will present the overall results from an large effort conducted at ESO-VLT+UVES to measure abundances in a sample of extremely metal-poor stars (EMPS) from high-resolution and high signal to noise spectra. More than 70 EMPS with [Fe/H]<-2.7 were observed equally distributed between turnoff and giants stars and very precise abundance ratios could be derived thanks to the high quality of the data. Among the results those of specific interest are lithium measurements in unevolved EMPS the much debated abundance of oxygen in the early galaxy (we present [OI] line measurements down to [O/Fe]=-3.5) and the trends of alpha elements iron group elements and Zinc. The scatter around these trends will also be discussed taking advantage of the small observationnal error-bars of this dataset. The implications on the early Galactic enrichement will be rewiewed while more specific topics covered by this large effort (and large team) will be adressed in devoted posters.

  10. Large scale variability, long-term trends and extreme events in total ozone over the northern mid-latitudes based on satellite time series

    NASA Astrophysics Data System (ADS)

    Rieder, H. E.; Staehelin, J.; Maeder, J. A.; Ribatet, M.; Davison, A. C.

    2009-04-01

    Various generations of satellites (e.g. TOMS, GOME, OMI) made spatial datasets of column ozone available to the scientific community. This study has a special focus on column ozone over the northern mid-latitudes. Tools from geostatistics and extreme value theory are applied to analyze variability, long-term trends and frequency distributions of extreme events in total ozone. In a recent case study (Rieder et al., 2009) new tools from extreme value theory (Coles, 2001; Ribatet, 2007) have been applied to the world's longest total ozone record from Arosa, Switzerland (e.g. Staehelin 1998a,b), in order to describe extreme events in low and high total ozone. Within the current study this analysis is extended to satellite datasets for the northern mid-latitudes. Further special emphasis is given on patterns and spatial correlations and the influence of changes in atmospheric dynamics (e.g. tropospheric and lower stratospheric pressure systems) on column ozone. References: Coles, S.: An Introduction to Statistical Modeling of Extreme Values, Springer Series in Statistics, ISBN:1852334592, Springer, Berlin, 2001. Ribatet, M.: POT: Modelling peaks over a threshold, R News, 7, 34-36, 2007. Rieder, H.E., Staehelin, J., Maeder, J.A., Ribatet, M., Stübi, R., Weihs, P., Holawe, F., Peter, T., and Davison, A.C.: From ozone mini holes and mini highs towards extreme value theory: New insights from extreme events and non stationarity, submitted to J. Geophys. Res., 2009. Staehelin, J., Kegel, R., and Harris, N. R.: Trend analysis of the homogenized total ozone series of Arosa (Switzerland), 1929-1996, J. Geophys. Res., 103(D7), 8389-8400, doi:10.1029/97JD03650, 1998a. Staehelin, J., Renaud, A., Bader, J., McPeters, R., Viatte, P., Hoegger, B., Bugnion, V., Giroud, M., and Schill, H.: Total ozone series at Arosa (Switzerland): Homogenization and data comparison, J. Geophys. Res., 103(D5), 5827-5842, doi:10.1029/97JD02402, 1998b.

  11. Technical note: Space-time analysis of rainfall extremes in Italy: clues from a reconciled dataset

    NASA Astrophysics Data System (ADS)

    Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

    2018-05-01

    Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.

  12. Quantile-based bias correction and uncertainty quantification of extreme event attribution statements

    DOE PAGES

    Jeon, Soyoung; Paciorek, Christopher J.; Wehner, Michael F.

    2016-02-16

    Extreme event attribution characterizes how anthropogenic climate change may have influenced the probability and magnitude of selected individual extreme weather and climate events. Attribution statements often involve quantification of the fraction of attributable risk (FAR) or the risk ratio (RR) and associated confidence intervals. Many such analyses use climate model output to characterize extreme event behavior with and without anthropogenic influence. However, such climate models may have biases in their representation of extreme events. To account for discrepancies in the probabilities of extreme events between observational datasets and model datasets, we demonstrate an appropriate rescaling of the model output basedmore » on the quantiles of the datasets to estimate an adjusted risk ratio. Our methodology accounts for various components of uncertainty in estimation of the risk ratio. In particular, we present an approach to construct a one-sided confidence interval on the lower bound of the risk ratio when the estimated risk ratio is infinity. We demonstrate the methodology using the summer 2011 central US heatwave and output from the Community Earth System Model. In this example, we find that the lower bound of the risk ratio is relatively insensitive to the magnitude and probability of the actual event.« less

  13. Hydrologic extremes - an intercomparison of multiple gridded statistical downscaling methods

    NASA Astrophysics Data System (ADS)

    Werner, A. T.; Cannon, A. J.

    2015-06-01

    Gridded statistical downscaling methods are the main means of preparing climate model data to drive distributed hydrological models. Past work on the validation of climate downscaling methods has focused on temperature and precipitation, with less attention paid to the ultimate outputs from hydrological models. Also, as attention shifts towards projections of extreme events, downscaling comparisons now commonly assess methods in terms of climate extremes, but hydrologic extremes are less well explored. Here, we test the ability of gridded downscaling models to replicate historical properties of climate and hydrologic extremes, as measured in terms of temporal sequencing (i.e., correlation tests) and distributional properties (i.e., tests for equality of probability distributions). Outputs from seven downscaling methods - bias correction constructed analogues (BCCA), double BCCA (DBCCA), BCCA with quantile mapping reordering (BCCAQ), bias correction spatial disaggregation (BCSD), BCSD using minimum/maximum temperature (BCSDX), climate imprint delta method (CI), and bias corrected CI (BCCI) - are used to drive the Variable Infiltration Capacity (VIC) model over the snow-dominated Peace River basin, British Columbia. Outputs are tested using split-sample validation on 26 climate extremes indices (ClimDEX) and two hydrologic extremes indices (3 day peak flow and 7 day peak flow). To characterize observational uncertainty, four atmospheric reanalyses are used as climate model surrogates and two gridded observational datasets are used as downscaling target data. The skill of the downscaling methods generally depended on reanalysis and gridded observational dataset. However, CI failed to reproduce the distribution and BCSD and BCSDX the timing of winter 7 day low flow events, regardless of reanalysis or observational dataset. Overall, DBCCA passed the greatest number of tests for the ClimDEX indices, while BCCAQ, which is designed to more accurately resolve event-scale spatial gradients, passed the greatest number of tests for hydrologic extremes. Non-stationarity in the observational/reanalysis datasets complicated the evaluation of downscaling performance. Comparing temporal homogeneity and trends in climate indices and hydrological model outputs calculated from downscaled reanalyses and gridded observations was useful for diagnosing the reliability of the various historical datasets. We recommend that such analyses be conducted before such data are used to construct future hydro-climatic change scenarios.

  14. Automatic Beam Path Analysis of Laser Wakefield Particle Acceleration Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rubel, Oliver; Geddes, Cameron G.R.; Cormier-Michel, Estelle

    2009-10-19

    Numerical simulations of laser wakefield particle accelerators play a key role in the understanding of the complex acceleration process and in the design of expensive experimental facilities. As the size and complexity of simulation output grows, an increasingly acute challenge is the practical need for computational techniques that aid in scientific knowledge discovery. To that end, we present a set of data-understanding algorithms that work in concert in a pipeline fashion to automatically locate and analyze high energy particle bunches undergoing acceleration in very large simulation datasets. These techniques work cooperatively by first identifying features of interest in individual timesteps,more » then integrating features across timesteps, and based on the information derived perform analysis of temporally dynamic features. This combination of techniques supports accurate detection of particle beams enabling a deeper level of scientific understanding of physical phenomena than hasbeen possible before. By combining efficient data analysis algorithms and state-of-the-art data management we enable high-performance analysis of extremely large particle datasets in 3D. We demonstrate the usefulness of our methods for a variety of 2D and 3D datasets and discuss the performance of our analysis pipeline.« less

  15. shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics.

    PubMed

    Khomtchouk, Bohdan B; Hennessy, James R; Wahlestedt, Claes

    2017-01-01

    Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps. We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed. shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.

  16. How much do different global GPP products agree in distribution and magnitude of GPP extremes?

    NASA Astrophysics Data System (ADS)

    Kim, S.; Ryu, Y.; Jiang, C.

    2016-12-01

    To evaluate uncertainty of global Gross Primary Productivity (GPP) extremes, we compare three global GPP datasets derived from different data processing methods (e.g. MPI-BGC: machine-learning, MODIS GPP (MOD17): semi-empirical, Breathing Earth System Simulator (BESS): process based). We preprocess the datasets following the method from Zscheischler et al., (2012) to detect GPP extremes which occur in less than 1% of the number of whole pixels, and to identify 3D-connected spatiotemporal GPP extremes. We firstly analyze global patterns and the magnitude of GPP extremes with MPI-BGC, MOD17, and BESS over 2001-2011. For consistent analysis in the three products, spatial and temporal resolution were set at 50 km and a monthly scale, respectively. Our results indicated that the global patterns of GPP extremes derived from MPI-BGC and BESS agreed with each other by showing hotspots in Northeastern Brazil and Eastern Texas. However, the extreme events detected from MOD17 were concentrated in tropical forests (e.g. Southeast Asia and South America). The amount of GPP reduction caused by climate extremes considerably differed across the products. For example, Russian heatwave in 2010 led to 100 Tg C uncertainty (198.7 Tg C in MPI-BGC, 305.6 Tg C in MOD17, and 237.8 Tg C in BESS). Moreover, the duration of extreme events differ among the three GPP datasets for the Russian heatwave (MPI-BGC: May-Sep, MOD17: Jun-Aug, and BESS: May-Aug). To test whether Sun induced Fluorescence (SiF), a proxy of GPP, can capture GPP extremes, we investigate global distribution of GPP extreme events in BESS, MOD17 and GOME-2 SiF between 2008 and 2014 when SiF data is available. We found that extreme GPP events in GOME-2 SiF and MOD17 appear in tropical forests whereas those in BESS emerged in Northeastern Brazil and Eastern Texas. The GPP extremes by severe 2011 US drought were detected by BESS and MODIS, but not by SiF. Our findings highlight that different GPP datasets could result in varying duration and intensity of GPP extremes and distribution of hotspots, and this study could contribute to quantifying uncertainties in GPP extremes.

  17. ClimateNet: A Machine Learning dataset for Climate Science Research

    NASA Astrophysics Data System (ADS)

    Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

    2017-12-01

    Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

  18. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    PubMed

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  19. Improved Statistical Method For Hydrographic Climatic Records Quality Control

    NASA Astrophysics Data System (ADS)

    Gourrion, J.; Szekely, T.

    2016-02-01

    Climate research benefits from the continuous development of global in-situ hydrographic networks in the last decades. Apart from the increasing volume of observations available on a large range of temporal and spatial scales, a critical aspect concerns the ability to constantly improve the quality of the datasets. In the context of the Coriolis Dataset for ReAnalysis (CORA) version 4.2, a new quality control method based on a local comparison to historical extreme values ever observed is developed, implemented and validated. Temperature, salinity and potential density validity intervals are directly estimated from minimum and maximum values from an historical reference dataset, rather than from traditional mean and standard deviation estimates. Such an approach avoids strong statistical assumptions on the data distributions such as unimodality, absence of skewness and spatially homogeneous kurtosis. As a new feature, it also allows addressing simultaneously the two main objectives of a quality control strategy, i.e. maximizing the number of good detections while minimizing the number of false alarms. The reference dataset is presently built from the fusion of 1) all ARGO profiles up to early 2014, 2) 3 historical CTD datasets and 3) the Sea Mammals CTD profiles from the MEOP database. All datasets are extensively and manually quality controlled. In this communication, the latest method validation results are also presented. The method has been implemented in the latest version of the CORA dataset and will benefit to the next version of the Copernicus CMEMS dataset.

  20. Capturing spatial and temporal patterns of widespread, extreme flooding across Europe

    NASA Astrophysics Data System (ADS)

    Busby, Kathryn; Raven, Emma; Liu, Ye

    2013-04-01

    Statistical characterisation of physical hazards is an integral part of probabilistic catastrophe models used by the reinsurance industry to estimate losses from large scale events. Extreme flood events are not restricted by country boundaries which poses an issue for reinsurance companies as their exposures often extend beyond them. We discuss challenges and solutions that allow us to appropriately capture the spatial and temporal dependence of extreme hydrological events on a continental-scale, which in turn enables us to generate an industry-standard stochastic event set for estimating financial losses for widespread flooding. By presenting our event set methodology, we focus on explaining how extreme value theory (EVT) and dependence modelling are used to account for short, inconsistent hydrological data from different countries, and how to make appropriate statistical decisions that best characterise the nature of flooding across Europe. The consistency of input data is of vital importance when identifying historical flood patterns. Collating data from numerous sources inherently causes inconsistencies and we demonstrate our robust approach to assessing the data and refining it to compile a single consistent dataset. This dataset is then extrapolated using a parameterised EVT distribution to estimate extremes. Our method then captures the dependence of flood events across countries using an advanced multivariate extreme value model. Throughout, important statistical decisions are explored including: (1) distribution choice; (2) the threshold to apply for extracting extreme data points; (3) a regional analysis; (4) the definition of a flood event, which is often linked with reinsurance industry's hour's clause; and (5) handling of missing values. Finally, having modelled the historical patterns of flooding across Europe, we sample from this model to generate our stochastic event set comprising of thousands of events over thousands of years. We then briefly illustrate how this is applied within a probabilistic model to estimate catastrophic loss curves used by the reinsurance industry.

  1. A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

    NASA Astrophysics Data System (ADS)

    Contractor, S.; Donat, M.; Alexander, L. V.

    2017-12-01

    Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.

  2. Evaluating the ClimEx Single Model Large Ensemble in Comparison with EURO-CORDEX Results of Seasonal Means and Extreme Precipitation Indicators

    NASA Astrophysics Data System (ADS)

    von Trentini, F.; Schmid, F. J.; Braun, M.; Brisette, F.; Frigon, A.; Leduc, M.; Martel, J. L.; Willkofer, F.; Wood, R. R.; Ludwig, R.

    2017-12-01

    Meteorological extreme events seem to become more frequent in the present and future, and a seperation of natural climate variability and a clear climate change effect on these extreme events gains more and more interest. Since there is only one realisation of historical events, natural variability in terms of very long timeseries for a robust statistical analysis is not possible with observation data. A new single model large ensemble (SMLE), developed for the ClimEx project (Climate change and hydrological extreme events - risks and perspectives for water management in Bavaria and Québec) is supposed to overcome this lack of data by downscaling 50 members of the CanESM2 (RCP 8.5) with the Canadian CRCM5 regional model (using the EURO-CORDEX grid specifications) for timeseries of 1950-2099 each, resulting in 7500 years of simulated climate. This allows for a better probabilistic analysis of rare and extreme events than any preceding dataset. Besides seasonal sums, several extreme indicators like R95pTOT, RX5day and others are calculated for the ClimEx ensemble and several EURO-CORDEX runs. This enables us to investigate the interaction between natural variability (as it appears in the CanESM2-CRCM5 members) and a climate change signal of those members for past, present and future conditions. Adding the EURO-CORDEX results to this, we can also assess the role of internal model variability (or natural variability) in climate change simulations. A first comparison shows similar magnitudes of variability of climate change signals between the ClimEx large ensemble and the CORDEX runs for some indicators, while for most indicators the spread of the SMLE is smaller than the spread of different CORDEX models.

  3. A practical tool for maximal information coefficient analysis.

    PubMed

    Albanese, Davide; Riccadonna, Samantha; Donati, Claudio; Franceschi, Pietro

    2018-04-01

    The ability of finding complex associations in large omics datasets, assessing their significance, and prioritizing them according to their strength can be of great help in the data exploration phase. Mutual information-based measures of association are particularly promising, in particular after the recent introduction of the TICe and MICe estimators, which combine computational efficiency with superior bias/variance properties. An open-source software implementation of these two measures providing a complete procedure to test their significance would be extremely useful. Here, we present MICtools, a comprehensive and effective pipeline that combines TICe and MICe into a multistep procedure that allows the identification of relationships of various degrees of complexity. MICtools calculates their strength assessing statistical significance using a permutation-based strategy. The performances of the proposed approach are assessed by an extensive investigation in synthetic datasets and an example of a potential application on a metagenomic dataset is also illustrated. We show that MICtools, combining TICe and MICe, is able to highlight associations that would not be captured by conventional strategies.

  4. Thermodynamic Data Rescue and Informatics for Deep Carbon Science

    NASA Astrophysics Data System (ADS)

    Zhong, H.; Ma, X.; Prabhu, A.; Eleish, A.; Pan, F.; Parsons, M. A.; Ghiorso, M. S.; West, P.; Zednik, S.; Erickson, J. S.; Chen, Y.; Wang, H.; Fox, P. A.

    2017-12-01

    A large number of legacy datasets are contained in geoscience literature published between 1930 and 1980 and not expressed external to the publication text in digitized formats. Extracting, organizing, and reusing these "dark" datasets is highly valuable for many within the Earth and planetary science community. As a part of the Deep Carbon Observatory (DCO) data legacy missions, the DCO Data Science Team and Extreme Physics and Chemistry community identified thermodynamic datasets related to carbon, or more specifically datasets about the enthalpy and entropy of chemicals, as a proof of principle analysis. The data science team endeavored to develop a semi-automatic workflow, which includes identifying relevant publications, extracting contained datasets using OCR methods, collaborative reviewing, and registering the datasets via the DCO Data Portal where the 'Linked Data' feature of the data portal provides a mechanism for connecting rescued datasets beyond their individual data sources, to research domains, DCO Communities, and more, making data discovery and retrieval more effective.To date, the team has successfully rescued, deposited and registered additional datasets from publications with thermodynamic sources. These datasets contain 3 main types of data: (1) heat content or enthalpy data determined for a given compound as a function of temperature using high-temperature calorimetry, (2) heat content or enthalpy data determined for a given compound as a function of temperature using adiabatic calorimetry, and (3) direct determination of heat capacity of a compound as a function of temperature using differential scanning calorimetry. The data science team integrated these datasets and delivered a spectrum of data analytics including visualizations, which will lead to a comprehensive characterization of the thermodynamics of carbon and carbon-related materials.

  5. A Metastatistical Approach to Satellite Estimates of Extreme Rainfall Events

    NASA Astrophysics Data System (ADS)

    Zorzetto, E.; Marani, M.

    2017-12-01

    The estimation of the average recurrence interval of intense rainfall events is a central issue for both hydrologic modeling and engineering design. These estimates require the inference of the properties of the right tail of the statistical distribution of precipitation, a task often performed using the Generalized Extreme Value (GEV) distribution, estimated either from a samples of annual maxima (AM) or with a peaks over threshold (POT) approach. However, these approaches require long and homogeneous rainfall records, which often are not available, especially in the case of remote-sensed rainfall datasets. We use here, and tailor it to remotely-sensed rainfall estimates, an alternative approach, based on the metastatistical extreme value distribution (MEVD), which produces estimates of rainfall extreme values based on the probability distribution function (pdf) of all measured `ordinary' rainfall event. This methodology also accounts for the interannual variations observed in the pdf of daily rainfall by integrating over the sample space of its random parameters. We illustrate the application of this framework to the TRMM Multi-satellite Precipitation Analysis rainfall dataset, where MEVD optimally exploits the relatively short datasets of satellite-sensed rainfall, while taking full advantage of its high spatial resolution and quasi-global coverage. Accuracy of TRMM precipitation estimates and scale issues are here investigated for a case study located in the Little Washita watershed, Oklahoma, using a dense network of rain gauges for independent ground validation. The methodology contributes to our understanding of the risk of extreme rainfall events, as it allows i) an optimal use of the TRMM datasets in estimating the tail of the probability distribution of daily rainfall, and ii) a global mapping of daily rainfall extremes and distributional tail properties, bridging the existing gaps in rain gauges networks.

  6. The added value of convection permitting simulations of extreme precipitation events over the eastern Mediterranean

    NASA Astrophysics Data System (ADS)

    Zittis, G.; Bruggeman, A.; Camera, C.; Hadjinicolaou, P.; Lelieveld, J.

    2017-07-01

    Climate change is expected to substantially influence precipitation amounts and distribution. To improve simulations of extreme rainfall events, we analyzed the performance of different convection and microphysics parameterizations of the WRF (Weather Research and Forecasting) model at very high horizontal resolutions (12, 4 and 1 km). Our study focused on the eastern Mediterranean climate change hot-spot. Five extreme rainfall events over Cyprus were identified from observations and were dynamically downscaled from the ERA-Interim (EI) dataset with WRF. We applied an objective ranking scheme, using a 1-km gridded observational dataset over Cyprus and six different performance metrics, to investigate the skill of the WRF configurations. We evaluated the rainfall timing and amounts for the different resolutions, and discussed the observational uncertainty over the particular extreme events by comparing three gridded precipitation datasets (E-OBS, APHRODITE and CHIRPS). Simulations with WRF capture rainfall over the eastern Mediterranean reasonably well for three of the five selected extreme events. For these three cases, the WRF simulations improved the ERA-Interim data, which strongly underestimate the rainfall extremes over Cyprus. The best model performance is obtained for the January 1989 event, simulated with an average bias of 4% and a modified Nash-Sutcliff of 0.72 for the 5-member ensemble of the 1-km simulations. We found overall added value for the convection-permitting simulations, especially over regions of high-elevation. Interestingly, for some cases the intermediate 4-km nest was found to outperform the 1-km simulations for low-elevation coastal parts of Cyprus. Finally, we identified significant and inconsistent discrepancies between the three, state of the art, gridded precipitation datasets for the tested events, highlighting the observational uncertainty in the region.

  7. Improved statistical method for temperature and salinity quality control

    NASA Astrophysics Data System (ADS)

    Gourrion, Jérôme; Szekely, Tanguy

    2017-04-01

    Climate research and Ocean monitoring benefit from the continuous development of global in-situ hydrographic networks in the last decades. Apart from the increasing volume of observations available on a large range of temporal and spatial scales, a critical aspect concerns the ability to constantly improve the quality of the datasets. In the context of the Coriolis Dataset for ReAnalysis (CORA) version 4.2, a new quality control method based on a local comparison to historical extreme values ever observed is developed, implemented and validated. Temperature, salinity and potential density validity intervals are directly estimated from minimum and maximum values from an historical reference dataset, rather than from traditional mean and standard deviation estimates. Such an approach avoids strong statistical assumptions on the data distributions such as unimodality, absence of skewness and spatially homogeneous kurtosis. As a new feature, it also allows addressing simultaneously the two main objectives of an automatic quality control strategy, i.e. maximizing the number of good detections while minimizing the number of false alarms. The reference dataset is presently built from the fusion of 1) all ARGO profiles up to late 2015, 2) 3 historical CTD datasets and 3) the Sea Mammals CTD profiles from the MEOP database. All datasets are extensively and manually quality controlled. In this communication, the latest method validation results are also presented. The method has already been implemented in the latest version of the delayed-time CMEMS in-situ dataset and will be deployed soon in the equivalent near-real time products.

  8. Online sparse Gaussian process based human motion intent learning for an electrically actuated lower extremity exoskeleton.

    PubMed

    Long, Yi; Du, Zhi-Jiang; Chen, Chao-Feng; Dong, Wei; Wang, Wei-Dong

    2017-07-01

    The most important step for lower extremity exoskeleton is to infer human motion intent (HMI), which contributes to achieve human exoskeleton collaboration. Since the user is in the control loop, the relationship between human robot interaction (HRI) information and HMI is nonlinear and complicated, which is difficult to be modeled by using mathematical approaches. The nonlinear approximation can be learned by using machine learning approaches. Gaussian Process (GP) regression is suitable for high-dimensional and small-sample nonlinear regression problems. GP regression is restrictive for large data sets due to its computation complexity. In this paper, an online sparse GP algorithm is constructed to learn the HMI. The original training dataset is collected when the user wears the exoskeleton system with friction compensation to perform unconstrained movement as far as possible. The dataset has two kinds of data, i.e., (1) physical HRI, which is collected by torque sensors placed at the interaction cuffs for the active joints, i.e., knee joints; (2) joint angular position, which is measured by optical position sensors. To reduce the computation complexity of GP, grey relational analysis (GRA) is utilized to specify the original dataset and provide the final training dataset. Those hyper-parameters are optimized offline by maximizing marginal likelihood and will be applied into online GP regression algorithm. The HMI, i.e., angular position of human joints, will be regarded as the reference trajectory for the mechanical legs. To verify the effectiveness of the proposed algorithm, experiments are performed on a subject at a natural speed. The experimental results show the HMI can be obtained in real time, which can be extended and employed in the similar exoskeleton systems.

  9. Linking Automated Data Analysis and Visualization with Applications in Developmental Biology and High-Energy Physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruebel, Oliver

    2009-11-20

    Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less

  10. Improving Discoverability of Geophysical Data using Location Based Services

    NASA Astrophysics Data System (ADS)

    Morrison, D.; Barnes, R. J.; Potter, M.; Nylund, S. R.; Patrone, D.; Weiss, M.; Talaat, E. R.; Sarris, T. E.; Smith, D.

    2014-12-01

    The great promise of Virtual Observatories is the ability to perform complex search operations across the metadata of a large variety of different data sets. This allows the researcher to isolate and select the relevant measurements for their topic of study. The Virtual ITM Observatory (VITMO) has many diverse geophysical datasets that cover a large temporal and spatial range that present a unique search problem. VITMO provides many methods by which the user can search for and select data of interest including restricting selections based on geophysical conditions (solar wind speed, Kp, etc) as well as finding those datasets that overlap in time. One of the key challenges in improving discoverability is the ability to identify portions of datasets that overlap in time and in location. The difficulty is that location data is not contained in the metadata for datasets produced by satellites and would be extremely large in volume if it were available, making searching for overlapping data very time consuming. To solve this problem we have developed a series of light-weight web services that can provide a new data search capability for VITMO and others. The services consist of a database of spacecraft ephemerides and instrument fields of view; an overlap calculator to find times when the fields of view of different instruments intersect; and a magnetic field line tracing service that maps in situ and ground based measurements to the equatorial plane in magnetic coordinates for a number of field models and geophysical conditions. These services run in real-time when the user queries for data. They will allow the non-specialist user to select data that they were previously unable to locate, opening up analysis opportunities beyond the instrument teams and specialists, making it easier for future students who come into the field.

  11. Five centuries of Central European temperature extremes reconstructed from tree-ring density and documentary evidence

    NASA Astrophysics Data System (ADS)

    Battipaglia, G.; Frank, D.; Buentgen, U.; Dobrovolný, P.; Brázdil, R.; Pfister, C.; Esper, J.

    2009-09-01

    In this project three different summer temperature sensitive tree-ring chronologies across the European Alpine region were compiled and analyzed to make a calendar of extreme warm and cold summers. We identified 100 extreme events during the past millennium from the tree ring data, and 44 extreme years during the 1550-2003 period based upon tree-ring, documentary and instrumental evidence. Comparisons with long instrumental series and documentary evidence verify the tree-ring extremes and indicate the possibility to use this dataset towards a better understanding of the characteristics prior to the instrumental period. Potential links between the occurrence of extreme events over Alps and anomalous large-scale patterns were explored and indicate that the average pattern of the 20 warmest summers (over the 1700-2002 period) describes maximum positive anomalies over Central Europe, whereas the average pattern of the 20 coldest summers shows maximum negative anomalies over Western Europe. Challenges with the present approach included determining an appropriate classification scheme for extreme events and the development of a methodology able to identify and characterize the occurrence of extreme episodes back in time. As a future step, our approach will be extended to help verify the sparse documentary data from the beginning of the past millennium and will be used in conjunction with climate models to assess model capabilities in reproducing characteristics of temperature extremes.

  12. Evaluation of precipitation estimates over CONUS derived from satellite, radar, and rain gauge datasets (2002-2012)

    NASA Astrophysics Data System (ADS)

    Prat, O. P.; Nelson, B. R.

    2014-10-01

    We use a suite of quantitative precipitation estimates (QPEs) derived from satellite, radar, and surface observations to derive precipitation characteristics over CONUS for the period 2002-2012. This comparison effort includes satellite multi-sensor datasets (bias-adjusted TMPA 3B42, near-real time 3B42RT), radar estimates (NCEP Stage IV), and rain gauge observations. Remotely sensed precipitation datasets are compared with surface observations from the Global Historical Climatology Network (GHCN-Daily) and from the PRISM (Parameter-elevation Regressions on Independent Slopes Model). The comparisons are performed at the annual, seasonal, and daily scales over the River Forecast Centers (RFCs) for CONUS. Annual average rain rates present a satisfying agreement with GHCN-D for all products over CONUS (± 6%). However, differences at the RFC are more important in particular for near-real time 3B42RT precipitation estimates (-33 to +49%). At annual and seasonal scales, the bias-adjusted 3B42 presented important improvement when compared to its near real time counterpart 3B42RT. However, large biases remained for 3B42 over the Western US for higher average accumulation (≥ 5 mm day-1) with respect to GHCN-D surface observations. At the daily scale, 3B42RT performed poorly in capturing extreme daily precipitation (> 4 in day-1) over the Northwest. Furthermore, the conditional analysis and the contingency analysis conducted illustrated the challenge of retrieving extreme precipitation from remote sensing estimates.

  13. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Calculating p-values and their significances with the Energy Test for large datasets

    NASA Astrophysics Data System (ADS)

    Barter, W.; Burr, C.; Parkes, C.

    2018-04-01

    The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.

  15. Evaluation of seabed mapping methods for fine-scale classification of extremely shallow benthic habitats - Application to the Venice Lagoon, Italy

    NASA Astrophysics Data System (ADS)

    Montereale Gavazzi, G.; Madricardo, F.; Janowski, L.; Kruss, A.; Blondel, P.; Sigovini, M.; Foglini, F.

    2016-03-01

    Recent technological developments of multibeam echosounder systems (MBES) allow mapping of benthic habitats with unprecedented detail. MBES can now be employed in extremely shallow waters, challenging data acquisition (as these instruments were often designed for deeper waters) and data interpretation (honed on datasets with resolution sometimes orders of magnitude lower). With extremely high-resolution bathymetry and co-located backscatter data, it is now possible to map the spatial distribution of fine scale benthic habitats, even identifying the acoustic signatures of single sponges. In this context, it is necessary to understand which of the commonly used segmentation methods is best suited to account for such level of detail. At the same time, new sampling protocols for precisely geo-referenced ground truth data need to be developed to validate the benthic environmental classification. This study focuses on a dataset collected in a shallow (2-10 m deep) tidal channel of the Lagoon of Venice, Italy. Using 0.05-m and 0.2-m raster grids, we compared a range of classifications, both pixel-based and object-based approaches, including manual, Maximum Likelihood Classifier, Jenks Optimization clustering, textural analysis and Object Based Image Analysis. Through a comprehensive and accurately geo-referenced ground truth dataset, we were able to identify five different classes of the substrate composition, including sponges, mixed submerged aquatic vegetation, mixed detritic bottom (fine and coarse) and unconsolidated bare sediment. We computed estimates of accuracy (namely Overall, User, Producer Accuracies and the Kappa statistic) by cross tabulating predicted and reference instances. Overall, pixel based segmentations produced the highest accuracies and the accuracy assessment is strongly dependent on the number of classes chosen for the thematic output. Tidal channels in the Venice Lagoon are extremely important in terms of habitats and sediment distribution, particularly within the context of the new tidal barrier being built. However, they had remained largely unexplored until now, because of the surveying challenges. The application of this remote sensing approach, combined with targeted sampling, opens a new perspective in the monitoring of benthic habitats in view of a knowledge-based management of natural resources in shallow coastal areas.

  16. Analysis of the precipitation and streamflow extremes in Northern Italy using high resolution reanalysis dataset Express-Hydro

    NASA Astrophysics Data System (ADS)

    Silvestro, Francesco; Parodi, Antonio; Campo, Lorenzo

    2017-04-01

    The characterization of the hydrometeorological extremes, both in terms of rainfall and streamflow, in a given region plays a key role in the environmental monitoring provided by the flood alert services. In last years meteorological simulations (both near real-time and historical reanalysis) were available at increasing spatial and temporal resolutions, making possible long-period hydrological reanalysis in which the meteo dataset is used as input in distributed hydrological models. In this work, a very high resolution meteorological reanalysis dataset, namely Express-Hydro (CIMA, ISAC-CNR, GAUSS Special Project PR45DE), was employed as input in the hydrological model Continuum in order to produce long time series of streamflows in the Liguria territory, located in the Northern part of Italy. The original dataset covers the whole Europe territory in the 1979-2008 period, at 4 km of spatial resolution and 3 hours of time resolution. Analyses in terms of comparison between the rainfall estimated by the dataset and the observations (available from the local raingauges network) were carried out, and a bias correction was also performed in order to better match the observed climatology. An extreme analysis was eventually carried on the streamflows time series obtained by the simulations, by comparing them with the results of the same hydrological model fed with the observed time series of rainfall. The results of the analysis are shown and discussed.

  17. The use of historical information for regional frequency analysis of extreme skew surge

    NASA Astrophysics Data System (ADS)

    Frau, Roberto; Andreewsky, Marc; Bernardara, Pietro

    2018-03-01

    The design of effective coastal protections requires an adequate estimation of the annual occurrence probability of rare events associated with a return period up to 103 years. Regional frequency analysis (RFA) has been proven to be an applicable way to estimate extreme events by sorting regional data into large and spatially distributed datasets. Nowadays, historical data are available to provide new insight on past event estimation. The utilisation of historical information would increase the precision and the reliability of regional extreme's quantile estimation. However, historical data are from significant extreme events that are not recorded by tide gauge. They usually look like isolated data and they are different from continuous data from systematic measurements of tide gauges. This makes the definition of the duration of our observations period complicated. However, the duration of the observation period is crucial for the frequency estimation of extreme occurrences. For this reason, we introduced here the concept of credible duration. The proposed RFA method (hereinafter referenced as FAB, from the name of the authors) allows the use of historical data together with systematic data, which is a result of the use of the credible duration concept.

  18. Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays

    NASA Astrophysics Data System (ADS)

    Guha, Rajarshi; Schürer, Stephan C.

    2008-06-01

    Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted.

  19. Future changes in summer mean and extreme precipitation frequency in Japan by d4PDF regional climate simulations

    NASA Astrophysics Data System (ADS)

    Okada, Y.; Ishii, M.; Endo, H.; Kawase, H.; Sasaki, H.; Takayabu, I.; Watanabe, S.; Fujita, M.; Sugimoto, S.; Kawazoe, S.

    2017-12-01

    Precipitation in summer plays a vital role in sustaining life across East Asia, but the heavy rain that is often generated during this period can also cause serious damage. Developing a better understanding of the features and occurrence frequency of this heavy rain is an important element of disaster prevention. We investigated future changes in summer mean and extreme precipitation frequency in Japan using large ensemble dataset which simulated by the Non-Hydrostatic Regional Climate Model with a horizontal resolution of 20km (NHRCM20). This dataset called database for Policy Decision making for Future climate changes (d4PDF), which is intended to be utilized for the impact assessment studies and adaptation planning to global warming. The future climate experiments assume the global mean surface air temperature rise 2K and 4K from the pre-industrial period. We investigated using this dataset future changes of precipitation in summer over the Japanese archipelago based on observational locations. For mean precipitation in the present-day climate, the bias of the rainfall for each month is within 25% even considering all members (30 members). The bias at each location is found to increase by over 50% on the Pacific Ocean side of eastern part of Japan and interior locations of western part of Japan. The result in western part of Japan depends on the effect of the elevations in this model. The future changes in mean precipitation show a contrast between northern and southern Japan, with the north showing a slight increase but the south a decrease. The future changes in the frequency of extreme precipitation in the national average of Japan increase at 2K and 4K simulations compared with the present-day climate, respectively. The authors were supported by the Social Implementation Program on Climate Change Adaptation Technology (SI-CAT), the Ministry of Education, Culture, Sports, Science, and Technology (MEXT), Japan.

  20. Extreme Rainfall Events Over Southern Africa: Assessment of a Climate Model to Reproduce Daily Extremes

    NASA Astrophysics Data System (ADS)

    Williams, C.; Kniveton, D.; Layberry, R.

    2007-12-01

    It is increasingly accepted that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable extreme events, due to a number of factors including extensive poverty, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of a state-of-the-art climate model to simulate climate at daily timescales is carried out using satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA). This dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. Once the model's ability to reproduce extremes has been assessed, idealised regions of SST anomalies are used to force the model, with the overall aim of investigating the ways in which SST anomalies influence rainfall extremes over southern Africa. In this paper, results from sensitivity testing of the UK Meteorological Office Hadley Centre's climate model's domain size are firstly presented. Then simulations of current climate from the model, operating in both regional and global mode, are compared to the MIRA dataset at daily timescales. Thirdly, the ability of the model to reproduce daily rainfall extremes will be assessed, again by a comparison with extremes from the MIRA dataset. Finally, the results from the idealised SST experiments are briefly presented, suggesting associations between rainfall extremes and both local and remote SST anomalies.

  1. MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites

    PubMed Central

    2017-01-01

    Quality control of MRI is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. Visual inspection is subjective and impractical for large scale datasets. Although automated quality assessments have been demonstrated on single-site datasets, it is unclear that solutions can generalize to unseen data acquired at new sites. Here, we introduce the MRI Quality Control tool (MRIQC), a tool for extracting quality measures and fitting a binary (accept/exclude) classifier. Our tool can be run both locally and as a free online service via the OpenNeuro.org portal. The classifier is trained on a publicly available, multi-site dataset (17 sites, N = 1102). We perform model selection evaluating different normalization and feature exclusion approaches aimed at maximizing across-site generalization and estimate an accuracy of 76%±13% on new sites, using leave-one-site-out cross-validation. We confirm that result on a held-out dataset (2 sites, N = 265) also obtaining a 76% accuracy. Even though the performance of the trained classifier is statistically above chance, we show that it is susceptible to site effects and unable to account for artifacts specific to new sites. MRIQC performs with high accuracy in intra-site prediction, but performance on unseen sites leaves space for improvement which might require more labeled data and new approaches to the between-site variability. Overcoming these limitations is crucial for a more objective quality assessment of neuroimaging data, and to enable the analysis of extremely large and multi-site samples. PMID:28945803

  2. Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification.

    PubMed

    Lu, Huijuan; Wei, Shasha; Zhou, Zili; Miao, Yanzi; Lu, Yi

    2015-01-01

    The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.

  3. Processing of the WLCG monitoring data using NoSQL

    NASA Astrophysics Data System (ADS)

    Andreeva, J.; Beche, A.; Belov, S.; Dzhunov, I.; Kadochnikov, I.; Karavakis, E.; Saiz, P.; Schovancova, J.; Tuckett, D.

    2014-06-01

    The Worldwide LHC Computing Grid (WLCG) today includes more than 150 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation, performance and reliability. Furthermore, the generated monitoring flow is constantly increasing, which represents another challenge for the monitoring systems. While existing solutions are traditionally based on Oracle for data storage and processing, recent developments evaluate NoSQL for processing large-scale monitoring datasets. NoSQL databases are getting increasingly popular for processing datasets at the terabyte and petabyte scale using commodity hardware. In this contribution, the integration of NoSQL data processing in the Experiment Dashboard framework is described along with first experiences of using this technology for monitoring the LHC computing activities.

  4. Revisiting the synoptic-scale predictability of severe European winter storms using ECMWF ensemble reforecasts

    NASA Astrophysics Data System (ADS)

    Pantillon, Florian; Knippertz, Peter; Corsmeier, Ulrich

    2017-10-01

    New insights into the synoptic-scale predictability of 25 severe European winter storms of the 1995-2015 period are obtained using the homogeneous ensemble reforecast dataset from the European Centre for Medium-Range Weather Forecasts. The predictability of the storms is assessed with different metrics including (a) the track and intensity to investigate the storms' dynamics and (b) the Storm Severity Index to estimate the impact of the associated wind gusts. The storms are well predicted by the whole ensemble up to 2-4 days ahead. At longer lead times, the number of members predicting the observed storms decreases and the ensemble average is not clearly defined for the track and intensity. The Extreme Forecast Index and Shift of Tails are therefore computed from the deviation of the ensemble from the model climate. Based on these indices, the model has some skill in forecasting the area covered by extreme wind gusts up to 10 days, which indicates a clear potential for early warnings. However, large variability is found between the individual storms. The poor predictability of outliers appears related to their physical characteristics such as explosive intensification or small size. Longer datasets with more cases would be needed to further substantiate these points.

  5. Moisture source classification of heavy precipitation events in Switzerland in the last 130 years (1871-2011)

    NASA Astrophysics Data System (ADS)

    Aemisegger, Franziska; Piaget, Nicolas

    2017-04-01

    A new weather-system oriented classification framework of extreme precipitation events leading to large-scale floods in Switzerland is presented on this poster. Thirty-six high impact floods in the last 130 years are assigned to three representative categories of atmospheric moisture origin and transport patterns. The methodology underlying this moisture source classification combines information of the airmass history in the twenty days preceding the precipitation event with humidity variations along the large-scale atmospheric transport systems in a Lagrangian approach. The classification scheme is defined using the 33-year ERA-Interim reanalysis dataset (1979-2011) and is then applied to the Twentieth Century Reanalysis (1871-2011) extreme precipitation events as well as the 36 selected floods. The three defined categories are characterised by different dominant moisture uptake regions including the North Atlantic, the Mediterranean and continental Europe. Furthermore, distinct anomalies in the large-scale atmospheric flow are associated with the different categories. The temporal variations in the relative importance of the three categories over the last 130 years provides new insights into the impact of changing climate conditions on the dynamical mechanisms leading to heavy precipitation in Switzerland.

  6. Topological Analysis and Gaussian Decision Tree: Effective Representation and Classification of Biosignals of Small Sample Size.

    PubMed

    Zhang, Zhifei; Song, Yang; Cui, Haochen; Wu, Jayne; Schwartz, Fernando; Qi, Hairong

    2017-09-01

    Bucking the trend of big data, in microdevice engineering, small sample size is common, especially when the device is still at the proof-of-concept stage. The small sample size, small interclass variation, and large intraclass variation, have brought biosignal analysis new challenges. Novel representation and classification approaches need to be developed to effectively recognize targets of interests with the absence of a large training set. Moving away from the traditional signal analysis in the spatiotemporal domain, we exploit the biosignal representation in the topological domain that would reveal the intrinsic structure of point clouds generated from the biosignal. Additionally, we propose a Gaussian-based decision tree (GDT), which can efficiently classify the biosignals even when the sample size is extremely small. This study is motivated by the application of mastitis detection using low-voltage alternating current electrokinetics (ACEK) where five categories of bisignals need to be recognized with only two samples in each class. Experimental results demonstrate the robustness of the topological features as well as the advantage of GDT over some conventional classifiers in handling small dataset. Our method reduces the voltage of ACEK to a safe level and still yields high-fidelity results with a short assay time. This paper makes two distinctive contributions to the field of biosignal analysis, including performing signal processing in the topological domain and handling extremely small dataset. Currently, there have been no related works that can efficiently tackle the dilemma between avoiding electrochemical reaction and accelerating assay process using ACEK.

  7. Efficient segmentation of 3D fluoroscopic datasets from mobile C-arm

    NASA Astrophysics Data System (ADS)

    Styner, Martin A.; Talib, Haydar; Singh, Digvijay; Nolte, Lutz-Peter

    2004-05-01

    The emerging mobile fluoroscopic 3D technology linked with a navigation system combines the advantages of CT-based and C-arm-based navigation. The intra-operative, automatic segmentation of 3D fluoroscopy datasets enables the combined visualization of surgical instruments and anatomical structures for enhanced planning, surgical eye-navigation and landmark digitization. We performed a thorough evaluation of several segmentation algorithms using a large set of data from different anatomical regions and man-made phantom objects. The analyzed segmentation methods include automatic thresholding, morphological operations, an adapted region growing method and an implicit 3D geodesic snake method. In regard to computational efficiency, all methods performed within acceptable limits on a standard Desktop PC (30sec-5min). In general, the best results were obtained with datasets from long bones, followed by extremities. The segmentations of spine, pelvis and shoulder datasets were generally of poorer quality. As expected, the threshold-based methods produced the worst results. The combined thresholding and morphological operations methods were considered appropriate for a smaller set of clean images. The region growing method performed generally much better in regard to computational efficiency and segmentation correctness, especially for datasets of joints, and lumbar and cervical spine regions. The less efficient implicit snake method was able to additionally remove wrongly segmented skin tissue regions. This study presents a step towards efficient intra-operative segmentation of 3D fluoroscopy datasets, but there is room for improvement. Next, we plan to study model-based approaches for datasets from the knee and hip joint region, which would be thenceforth applied to all anatomical regions in our continuing development of an ideal segmentation procedure for 3D fluoroscopic images.

  8. Openwebglobe 2: Visualization of Complex 3D-GEODATA in the (mobile) Webbrowser

    NASA Astrophysics Data System (ADS)

    Christen, M.

    2016-06-01

    Providing worldwide high resolution data for virtual globes consists of compute and storage intense tasks for processing data. Furthermore, rendering complex 3D-Geodata, such as 3D-City models with an extremely high polygon count and a vast amount of textures at interactive framerates is still a very challenging task, especially on mobile devices. This paper presents an approach for processing, caching and serving massive geospatial data in a cloud-based environment for large scale, out-of-core, highly scalable 3D scene rendering on a web based virtual globe. Cloud computing is used for processing large amounts of geospatial data and also for providing 2D and 3D map data to a large amount of (mobile) web clients. In this paper the approach for processing, rendering and caching very large datasets in the currently developed virtual globe "OpenWebGlobe 2" is shown, which displays 3D-Geodata on nearly every device.

  9. Intensity-Duration-Frequency curves from remote sensing datasets: direct comparison of weather radar and CMORPH over the Eastern Mediterranean

    NASA Astrophysics Data System (ADS)

    Morin, Efrat; Marra, Francesco; Peleg, Nadav; Mei, Yiwen; Anagnostou, Emmanouil N.

    2017-04-01

    Rainfall frequency analysis is used to quantify the probability of occurrence of extreme rainfall and is traditionally based on rain gauge records. The limited spatial coverage of rain gauges is insufficient to sample the spatiotemporal variability of extreme rainfall and to provide the areal information required by management and design applications. Conversely, remote sensing instruments, even if quantitative uncertain, offer coverage and spatiotemporal detail that allow overcoming these issues. In recent years, remote sensing datasets began to be used for frequency analyses, taking advantage of increased record lengths and quantitative adjustments of the data. However, the studies so far made use of concepts and techniques developed for rain gauge (i.e. point or multiple-point) data and have been validated by comparison with gauge-derived analyses. These procedures add further sources of uncertainty and prevent from isolating between data and methodological uncertainties and from fully exploiting the available information. In this study, we step out of the gauge-centered concept presenting a direct comparison between at-site Intensity-Duration-Frequency (IDF) curves derived from different remote sensing datasets on corresponding spatial scales, temporal resolutions and records. We analyzed 16 years of homogeneously corrected and gauge-adjusted C-Band weather radar estimates, high-resolution CMORPH and gauge-adjusted high-resolution CMORPH over the Eastern Mediterranean. Results of this study include: (a) good spatial correlation between radar and satellite IDFs ( 0.7 for 2-5 years return period); (b) consistent correlation and dispersion in the raw and gauge adjusted CMORPH; (c) bias is almost uniform with return period for 12-24 h durations; (d) radar identifies thicker tail distributions than CMORPH and the tail of the distributions depends on the spatial and temporal scales. These results demonstrate the potential of remote sensing datasets for rainfall frequency analysis for management (e.g. warning and early-warning systems) and design (e.g. sewer design, large scale drainage planning)

  10. Reference data on muscle volumes of healthy human pelvis and lower extremity muscles: an in vivo magnetic resonance imaging feasibility study.

    PubMed

    Lube, Juliane; Cotofana, Sebastian; Bechmann, Ingo; Milani, Thomas L; Özkurtul, Orkun; Sakai, Tatsuo; Steinke, Hanno; Hammer, Niels

    2016-01-01

    Muscle volumes are of crucial interest when attempting to analyze individual physical performance and disease- or age-related alterations in muscle morphology. However, very little reference data are available in the literature on pelvis and lower extremity muscle volumes originating from healthy and young individuals. Furthermore, it is of interest if representative muscle volumes, covering large anatomical regions, can be obtained using magnetic resonance imaging (MRI) in a setting similar to the clinical routine. Our objective was therefore to provide encompassing, bilateral, 3-T MRI-based datasets on muscle volumes of the pelvis and the lower limb muscles. T1-weighted 3-T MRI records were obtained bilaterally from six young and healthy participants. Three-dimensional volumes were compiled from 28 muscles and muscle groups of each participant before the muscle volumes were computed. Muscle volumes were obtained from 28 muscles and muscle groups of the pelvis and lower extremity. Volumes were larger in male than in female participants. Volumes of the dominant and non-dominant sides were similar in both genders. The obtained results were in line with volumetric data obtained from smaller anatomical areas, thus extending the available datasets. This study provides an encompassing and feasible approach to obtain data on the muscle volumes of pelvic and limb muscles of healthy, young, and physically active individuals. The respective data form a basis to determine effects of therapeutic approaches, progression of diseases, or technical applications like automated segmentation algorithms applied to different populations.

  11. Task Dependence, Tissue Specificity, and Spatial Distribution of Widespread Activations in Large Single-Subject Functional MRI Datasets at 7T

    PubMed Central

    Gonzalez-Castillo, Javier; Hoy, Colin W.; Handwerker, Daniel A.; Roopchansingh, Vinai; Inati, Souheil J.; Saad, Ziad S.; Cox, Robert W.; Bandettini, Peter A.

    2015-01-01

    It was recently shown that when large amounts of task-based blood oxygen level–dependent (BOLD) data are combined to increase contrast- and temporal signal-to-noise ratios, the majority of the brain shows significant hemodynamic responses time-locked with the experimental paradigm. Here, we investigate the biological significance of such widespread activations. First, the relationship between activation extent and task demands was investigated by varying cognitive load across participants. Second, the tissue specificity of responses was probed using the better BOLD signal localization capabilities of a 7T scanner. Finally, the spatial distribution of 3 primary response types—namely positively sustained (pSUS), negatively sustained (nSUS), and transient—was evaluated using a newly defined voxel-wise waveshape index that permits separation of responses based on their temporal signature. About 86% of gray matter (GM) became significantly active when all data entered the analysis for the most complex task. Activation extent scaled with task load and largely followed the GM contour. The most common response type was nSUS BOLD, irrespective of the task. Our results suggest that widespread activations associated with extremely large single-subject functional magnetic resonance imaging datasets can provide valuable information about the functional organization of the brain that goes undetected in smaller sample sizes. PMID:25405938

  12. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    NASA Astrophysics Data System (ADS)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  13. The Chennai extreme rainfall event in 2015: The Bay of Bengal connection

    NASA Astrophysics Data System (ADS)

    Boyaj, Alugula; Ashok, Karumuri; Ghosh, Subimal; Devanand, Anjana; Dandu, Govardhan

    2018-04-01

    Southeast India experienced a heavy rainfall during 30 Nov-2 Dec 2015. Particularly, the Chennai city, the fourth major metropolitan city in India with a population of 5 million, experienced extreme flooding and causalities. Using various observed/reanalysed datasets, we find that the concurrent southern Bay of Bengal (BoB) sea surface temperatures (SST) were anomalously warm. Our analysis shows that BoB sea surface temperature anomalies (SSTA) are indeed positively, and significantly, correlated with the northeastern Indian monsoonal rainfall during this season. Our sensitivity experiments carried out with the Weather Research and Forecasting (WRF) model at 25 km resolution suggest that, while the strong concurrent El Niño conditions contributed to about 21.5% of the intensity of the extreme Chennai rainfall through its signals in the local SST mentioned above, the warming trend in BoB SST also contributed equally to the extremity of the event. Further, the El Niño southern oscillation (ENSO) impacts on the intensity of the synoptic events in the BoB during the northeast monsoon are manifested largely through the local SST in the BoB as compared through its signature in the atmospheric circulations over the BoB.

  14. High-Level Location Based Search Services That Improve Discoverability of Geophysical Data in the Virtual ITM Observatory

    NASA Astrophysics Data System (ADS)

    Schaefer, R. K.; Morrison, D.; Potter, M.; Barnes, R. J.; Nylund, S. R.; Patrone, D.; Aiello, J.; Talaat, E. R.; Sarris, T.

    2015-12-01

    The great promise of Virtual Observatories is the ability to perform complex search operations across the metadata of a large variety of different data sets. This allows the researcher to isolate and select the relevant measurements for their topic of study. The Virtual ITM Observatory (VITMO) has many diverse geophysical datasets that cover a large temporal and spatial range that present a unique search problem. VITMO provides many methods by which the user can search for and select data of interest including restricting selections based on geophysical conditions (solar wind speed, Kp, etc) as well as finding those datasets that overlap in time. One of the key challenges in improving discoverability is the ability to identify portions of datasets that overlap in time and in location. The difficulty is that location data is not contained in the metadata for datasets produced by satellites and would be extremely large in volume if it were available, making searching for overlapping data very time consuming. To solve this problem we have developed a series of light-weight web services that can provide a new data search capability for VITMO and others. The services consist of a database of spacecraft ephemerides and instrument fields of view; an overlap calculator to find times when the fields of view of different instruments intersect; and a magnetic field line tracing service that maps in situ and ground based measurements to the equatorial plane in magnetic coordinates for a number of field models and geophysical conditions. These services run in real-time when the user queries for data. These services will allow the non-specialist user to select data that they were previously unable to locate, opening up analysis opportunities beyond the instrument teams and specialists, making it easier for future students who come into the field.

  15. Performance of Machine Learning Algorithms for Qualitative and Quantitative Prediction Drug Blockade of hERG1 channel.

    PubMed

    Wacker, Soren; Noskov, Sergei Yu

    2018-05-01

    Drug-induced abnormal heart rhythm known as Torsades de Pointes (TdP) is a potential lethal ventricular tachycardia found in many patients. Even newly released anti-arrhythmic drugs, like ivabradine with HCN channel as a primary target, block the hERG potassium current in overlapping concentration interval. Promiscuous drug block to hERG channel may potentially lead to perturbation of the action potential duration (APD) and TdP, especially when with combined with polypharmacy and/or electrolyte disturbances. The example of novel anti-arrhythmic ivabradine illustrates clinically important and ongoing deficit in drug design and warrants for better screening methods. There is an urgent need to develop new approaches for rapid and accurate assessment of how drugs with complex interactions and multiple subcellular targets can predispose or protect from drug-induced TdP. One of the unexpected outcomes of compulsory hERG screening implemented in USA and European Union resulted in large datasets of IC 50 values for various molecules entering the market. The abundant data allows now to construct predictive machine-learning (ML) models. Novel ML algorithms and techniques promise better accuracy in determining IC 50 values of hERG blockade that is comparable or surpassing that of the earlier QSAR or molecular modeling technique. To test the performance of modern ML techniques, we have developed a computational platform integrating various workflows for quantitative structure activity relationship (QSAR) models using data from the ChEMBL database. To establish predictive powers of ML-based algorithms we computed IC 50 values for large dataset of molecules and compared it to automated patch clamp system for a large dataset of hERG blocking and non-blocking drugs, an industry gold standard in studies of cardiotoxicity. The optimal protocol with high sensitivity and predictive power is based on the novel eXtreme gradient boosting (XGBoost) algorithm. The ML-platform with XGBoost displays excellent performance with a coefficient of determination of up to R 2 ~0.8 for pIC 50 values in evaluation datasets, surpassing other metrics and approaches available in literature. Ultimately, the ML-based platform developed in our work is a scalable framework with automation potential to interact with other developing technologies in cardiotoxicity field, including high-throughput electrophysiology measurements delivering large datasets of profiled drugs, rapid synthesis and drug development via progress in synthetic biology.

  16. Rainfall variability and extremes over southern Africa: assessment of a climate model to reproduce daily extremes

    NASA Astrophysics Data System (ADS)

    Williams, C.; Kniveton, D.; Layberry, R.

    2009-04-01

    It is increasingly accepted that that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of ability of a state of the art climate model to simulate climate at daily timescales is carried out using satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA). This dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. The ability of a climate model to simulate current climate provides some indication of how much confidence can be applied to its future predictions. In this paper, simulations of current climate from the UK Meteorological Office Hadley Centre's climate model, in both regional and global mode, are firstly compared to the MIRA dataset at daily timescales. This concentrates primarily on the ability of the model to simulate the spatial and temporal patterns of rainfall variability over southern Africa. Secondly, the ability of the model to reproduce daily rainfall extremes will be assessed, again by a comparison with extremes from the MIRA dataset.

  17. Assessment of the Long Term Trends in Extreme Heat Events and the Associated Health Impacts in the United States

    NASA Astrophysics Data System (ADS)

    Bell, J.; Rennie, J.; Kunkel, K.; Herring, S.; Cullen, H. M.

    2017-12-01

    Land surface air temperature products have been essential for monitoring the evolution of the climate system. Before a temperature dataset is included in such reports, it is important that non-climatic influences be removed or changed so the dataset is considered homogenous. These inhomogeneities include changes in station location, instrumentation and observing practices. While many homogenized products exist on the monthly time scale, few daily products exist, due to the complication of removing breakpoints that are truly inhomogeneous rather than solely by chance (for example, sharp changes due to synoptic conditions). Recently, a sub monthly homogenized dataset has been developed using data and software provided by NOAA's National Centers for Environmental Information (NCEI). Homogeneous daily data are useful for identification and attribution of extreme heat events over a period of time. Projections of increasing temperatures are expected to result in corresponding increases in the frequency, duration, and intensity of extreme heat events. It is also established that extreme heat events can have significant public health impacts, including short-term increases in mortality and morbidity. In addition, it can exacerbate chronic health conditions in vulnerable populations, including renal and cardiovascular issues. To understand how heat events impact a specific population, it will be important to connect observations on the duration and intensity of extreme heat events with health impacts data including insurance claims and hospital admissions data. This presentation will explain the methodology to identify extreme heat events, provide a climatology of heat event onset, length and severity, and explore a case study of an anomalous heat event with available health data.

  18. From daily to sub-daily time steps - Creating a high temporal and spatial resolution climate reference data set for hydrological modeling and bias-correction of RCM data

    NASA Astrophysics Data System (ADS)

    Willkofer, Florian; Wood, Raul R.; Schmid, Josef; von Trentini, Fabian; Ludwig, Ralf

    2016-04-01

    The ClimEx project (Climate change and hydrological extreme events - risks and perspectives for water management in Bavaria and Québec) focuses on the effects of climate change on hydro-meteorological extreme events and their implications for water management in Bavaria and Québec. It builds on the conjoint analysis of a large ensemble of the CRCM5, driven by 50 members of the CanESM2, and the latest information provided through the CORDEX-initiative, to better assess the influence of natural climate variability and climatic change on the dynamics of extreme events. A critical point in the entire project is the preparation of a meteorological reference dataset with the required temporal (1-6h) and spatial (500m) resolution to be able to better evaluate hydrological extreme events in mesoscale river basins. For Bavaria a first reference data set (daily, 1km) used for bias-correction of RCM data was created by combining raster based data (E-OBS [1], HYRAS [2], MARS [3]) and interpolated station data using the meteorological interpolation schemes of the hydrological model WaSiM [4]. Apart from the coarse temporal and spatial resolution, this mosaic of different data sources is considered rather inconsistent and hence, not applicable for modeling of hydrological extreme events. Thus, the objective is to create a dataset with hourly data of temperature, precipitation, radiation, relative humidity and wind speed, which is then used for bias-correction of the RCM data being used as driver for hydrological modeling in the river basins. Therefore, daily data is disaggregated to hourly time steps using the 'Method of fragments' approach [5], based on available training stations. The disaggregation chooses fragments of daily values from observed hourly datasets, based on similarities in magnitude and behavior of previous and subsequent events. The choice of a certain reference station (hourly data, provision of fragments) for disaggregating daily station data (application of fragments) is crucial and several methods will be tested to achieve a profound spatial interpolation. This entire methodology shall be applicable for existing or newly developed datasets. References [1] Haylock, M.R., N. Hofstra, A.M.G. Klein Tank, E.J. Klok, P.D. Jones and M. New. A European daily high-resolution gridded dataset of surface temperature and precipitation. J. Geophys. Res (Atmospheres) (2008), 113, D20119, doi:10.1029/2008JD10201. [2] Rauthe, M., Steiner, H., Riediger, U., Mazurkiewicz, A. and A. Gratzki. A Central European precipitation climatology - Part I: Generation and validation of a high-resolution gridded daily data set (HYRAS). Meteorologische Zeitschrift (2013), 22/3, p.238-256. [3] MARS-AGRI4CAST. AGRI4CAST Interpolated Meteorological Data. http://mars.jrc.ec.europa.eu/mars/ About-us/AGRI4CAST/Data-distribution/AGRI4CAST-Interpolated-Meteorological-Data. 2007, last accessed May 10th, 2013. [4] Schulla, J. Model Description WaSiM - Water balance Simulation Model. 2015, available at: http://wasim.ch/en/products/wasim_description.htm. [5] Sharma, A. and S. Srikanthan. Continuous Rainfall Simulation: A Nonparametric Alternative. 30th Hydrology and Water Resources Symposium, Launceston, Tasmania, 4-7 December, 2006.

  19. Beyond Traditional Extreme Value Theory Through a Metastatistical Approach: Lessons Learned from Precipitation, Hurricanes, and Storm Surges

    NASA Astrophysics Data System (ADS)

    Marani, M.; Zorzetto, E.; Hosseini, S. R.; Miniussi, A.; Scaioni, M.

    2017-12-01

    The Generalized Extreme Value (GEV) distribution is widely adopted irrespective of the properties of the stochastic process generating the extreme events. However, GEV presents several limitations, both theoretical (asymptotic validity for a large number of events/year or hypothesis of Poisson occurrences of Generalized Pareto events), and practical (fitting uses just yearly maxima or a few values above a high threshold). Here we describe the Metastatistical Extreme Value Distribution (MEVD, Marani & Ignaccolo, 2015), which relaxes asymptotic or Poisson/GPD assumptions and makes use of all available observations. We then illustrate the flexibility of the MEVD by applying it to daily precipitation, hurricane intensity, and storm surge magnitude. Application to daily rainfall from a global raingauge network shows that MEVD estimates are 50% more accurate than those from GEV when the recurrence interval of interest is much greater than the observational period. This makes MEVD suited for application to satellite rainfall observations ( 20 yrs length). Use of MEVD on TRMM data yields extreme event patterns that are in better agreement with surface observations than corresponding GEV estimates.Applied to the HURDAT2 Atlantic hurricane intensity dataset, MEVD significantly outperforms GEV estimates of extreme hurricanes. Interestingly, the Generalized Pareto distribution used for "ordinary" hurricane intensity points to the existence of a maximum limit wind speed that is significantly smaller than corresponding physically-based estimates. Finally, we applied the MEVD approach to water levels generated by tidal fluctuations and storm surges at a set of coastal sites spanning different storm-surge regimes. MEVD yields accurate estimates of large quantiles and inferences on tail thickness (fat vs. thin) of the underlying distribution of "ordinary" surges. In summary, the MEVD approach presents a number of theoretical and practical advantages, and outperforms traditional approaches in several applications. We conclude that the MEVD is a significant contribution to further generalize extreme value theory, with implications for a broad range of Earth Sciences.

  20. Phylotranscriptomic consolidation of the jawed vertebrate timetree.

    PubMed

    Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé

    2017-09-01

    Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.

  1. Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

    PubMed

    Yang, Ziheng; Zhu, Tianqi

    2018-02-20

    The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

  2. Modeling extreme sea levels due to tropical and extra-tropical cyclones at the global-scale

    NASA Astrophysics Data System (ADS)

    Muis, S.; Lin, N.; Verlaan, M.; Winsemius, H.; Ward, P.; Aerts, J.

    2017-12-01

    Extreme sea levels, a combination of storm surges and astronomical tides, can cause catastrophic floods. Due to their intense wind speeds and low pressure, tropical cyclones (TCs) typically cause higher storm surges than extra-tropical cyclones (ETCs), but ETCs may still contribute significantly to the overall flood risk. In this contribution, we show a novel approach to model extreme sea levels due to both tropical and extra-tropical cyclones at the global-scale. Using a global hydrodynamic model we have developed the Global Tide and Surge Reanalysis (GTSR) dataset (Muis et al., 2016), which provides daily maximum timeseries of storm tide from 1979 to 2014. GTSR is based on wind and pressure fields from the ERA-Interim climate reanalysis (Dee at al., 2011). A severe limitation of the GTSR dataset is the underrepresentation of TCs. This is due to the relatively coarse grid resolution of ERA-Interim, which means that the strong intensities of TCs are not fully included. Furthermore, the length of ERA-Interim is too short to estimate the probabilities of extreme TCs in a reliable way. We will discuss potential ways to address this limitation, and demonstrate how to improve the global GTSR framework. We will apply the improved framework to the east coast of the United States. First, we improve our meteorological forcing by applying a parametric hurricane model (Holland 1980), and we improve the tide and surge reanalysis dataset (Muis et al., 2016) by explicitly modeling the historical TCs in the Extended Best Track dataset (Demuth et al., 2006). Second, we improve our sampling by statistically extending the observed TC record to many thousands of years (Emanuel et al., 2006). The improved framework allows for the mapping of probabilities of extreme sea levels, including extremes TC events, for the east coast of the United States. ReferencesDee et al (2011). The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc. 137, 553-97. Emanuel et al (2006). A Statistical Deterministic Approach to Hurricane Risk Assessment/ Bull. Am. Meteorol. Soc. 87, 299-314. Holland (1980). An analytic model of the wind and pressure profiles in hurricanes. Mon. Weather Rev. 108, 1212-1218. Muis et al (2016). A global reanalysis of storm surge and extreme sea levels. Nat. Commun. 7, 1-11

  3. Human contribution to the United States extreme heatwaves in the coming decades

    NASA Astrophysics Data System (ADS)

    Russo, E.; Marchese, A. F.; Immè, G.; Russo, S.

    2015-12-01

    In the past decades many intense and long heatwaves have hit large areas across the United States producing notable impacts on human mortality,regional economies, and natural ecosystems.Evidence indicates that anthropogenic climate change will alter the magnitude and frequency of these events. Here, by means of the Heat Wave Magnitude Index daily (HWMId) applied to daily maximum temperature from the United States reanalysis dataset (NLDAS-2), we grade the heat waves occurred in the U.S. since 1980, demonstrating that the two worst events within the studied period occurred in the summer of 1980 and 2011. Moreover, by referring to these two events as extremes, we show that model predictions from the North American COordinated Regional climate Downscaling EXperiment (CORDEX) under different IPCC AR5 scenarios, suggest an increased risk of occurrence of extreme heat waves in the near future (2021-2050). In particular, under the most severe scenario, events of the same severity, as the 1980 and 2011 U.S. heat waves, will become more likely in the studied region.

  4. Climate Change Impacts on the Upper Indus Hydrology: Sources, Shifts and Extremes

    PubMed Central

    Immerzeel, W. W.; Kraaijenbrink, P. D. A.; Shrestha, A. B.; Bierkens, M. F. P.

    2016-01-01

    The Indus basin heavily depends on its upstream mountainous part for the downstream supply of water while downstream demands are high. Since downstream demands will likely continue to increase, accurate hydrological projections for the future supply are important. We use an ensemble of statistically downscaled CMIP5 General Circulation Model outputs for RCP4.5 and RCP8.5 to force a cryospheric-hydrological model and generate transient hydrological projections for the entire 21st century for the upper Indus basin. Three methodological advances are introduced: (i) A new precipitation dataset that corrects for the underestimation of high-altitude precipitation is used. (ii) The model is calibrated using data on river runoff, snow cover and geodetic glacier mass balance. (iii) An advanced statistical downscaling technique is used that accounts for changes in precipitation extremes. The analysis of the results focuses on changes in sources of runoff, seasonality and hydrological extremes. We conclude that the future of the upper Indus basin’s water availability is highly uncertain in the long run, mainly due to the large spread in the future precipitation projections. Despite large uncertainties in the future climate and long-term water availability, basin-wide patterns and trends of seasonal shifts in water availability are consistent across climate change scenarios. Most prominent is the attenuation of the annual hydrograph and shift from summer peak flow towards the other seasons for most ensemble members. In addition there are distinct spatial patterns in the response that relate to monsoon influence and the importance of meltwater. Analysis of future hydrological extremes reveals that increases in intensity and frequency of extreme discharges are very likely for most of the upper Indus basin and most ensemble members. PMID:27828994

  5. Climate Change Impacts on the Upper Indus Hydrology: Sources, Shifts and Extremes.

    PubMed

    Lutz, A F; Immerzeel, W W; Kraaijenbrink, P D A; Shrestha, A B; Bierkens, M F P

    2016-01-01

    The Indus basin heavily depends on its upstream mountainous part for the downstream supply of water while downstream demands are high. Since downstream demands will likely continue to increase, accurate hydrological projections for the future supply are important. We use an ensemble of statistically downscaled CMIP5 General Circulation Model outputs for RCP4.5 and RCP8.5 to force a cryospheric-hydrological model and generate transient hydrological projections for the entire 21st century for the upper Indus basin. Three methodological advances are introduced: (i) A new precipitation dataset that corrects for the underestimation of high-altitude precipitation is used. (ii) The model is calibrated using data on river runoff, snow cover and geodetic glacier mass balance. (iii) An advanced statistical downscaling technique is used that accounts for changes in precipitation extremes. The analysis of the results focuses on changes in sources of runoff, seasonality and hydrological extremes. We conclude that the future of the upper Indus basin's water availability is highly uncertain in the long run, mainly due to the large spread in the future precipitation projections. Despite large uncertainties in the future climate and long-term water availability, basin-wide patterns and trends of seasonal shifts in water availability are consistent across climate change scenarios. Most prominent is the attenuation of the annual hydrograph and shift from summer peak flow towards the other seasons for most ensemble members. In addition there are distinct spatial patterns in the response that relate to monsoon influence and the importance of meltwater. Analysis of future hydrological extremes reveals that increases in intensity and frequency of extreme discharges are very likely for most of the upper Indus basin and most ensemble members.

  6. Filtering Raw Terrestrial Laser Scanning Data for Efficient and Accurate Use in Geomorphologic Modeling

    NASA Astrophysics Data System (ADS)

    Gleason, M. J.; Pitlick, J.; Buttenfield, B. P.

    2011-12-01

    Terrestrial laser scanning (TLS) represents a new and particularly effective remote sensing technique for investigating geomorphologic processes. Unfortunately, TLS data are commonly characterized by extremely large volume, heterogeneous point distribution, and erroneous measurements, raising challenges for applied researchers. To facilitate efficient and accurate use of TLS in geomorphology, and to improve accessibility for TLS processing in commercial software environments, we are developing a filtering method for raw TLS data to: eliminate data redundancy; produce a more uniformly spaced dataset; remove erroneous measurements; and maintain the ability of the TLS dataset to accurately model terrain. Our method conducts local aggregation of raw TLS data using a 3-D search algorithm based on the geometrical expression of expected random errors in the data. This approach accounts for the estimated accuracy and precision limitations of the instruments and procedures used in data collection, thereby allowing for identification and removal of potential erroneous measurements prior to data aggregation. Initial tests of the proposed technique on a sample TLS point cloud required a modest processing time of approximately 100 minutes to reduce dataset volume over 90 percent (from 12,380,074 to 1,145,705 points). Preliminary analysis of the filtered point cloud revealed substantial improvement in homogeneity of point distribution and minimal degradation of derived terrain models. We will test the method on two independent TLS datasets collected in consecutive years along a non-vegetated reach of the North Fork Toutle River in Washington. We will evaluate the tool using various quantitative, qualitative, and statistical methods. The crux of this evaluation will include a bootstrapping analysis to test the ability of the filtered datasets to model the terrain at roughly the same accuracy as the raw datasets.

  7. Characterizing hydrological hazards and trends with the NASA South Asia Land Data Assimilation System

    NASA Astrophysics Data System (ADS)

    Ghatak, D.; Zaitchik, B. F.; Limaye, A. S.; Searby, N. D.; Doorn, B.; Bolten, J. D.; Toll, D. L.; Lee, S.; Mourad, B.; Narula, K.; Nischal, S.; Iceland, C.; Bajracharya, B.; Kumar, S.; Shrestha, B. R.; Murthy, M.; Hain, C.; Anderson, M. C.

    2015-12-01

    South Asia faces severe challenges to meet the need of water for agricultural, domestic and industrial purposes while coping with the threats posed by climate and land use/cover changes on regional hydrology. South Asia is also characterized by extreme climate contrasts, remote and poorly-monitored headwaters regions, and large uncertainties in estimates of consumptive water withdrawals. Here, we present results from the South Asia Land Data Assimilation System (South Asia LDAS) that apply multiple simulations involving different combination of forcing datasets, land surface models, and satellite-derived parameter datasets to characterize the distributed water balance of the subcontinent. The South Asia LDAS ensemble of simulations provides a range of uncertainty associated with model products. The system includes customized irrigation schemes to capture water use and HYMAP streamflow routing for application to floods. This presentation focuses on two key application areas for South Asia LDAS: the representation of extreme floods in transboundary rivers, and the estimate of water use in irrigated agriculture. We show that South Asia LDAS captures important features of both phenomena, address opportunities and barriers for the use of South Asia LDAS in decision support, and review uncertainties and limitations.This work is being performed by an interdisciplinary team of scientists and decision makers, to ensure that the modeling system meets the needs of decision makers at national and regional levels.

  8. Nonlinear responses of southern African rainfall to forcing from Atlantic SST in a high-resolution regional climate model

    NASA Astrophysics Data System (ADS)

    Williams, C.; Kniveton, D.; Layberry, R.

    2009-04-01

    It is increasingly accepted that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. In this research, high resolution satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA) are used as a basis for undertaking model experiments using a state-of-the-art regional climate model. The MIRA dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. Once the model's ability to reproduce extremes has been assessed, idealised regions of sea surface temperature (SST) anomalies are used to force the model, with the overall aim of investigating the ways in which SST anomalies influence rainfall extremes over southern Africa. In this paper, results from sensitivity testing of the regional climate model's domain size are briefly presented, before a comparison of simulated daily rainfall from the model with the satellite-derived dataset. Secondly, simulations of current climate and rainfall extremes from the model are compared to the MIRA dataset at daily timescales. Finally, the results from the idealised SST experiments are presented, suggesting highly nonlinear associations between rainfall extremes remote SST anomalies.

  9. A Global Drought and Flood Catalogue for the past 100 years

    NASA Astrophysics Data System (ADS)

    Sheffield, J.; He, X.; Peng, L.; Pan, M.; Fisher, C. K.; Wood, E. F.

    2017-12-01

    Extreme hydrological events cause the most impacts of natural hazards globally, impacting on a wide range of sectors including, most prominently, agriculture, food security and water availability and quality, but also on energy production, forestry, health, transportation and fisheries. Understanding how floods and droughts intersect, and have changed in the past provides the basis for understanding current risk and how it may change in the future. To do this requires an understanding of the mechanisms associated with events and therefore their predictability, attribution of long-term changes in risk, and quantification of projections of changes in the future. Of key importance are long-term records of relevant variables so that risk can be quantified more accurately, given the growing acknowledgement that risk is not stationary under long-term climate variability and climate change. To address this, we develop a catalogue of drought and flood events based on land surface and hydrodynamic modeling, forced by a hybrid meteorological dataset that draws from the continuity and coverage of reanalysis, and satellite datasets, merged with global gauge databases. The meteorological dataset is corrected for temporal inhomogeneities, spurious trends and variable inter-dependencies to ensure long-term consistency, as well as realistic representation of short-term variability and extremes. The VIC land surface model is run for the past 100 years at 0.25-degree resolution for global land areas. The VIC runoff is then used to drive the CaMa-Flood hydrodynamic model to obtain information on flood inundation risk. The model outputs are compared to satellite based estimates of flood and drought conditions and the observational flood record. The data are analyzed in terms of the spatio-temporal characteristics of large-scale flood and drought events with a particular focus on characterizing the long-term variability in risk. Significant changes in risk occur on multi-decadal time scales and are mostly associated with variability in the North Atlantic and Pacific. The catalogue can be used for analysis of extreme events, risk assessment, and as a benchmark for model evaluation.

  10. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106

  11. Topic modeling for cluster analysis of large biological and medical datasets.

    PubMed

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.

  12. Trends and variability of daily precipitation extremes during 1960-2012 in the Yangtze River Basin, China

    USDA-ARS?s Scientific Manuscript database

    Trends and variability of extreme precipitation events are important for water-related disaster prevention and mitigation as well as water resource management. Based on daily precipitation dataset from 143 meteorological stations in the Yangtze River Basin (YRB), a suite of precipitation indices rec...

  13. IMPACTS OF CLIMATE-INDUCED CHANGES IN EXTREME EVENTS ON OZONE AND PARTICULATE MATTER AIR QUALITY

    EPA Science Inventory

    Historical data records of air pollution meteorology from multiple datasets will be compiled and analyzed to identify possible trends in extreme events. Changes in climate and air quality between 2010 and 2050 will be simulated with a suite of models. The consequential effe...

  14. Towards a monitoring system of temperature extremes in Europe

    NASA Astrophysics Data System (ADS)

    Lavaysse, Christophe; Cammalleri, Carmelo; Dosio, Alessandro; van der Schrier, Gerard; Toreti, Andrea; Vogt, Jürgen

    2018-01-01

    Extreme-temperature anomalies such as heat and cold waves may have strong impacts on human activities and health. The heat waves in western Europe in 2003 and in Russia in 2010, or the cold wave in southeastern Europe in 2012, generated a considerable amount of economic loss and resulted in the death of several thousands of people. Providing an operational system to monitor extreme-temperature anomalies in Europe is thus of prime importance to help decision makers and emergency services to be responsive to an unfolding extreme event. In this study, the development and the validation of a monitoring system of extreme-temperature anomalies are presented. The first part of the study describes the methodology based on the persistence of events exceeding a percentile threshold. The method is applied to three different observational datasets, in order to assess the robustness and highlight uncertainties in the observations. The climatology of extreme events from the last 21 years is then analysed to highlight the spatial and temporal variability of the hazard, and discrepancies amongst the observational datasets are discussed. In the last part of the study, the products derived from this study are presented and discussed with respect to previous studies. The results highlight the accuracy of the developed index and the statistical robustness of the distribution used to calculate the return periods.

  15. Analyses of Sporocarps, Morphotyped Ectomycorrhizae, Environmental ITS and LSU Sequences Identify Common Genera that Occur at a Periglacial Site

    PubMed Central

    Jumpponen, Ari; Brown, Shawn P.; Trappe, James M.; Cázares, Efrén; Strömmer, Rauni

    2015-01-01

    Periglacial substrates exposed by retreating glaciers represent extreme and sensitive environments defined by a variety of abiotic stressors that challenge organismal establishment and survival. The simple communities often residing at these sites enable their analyses in depth. We utilized existing data and mined published sporocarp, morphotyped ectomycorrhizae (ECM), as well as environmental sequence data of internal transcribed spacer (ITS) and large subunit (LSU) regions of the ribosomal RNA gene to identify taxa that occur at a glacier forefront in the North Cascades Mountains in Washington State in the USA. The discrete data types consistently identified several common and widely distributed genera, perhaps best exemplified by Inocybe and Laccaria. Although we expected low diversity and richness, our environmental sequence data included 37 ITS and 26 LSU operational taxonomic units (OTUs) that likely form ECM. While environmental surveys of metabarcode markers detected large numbers of targeted ECM taxa, both the fruiting body and the morphotype datasets included genera that were undetected in either of the metabarcode datasets. These included hypogeous (Hymenogaster) and epigeous (Lactarius) taxa, some of which may produce large sporocarps but may possess small and/or spatially patchy genets. We highlight the importance of combining various data types to provide a comprehensive view of a fungal community, even in an environment assumed to host communities of low species richness and diversity. PMID:29376900

  16. A global dataset of sub-daily rainfall indices

    NASA Astrophysics Data System (ADS)

    Fowler, H. J.; Lewis, E.; Blenkinsop, S.; Guerreiro, S.; Li, X.; Barbero, R.; Chan, S.; Lenderink, G.; Westra, S.

    2017-12-01

    It is still uncertain how hydrological extremes will change with global warming as we do not fully understand the processes that cause extreme precipitation under current climate variability. The INTENSE project is using a novel and fully-integrated data-modelling approach to provide a step-change in our understanding of the nature and drivers of global precipitation extremes and change on societally relevant timescales, leading to improved high-resolution climate model representation of extreme rainfall processes. The INTENSE project is in conjunction with the World Climate Research Programme (WCRP)'s Grand Challenge on 'Understanding and Predicting Weather and Climate Extremes' and the Global Water and Energy Exchanges Project (GEWEX) Science questions. A new global sub-daily precipitation dataset has been constructed (data collection is ongoing). Metadata for each station has been calculated, detailing record lengths, missing data, station locations. A set of global hydroclimatic indices have been produced based upon stakeholder recommendations including indices that describe maximum rainfall totals and timing, the intensity, duration and frequency of storms, frequency of storms above specific thresholds and information about the diurnal cycle. This will provide a unique global data resource on sub-daily precipitation whose derived indices will be freely available to the wider scientific community.

  17. Rainfall variability and extremes over southern Africa: Assessment of a climate model to reproduce daily extremes

    NASA Astrophysics Data System (ADS)

    Williams, C. J. R.; Kniveton, D. R.; Layberry, R.

    2009-04-01

    It is increasingly accepted that that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The subcontinent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of ability of a state of the art climate model to simulate climate at daily timescales is carried out using satellite derived rainfall data from the Microwave Infra-Red Algorithm (MIRA). This dataset covers the period from 1993-2002 and the whole of southern Africa at a spatial resolution of 0.1 degree longitude/latitude. The ability of a climate model to simulate current climate provides some indication of how much confidence can be applied to its future predictions. In this paper, simulations of current climate from the UK Meteorological Office Hadley Centre's climate model, in both regional and global mode, are firstly compared to the MIRA dataset at daily timescales. This concentrates primarily on the ability of the model to simulate the spatial and temporal patterns of rainfall variability over southern Africa. Secondly, the ability of the model to reproduce daily rainfall extremes will be assessed, again by a comparison with extremes from the MIRA dataset. The paper will conclude by discussing the user needs of satellite rainfall retrievals from a climate change modelling prospective.

  18. Assessment of a climate model to reproduce rainfall variability and extremes over Southern Africa

    NASA Astrophysics Data System (ADS)

    Williams, C. J. R.; Kniveton, D. R.; Layberry, R.

    2010-01-01

    It is increasingly accepted that any possible climate change will not only have an influence on mean climate but may also significantly alter climatic variability. A change in the distribution and magnitude of extreme rainfall events (associated with changing variability), such as droughts or flooding, may have a far greater impact on human and natural systems than a changing mean. This issue is of particular importance for environmentally vulnerable regions such as southern Africa. The sub-continent is considered especially vulnerable to and ill-equipped (in terms of adaptation) for extreme events, due to a number of factors including extensive poverty, famine, disease and political instability. Rainfall variability and the identification of rainfall extremes is a function of scale, so high spatial and temporal resolution data are preferred to identify extreme events and accurately predict future variability. The majority of previous climate model verification studies have compared model output with observational data at monthly timescales. In this research, the assessment of ability of a state of the art climate model to simulate climate at daily timescales is carried out using satellite-derived rainfall data from the Microwave Infrared Rainfall Algorithm (MIRA). This dataset covers the period from 1993 to 2002 and the whole of southern Africa at a spatial resolution of 0.1° longitude/latitude. This paper concentrates primarily on the ability of the model to simulate the spatial and temporal patterns of present-day rainfall variability over southern Africa and is not intended to discuss possible future changes in climate as these have been documented elsewhere. Simulations of current climate from the UK Meteorological Office Hadley Centre's climate model, in both regional and global mode, are firstly compared to the MIRA dataset at daily timescales. Secondly, the ability of the model to reproduce daily rainfall extremes is assessed, again by a comparison with extremes from the MIRA dataset. The results suggest that the model reproduces the number and spatial distribution of rainfall extremes with some accuracy, but that mean rainfall and rainfall variability is under-estimated (over-estimated) over wet (dry) regions of southern Africa.

  19. Representing Extremes in Agricultural Models

    NASA Technical Reports Server (NTRS)

    Ruane, Alex

    2015-01-01

    AgMIP and related projects are conducting several activities to understand and improve crop model response to extreme events. This involves crop model studies as well as the generation of climate datasets and scenarios more capable of capturing extremes. Models are typically less responsive to extreme events than we observe, and miss several forms of extreme events. Models also can capture interactive effects between climate change and climate extremes. Additional work is needed to understand response of markets and economic systems to food shocks. AgMIP is planning a Coordinated Global and Regional Assessment of Climate Change Impacts on Agricultural Production and Food Security with an aim to inform the IPCC Sixth Assessment Report.

  20. A large set of potential past, present and future hydro-meteorological time series for the UK

    NASA Astrophysics Data System (ADS)

    Guillod, Benoit P.; Jones, Richard G.; Dadson, Simon J.; Coxon, Gemma; Bussi, Gianbattista; Freer, James; Kay, Alison L.; Massey, Neil R.; Sparrow, Sarah N.; Wallom, David C. H.; Allen, Myles R.; Hall, Jim W.

    2018-01-01

    Hydro-meteorological extremes such as drought and heavy precipitation can have large impacts on society and the economy. With potentially increasing risks associated with such events due to climate change, properly assessing the associated impacts and uncertainties is critical for adequate adaptation. However, the application of risk-based approaches often requires large sets of extreme events, which are not commonly available. Here, we present such a large set of hydro-meteorological time series for recent past and future conditions for the United Kingdom based on weather@home 2, a modelling framework consisting of a global climate model (GCM) driven by observed or projected sea surface temperature (SST) and sea ice which is downscaled to 25 km over the European domain by a regional climate model (RCM). Sets of 100 time series are generated for each of (i) a historical baseline (1900-2006), (ii) five near-future scenarios (2020-2049) and (iii) five far-future scenarios (2070-2099). The five scenarios in each future time slice all follow the Representative Concentration Pathway 8.5 (RCP8.5) and sample the range of sea surface temperature and sea ice changes from CMIP5 (Coupled Model Intercomparison Project Phase 5) models. Validation of the historical baseline highlights good performance for temperature and potential evaporation, but substantial seasonal biases in mean precipitation, which are corrected using a linear approach. For extremes in low precipitation over a long accumulation period ( > 3 months) and shorter-duration high precipitation (1-30 days), the time series generally represents past statistics well. Future projections show small precipitation increases in winter but large decreases in summer on average, leading to an overall drying, consistently with the most recent UK Climate Projections (UKCP09) but larger in magnitude than the latter. Both drought and high-precipitation events are projected to increase in frequency and intensity in most regions, highlighting the need for appropriate adaptation measures. Overall, the presented dataset is a useful tool for assessing the risk associated with drought and more generally with hydro-meteorological extremes in the UK.

  1. Refining multi-model projections of temperature extremes by evaluation against land-atmosphere coupling diagnostics

    NASA Astrophysics Data System (ADS)

    Sippel, Sebastian; Zscheischler, Jakob; Mahecha, Miguel D.; Orth, Rene; Reichstein, Markus; Vogel, Martha; Seneviratne, Sonia I.

    2017-05-01

    The Earth's land surface and the atmosphere are strongly interlinked through the exchange of energy and matter. This coupled behaviour causes various land-atmosphere feedbacks, and an insufficient understanding of these feedbacks contributes to uncertain global climate model projections. For example, a crucial role of the land surface in exacerbating summer heat waves in midlatitude regions has been identified empirically for high-impact heat waves, but individual climate models differ widely in their respective representation of land-atmosphere coupling. Here, we compile an ensemble of 54 combinations of observations-based temperature (T) and evapotranspiration (ET) benchmarking datasets and investigate coincidences of T anomalies with ET anomalies as a proxy for land-atmosphere interactions during periods of anomalously warm temperatures. First, we demonstrate that a large fraction of state-of-the-art climate models from the Coupled Model Intercomparison Project (CMIP5) archive produces systematically too frequent coincidences of high T anomalies with negative ET anomalies in midlatitude regions during the warm season and in several tropical regions year-round. These coincidences (high T, low ET) are closely related to the representation of temperature variability and extremes across the multi-model ensemble. Second, we derive a land-coupling constraint based on the spread of the T-ET datasets and consequently retain only a subset of CMIP5 models that produce a land-coupling behaviour that is compatible with these benchmark estimates. The constrained multi-model simulations exhibit more realistic temperature extremes of reduced magnitude in present climate in regions where models show substantial spread in T-ET coupling, i.e. biases in the model ensemble are consistently reduced. Also the multi-model simulations for the coming decades display decreased absolute temperature extremes in the constrained ensemble. On the other hand, the differences between projected and present-day climate extremes are affected to a lesser extent by the applied constraint, i.e. projected changes are reduced locally by around 0.5 to 1 °C - but this remains a local effect in regions that are highly sensitive to land-atmosphere coupling. In summary, our approach offers a physically consistent, diagnostic-based avenue to evaluate multi-model ensembles and subsequently reduce model biases in simulated and projected extreme temperatures.

  2. The 3D Reference Earth Model: Status and Preliminary Results

    NASA Astrophysics Data System (ADS)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    In the 20th century, seismologists constructed models of how average physical properties (e.g. density, rigidity, compressibility, anisotropy) vary with depth in the Earth's interior. These one-dimensional (1D) reference Earth models (e.g. PREM) have proven indispensable in earthquake location, imaging of interior structure, understanding material properties under extreme conditions, and as a reference in other fields, such as particle physics and astronomy. Over the past three decades, new datasets motivated more sophisticated efforts that yielded models of how properties vary both laterally and with depth in the Earth's interior. Though these three-dimensional (3D) models exhibit compelling similarities at large scales, differences in the methodology, representation of structure, and dataset upon which they are based, have prevented the creation of 3D community reference models. As part of the REM-3D project, we are compiling and reconciling reference seismic datasets of body wave travel-time measurements, fundamental mode and overtone surface wave dispersion measurements, and normal mode frequencies and splitting functions. These reference datasets are being inverted for a long-wavelength, 3D reference Earth model that describes the robust long-wavelength features of mantle heterogeneity. As a community reference model with fully quantified uncertainties and tradeoffs and an associated publically available dataset, REM-3D will facilitate Earth imaging studies, earthquake characterization, inferences on temperature and composition in the deep interior, and be of improved utility to emerging scientific endeavors, such as neutrino geoscience. Here, we summarize progress made in the construction of the reference long period dataset and present a preliminary version of REM-3D in the upper-mantle. In order to determine the level of detail warranted for inclusion in REM-3D, we analyze the spectrum of discrepancies between models inverted with different subsets of the reference dataset. This procedure allows us to evaluate the extent of consistency in imaging heterogeneity at various depths and between spatial scales.

  3. Comparison and validation of gridded precipitation datasets for Spain

    NASA Astrophysics Data System (ADS)

    Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

    2016-04-01

    In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely better than ERA-Interim, which has a much coarser representation of the relief, which is crucial for precipitation. These results are a contribution to the Spanish Case Study of the eartH2Observe project, which is focused on the simulation of drought processes in Spain using Land-Surface Models (LSM). This study will also be helpful in the Spanish MARCO project, which aims at improving the ability of RCMs to simulate hydrometeorological extremes.

  4. Application of a fast skyline computation algorithm for serendipitous searching problems

    NASA Astrophysics Data System (ADS)

    Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary

    2018-02-01

    Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.

  5. Characterization of extreme years in Central Europe between 2000 and 2016 according to specific vegetation characteristics based on Earth Observatory data

    NASA Astrophysics Data System (ADS)

    Kern, Anikó; Marjanović, Hrvoje; Barcza, Zoltán

    2017-04-01

    Extreme weather events frequently occur in Central Europe, affecting the state of the vegetation in large areas. Droughts and heat-waves affect all plant functional types, but the response of the vegetation is not uniform and depends on other parameters, plant strategies and the antecedent meteorological conditions as well. Meteorologists struggle with the definition of extreme events and selection of years that can be considered as extreme in terms of meteorological conditions due to the large variability of the meteorological parameters both in time and space. One way to overcome this problem is the definition of extreme weather based on its observed effect on plant state. The Normalized Difference Vegetation Index (NDVI), the Enhanced Vegetation Index (EVI), the Leaf Area Index (LAI), the Fraction of Photosynthetically Active Radiation (FPAR) and the Gross Primary Production (GPP) are different measures of the land vegetation derived from remote sensing data, providing information about the plant state, but it is less known how weather anomalies affect these measures. We used the vegetation related official products created from the measurements of the MODerate resolution Imaging Spectroradiometer (MODIS) on board satellite Terra to select and characterize the extreme years in Central European countries during the 2000-2016 time period. The applied Collection-6 MOD13 NDVI/EVI, MOD15 LAI/FPAR and MOD17 GPP datasets have 500 m × 500 m spatial resolution covering the region of the Carpathian-Basin. After quality and noise filtering (and temporal interpolation in case of MOD13) 8-day anomaly values were derived to investigate the different years. The freely available FORESEE meteorological database was used to study climate variability in the region. Daily precipitation and maximum/minimum temperature fields at 1/12° × 1/12° grid were resampled to the 8-day temporal and 500 m × 500 m spatial resolution of the MODIS products. To discriminate the different behavior of the various plant functional types MODIS (MCD12) and CORINE (CLC2012) land cover datasets were applied and handled together. Based on the determination of the reliable pixels with different plant types the response of broadleaf forests, coniferous forests, grasslands and croplands were discriminated and investigated. Characteristic time periods were selected based on the remote sensing data to define anomalies, and then the meteorological data were used to define critical time periods within the year that has the strongest effect on the observed anomalies. Similarities/dissimilarities between the behaviors of the different remotely sensed measures are also studied to elucidate the consistency of the indices. The results indicate that the diverse remote sensing indices typically co-vary but reveal strong plant functional type dependency. The study suggest that the selection of extreme years based on annual data is not the best choice, as shorter time periods within the years explain the anomalies to a higher degree than annual data. The results can be used to select anomalous years outside of the satellite era as well. Keywords: Remote sensing, meteorology; extreme years; MODIS, NDVI; EVI; LAI; FPAR; GPP; phenology

  6. Widespread extreme drought events in Iberia and their relationship with North Atlantic moisture flux deficit

    NASA Astrophysics Data System (ADS)

    Liberato, Margarida L. R.; Montero, Irene; Russo, Ana; Gouveia, Célia; Ramos, Alexandre M.; Trigo, Ricardo M.

    2015-04-01

    Droughts represent one of the most frequent climatic extreme events on the Iberian Peninsula, often with widespread negative ecological and environmental impacts, resulting in major socio-economic damages such as large decreases in hydroelectricity and agricultural productions or increasing forest fire risk. Unlike other weather driven extreme events, droughts duration could be from few months to several years. Here we employ a recently developed climatic drought index, the Standardized Precipitation Evapotranspiration Index (SPEI; Vicente-Serrano et al. 2010a), based on the simultaneous use of precipitation and temperature fields. This index holds the advantage of combining a multi-scalar character with the capacity to include the effects of temperature variability on drought assessment (Vicente-Serrano et al., 2010a). In this study the SPEI was computed using the Climatic Research Unit (CRU) TS3.21 High Resolution Gridded Data (0.5°) for the period 1901-2012. At this resolution the study region of Iberian Peninsula corresponds to a square of 30x30 grid pixels. The CRU Potential Evapotranspiration (PET) was used, through the Penmann-Monteith equation and the log-logistic probability distribution. This formulation allows a very good fit to the series of differences between precipitation and PET (Vicente-Serrano et al., 2010b), using monthly averages of daily maximum and minimum temperature data and also monthly precipitation records. The parameters were estimated by means of the L-moment method. The application of multi-scalar indices to the high-resolution datasets allows identifying whether the Iberian Peninsula is in hydric stress and also whether drought is installed. Based on the gridded SPEI datasets, spanning from 1901 to 2012, obtained for timescales 6, 12, 18 and 24 months, an objective method is applied for ranking the most extensive extreme drought events that occurred on the Iberian Peninsula. This objective method is based on the evaluation of the drought's magnitude, which is obtained after considering the area affected - defined by SPEI values over a certain threshold (in this case SPEI < -1.28) - as well as its intensity in each grid point. Different rankings are presented for the different timescales considering both the entire Iberian Peninsula and Portugal. Furthermore we used the NCEP/NCAR reanalysis in the 1948-2012 period, namely, the geopotential height, temperature, wind and specific humidity fields at all pressure levels and mean sea level pressure (MSLP) and total column water vapour (TCWV) for the Euro-Atlantic sector (60° W to 40° E, 20° N to 70° N) at full temporal (six hourly) and spatial (2.5° regular horizontal grid) resolutions available as well as the globally gridded monthly precipitation products of the Global Precipitation Climatology Centre (GPCC), to analyse the large-scale conditions associated with the most extreme droughts in Iberia. Results show that during these drought periods there is a clear moisture deficit over the region, with permanent negative anomalies of TCWV. Additionally, in these occasions, the zonal moisture transport is more intense over the northern Atlantic and less intense on the subtropics while the meridional moisture transport is intensified, in accordance with the barotropic structure of HGT anomalies. Vicente-Serrano, S.M., Beguería, S., and López-Moreno, J.I. (2010a). A Multi-scalar drought index sensitive to global warming: The Standardized Precipitation Evapotranspiration Index - SPEI. Journal of Climate, 23, 1696-1718. Vicente-Serrano, S.M., Beguería, S., López-Moreno, J.I., Angulo, M., and El Kenawy, A. (2010b). A new global 0.5° gridded dataset (1901-2006) of a multiscalar drought index: comparison with current drought index datasets based on the Palmer Drought Severity Index. Journal of Hydrometeorology, 11, 1033-1043 Acknowledgements: This work was partially supported by national funds through FCT (Fundação para a Ciência e a Tecnologia, Portugal) under project QSECA (PTDC/AAGGLO/4155/2012).

  7. Streamflow model of the six-country transboundary Ganges-Bhramaputra and Meghna river basin

    NASA Astrophysics Data System (ADS)

    Rahman, K.; Lehmann, A.; Dennedy-Frank, P. J.; Gorelick, S.

    2014-12-01

    Extremely large-scale river basin modelling remains a challenge for water resources planning in the developing world. Such planning is particularly difficult in the developing world because of the lack of data on both natural (climatological, hydrological) processes and complex anthropological influences. We simulate three enormous river basins located in south Asia. The Ganges-Bhramaputra and Meghna (GBM) River Basins cover an area of 1.75 million km2 associated with 6 different countries, including the Bengal delta, which is the most densely populated delta in the world with ~600 million people. We target this developing region to better understand the hydrological system and improve water management planning in these transboundary watersheds. This effort uses the Soil and Water Assessment Tool (SWAT) to simulate streamflow in the GBM River Basins and assess the use of global climatological datasets for such large scale river modeling. We evaluate the utility of three global rainfall datasets to reproduce measured river discharge: the Tropical Rainfall Measuring Mission (TRMM) from NASA, the National Centers for Environmental Prediction (NCEP) reanalysis, and the World Metrological Organization (WMO) reanalysis. We use global datasets for spatial information as well: 90m DEM from the Shuttle Radar Topographic Mission, 300m GlobCover land use maps, and 1000 km FAO soil map. We find that SWAT discharge estimates match the observed streamflow well (NSE=0.40-0.66, R2=0.60-0.70) when using meteorological estimates from the NCEP reanalysis. However, SWAT estimates diverge from observed discharge when using meteorological estimates from TRMM and the WMO reanalysis.

  8. Accurate and fast multiple-testing correction in eQTL studies.

    PubMed

    Sul, Jae Hoon; Raj, Towfique; de Jong, Simone; de Bakker, Paul I W; Raychaudhuri, Soumya; Ophoff, Roel A; Stranger, Barbara E; Eskin, Eleazar; Han, Buhm

    2015-06-04

    In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. An observational and modeling study of the August 2017 Florida climate extreme event.

    NASA Astrophysics Data System (ADS)

    Konduru, R.; Singh, V.; Routray, A.

    2017-12-01

    A special report on the climate extremes by the Intergovernmental Panel on Climate Change (IPCC) elucidates that the sole cause of disasters is due to the exposure and vulnerability of the human and natural system to the climate extremes. The cause of such a climate extreme could be anthropogenic or non-anthropogenic. Therefore, it is challenging to discern the critical factor of influence for a particular climate extreme. Such kind of perceptive study with reasonable confidence on climate extreme events is possible only if there exist any past case studies. A similar rarest climate extreme problem encountered in the case of Houston floods and extreme rainfall over Florida in August 2017. A continuum of hurricanes like Harvey and Irma targeted the Florida region and caused catastrophe. Due to the rarity of August 2017 Florida climate extreme event, it requires the in-depth study on this case. To understand the multi-faceted nature of the event, a study on the development of the Harvey hurricane and its progression and dynamics is significant. Current article focus on the observational and modeling study on the Harvey hurricane. A global model named as NCUM (The global UK Met office Unified Model (UM) operational at National Center for Medium Range Weather Forecasting, India, was utilized to simulate the Harvey hurricane. The simulated rainfall and wind fields were compared with the observational datasets like Tropical Rainfall Measuring Mission rainfall datasets and Era-Interim wind fields. The National Centre for Environmental Prediction (NCEP) automated tracking system was utilized to track the Harvey hurricane, and the tracks were analyzed statistically for different forecasts concerning the Harvey hurricane track of Joint Typhon Warning Centre. Further, the current study will be continued to investigate the atmospheric processes involved in the August 2017 Florida climate extreme event.

  10. Defect Detection and Segmentation Framework for Remote Field Eddy Current Sensor Data

    PubMed Central

    2017-01-01

    Remote-Field Eddy-Current (RFEC) technology is often used as a Non-Destructive Evaluation (NDE) method to prevent water pipe failures. By analyzing the RFEC data, it is possible to quantify the corrosion present in pipes. Quantifying the corrosion involves detecting defects and extracting their depth and shape. For large sections of pipelines, this can be extremely time-consuming if performed manually. Automated approaches are therefore well motivated. In this article, we propose an automated framework to locate and segment defects in individual pipe segments, starting from raw RFEC measurements taken over large pipelines. The framework relies on a novel feature to robustly detect these defects and a segmentation algorithm applied to the deconvolved RFEC signal. The framework is evaluated using both simulated and real datasets, demonstrating its ability to efficiently segment the shape of corrosion defects. PMID:28984823

  11. Defect Detection and Segmentation Framework for Remote Field Eddy Current Sensor Data.

    PubMed

    Falque, Raphael; Vidal-Calleja, Teresa; Miro, Jaime Valls

    2017-10-06

    Remote-Field Eddy-Current (RFEC) technology is often used as a Non-Destructive Evaluation (NDE) method to prevent water pipe failures. By analyzing the RFEC data, it is possible to quantify the corrosion present in pipes. Quantifying the corrosion involves detecting defects and extracting their depth and shape. For large sections of pipelines, this can be extremely time-consuming if performed manually. Automated approaches are therefore well motivated. In this article, we propose an automated framework to locate and segment defects in individual pipe segments, starting from raw RFEC measurements taken over large pipelines. The framework relies on a novel feature to robustly detect these defects and a segmentation algorithm applied to the deconvolved RFEC signal. The framework is evaluated using both simulated and real datasets, demonstrating its ability to efficiently segment the shape of corrosion defects.

  12. Spatial clustering and meteorological drivers of summer ozone in Europe

    NASA Astrophysics Data System (ADS)

    Carro-Calvo, Leopoldo; Ordóñez, Carlos; García-Herrera, Ricardo; Schnell, Jordan L.

    2017-04-01

    We present a regionalization of summer near-surface ozone (O3) in Europe. For this purpose we apply a K-means algorithm on a gridded MDA8 O3 (maximum daily average 8-h ozone) dataset covering a European domain [15° W - 30° E, 35°-70° N] at 1° x 1° horizontal resolution for the 1998-2012 period. This dataset was compiled by merging observations from the European Monitoring and Evaluation Programme (EMEP) and the European Environment Agency's air quality database (AirBase). The K-means method allows identifying sets of different regions where the O3 concentrations present coherent spatiotemporal patterns and are thus expected to be driven by similar meteorological factors. After some testing, 9 regions were selected: the British Isles, North-Central Europe, Northern Scandinavia, the Baltic countries, the Iberian Peninsula, Western Europe, South-Central Europe, Eastern Europe and the Balkans. For each region we examine the synoptic situations associated with elevated ozone extremes (days exceeding the 95th percentile of the summer MDA8 O3 distribution). Our analyses reveal that there are basically two different kinds of regions in Europe: (a) those in the centre and south of the continent where ozone extremes are associated with elevated temperature within the same region and (b) those in northern Europe where ozone extremes are driven by southerly advection of air masses from warmer, more polluted areas. Even when the observed patterns were initially identified only for days registering high O3 extremes, all summer days can be projected on such patterns to identify the main modes of meteorological variability of O3. We have found that such modes are partly responsible for the day-to-day variability in the O3 concentrations and can explain a relatively large fraction (from 44 to 88 %, depending on the region) of the interannual variability of summer mean MDA8 O3 during the period of analysis. On the other hand, some major teleconnection patterns have been tested but do not seem to exert a large impact on the variability of surface O3 over most regions. The identification of these independent regions where surface ozone presents a coherent behaviour and responds similarly to specific meteorological modes of variability has multiple applications. For instance, the performance of chemical transport models (CTMs) and chemistry-climate models (CCMs) can be separately assessed over such regions to identify areas where they present large biases that need to be corrected. Our results can also be used to test the models' sensitivity to the day-to-day changing meteorology and to climate change over specific regions.

  13. Exploring Antarctic Land Surface Temperature Extremes Using Condensed Anomaly Databases

    NASA Astrophysics Data System (ADS)

    Grant, Glenn Edwin

    Satellite observations have revolutionized the Earth Sciences and climate studies. However, data and imagery continue to accumulate at an accelerating rate, and efficient tools for data discovery, analysis, and quality checking lag behind. In particular, studies of long-term, continental-scale processes at high spatiotemporal resolutions are especially problematic. The traditional technique of downloading an entire dataset and using customized analysis code is often impractical or consumes too many resources. The Condensate Database Project was envisioned as an alternative method for data exploration and quality checking. The project's premise was that much of the data in any satellite dataset is unneeded and can be eliminated, compacting massive datasets into more manageable sizes. Dataset sizes are further reduced by retaining only anomalous data of high interest. Hosting the resulting "condensed" datasets in high-speed databases enables immediate availability for queries and exploration. Proof of the project's success relied on demonstrating that the anomaly database methods can enhance and accelerate scientific investigations. The hypothesis of this dissertation is that the condensed datasets are effective tools for exploring many scientific questions, spurring further investigations and revealing important information that might otherwise remain undetected. This dissertation uses condensed databases containing 17 years of Antarctic land surface temperature anomalies as its primary data. The study demonstrates the utility of the condensate database methods by discovering new information. In particular, the process revealed critical quality problems in the source satellite data. The results are used as the starting point for four case studies, investigating Antarctic temperature extremes, cloud detection errors, and the teleconnections between Antarctic temperature anomalies and climate indices. The results confirm the hypothesis that the condensate databases are a highly useful tool for Earth Science analyses. Moreover, the quality checking capabilities provide an important method for independent evaluation of dataset veracity.

  14. SDCLIREF - A sub-daily gridded reference dataset

    NASA Astrophysics Data System (ADS)

    Wood, Raul R.; Willkofer, Florian; Schmid, Franz-Josef; Trentini, Fabian; Komischke, Holger; Ludwig, Ralf

    2017-04-01

    Climate change is expected to impact the intensity and frequency of hydrometeorological extreme events. In order to adequately capture and analyze extreme rainfall events, in particular when assessing flood and flash flood situations, data is required at high spatial and sub-daily resolution which is often not available in sufficient density and over extended time periods. The ClimEx project (Climate Change and Hydrological Extreme Events) addresses the alteration of hydrological extreme events under climate change conditions. In order to differentiate between a clear climate change signal and the limits of natural variability, unique Single-Model Regional Climate Model Ensembles (CRCM5 driven by CanESM2, RCP8.5) were created for a European and North-American domain, each comprising 50 members of 150 years (1951-2100). In combination with the CORDEX-Database, this newly created ClimEx-Ensemble is a one-of-a-kind model dataset to analyze changes of sub-daily extreme events. For the purpose of bias-correcting the regional climate model ensembles as well as for the baseline calibration and validation of hydrological catchment models, a new sub-daily (3h) high-resolution (500m) gridded reference dataset (SDCLIREF) was created for a domain covering the Upper Danube and Main watersheds ( 100.000km2). As the sub-daily observations lack a continuous time series for the reference period 1980-2010, the need for a suitable method to bridge the gap of the discontinuous time series arouse. The Method of Fragments (Sharma and Srikanthan (2006); Westra et al. (2012)) was applied to transform daily observations to sub-daily rainfall events to extend the time series and densify the station network. Prior to applying the Method of Fragments and creating the gridded dataset using rigorous interpolation routines, data collection of observations, operated by several institutions in three countries (Germany, Austria, Switzerland), and the subsequent quality control of the observations was carried out. Among others, the quality control checked for steps, extensive dry seasons, temporal consistency and maximum hourly values. The resulting SDCLIREF dataset provides a robust precipitation reference for hydrometeorological applications in unprecedented high spatio-temporal resolution. References: Sharma, A.; Srikanthan, S. (2006): Continuous Rainfall Simulation: A Nonparametric Alternative. In: 30th Hydrology and Water Resources Symposium 4-7 December 2006, Launceston, Tasmania. Westra, S.; Mehrotra, R.; Sharma, A.; Srikanthan, R. (2012): Continuous rainfall simulation. 1. A regionalized subdaily disaggregation approach. In: Water Resour. Res. 48 (1). DOI: 10.1029/2011WR010489.

  15. Creating a global sub-daily precipitation dataset

    NASA Astrophysics Data System (ADS)

    Lewis, Elizabeth; Blenkinsop, Stephen; Fowler, Hayley

    2017-04-01

    Extremes of precipitation can cause flooding and droughts which can lead to substantial damages to infrastructure and ecosystems and can result in loss of life. It is still uncertain how hydrological extremes will change with global warming as we do not fully understand the processes that cause extreme precipitation under current climate variability. The INTENSE project is using a novel and fully-integrated data-modelling approach to provide a step-change in our understanding of the nature and drivers of global precipitation extremes and change on societally relevant timescales, leading to improved high-resolution climate model representation of extreme rainfall processes. The INTENSE project is in conjunction with the World Climate Research Programme (WCRP)'s Grand Challenge on 'Understanding and Predicting Weather and Climate Extremes' and the Global Water and Energy Exchanges Project (GEWEX) Science questions. The first step towards achieving this is to construct a new global sub-daily precipitation dataset. Data collection is ongoing and already covers North America, Europe, Asia and Australasia. Comprehensive, open source quality control software is being developed to set a new standard for verifying sub-daily precipitation data and a set of global hydroclimatic indices will be produced based upon stakeholder recommendations. This will provide a unique global data resource on sub-daily precipitation whose derived indices, e.g. monthly/annual maxima, will be freely available to the wider scientific community.

  16. Complex extreme learning machine applications in terahertz pulsed signals feature sets.

    PubMed

    Yin, X-X; Hadjiloucas, S; Zhang, Y

    2014-11-01

    This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. Evaluating the ClimEx Single Model large ensemble in comparison with EURO-CORDEX results of heatwave and drought indicators

    NASA Astrophysics Data System (ADS)

    von Trentini, F.; Schmid, F. J.; Braun, M.; Frigon, A.; Leduc, M.; Martel, J. L.; Willkofer, F.; Wood, R. R.; Ludwig, R.

    2017-12-01

    Meteorological extreme events seem to become more frequent in the present and future, and a seperation of natural climate variability and a clear climate change effect on these extreme events gains more and more interest. Since there is only one realisation of historical events, natural variability in terms of very long timeseries for a robust statistical analysis is not possible with observation data. A new single model large ensemble (SMLE), developed for the ClimEx project (Climate change and hydrological extreme events - risks and perspectives for water management in Bavaria and Québec) is supposed to overcome this lack of data by downscaling 50 members of the CanESM2 (RCP 8.5) with the Canadian CRCM5 regional model (using the EURO-CORDEX grid specifications) for timeseries of 1950-2099 each, resulting in 7500 years of simulated climate. This allows for a better probabilistic analysis of rare and extreme events than any preceding dataset. Besides seasonal sums, several indicators concerning heatwave frequency, duration and mean temperature a well as number and maximum length of dry periods (cons. days <1mm) are calculated for the ClimEx ensemble and several EURO-CORDEX runs. This enables us to investigate the interaction between natural variability (as it appears in the CanESM2-CRCM5 members) and a climate change signal of those members for past, present and future conditions. Adding the EURO-CORDEX results to this, we can also assess the role of internal model variability (or natural variability) in climate change simulations. A first comparison shows similar magnitudes of variability of climate change signals between the ClimEx large ensemble and the CORDEX runs for some indicators, while for most indicators the spread of the SMLE is smaller than the spread of different CORDEX models.

  18. Web processing service for climate impact and extreme weather event analyses. Flyingpigeon (Version 1.0)

    NASA Astrophysics Data System (ADS)

    Hempelmann, Nils; Ehbrecht, Carsten; Alvarez-Castro, Carmen; Brockmann, Patrick; Falk, Wolfgang; Hoffmann, Jörg; Kindermann, Stephan; Koziol, Ben; Nangini, Cathy; Radanovics, Sabine; Vautard, Robert; Yiou, Pascal

    2018-01-01

    Analyses of extreme weather events and their impacts often requires big data processing of ensembles of climate model simulations. Researchers generally proceed by downloading the data from the providers and processing the data files ;at home; with their own analysis processes. However, the growing amount of available climate model and observation data makes this procedure quite awkward. In addition, data processing knowledge is kept local, instead of being consolidated into a common resource of reusable code. These drawbacks can be mitigated by using a web processing service (WPS). A WPS hosts services such as data analysis processes that are accessible over the web, and can be installed close to the data archives. We developed a WPS named 'flyingpigeon' that communicates over an HTTP network protocol based on standards defined by the Open Geospatial Consortium (OGC), to be used by climatologists and impact modelers as a tool for analyzing large datasets remotely. Here, we present the current processes we developed in flyingpigeon relating to commonly-used processes (preprocessing steps, spatial subsets at continent, country or region level, and climate indices) as well as methods for specific climate data analysis (weather regimes, analogues of circulation, segetal flora distribution, and species distribution models). We also developed a novel, browser-based interactive data visualization for circulation analogues, illustrating the flexibility of WPS in designing custom outputs. Bringing the software to the data instead of transferring the data to the code is becoming increasingly necessary, especially with the upcoming massive climate datasets.

  19. Mapping the Decadal Spatio-temporal Variation of Social Vulnerability to Hydro-climatic Extremes over India

    NASA Astrophysics Data System (ADS)

    H, V.; Karmakar, S.; Ghosh, S.

    2015-12-01

    Human induced global warming is unequivocal and observational studies shows that, this has led to increase in the intensity and frequency of hydro-climatic extremes, most importantly precipitation extreme, heat waves and drought; and also is expected to be increased in the future. The occurrence of these extremes have a devastating effects on nation's economy and on societal well-being. Previous studies on India provided the evidences of significant changes in the precipitation extreme from pre- to post-1950, with huge spatial heterogeneity; and projections of heat waves indicated that significant part of India will experience heat stress conditions in the future. Under these circumstance, it is necessary to develop a nation-wide social vulnerability map to scrutinize the adequacy of existing emergency management. Yet there has been no systematic past efforts on mapping social vulnerability to hydro-climatic extremes at nation-wide for India. Therefore, immediate efforts are required to quantify the social vulnerability, particularly developing country like India, where major transformations in demographic characteristics and development patterns are evident during past decades. In the present study, we perform a comprehensive spatio-temporal social vulnerability analysis by considering multiple sensitive indicators for three decades (1990-2010) which identifies the hot-spots, with higher vulnerability to hydro-climatic extremes. The population datasets are procured from Census of India and the meteorological datasets are obtained from India Meteorological Department (IMD). The study derives interesting results on decadal changes of spatial distribution of risk, considering social vulnerability and hazard to extremes.

  20. Towards a full representation of tropical cyclones in a global reanalysis of extreme sea levels

    NASA Astrophysics Data System (ADS)

    Muis, S.; Verlaan, M.; Lin, N.; Winsemius, H.; Vatvani, D.; Ward, P.; Aerts, J.

    2016-12-01

    Tropical cyclones (TCs), including hurricanes and typhoons, are characterised by high wind speeds and low pressure, and cause dangerous storm surges in coastal areas. Recent disasters like the flooding of New Orleans in 2005 due to Hurricane Katrina and of New York in 2012 due to Hurricane Sandy exemplify the significant TC risk in the United States. In this contribution, we present a new framework to model TC storm surges and probabilities at the Atlantic basin- and, ultimately, global scales. This works builds on the work of Muis et al. (2016), which presented the first dynamically-derived reanalysis dataset of storm surges that covers the entire world's coastline (GTSR dataset). Surge levels for the period 1979-2014 were simulated by forcing the Global Surge and Tide Model (GTSM) with wind speed and atmospheric pressure from the ERA-Interim reanalysis. There is generally a good agreement between simulated and observed sea level extremes in extra-tropical regions; however for areas prone to TCs there is a severe underestimation of extremes. For example, the maximum surge levels during Hurricane Katrina in New Orleans exceeded 8 m, whilst the GTSM surge levels in that area do not exceed 2-3 m. Hence, due to the coarse grid resolution, the strong intensities of TCs are not fully captured in ERA-Interim. Furthermore, the length of ERA-Interim data set, like other reanalysis datasets, is too short to estimate the probabilities of extreme TC events in a reliable way. For accurate risk assessments it is essential to improve the representation of TCs in these global reanalysis of extreme sea levels. First, we need a higher resolution of meteorological forcing, which can be modelled with input from the observed best track data. Second, we need to statistically extend the observed record to many thousands of years. We will present the first results of these steps for the east coast of the United States. We will validate the GTSM model forced with best track data using recent extreme events like Katrina and Sandy. We will investigate how the statistics of the extreme sea level will change due to improved representation of TCs.

  1. Comparison between extreme learning machine and wavelet neural networks in data classification

    NASA Astrophysics Data System (ADS)

    Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri

    2017-03-01

    Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.

  2. Gene expression changes governing extreme dehydration tolerance in an Antarctic insect

    PubMed Central

    Teets, Nicholas M.; Peyton, Justin T.; Colinet, Herve; Renault, David; Kelley, Joanna L.; Kawarasaki, Yuta; Lee, Richard E.; Denlinger, David L.

    2012-01-01

    Among terrestrial organisms, arthropods are especially susceptible to dehydration, given their small body size and high surface area to volume ratio. This challenge is particularly acute for polar arthropods that face near-constant desiccating conditions, as water is frozen and thus unavailable for much of the year. The molecular mechanisms that govern extreme dehydration tolerance in insects remain largely undefined. In this study, we used RNA sequencing to quantify transcriptional mechanisms of extreme dehydration tolerance in the Antarctic midge, Belgica antarctica, the world’s southernmost insect and only insect endemic to Antarctica. Larvae of B. antarctica are remarkably tolerant of dehydration, surviving losses up to 70% of their body water. Gene expression changes in response to dehydration indicated up-regulation of cellular recycling pathways including the ubiquitin-mediated proteasome and autophagy, with concurrent down-regulation of genes involved in general metabolism and ATP production. Metabolomics results revealed shifts in metabolite pools that correlated closely with changes in gene expression, indicating that coordinated changes in gene expression and metabolism are a critical component of the dehydration response. Finally, using comparative genomics, we compared our gene expression results with a transcriptomic dataset for the Arctic collembolan, Megaphorura arctica. Although B. antarctica and M. arctica are adapted to similar environments, our analysis indicated very little overlap in expression profiles between these two arthropods. Whereas several orthologous genes showed similar expression patterns, transcriptional changes were largely species specific, indicating these polar arthropods have developed distinct transcriptional mechanisms to cope with similar desiccating conditions. PMID:23197828

  3. Gene expression changes governing extreme dehydration tolerance in an Antarctic insect.

    PubMed

    Teets, Nicholas M; Peyton, Justin T; Colinet, Herve; Renault, David; Kelley, Joanna L; Kawarasaki, Yuta; Lee, Richard E; Denlinger, David L

    2012-12-11

    Among terrestrial organisms, arthropods are especially susceptible to dehydration, given their small body size and high surface area to volume ratio. This challenge is particularly acute for polar arthropods that face near-constant desiccating conditions, as water is frozen and thus unavailable for much of the year. The molecular mechanisms that govern extreme dehydration tolerance in insects remain largely undefined. In this study, we used RNA sequencing to quantify transcriptional mechanisms of extreme dehydration tolerance in the Antarctic midge, Belgica antarctica, the world's southernmost insect and only insect endemic to Antarctica. Larvae of B. antarctica are remarkably tolerant of dehydration, surviving losses up to 70% of their body water. Gene expression changes in response to dehydration indicated up-regulation of cellular recycling pathways including the ubiquitin-mediated proteasome and autophagy, with concurrent down-regulation of genes involved in general metabolism and ATP production. Metabolomics results revealed shifts in metabolite pools that correlated closely with changes in gene expression, indicating that coordinated changes in gene expression and metabolism are a critical component of the dehydration response. Finally, using comparative genomics, we compared our gene expression results with a transcriptomic dataset for the Arctic collembolan, Megaphorura arctica. Although B. antarctica and M. arctica are adapted to similar environments, our analysis indicated very little overlap in expression profiles between these two arthropods. Whereas several orthologous genes showed similar expression patterns, transcriptional changes were largely species specific, indicating these polar arthropods have developed distinct transcriptional mechanisms to cope with similar desiccating conditions.

  4. Numerical investigations with WRF about atmospheric features leading to heavy precipitation and flood events over the Central Andes' complex topography

    NASA Astrophysics Data System (ADS)

    Zamuriano, Marcelo; Brönnimann, Stefan

    2017-04-01

    It's known that some extremes such as heavy rainfalls, flood events, heatwaves and droughts depend largely on the atmospheric circulation and local features. Bolivia is no exception and while the large scale dynamics over the Amazon has been largely investigated, the local features driven by the Andes Cordillera and the Altiplano is still poorly documented. New insights on the regional atmospheric dynamics preceding heavy precipitation and flood events over the complex topography of the Andes-Amazon interface are added through numerical investigations of several case events: flash flood episodes over La Paz city and the extreme 2014 flood in south-western Amazon basin. Large scale atmospheric water transport is dynamically downscaled in order to take into account the complex topography forcing and local features as modulators of these events. For this purpose, a series of high resolution numerical experiments with the WRF-ARW model is conducted using various global datasets and parameterizations. While several mechanisms have been suggested to explain the dynamics of these episodes, they have not been tested yet through numerical modelling experiments. The simulations captures realistically the local water transport and the terrain influence over atmospheric circulation, even though the precipitation intensity is in general unrealistic. Nevertheless, the results show that Dynamical Downscaling over the tropical Andes' complex terrain provides useful meteorological data for a variety of studies and contributes to a better understanding of physical processes involved in the configuration of these events.

  5. On the uncertainties associated with using gridded rainfall data as a proxy for observed

    NASA Astrophysics Data System (ADS)

    Tozer, C. R.; Kiem, A. S.; Verdon-Kidd, D. C.

    2012-05-01

    Gridded rainfall datasets are used in many hydrological and climatological studies, in Australia and elsewhere, including for hydroclimatic forecasting, climate attribution studies and climate model performance assessments. The attraction of the spatial coverage provided by gridded data is clear, particularly in Australia where the spatial and temporal resolution of the rainfall gauge network is sparse. However, the question that must be asked is whether it is suitable to use gridded data as a proxy for observed point data, given that gridded data is inherently "smoothed" and may not necessarily capture the temporal and spatial variability of Australian rainfall which leads to hydroclimatic extremes (i.e. droughts, floods). This study investigates this question through a statistical analysis of three monthly gridded Australian rainfall datasets - the Bureau of Meteorology (BOM) dataset, the Australian Water Availability Project (AWAP) and the SILO dataset. The results of the monthly, seasonal and annual comparisons show that not only are the three gridded datasets different relative to each other, there are also marked differences between the gridded rainfall data and the rainfall observed at gauges within the corresponding grids - particularly for extremely wet or extremely dry conditions. Also important is that the differences observed appear to be non-systematic. To demonstrate the hydrological implications of using gridded data as a proxy for gauged data, a rainfall-runoff model is applied to one catchment in South Australia initially using gauged data as the source of rainfall input and then gridded rainfall data. The results indicate a markedly different runoff response associated with each of the different sources of rainfall data. It should be noted that this study does not seek to identify which gridded dataset is the "best" for Australia, as each gridded data source has its pros and cons, as does gauged data. Rather, the intention is to quantify differences between various gridded data sources and how they compare with gauged data so that these differences can be considered and accounted for in studies that utilise these gridded datasets. Ultimately, if key decisions are going to be based on the outputs of models that use gridded data, an estimate (or at least an understanding) of the uncertainties relating to the assumptions made in the development of gridded data and how that gridded data compares with reality should be made.

  6. Challenges in Extracting Information From Large Hydrogeophysical-monitoring Datasets

    NASA Astrophysics Data System (ADS)

    Day-Lewis, F. D.; Slater, L. D.; Johnson, T.

    2012-12-01

    Over the last decade, new automated geophysical data-acquisition systems have enabled collection of increasingly large and information-rich geophysical datasets. Concurrent advances in field instrumentation, web services, and high-performance computing have made real-time processing, inversion, and visualization of large three-dimensional tomographic datasets practical. Geophysical-monitoring datasets have provided high-resolution insights into diverse hydrologic processes including groundwater/surface-water exchange, infiltration, solute transport, and bioremediation. Despite the high information content of such datasets, extraction of quantitative or diagnostic hydrologic information is challenging. Visual inspection and interpretation for specific hydrologic processes is difficult for datasets that are large, complex, and (or) affected by forcings (e.g., seasonal variations) unrelated to the target hydrologic process. New strategies are needed to identify salient features in spatially distributed time-series data and to relate temporal changes in geophysical properties to hydrologic processes of interest while effectively filtering unrelated changes. Here, we review recent work using time-series and digital-signal-processing approaches in hydrogeophysics. Examples include applications of cross-correlation, spectral, and time-frequency (e.g., wavelet and Stockwell transforms) approaches to (1) identify salient features in large geophysical time series; (2) examine correlation or coherence between geophysical and hydrologic signals, even in the presence of non-stationarity; and (3) condense large datasets while preserving information of interest. Examples demonstrate analysis of large time-lapse electrical tomography and fiber-optic temperature datasets to extract information about groundwater/surface-water exchange and contaminant transport.

  7. The association between preceding drought occurrence and heat waves in the Mediterranean

    NASA Astrophysics Data System (ADS)

    Russo, Ana; Gouveia, Célia M.; Ramos, Alexandre M.; Páscoa, Patricia; Trigo, Ricardo M.

    2017-04-01

    A large number of weather driven extreme events has occurred worldwide in the last decade, namely in Europe that has been struck by record breaking extreme events with unprecedented socio-economic impacts, including the mega-heatwaves of 2003 in Europe and 2010 in Russia, and the large droughts in southwestern Europe in 2005 and 2012. The last IPCC report on extreme events points that a changing climate can lead to changes in the frequency, intensity, spatial extent, duration, and timing of weather and climate extremes. These, combined with larger exposure, can result in unprecedented risk to humans and ecosystems. In this context it is becoming increasingly relevant to improve the early identification and predictability of such events, as they negatively affect several socio-economic activities. Moreover, recent diagnostic and modelling experiments have confirmed that hot extremes are often preceded by surface moisture deficits in some regions throughout the world. In this study we analyze if the occurrence of hot extreme months is enhanced by the occurrence of preceding drought events throughout the Mediterranean area. In order to achieve this purpose, the number of hot days in the regions' hottest month will be associated with a drought indicator. The evolution and characterization of drought was analyzed using both the Standardized Precipitation Evaporation Index (SPEI) and the Standardized Precipitation Index (SPI), as obtained from CRU TS3.23 database for the period 1950-2014. We have used both SPI and SPEI for different time scales between 3 and 9 months with a spatial resolution of 0.5°. The number of hot days and nights per month (NHD and NHN) was determined using the ECAD-EOBS daily dataset for the same period and spatial resolution (dataset v14). The NHD and NHN were computed, respectively, as the number of days with a maximum or minimum temperature exceeding the 90th percentile. Results show that the most frequent hottest months for the Mediterranean region occur in July and August. Moreover, the magnitude of correlations between detrended NHD/NHN and the preceding 6- and 9-month SPEI/SPI are usually dimmer than for the 3 month time-scale. Most regions exhibit significantly negative correlations, i.e. high (low) NHD/NHN following negative (positive) SPEI/SPI values, and thus a potential for NHD/NHN early warning. Finally, correlations between the NHD/NHN with SPI and SPEI differ, with SPEI characterized by slightly higher values observed mainly for the 3-months time-scale. Acknowledgments: This work was partially supported by national funds through FCT (Fundação para a Ciência e a Tecnologia, Portugal) under project IMDROFLOOD (WaterJPI/0004/2014). Ana Russo thanks FCT for granted support (SFRH/BPD/99757/2014). A. M. Ramos was also supported by a FCT postdoctoral grant (FCT/DFRH/ SFRH/BPD/84328/2012).

  8. Integrated cosmic muon flux in the zenith angle range 0 < cosθ < 0.37 for momentum threshold up to 11.6 GeV/c

    NASA Astrophysics Data System (ADS)

    Fujii, Hirofumi; Hara, Kazuhiko; Hayashi, Kohei; Kakuno, Hidekazu; Kodama, Hideyo; Nagamine, Kanetada; Sato, Kazuyuki; Sato, Kotaro; Kim, Shin-Hong; Suzuki, Atsuto; Takahashi, Kazuki; Takasaki, Fumihiko

    2017-12-01

    We have measured the cosmic muon flux in the zenith angle range {<} cos θ {<} 0.37 with a detector comprising planes of scintillator hodoscope bars and iron blocks inserted between them. The muon ranges for up to 9.5 m-thick iron blocks allow the provision of muon flux data integrated over corresponding threshold momenta up to 11.6 GeV/c. Such a dataset covering the horizontal direction is extremely useful for a technique called muon radiography, where the mass distribution inside a large object is investigated from the cosmic muon distribution measured behind the object.

  9. A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system

    NASA Astrophysics Data System (ADS)

    Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley

    2016-04-01

    An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between observed and simulated streamflows as a result of more realistic sub-daily meteorological forcing.

  10. Extreme-value statistics reveal rare failure-critical defects in additive manufacturing

    DOE PAGES

    Boyce, Brad L.; Salzbrenner, Bradley C.; Rodelas, Jeffrey M.; ...

    2017-04-21

    Additive manufacturing enables the rapid, cost effective production of large populations of material test coupons such as tensile bars. By adopting streamlined test methods including ‘drop-in’ grips and non-contact extensometry, testing these large populations becomes more efficient. Unlike hardness tests, the tensile test provides a direct measure of yield strength, flow properties, and ductility, which can be directly incorporated into solid mechanics simulations. In the present work, over 1000 nominally identical tensile tests were used to explore the effect of process variability on the mechanical property distributions of a precipitation hardened stainless steel, 17-4PH, produced by a laser powder bedmore » fusion process, also known as direct metal laser sintering. With this large dataset, rare defects are revealed that affect only ~2% of the population, stemming from a single build lot of material. Lastly, the rare defects caused a substantial loss in ductility and were associated with an interconnected network of porosity.« less

  11. The potential of urban rainfall monitoring with crowdsourced automatic weather stations in Amsterdam

    NASA Astrophysics Data System (ADS)

    de Vos, Lotte; Leijnse, Hidde; Overeem, Aart; Uijlenhoet, Remko

    2017-02-01

    The high density of built-up areas and resulting imperviousness of the land surface makes urban areas vulnerable to extreme rainfall, which can lead to considerable damage. In order to design and manage cities to be able to deal with the growing number of extreme rainfall events, rainfall data are required at higher temporal and spatial resolutions than those needed for rural catchments. However, the density of operational rainfall monitoring networks managed by local or national authorities is typically low in urban areas. A growing number of automatic personal weather stations (PWSs) link rainfall measurements to online platforms. Here, we examine the potential of such crowdsourced datasets for obtaining the desired resolution and quality of rainfall measurements for the capital of the Netherlands. Data from 63 stations in Amsterdam (˜ 575 km2) that measure rainfall over at least 4 months in a 17-month period are evaluated. In addition, a detailed assessment is made of three Netatmo stations, the largest contributor to this dataset, in an experimental setup. The sensor performance in the experimental setup and the density of the PWS network are promising. However, features in the online platforms, like rounding and thresholds, cause changes from the original time series, resulting in considerable errors in the datasets obtained. These errors are especially large during low-intensity rainfall, although they can be reduced by accumulating rainfall over longer intervals. Accumulation improves the correlation coefficient with gauge-adjusted radar data from 0.48 at 5 min intervals to 0.60 at hourly intervals. Spatial rainfall correlation functions derived from PWS data show much more small-scale variability than those based on gauge-adjusted radar data and those found in similar research using dedicated rain gauge networks. This can largely be attributed to the noise in the PWS data resulting from both the measurement setup and the processes occurring in the data transfer to the online PWS platform. A double mass comparison with gauge-adjusted radar data shows that the median of the stations resembles the rainfall reference better than the real-time (unadjusted) radar product. Averaging nearby raw PWS measurements further improves the match with gauge-adjusted radar data in that area. These results confirm that the growing number of internet-connected PWSs could successfully be used for urban rainfall monitoring.

  12. Urban rainfall monitoring with crowdsourced automatic weather stations in Amsterdam

    NASA Astrophysics Data System (ADS)

    de Vos, Lotte; Leijnse, Hidde; Overeem, Aart; Uijlenhoet, Remko

    2017-04-01

    The high density of built-up areas and resulting imperviousness of the land surface makes urban areas vulnerable to extreme rainfall, which can lead to considerable damage. In order to design and manage cities to be able to deal with the growing number of extreme rainfall events, rainfall data is required at higher temporal and spatial resolutions than those needed for rural catchments. However, the density of operational rainfall monitoring networks managed by local or national authorities is typically low in urban areas. A growing number of automatic personal weather stations (PWSs) link rainfall measurements to online platforms. Here, we examine the potential of such crowdsourced datasets for obtaining the desired resolution and quality of rainfall measurements for the capital of the Netherlands. Data from 63 stations in Amsterdam (˜575 km2}) that measure rainfall over at least 4 months in a 17-month period are evaluated. In addition, a detailed assessment is made of three Netatmo stations, the largest contributor to this dataset, in an experimental set-up. The sensor performance in the experimental set-up and the density of the PWS-network are promising. However, features in the online platforms, like rounding and thresholds, cause changes from the original time series, resulting in considerable errors in the datasets obtained. These errors are especially large during low intensity rainfall, although they can be reduced by accumulating rainfall over longer intervals. Accumulation improves the correlation coefficient with gauge-adjusted radar data from 0.48 at 5 min intervals to 0.60 at hourly intervals. Spatial rainfall correlation functions derived from PWS data show much more small-scale variability than those based on gauge-adjusted radar data and those found in similar research using dedicated rain gauge networks. This can largely be attributed to the noise in the PWS data resulting from both the measurement setup and the processes occurring in the data transfer to the online PWS-platform. A double mass comparison with gauge-adjusted radar data shows that the median of the stations resembles the rainfall reference better than the real-time (unadjusted) radar product. Averaging nearby raw PWS measurements further improves the match with gauge-adjusted radar data in that area. These results confirm that the growing number of internet-connected PWSs could successfully be used for urban rainfall monitoring.

  13. Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

    NASA Astrophysics Data System (ADS)

    Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

    2017-12-01

    Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.

  14. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines.

    PubMed

    Ellrott, Kyle; Bailey, Matthew H; Saksena, Gordon; Covington, Kyle R; Kandoth, Cyriac; Stewart, Chip; Hess, Julian; Ma, Singer; Chiotti, Kami E; McLellan, Michael; Sofia, Heidi J; Hutter, Carolyn; Getz, Gad; Wheeler, David; Ding, Li

    2018-03-28

    The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Detection of timescales in evolving complex systems

    PubMed Central

    Darst, Richard K.; Granell, Clara; Arenas, Alex; Gómez, Sergio; Saramäki, Jari; Fortunato, Santo

    2016-01-01

    Most complex systems are intrinsically dynamic in nature. The evolution of a dynamic complex system is typically represented as a sequence of snapshots, where each snapshot describes the configuration of the system at a particular instant of time. This is often done by using constant intervals but a better approach would be to define dynamic intervals that match the evolution of the system’s configuration. To this end, we propose a method that aims at detecting evolutionary changes in the configuration of a complex system, and generates intervals accordingly. We show that evolutionary timescales can be identified by looking for peaks in the similarity between the sets of events on consecutive time intervals of data. Tests on simple toy models reveal that the technique is able to detect evolutionary timescales of time-varying data both when the evolution is smooth as well as when it changes sharply. This is further corroborated by analyses of several real datasets. Our method is scalable to extremely large datasets and is computationally efficient. This allows a quick, parameter-free detection of multiple timescales in the evolution of a complex system. PMID:28004820

  16. Development of a coupled hydrological - hydrodynamic model for probabilistic catchment flood inundation modelling

    NASA Astrophysics Data System (ADS)

    Quinn, Niall; Freer, Jim; Coxon, Gemma; Dunne, Toby; Neal, Jeff; Bates, Paul; Sampson, Chris; Smith, Andy; Parkin, Geoff

    2017-04-01

    Computationally efficient flood inundation modelling systems capable of representing important hydrological and hydrodynamic flood generating processes over relatively large regions are vital for those interested in flood preparation, response, and real time forecasting. However, such systems are currently not readily available. This can be particularly important where flood predictions from intense rainfall are considered as the processes leading to flooding often involve localised, non-linear spatially connected hillslope-catchment responses. Therefore, this research introduces a novel hydrological-hydraulic modelling framework for the provision of probabilistic flood inundation predictions across catchment to regional scales that explicitly account for spatial variability in rainfall-runoff and routing processes. Approaches have been developed to automate the provision of required input datasets and estimate essential catchment characteristics from freely available, national datasets. This is an essential component of the framework as when making predictions over multiple catchments or at relatively large scales, and where data is often scarce, obtaining local information and manually incorporating it into the model quickly becomes infeasible. An extreme flooding event in the town of Morpeth, NE England, in 2008 was used as a first case study evaluation of the modelling framework introduced. The results demonstrated a high degree of prediction accuracy when comparing modelled and reconstructed event characteristics for the event, while the efficiency of the modelling approach used enabled the generation of relatively large ensembles of realisations from which uncertainty within the prediction may be represented. This research supports previous literature highlighting the importance of probabilistic forecasting, particularly during extreme events, which can be often be poorly characterised or even missed by deterministic predictions due to the inherent uncertainty in any model application. Future research will aim to further evaluate the robustness of the approaches introduced by applying the modelling framework to a variety of historical flood events across UK catchments. Furthermore, the flexibility and efficiency of the framework is ideally suited to the examination of the propagation of errors through the model which will help gain a better understanding of the dominant sources of uncertainty currently impacting flood inundation predictions.

  17. A peek into the future of radiology using big data applications.

    PubMed

    Kharat, Amit T; Singhal, Shubham

    2017-01-01

    Big data is extremely large amount of data which is available in the radiology department. Big data is identified by four Vs - Volume, Velocity, Variety, and Veracity. By applying different algorithmic tools and converting raw data to transformed data in such large datasets, there is a possibility of understanding and using radiology data for gaining new knowledge and insights. Big data analytics consists of 6Cs - Connection, Cloud, Cyber, Content, Community, and Customization. The global technological prowess and per-capita capacity to save digital information has roughly doubled every 40 months since the 1980's. By using big data, the planning and implementation of radiological procedures in radiology departments can be given a great boost. Potential applications of big data in the future are scheduling of scans, creating patient-specific personalized scanning protocols, radiologist decision support, emergency reporting, virtual quality assurance for the radiologist, etc. Targeted use of big data applications can be done for images by supporting the analytic process. Screening software tools designed on big data can be used to highlight a region of interest, such as subtle changes in parenchymal density, solitary pulmonary nodule, or focal hepatic lesions, by plotting its multidimensional anatomy. Following this, we can run more complex applications such as three-dimensional multi planar reconstructions (MPR), volumetric rendering (VR), and curved planar reconstruction, which consume higher system resources on targeted data subsets rather than querying the complete cross-sectional imaging dataset. This pre-emptive selection of dataset can substantially reduce the system requirements such as system memory, server load and provide prompt results. However, a word of caution, "big data should not become "dump data" due to inadequate and poor analysis and non-structured improperly stored data. In the near future, big data can ring in the era of personalized and individualized healthcare.

  18. Percentile-Based ETCCDI Temperature Extremes Indices for CMIP5 Model Output: New Results through Semiparametric Quantile Regression Approach

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, C.

    2017-12-01

    Climate extremes often manifest as rare events in terms of surface air temperature and precipitation with an annual reoccurrence period. In order to represent the manifold characteristics of climate extremes for monitoring and analysis, the Expert Team on Climate Change Detection and Indices (ETCCDI) had worked out a set of 27 core indices based on daily temperature and precipitation data, describing extreme weather and climate events on an annual basis. The CLIMDEX project (http://www.climdex.org) had produced public domain datasets of such indices for data from a variety of sources, including output from global climate models (GCM) participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Among the 27 ETCCDI indices, there are six percentile-based temperature extremes indices that may fall into two groups: exceedance rates (ER) (TN10p, TN90p, TX10p and TX90p) and durations (CSDI and WSDI). Percentiles must be estimated prior to the calculation of the indices, and could more or less be biased by the adopted algorithm. Such biases will in turn be propagated to the final results of indices. The CLIMDEX used an empirical quantile estimator combined with a bootstrap resampling procedure to reduce the inhomogeneity in the annual series of the ER indices. However, there are still some problems remained in the CLIMDEX datasets, namely the overestimated climate variability due to unaccounted autocorrelation in the daily temperature data, seasonally varying biases and inconsistency between algorithms applied to the ER indices and to the duration indices. We now present new results of the six indices through a semiparametric quantile regression approach for the CMIP5 model output. By using the base-period data as a whole and taking seasonality and autocorrelation into account, this approach successfully addressed the aforementioned issues and came out with consistent results. The new datasets cover the historical and three projected (RCP2.6, RCP4.5 and RCP8.5) emission scenarios run a multimodel ensemble of 19 members. We analyze changes in the six indices on global and regional scales over the 21st century relative to either the base period 1961-1990 or the reference period 1981-2000, and compare the results with those based on the CLIMDEX datasets.

  19. An Effective Methodology for Processing and Analyzing Large, Complex Spacecraft Data Streams

    ERIC Educational Resources Information Center

    Teymourlouei, Haydar

    2013-01-01

    The emerging large datasets have made efficient data processing a much more difficult task for the traditional methodologies. Invariably, datasets continue to increase rapidly in size with time. The purpose of this research is to give an overview of some of the tools and techniques that can be utilized to manage and analyze large datasets. We…

  20. Evaluation of Prospective Changes in Temperature Extremes for the CORDEX-Australasia Domain Using the NEX-GDDP Dataset

    NASA Astrophysics Data System (ADS)

    Turp, M. Tufan; An, Nazan; Kurnaz, M. Levent

    2017-04-01

    CORDEX-Australasia is a vast domain where comprises primarily Australia, New Zealand, and Papua New Guinea whilst it also covers the islands in the Pacific Ocean such as New Caledonia, Fiji, Tonga, Tuvalu, and Vanuatu as well. Climate of Australasia varies from tropical monsoonal and arid to moist temperate and alpine. The number of studies about the domain of Australasia is very limited and it is in urgent need of further efforts. This research points out the relationship between the climate change and temperature extremes over the domain of Australasia and it investigates the changes in the number of some specific temperature extreme indices (i.e. summer days, consecutive summer days, heat wave duration, very warm days, tropical nights, etc.) as described by the joint CCl/CLIVAR/JCOMM Expert Team (ET) on Climate Change Detection and Indices (ETCCDI). All these extreme indices were also calculated using the NASA Earth Exchange Global Daily Downscaled Projection (NEX-GDDP) dataset. In this study, all these index computations have been employed by utilizing ACCESS1-0 and MPI-ESM-MR global circulation models' bias corrected daily minimum and maximum air temperature variables, which were statistically downscaled to a 0.25 degrees x 0.25 degrees spatial resolution by the Climate Analytics Group and NASA Ames Research Center, under both medium-low and high emission trajectories (i.e. RCP4.5 and RCP8.5). Moreover, the analysis of the projected changes in the temperature extremes was applied for the period of 2081-2100 with respect to the reference period of 1986-2005. Acknowledgements: This research has been supported by Bogazici University Research Fund Grant Number 12220. Climate scenarios used were from the NEX-GDDP dataset, prepared by the Climate Analytics Group and NASA Ames Research Center using the NASA Earth Exchange, and distributed by the NASA Center for Climate Simulation (NCCS).

  1. Extreme precipitation and floods in the Iberian Peninsula and its socio-economic impacts

    NASA Astrophysics Data System (ADS)

    Ramos, A. M.; Pereira, S.; Trigo, R. M.; Zêzere, J. L.

    2017-12-01

    Extreme precipitation events in the Iberian Peninsula can induce floods and landslides that have often major socio-economic impacts. The DISASTER database gathered the basic information on past floods and landslides that caused social consequences in Portugal for the period 1865-2015. This database was built under the assumption that social consequences of floods and landslides are sufficient relevant to be reported by newspapers, that provide the data source. Three extreme historical events were analysed in detail taking into account their associated wide socio-economic impacts. The December 1876 record precipitation and flood event leading to an all-time record flow in two large international rivers (Tagus and Guadiana). As a direct consequence, several Portuguese and Spanish towns and villages located in the banks of both rivers suffered serious flood damage on 7 December 1876. The 20-28 December 1909 event recorded the highest number of flood and landslide cases that occurred in Portugal in the period 1865-2015, having triggered the highest floods in 200 years at the Douro river's mouth and causing 89 fatalities in both Portugal and Spain northern regions. More recently the deadliest flash-flooding event affecting Portugal since, at least, the early 19th century, took place on the 25 and 26 November 1967 causing more than 500 fatalities in the Lisbon region. We provide a detailed analysis of each of these events, including their human impacts, precipitation analyses based on historical datasets and the associated atmospheric circulation conditions from reanalysis datasets. Acknowledgements: This work was supported by the project FORLAND - Hydrogeomorphologic risk in Portugal: driving forces and application for land use planning [PTDC / ATPGEO / 1660/2014] funded by the Portuguese Foundation for Science and Technology (FCT), Portugal. A. M. Ramos was also supported by a FCT postdoctoral grant (FCT/DFRH/ SFRH/BPD/84328/2012). The financial support for attending this workshop was also possible through FCT project UID/GEO/50019/2013 - Instituto Dom Luiz.

  2. Assessment of extreme value distributions for maximum temperature in the Mediterranean area

    NASA Astrophysics Data System (ADS)

    Beck, Alexander; Hertig, Elke; Jacobeit, Jucundus

    2015-04-01

    Extreme maximum temperatures highly affect the natural as well as the societal environment Heat stress has great effects on flora, fauna and humans and culminates in heat related morbidity and mortality. Agriculture and different industries are severely affected by extreme air temperatures. Even more under climate change conditions, it is necessary to detect potential hazards which arise from changes in the distributional parameters of extreme values, and this is especially relevant for the Mediterranean region which is characterized as a climate change hot spot. Therefore statistical approaches are developed to estimate these parameters with a focus on non-stationarities emerging in the relationship between regional climate variables and their large-scale predictors like sea level pressure, geopotential heights, atmospheric temperatures and relative humidity. Gridded maximum temperature data from the daily E-OBS dataset (Haylock et al., 2008) with a spatial resolution of 0.25° x 0.25° from January 1950 until December 2012 are the predictands for the present analyses. A s-mode principal component analysis (PCA) has been performed in order to reduce data dimension and to retain different regions of similar maximum temperature variability. The grid box with the highest PC-loading represents the corresponding principal component. A central part of the analyses is the model development for temperature extremes under the use of extreme value statistics. A combined model is derived consisting of a Generalized Pareto Distribution (GPD) model and a quantile regression (QR) model which determines the GPD location parameters. The QR model as well as the scale parameters of the GPD model are conditioned by various large-scale predictor variables. In order to account for potential non-stationarities in the predictors-temperature relationships, a special calibration and validation scheme is applied, respectively. Haylock, M. R., N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New (2008), A European daily high-resolution gridded data set of surface temperature and precipitation for 1950 - 2006, J. Geophys. Res., 113, D20119, doi:10.1029/2008JD010201.

  3. Dataset definition for CMS operations and physics analyses

    NASA Astrophysics Data System (ADS)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  4. Online Visualization and Value Added Services of MERRA-2 Data at GES DISC

    NASA Technical Reports Server (NTRS)

    Shen, Suhung; Ostrenga, Dana M.; Vollmer, Bruce E.; Hegde, Mahabaleshwa S.; Wei, Jennifer C.; Bosilovich, Michael G.

    2017-01-01

    NASA climate reanalysis datasets from MERRA-2, distributed at the Goddard Earth Sciences Data and Information Services Center (GES DISC), have been used in broad research areas, such as climate variations, extreme weather, agriculture, renewable energy, and air quality, etc. The datasets contain numerous variables for atmosphere, land, and ocean, grouped into 95 products. The total archived volume is approximately 337 TB ( approximately 562K files) at the end of October 2017. Due to the large number of products and files, and large data volumes, it may be a challenge for a user to find and download the data of interest. The support team at GES DISC, working closely with the MERRA-2 science team, has created and is continuing to work on value added data services to best meet the needs of a broad user community. This presentation, using aerosol over Asia Monsoon as an example, provides an overview of the MERRA-2 data services at GES DISC, including: How to find the data? How many data access methods are provided? What are the best data access methods for me? How do download the subsetted (parameter, spatial, temporal) data and save in preferred spatial resolution and data format? How to visualize and explore the data online? In addition, we introduce a future online analytic tool designed for supporting application research, focusing on long-term hourly time-series data access and analysis.

  5. Globus Online: Climate Data Management for Small Teams

    NASA Astrophysics Data System (ADS)

    Ananthakrishnan, R.; Foster, I.

    2013-12-01

    Large and highly distributed climate data demands new approaches to data organization and lifecycle management. We need, in particular, catalogs that can allow researchers to track the location and properties of large numbers of data files, and management tools that can allow researchers to update data properties and organization during their research, move data among different locations, and invoke analysis computations on data--all as easily as if they were working with small numbers of files on their desktop computer. Both catalogs and management tools often need to be able to scale to extremely large quantities of data. When developing solutions to these problems, it is important to distinguish between the needs of (a) large communities, for whom the ability to organize published data is crucial (e.g., by implementing formal data publication processes, assigning DOIs, recording definitive metadata, providing for versioning), and (b) individual researchers and small teams, who are more frequently concerned with tracking the diverse data and computations involved in what highly dynamic and iterative research processes. Key requirements in the latter case include automated data registration and metadata extraction, ease of update, close-to-zero management overheads (e.g., no local software install); and flexible, user-managed sharing support, allowing read and write privileges within small groups. We describe here how new capabilities provided by the Globus Online system address the needs of the latter group of climate scientists, providing for the rapid creation and establishment of lightweight individual- or team-specific catalogs; the definition of logical groupings of data elements, called datasets; the evolution of catalogs, dataset definitions, and associated metadata over time, to track changes in data properties and organization as a result of research processes; and the manipulation of data referenced by catalog entries (e.g., replication of a dataset to a remote location for analysis, sharing of a dataset). Its software-as-a-service ('SaaS') architecture means that these capabilities are provided to users over the network, without a need for local software installation. In addition, Globus Online provides well defined APIs, thus providing a platform that can be leveraged to integrate the capabilities with other portals and applications. We describe early applications of these new Globus Online to climate science. We focus in particular on applications that demonstrate how Globus Online capabilities complement those of the Earth System Grid Federation (ESGF), the premier system for publication and discovery of large community datasets. ESGF already uses Globus Online mechanisms for data download. We demonstrate methods by which the two systems can be further integrated and harmonized, so that for example data collections produced within a small team can be easily published from Globus Online to ESGF for archival storage and broader access--and a Globus Online catalog can be used to organize an individual view of a subset of data held in ESGF.

  6. Towards a detailed knowledge about Mediterranean flash floods and extreme floods in the catchments of Spain, France and Italy

    NASA Astrophysics Data System (ADS)

    Duband, D.

    2009-09-01

    It is important to remember that scientific research programs of the European Commission and contributors had implemented a multidisciplinary (geography, history, meteorology, climatology, hydrology, geomorphology, geology, paleohydrology, sociology, economy......) better knowledge and more understanding of the physical risk assessment of disastrous floods (particularly flash floods) with rising factors of vulnerability and perhaps climate change at the end of the XX1 century, in the triangular geographical area Zaragosa (Spain)-Orléans (France)-Firenze (Italy). With reference to historical floods events observed from last two centuries in Spain (Catalonia), France (Languedoc Roussillon - Provence Alpes Cote d’Azur-Corse-Rhone Alpes -Auvergne- Bourgogne) and in Italy (Ligurie - Piemont - Lombardie) we lay particular stress on a detailed understanding of the spatial and temporal scales of the physical dynamic process being at the origin of locals or extensive flash floods. This study requires to be based on the meteorology (atmospheric circulation patterns ,on west Europe- Atlantic and Mediterranean sea) responsible, with relief and sea surface temperature, of high precipitations (amounts, intensities), air temperature, discharges of high floods, observed in the past ,on large and coastal rivers. We will take example of the Rhone river catchments, in connexion with Po-Ebre-Loire-Seine rivers, based on the studies of thirty high historical floods occurred from 1840 to 2005, and characteristics of Oceanic and Mediterranean weather situations, sometime alternated. Since recent years we have the daily mean sea level pressure dataset (EMSLP) reconstructions for European-North Atlantic Region for the period 1850-2006. So it is now possible to allow us the selection in the complete meteorological dataset during 1950- 2009 period by an analog method (like operational daily applications from 1969, at Electricity of France) to select weather situations similar to historical daily situations responsible of extreme floods with larges discharges, with the conditional precipitations associated on catchments with god and up to date observations of precipitations (daily, hourly). This kind of complete studies would be very useful for: -Statistical-physical studies of extreme rainfall-flood events (peak discharge, volume), frequency-probability-uncertainty (GRADEX and SHADEX methodology), -Better forecasting of meteorological (precipitations) and hydrological (floods) events, during crisis situations, -better understanding of the historical variability in the past 2 centuries (atmospheric features, precipitations, discharges high/low), -Better adjustment of modelling simulation, -Better identification and probabilistic approach of uncertainties.

  7. Really big data: Processing and analysis of large datasets

    USDA-ARS?s Scientific Manuscript database

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  8. Temporal and spatial scaling impacts on extreme precipitation

    NASA Astrophysics Data System (ADS)

    Eggert, B.; Berg, P.; Haerter, J. O.; Jacob, D.; Moseley, C.

    2015-01-01

    Both in the current climate and in the light of climate change, understanding of the causes and risk of precipitation extremes is essential for protection of human life and adequate design of infrastructure. Precipitation extreme events depend qualitatively on the temporal and spatial scales at which they are measured, in part due to the distinct types of rain formation processes that dominate extremes at different scales. To capture these differences, we first filter large datasets of high-resolution radar measurements over Germany (5 min temporally and 1 km spatially) using synoptic cloud observations, to distinguish convective and stratiform rain events. In a second step, for each precipitation type, the observed data are aggregated over a sequence of time intervals and spatial areas. The resulting matrix allows a detailed investigation of the resolutions at which convective or stratiform events are expected to contribute most to the extremes. We analyze where the statistics of the two types differ and discuss at which resolutions transitions occur between dominance of either of the two precipitation types. We characterize the scales at which the convective or stratiform events will dominate the statistics. For both types, we further develop a mapping between pairs of spatially and temporally aggregated statistics. The resulting curve is relevant when deciding on data resolutions where statistical information in space and time is balanced. Our study may hence also serve as a practical guide for modelers, and for planning the space-time layout of measurement campaigns. We also describe a mapping between different pairs of resolutions, possibly relevant when working with mismatched model and observational resolutions, such as in statistical bias correction.

  9. Links between extreme UV-radiation, total ozone, surface albedo and cloudiness: An analysis of 30 years of data from Switzerland and Austria

    NASA Astrophysics Data System (ADS)

    Rieder, H. E.; Staehelin, J.; Weihs, P.; Vuilleumier, L.; Blumthaler, M.; Holawe, F.; Lindfors, A.; Maeder, J. A.; Simic, S.; Wagner, J. E.; Walker, D.; Ribatet, M.

    2009-04-01

    Since the discovery of anthropogenic ozone depletion in the early 1970s (e.g. Molina and Rowland, 1974; Farman et al., 1985) the interest in stratospheric ozone trends and solar UV-B increased within the scientific community and the general public because of the link between reduced total column ozone and increased UV-radiation doses. Stratospheric ozone (e.g. Koch et al., 2005) and erythemal UV-radiation (e.g. Rieder et al., 2008) in the northern mid-latitudes are characterized by strong temporal variability. Long-term measurements of UV-B radiation are rare and datasets are only available for few locations and most of these measurements do not provide spectral information on the UV part of the spectra. During strong efforts in the reconstruction of erythemal UV, datasets of past UV-radiation doses became available for several measurement sites all over the globe. For Switzerland and Austria reconstructed UV datasets are available for 3 measurement sites (Davos, Sonnblick and Vienna) (Lindfors and Vuilleumier, 2005; Rieder et al., 2008). The world's longest ozone time series dating back to 1926 is available from Arosa, Switzerland, and is discussed in detail by Staehelin et al. (1998a,b). Recently new tools from extreme value theory have been applied to the Arosa time series to describe extreme events in low and high total ozone (Rieder et al., 2009). In our study we address the question of how much of the extremes in UV-radiation can be attributed to extremes in total ozone, high surface albedo and cloudiness. An analysis of the frequency distributions of such extreme events for the last decades is presented to gain a better understanding of the links between extreme erythemal UV-radiation, total ozone, surface albedo and clouds. References: Farman, J. C., Gardiner, B. G., and Shanklin, J. D.: Large losses of total ozone in Antarctica reveal seasonal ClOx/NOx interaction, Nature, 315, 207-210, 1985. Koch, G., Wernli, H., Schwierz, C., Staehelin, J., and Peter, T.: A composite study on the structure and formation of ozone miniholes and minihights over central Europe, J. Geophys. Res., 32, doi:10.1029/2004GL022062, 2005. Lindfors, A., and Vuilleumier, L.: Erythemal UV at Davos (Switzerland), 1926-2003, estimated using total ozone, sunshine duration, and snow depth, J. Geophys. Res., 110, D02104, doi:10.1029/2004JD005231, 2005. Molina, M. J., and Rowland, F. S.: Stratospheric sink for chlorofluoromethans: Chlorine atom-catalysed destruction of ozone, Nature, 249, 810-812, 1974. Rieder, H.E., Holawe, F., Simic, S., Blumthaler, M., Krzyscin, J.W., Wagner J.E., Schmalwieser A.W., and Weihs, P.: Reconstruction of erythemal UV-doses for two stations in Austria: A comparison between alpine and urban regions, Atmos. Chem. Phys., 8, 6309-6323, 2008. Rieder, H.E., Staehelin, J., Maeder, J.A., Ribatet, M., Stübi, R., Weihs, P., Holawe, F., Peter, T., and Davison, A.C.: From ozone mini holes and mini highs towards extreme value theory: New insights from extreme events and non stationarity, submitted to J. Geophys. Res., 2009. Staehelin, J., Kegel, R., and Harris, N. R.: Trend analysis of the homogenized total ozone series of Arosa (Switzerland), 1929-1996, J. Geophys. Res., 103(D7), 8389-8400, doi:10.1029/97JD03650, 1998a. Staehelin, J., Renaud, A., Bader, J., McPeters, R., Viatte, P., Hoegger, B., Bugnion, V., Giroud, M., and Schill, H.: Total ozone series at Arosa (Switzerland): Homogenization and data comparison, J. Geophys. Res., 103(D5), 5827-5842, doi:10.1029/97JD02402, 1998b.

  10. Regional-scale analysis of extreme precipitation from short and fragmented records

    NASA Astrophysics Data System (ADS)

    Libertino, Andrea; Allamano, Paola; Laio, Francesco; Claps, Pierluigi

    2018-02-01

    Rain gauge is the oldest and most accurate instrument for rainfall measurement, able to provide long series of reliable data. However, rain gauge records are often plagued by gaps, spatio-temporal discontinuities and inhomogeneities that could affect their suitability for a statistical assessment of the characteristics of extreme rainfall. Furthermore, the need to discard the shorter series for obtaining robust estimates leads to ignore a significant amount of information which can be essential, especially when large return periods estimates are sought. This work describes a robust statistical framework for dealing with uneven and fragmented rainfall records on a regional spatial domain. The proposed technique, named "patched kriging" allows one to exploit all the information available from the recorded series, independently of their length, to provide extreme rainfall estimates in ungauged areas. The methodology involves the sequential application of the ordinary kriging equations, producing a homogeneous dataset of synthetic series with uniform lengths. In this way, the errors inherent to any regional statistical estimation can be easily represented in the spatial domain and, possibly, corrected. Furthermore, the homogeneity of the obtained series, provides robustness toward local artefacts during the parameter-estimation phase. The application to a case study in the north-western Italy demonstrates the potential of the methodology and provides a significant base for discussing its advantages over previous techniques.

  11. Attribution of extreme rainfall from Hurricane Harvey, August 2017

    NASA Astrophysics Data System (ADS)

    van der Wiel, K.; van Oldenborgh, G. J.; Sebastian, A.; Singh, R.; Arrighi, J.; Otto, F. E. L.; Haustein, K.; Li, S.; Vecchi, G.; Cullen, H. M.

    2017-12-01

    During August 25-30, 2017, Hurricane Harvey stalled over Texas and caused extreme precipitation over Houston and the surrounding area, particularly on August 26-28. This resulted in extensive flooding with over 80 fatalities and large economic costs. Using observational datasets and high-resolution global climate model experiments we investigate the return period of this event and to what extent anthropogenic climate change influenced the likelihood and intensity of this type of events. The event definition for the attribution is set by the main impact, flooding in the city of Houston. Most rivers crested on August 28 or 29, driven by intensive rainfall on August 26-28. We therefore use the annual maximum of three-day average precipitation as the event definition. Station data (GHCN-D) and a gridded precipitation product (CPC unified analysis) are used to find the return period of the event and changes in the observed record. To attribute changes to anthropogenic climate change we use time-slice experiments from two high-resolution global climate models (EC-Earth 2.3 and GFDL HiFLOR, both integrated at approximately 25 km). A regional model (HadRM3P) was rejected because of unrealistic modelled extremes. Finally we put the attribution results in context, given local vulnerability and exposure.

  12. The Extremely Warm Early Winter 2000 in Europe: What is the Forcing

    NASA Technical Reports Server (NTRS)

    Otterman, J.; Angell, J. K.; Atlas, R.; Ardizzone, J.; Demaree, G.; Jusem, J. C.; Koslowsky, D.; Terry, J.; Einaudi, Franco (Technical Monitor)

    2001-01-01

    High variability characterizes the winter climate of central Europe: interannual fluctuations in the surface-air temperature as large as 18 C over large areas are fairly common. The extraordinary early-winter 2000 in Europe appears to be a departure to an unprecedented extreme of the existing climate patterns. Such anomalous events affect agriculture, forestry, fuel consumption, etc., and thus deserve in-depth analysis. Our analysis indicates that the high anomalies of the surface-air temperature are predominantly due to the southwesterly flow from the eastern North Atlantic, with a weak contribution by southerly flow from the western Mediterranean. Backward trajectories based on the SSM/I and NCEP Reanalysis datasets traced from west-central Europe indicate that the warm air masses flowing into Europe originate in the southern North Atlantic, where the surface-air temperatures exceed by 15c or more the climatic norms in Europe for late-November or early-December. Because such large ocean-to-continent temperature differences characterize the winter conditions, we refer to this episode which started in late November as occurring in the early winter. In this season, with the sun low over the horizon in Europe, absorption of insolation by the surface has little significance. The effect of cloudiness, a corollary to the low-level maritime-air advection, is a warming by a reduction of heat loss (greenhouse effect). In contrast, in the summer, clouds, by reducing absorption of insolation, produce a cooling, effect at the surface.

  13. Finding Spatio-Temporal Patterns in Large Sensor Datasets

    ERIC Educational Resources Information Center

    McGuire, Michael Patrick

    2010-01-01

    Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…

  14. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  15. 16-year Climatology of Cold-Season Extreme Precipitation-Drought Statistics derived from NLDAS Precipitation Data Over the Conterminous U.S.

    NASA Astrophysics Data System (ADS)

    Matsui, T.; Mocko, D. M.

    2015-12-01

    We examine radar-gauge merged 1/8-degree hourly precipitation data from the North American Land Data Assimilation System (NLDAS) Phase-II datasets from 1997 to 2013. For each 1/8 grid, we derived statistics of single-event storm duration, total accumulated precipitation, and dry period between each storm events during cold (Oct-Mar) seasons, and histogram of event-by-event statistics are used to estimate the thresholds for extreme (below-1%) and very extreme (below-0.1%) events. In this way, we constructed unique climatology maps of the extreme precipitation-drought frequencies and probability density functions. This climatology map depicted that cold-season extremely heavy precipitation events are populated over West Coast, Deep South, and coastal zone of North East, suggesting impacts of land-falling maritime storm systems. Simultaneously, datasets depicts that long-extended precipitation events are mostly populated over North West, and lower Mississippi Basin up to North East centered at Appalachian Mountains, resembling east Pacific storm tracks and nor'easter storm tracks, respectively. Furthermore, season-by-season statistics of these extreme events were examined for each National Climate Assessment (NCA) regimes in comparison with a number of major atmospheric oscillations and teleconnection patterns as well as Arctic Amplifications. Index of Arctic Amplification includes variability of 500mb zonal wind speed and pole-to-midlatitude differences in atmospheric thickness, linking to the phase speed of the Rossby wave. Finally, we present ensemble correlations scores, and discuss the physical processes and underlying mechanisms for their key characteristics as well as the predictive skill and predictability of the extreme events from sub-seasonal to interannual scales during cold seasons.

  16. Comparison Of Downscaled CMIP5 Precipitation Datasets For Projecting Changes In Extreme Precipitation In The San Francisco Bay Area.

    NASA Technical Reports Server (NTRS)

    Milesi, Cristina; Costa-Cabral, Mariza; Rath, John; Mills, William; Roy, Sujoy; Thrasher, Bridget; Wang, Weile; Chiang, Felicia; Loewenstein, Max; Podolske, James

    2014-01-01

    Water resource managers planning for the adaptation to future events of extreme precipitation now have access to high resolution downscaled daily projections derived from statistical bias correction and constructed analogs. We also show that along the Pacific Coast the Northern Oscillation Index (NOI) is a reliable predictor of storm likelihood, and therefore a predictor of seasonal precipitation totals and likelihood of extremely intense precipitation. Such time series can be used to project intensity duration curves into the future or input into stormwater models. However, few climate projection studies have explored the impact of the type of downscaling method used on the range and uncertainty of predictions for local flood protection studies. Here we present a study of the future climate flood risk at NASA Ames Research Center, located in South Bay Area, by comparing the range of predictions in extreme precipitation events calculated from three sets of time series downscaled from CMIP5 data: 1) the Bias Correction Constructed Analogs method dataset downscaled to a 1/8 degree grid (12km); 2) the Bias Correction Spatial Disaggregation method downscaled to a 1km grid; 3) a statistical model of extreme daily precipitation events and projected NOI from CMIP5 models. In addition, predicted years of extreme precipitation are used to estimate the risk of overtopping of the retention pond located on the site through simulations of the EPA SWMM hydrologic model. Preliminary results indicate that the intensity of extreme precipitation events is expected to increase and flood the NASA Ames retention pond. The results from these estimations will assist flood protection managers in planning for infrastructure adaptations.

  17. Attribution of Extreme Rainfall Events in the South of France Using EURO-CORDEX Simulations

    NASA Astrophysics Data System (ADS)

    Luu, L. N.; Vautard, R.; Yiou, P.

    2017-12-01

    The Mediterranean region regularly undergoes episodes of intense precipitation in the fall season that exceed 300mm a day. This study focuses on the role of climate change on the dynamics of the events that occur in the South of France. We used an ensemble of 10 EURO-CORDEX model simulations with two horizontal resolutions (EUR-11: 0.11° and EUR-44: 0.44°) for the attribution of extreme rainfall in the fall in the Cevennes mountain range (South of France). The biases of the simulations were corrected with simple scaling adjustment and a quantile correction (CDFt). This produces five datasets including EUR-44 and EUR-11 with and without scaling adjustment and CDFt-EUR-11, on which we test the impact of resolution and bias correction on the extremes. Those datasets, after pooling all of models together, are fitted by a stationary Generalized Extreme Value distribution for several periods to estimate a climate change signal in the tail of distribution of extreme rainfall in the Cévenne region. Those changes are then interpreted by a scaling model that links extreme rainfall with mean and maximum daily temperature. The results show that higher-resolution simulations with bias adjustment provide a robust and confident increase of intensity and likelihood of occurrence of autumn extreme rainfall in the area in current climate in comparison with historical climate. The probability (exceedance probability) of 1-in-1000-year event in historical climate may increase by a factor of 1.8 under current climate with a confident interval of 0.4 to 5.3 following the CDFt bias-adjusted EUR-11. The change of magnitude appears to follow the Clausius-Clapeyron relation that indicates a 7% increase in rainfall per 1oC increase in temperature.

  18. Multiple Auto-Adapting Color Balancing for Large Number of Images

    NASA Astrophysics Data System (ADS)

    Zhou, X.

    2015-04-01

    This paper presents a powerful technology of color balance between images. It does not only work for small number of images but also work for unlimited large number of images. Multiple adaptive methods are used. To obtain color seamless mosaic dataset, local color is adjusted adaptively towards the target color. Local statistics of the source images are computed based on the so-called adaptive dodging window. The adaptive target colors are statistically computed according to multiple target models. The gamma function is derived from the adaptive target and the adaptive source local stats. It is applied to the source images to obtain the color balanced output images. Five target color surface models are proposed. They are color point (or single color), color grid, 1st, 2nd and 3rd 2D polynomials. Least Square Fitting is used to obtain the polynomial target color surfaces. Target color surfaces are automatically computed based on all source images or based on an external target image. Some special objects such as water and snow are filtered by percentage cut or a given mask. Excellent results are achieved. The performance is extremely fast to support on-the-fly color balancing for large number of images (possible of hundreds of thousands images). Detailed algorithm and formulae are described. Rich examples including big mosaic datasets (e.g., contains 36,006 images) are given. Excellent results and performance are presented. The results show that this technology can be successfully used in various imagery to obtain color seamless mosaic. This algorithm has been successfully using in ESRI ArcGis.

  19. Effects of temperature and precipitation variability on the risk of violence in sub-Saharan Africa, 1980–2012

    PubMed Central

    O’Loughlin, John; Linke, Andrew M.; Witmer, Frank D. W.

    2014-01-01

    Ongoing debates in the academic community and in the public policy arena continue without clear resolution about the significance of global climate change for the risk of increased conflict. Sub-Saharan Africa is generally agreed to be the region most vulnerable to such climate impacts. Using a large database of conflict events and detailed climatological data covering the period 1980–2012, we apply a multilevel modeling technique that allows for a more nuanced understanding of a climate–conflict link than has been seen heretofore. In the aggregate, high temperature extremes are associated with more conflict; however, different types of conflict and different subregions do not show consistent relationship with temperature deviations. Precipitation deviations, both high and low, are generally not significant. The location and timing of violence are influenced less by climate anomalies (temperature or precipitation variations from normal) than by key political, economic, and geographic factors. We find important distinctions in the relationship between temperature extremes and conflict by using multiple methods of analysis and by exploiting our time-series cross-sectional dataset for disaggregated analyses. PMID:25385621

  20. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. An Investigation of Bomb Cyclogenesis in NCEP's CFS Model

    NASA Astrophysics Data System (ADS)

    Alvarez, F. M.; Eichler, T.; Gottschalck, J.

    2008-12-01

    With the concerns, impacts and consequences of climate change increasing, the need for climate models to simulate daily weather is very important. Given the improvements in resolution and physical parameterizations, climate models are becoming capable of resolving extreme weather events. A particular type of extreme event which has large impacts on transportation, industry and the general public is a rapidly intensifying cyclone referred to as a "bomb." In this study, bombs are investigated using the National Center for Environmental Prediction's (NCEP) Climate Forecast System (CFS) model. We generate storm tracks based on 6-hourly sea-level pressure (SLP) from long-term climate runs of the CFS model. Investigation of this dataset has revealed that the CFS model is capable of producing bombs. We show a case study of a bomb in the CFS model and demonstrate that it has characteristics similar to the observed. Since the CFS model is capable of producing bombs, future work will focus on trends in their frequency and intensity so that an assessment of the potential role of the bomb in climate change can be assessed.

  2. Asymmetry of projected increases in extreme temperature distributions

    PubMed Central

    Kodra, Evan; Ganguly, Auroop R.

    2014-01-01

    A statistical analysis reveals projections of consistently larger increases in the highest percentiles of summer and winter temperature maxima and minima versus the respective lowest percentiles, resulting in a wider range of temperature extremes in the future. These asymmetric changes in tail distributions of temperature appear robust when explored through 14 CMIP5 climate models and three reanalysis datasets. Asymmetry of projected increases in temperature extremes generalizes widely. Magnitude of the projected asymmetry depends significantly on region, season, land-ocean contrast, and climate model variability as well as whether the extremes of consideration are seasonal minima or maxima events. An assessment of potential physical mechanisms provides support for asymmetric tail increases and hence wider temperature extremes ranges, especially for northern winter extremes. These results offer statistically grounded perspectives on projected changes in the IPCC-recommended extremes indices relevant for impacts and adaptation studies. PMID:25073751

  3. Towards improved parameterization of a macroscale hydrologic model in a discontinuous permafrost boreal forest ecosystem

    DOE PAGES

    Endalamaw, Abraham; Bolton, W. Robert; Young-Robertson, Jessica M.; ...

    2017-09-14

    Modeling hydrological processes in the Alaskan sub-arctic is challenging because of the extreme spatial heterogeneity in soil properties and vegetation communities. Nevertheless, modeling and predicting hydrological processes is critical in this region due to its vulnerability to the effects of climate change. Coarse-spatial-resolution datasets used in land surface modeling pose a new challenge in simulating the spatially distributed and basin-integrated processes since these datasets do not adequately represent the small-scale hydrological, thermal, and ecological heterogeneity. The goal of this study is to improve the prediction capacity of mesoscale to large-scale hydrological models by introducing a small-scale parameterization scheme, which bettermore » represents the spatial heterogeneity of soil properties and vegetation cover in the Alaskan sub-arctic. The small-scale parameterization schemes are derived from observations and a sub-grid parameterization method in the two contrasting sub-basins of the Caribou Poker Creek Research Watershed (CPCRW) in Interior Alaska: one nearly permafrost-free (LowP) sub-basin and one permafrost-dominated (HighP) sub-basin. The sub-grid parameterization method used in the small-scale parameterization scheme is derived from the watershed topography. We found that observed soil thermal and hydraulic properties – including the distribution of permafrost and vegetation cover heterogeneity – are better represented in the sub-grid parameterization method than the coarse-resolution datasets. Parameters derived from the coarse-resolution datasets and from the sub-grid parameterization method are implemented into the variable infiltration capacity (VIC) mesoscale hydrological model to simulate runoff, evapotranspiration (ET), and soil moisture in the two sub-basins of the CPCRW. Simulated hydrographs based on the small-scale parameterization capture most of the peak and low flows, with similar accuracy in both sub-basins, compared to simulated hydrographs based on the coarse-resolution datasets. On average, the small-scale parameterization scheme improves the total runoff simulation by up to 50 % in the LowP sub-basin and by up to 10 % in the HighP sub-basin from the large-scale parameterization. This study shows that the proposed sub-grid parameterization method can be used to improve the performance of mesoscale hydrological models in the Alaskan sub-arctic watersheds.« less

  4. Towards improved parameterization of a macroscale hydrologic model in a discontinuous permafrost boreal forest ecosystem

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Endalamaw, Abraham; Bolton, W. Robert; Young-Robertson, Jessica M.

    Modeling hydrological processes in the Alaskan sub-arctic is challenging because of the extreme spatial heterogeneity in soil properties and vegetation communities. Nevertheless, modeling and predicting hydrological processes is critical in this region due to its vulnerability to the effects of climate change. Coarse-spatial-resolution datasets used in land surface modeling pose a new challenge in simulating the spatially distributed and basin-integrated processes since these datasets do not adequately represent the small-scale hydrological, thermal, and ecological heterogeneity. The goal of this study is to improve the prediction capacity of mesoscale to large-scale hydrological models by introducing a small-scale parameterization scheme, which bettermore » represents the spatial heterogeneity of soil properties and vegetation cover in the Alaskan sub-arctic. The small-scale parameterization schemes are derived from observations and a sub-grid parameterization method in the two contrasting sub-basins of the Caribou Poker Creek Research Watershed (CPCRW) in Interior Alaska: one nearly permafrost-free (LowP) sub-basin and one permafrost-dominated (HighP) sub-basin. The sub-grid parameterization method used in the small-scale parameterization scheme is derived from the watershed topography. We found that observed soil thermal and hydraulic properties – including the distribution of permafrost and vegetation cover heterogeneity – are better represented in the sub-grid parameterization method than the coarse-resolution datasets. Parameters derived from the coarse-resolution datasets and from the sub-grid parameterization method are implemented into the variable infiltration capacity (VIC) mesoscale hydrological model to simulate runoff, evapotranspiration (ET), and soil moisture in the two sub-basins of the CPCRW. Simulated hydrographs based on the small-scale parameterization capture most of the peak and low flows, with similar accuracy in both sub-basins, compared to simulated hydrographs based on the coarse-resolution datasets. On average, the small-scale parameterization scheme improves the total runoff simulation by up to 50 % in the LowP sub-basin and by up to 10 % in the HighP sub-basin from the large-scale parameterization. This study shows that the proposed sub-grid parameterization method can be used to improve the performance of mesoscale hydrological models in the Alaskan sub-arctic watersheds.« less

  5. Large-scale image region documentation for fully automated image biomarker algorithm development and evaluation.

    PubMed

    Reeves, Anthony P; Xie, Yiting; Liu, Shuang

    2017-04-01

    With the advent of fully automated image analysis and modern machine learning methods, there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. This paper presents a method and implementation for facilitating such datasets that addresses the critical issue of size scaling for algorithm validation and evaluation; current evaluation methods that are usually used in academic studies do not scale to large datasets. This method includes protocols for the documentation of many regions in very large image datasets; the documentation may be incrementally updated by new image data and by improved algorithm outcomes. This method has been used for 5 years in the context of chest health biomarkers from low-dose chest CT images that are now being used with increasing frequency in lung cancer screening practice. The lung scans are segmented into over 100 different anatomical regions, and the method has been applied to a dataset of over 20,000 chest CT images. Using this framework, the computer algorithms have been developed to achieve over 90% acceptable image segmentation on the complete dataset.

  6. Diagnosing the Prominence-Cavity Connection in the Solar Corona

    NASA Astrophysics Data System (ADS)

    Schmit, D. J.

    The energetic equilibrium of the corona is described by a balance of heating, thermal conduction, and radiative cooling. Prominences can be described by the thermal instability of coronal energy balance which leads to the formation of cool condensations. Observationally, the prominence is surrounded by a density depleted elliptical structure known as a cavity. In this dissertation, we use extreme ultraviolet remote sensing observations of the prominence-cavity system to diagnose the static and dynamic properties of these structures. The observations are compared with numerical models for the time-dependent coronal condensation process and the time-independent corona-prominence magnetic field. To diagnose the density of the cavity, we construct a three-dimensional structural model of the corona. This structural model allows us to synthesize extreme ultraviolet emission in the corona in a way that incorporates the projection effects which arise from the optically thin plasma. This forward model technique is used to constrain a radial density profile simultaneously in the cavity and the streamer. We use a χ2 minimization to find the density model which best matches a density sensitive line ratio (observed with Hinode/Extreme ultraviolet Imaging Spectrometer) and the white light scattered intensity (observed with Mauna Loa Solar Observatory MK4 coronagraph). We use extreme ultraviolet spectra and spectral images to diagnose the dynamics of the prominence and the surrounding corona. Based on the doppler shift of extreme ultraviolet coronal emission lines, we find that there are large regions of flowing plasma which appear to occur within cavities. These line of sight flows have speeds of 10 km/s-1 and projected spatial scales of 100 Mm. Using the Solar Dynamics Observatory Atmospheric Imaging Assembly (SDO/AIA) dataset, we observe dynamic emission from the prominence-cavity system. The SDO/AIA dataset observes multiple spectral bandpasses with different temperature sensitivities. Time-dependent changes in the observed emission in these bandpass images represent changes in the thermodynamic properties of the emitting plasma. We find that the coronal region surrounding the prominence exhibits larger intensity variations (over tens of hours of observations) as compared to the streamer region. This variability is particularly strong in the cool coronal emission of the 171Å bandpass. We identify the source of this variability as strong brightening events that resemble concave-up loop segments and extend from the cool prominence plasma. Magnetic field lines are the basic structural building block of the corona. Energy and pressure balance in the corona occur along magnetic field lines. The large-scale extreme ultraviolet emission we observe in the corona is a conglomerate of many coronal loops projected along a line of sight. In order to calculate the plasma properties at a particular point in the corona, we use one-dimensional models for energy and pressure balance along field lines. In order to predict the extreme ultraviolet emission along a particular line of sight, we project these one-dimensional models onto the three-dimensional magnetic configuration provided by a MHD model for the coronal magnetic field. These results have allowed us to the establish the first comprehensive picture on the magnetic and energetic interaction of the prominence and the cavity. While the originally hypothesis that the cavity supplies mass to the prominence proved inaccurate, we cannot simply say that these structures are not related. Rather our findings suggest that the prominence and the cavity are distinct magnetic substructures that are complementary regions of a larger whole, specifically a magnetic flux rope. (Abstract shortened by UMI.).

  7. Optimizing tertiary storage organization and access for spatio-temporal datasets

    NASA Technical Reports Server (NTRS)

    Chen, Ling Tony; Rotem, Doron; Shoshani, Arie; Drach, Bob; Louis, Steve; Keating, Meridith

    1994-01-01

    We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and interactive visualization. We have developed data partitioning techniques that partition datasets into 'clusters' based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.

  8. GLEAM version 3: Global Land Evaporation Datasets and Model

    NASA Astrophysics Data System (ADS)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ observations. It is also shown that the performance of the revised model is higher compared to the original one.

  9. Chapter 11: Web-based Tools - VO Region Inventory Service

    NASA Astrophysics Data System (ADS)

    Good, J. C.

    As the size and number of datasets available through the VO grows, it becomes increasingly critical to have services that aid in locating and characterizing data pertinent to a particular scientific problem. At the same time, this same increase makes that goal more and more difficult to achieve. With a small number of datasets, it is feasible to simply retrieve the data itself (as the NVO DataScope service does). At intermediate scales, "count" DBMS searches (searches of the actual datasets which return record counts rather than full data subsets) sent to each data provider will work. However, neither of these approaches scale as the number of datasets expands into the hundreds or thousands. Dealing with the same problem internally, IRSA developed a compact and extremely fast scheme for determining source counts for positional catalogs (and in some cases image metadata) over arbitrarily large regions for multiple catalogs in a fraction of a second. To show applicability to the VO in general, this service has been extended with indices for all 4000+ catalogs in CDS Vizier (essentially all published catalogs and source tables). In this chapter, we will briefly describe the architecture of this service, and then describe how this can be used in a distributed system to retrieve rapid inventories of all VO holdings in a way that places an insignificant load on any data supplier. Further, we show and this tool can be used in conjunction with VO Registries and catalog services to zero in on those datasets that are appropriate to the user's needs. The initial implementation of this service consolidates custom binary index file structures (external to any DBMS and therefore portable) at a single site to minimize search times and implements the search interface as a simple CGI program. However, the architecture is amenable to distribution. The next phase of development will focus on metadata harvesting from data archives through a standard program interface and distribution of the search processing across multiple service providers for redundancy and parallelization.

  10. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    NASA Technical Reports Server (NTRS)

    Ramasso, Emannuel; Saxena, Abhinav

    2014-01-01

    Benchmarking of prognostic algorithms has been challenging due to limited availability of common datasets suitable for prognostics. In an attempt to alleviate this problem several benchmarking datasets have been collected by NASA's prognostic center of excellence and made available to the Prognostics and Health Management (PHM) community to allow evaluation and comparison of prognostics algorithms. Among those datasets are five C-MAPSS datasets that have been extremely popular due to their unique characteristics making them suitable for prognostics. The C-MAPSS datasets pose several challenges that have been tackled by different methods in the PHM literature. In particular, management of high variability due to sensor noise, effects of operating conditions, and presence of multiple simultaneous fault modes are some factors that have great impact on the generalization capabilities of prognostics algorithms. More than 70 publications have used the C-MAPSS datasets for developing data-driven prognostic algorithms. The C-MAPSS datasets are also shown to be well-suited for development of new machine learning and pattern recognition tools for several key preprocessing steps such as feature extraction and selection, failure mode assessment, operating conditions assessment, health status estimation, uncertainty management, and prognostics performance evaluation. This paper summarizes a comprehensive literature review of publications using C-MAPSS datasets and provides guidelines and references to further usage of these datasets in a manner that allows clear and consistent comparison between different approaches.

  11. I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chard, Kyle; D'Arcy, Mike; Heavner, Benjamin D.

    Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and toolsmore » for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.« less

  12. Analysis of the IJCNN 2011 UTL Challenge

    DTIC Science & Technology

    2012-01-13

    large datasets from various application domains: handwriting recognition, image recognition, video processing, text processing, and ecology. The goal...http //clopinet.com/ul). We made available large datasets from various application domains handwriting recognition, image recognition, video...evaluation sets consist of 4096 examples each. Dataset Domain Features Sparsity Devel. Transf. AVICENNA Handwriting 120 0% 150205 50000 HARRY Video 5000 98.1

  13. Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.

    PubMed

    Gray, Vanessa E; Hause, Ronald J; Luebeck, Jens; Shendure, Jay; Fowler, Douglas M

    2018-01-24

    Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/). Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Web-based visualization of very large scientific astronomy imagery

    NASA Astrophysics Data System (ADS)

    Bertin, E.; Pillay, R.; Marmo, C.

    2015-04-01

    Visualizing and navigating through large astronomy images from a remote location with current astronomy display tools can be a frustrating experience in terms of speed and ergonomics, especially on mobile devices. In this paper, we present a high performance, versatile and robust client-server system for remote visualization and analysis of extremely large scientific images. Applications of this work include survey image quality control, interactive data query and exploration, citizen science, as well as public outreach. The proposed software is entirely open source and is designed to be generic and applicable to a variety of datasets. It provides access to floating point data at terabyte scales, with the ability to precisely adjust image settings in real-time. The proposed clients are light-weight, platform-independent web applications built on standard HTML5 web technologies and compatible with both touch and mouse-based devices. We put the system to the test and assess the performance of the system and show that a single server can comfortably handle more than a hundred simultaneous users accessing full precision 32 bit astronomy data.

  15. Micro-Raman spectroscopic identification of bacterial cells of the genus Staphylococcus and dependence on their cultivation conditions.

    PubMed

    Harz, M; Rösch, P; Peschke, K-D; Ronneberger, O; Burkhardt, H; Popp, J

    2005-11-01

    Microbial contamination is not only a medical problem, but also plays a large role in pharmaceutical clean room production and food processing technology. Therefore many techniques were developed to achieve differentiation and identification of microorganisms. Among these methods vibrational spectroscopic techniques (IR, Raman and SERS) are useful tools because of their rapidity and sensitivity. Recently we have shown that micro-Raman spectroscopy in combination with a support vector machine is an extremely capable approach for a fast and reliable, non-destructive online identification of single bacteria belonging to different genera. In order to simulate different environmental conditions we analyzed in this contribution different Staphylococcus strains with varying cultivation conditions in order to evaluate our method with a reliable dataset. First, micro-Raman spectra of the bulk material and single bacterial cells that were grown under the same conditions were recorded and used separately for a distinct chemotaxonomic classification of the strains. Furthermore Raman spectra were recorded from single bacterial cells that were cultured under various conditions to study the influence of cultivation on the discrimination ability. This dataset was analyzed both with a hierarchical cluster analysis (HCA) and a support vector machine (SVM).

  16. Revealing complex function, process and pathway interactions with high-throughput expression and biological annotation data.

    PubMed

    Singh, Nitesh Kumar; Ernst, Mathias; Liebscher, Volkmar; Fuellen, Georg; Taher, Leila

    2016-10-20

    The biological relationships both between and within the functions, processes and pathways that operate within complex biological systems are only poorly characterized, making the interpretation of large scale gene expression datasets extremely challenging. Here, we present an approach that integrates gene expression and biological annotation data to identify and describe the interactions between biological functions, processes and pathways that govern a phenotype of interest. The product is a global, interconnected network, not of genes but of functions, processes and pathways, that represents the biological relationships within the system. We validated our approach on two high-throughput expression datasets describing organismal and organ development. Our findings are well supported by the available literature, confirming that developmental processes and apoptosis play key roles in cell differentiation. Furthermore, our results suggest that processes related to pluripotency and lineage commitment, which are known to be critical for development, interact mainly indirectly, through genes implicated in more general biological processes. Moreover, we provide evidence that supports the relevance of cell spatial organization in the developing liver for proper liver function. Our strategy can be viewed as an abstraction that is useful to interpret high-throughput data and devise further experiments.

  17. Obesity in pediatric trauma.

    PubMed

    Witt, Cordelie E; Arbabi, Saman; Nathens, Avery B; Vavilala, Monica S; Rivara, Frederick P

    2017-04-01

    The implications of childhood obesity on pediatric trauma outcomes are not clearly established. Anthropomorphic data were recently added to the National Trauma Data Bank (NTDB) Research Datasets, enabling a large, multicenter evaluation of the effect of obesity on pediatric trauma patients. Children ages 2 to 19years who required hospitalization for traumatic injury were identified in the 2013-2014 NTDB Research Datasets. Age and gender-specific body mass indices (BMI) were calculated. Outcomes included injury patterns, operative procedures, complications, and hospital utilization parameters. Data from 149,817 pediatric patients were analyzed; higher BMI percentiles were associated with significantly more extremity injuries, and fewer injuries to the head, abdomen, thorax and spine (p values <0.001). On multivariable analysis, higher BMI percentiles were associated with significantly increased likelihood of death, deep venous thrombosis, pulmonary embolus and pneumonia; although there was no difference in risk of overall complications. Obese children also had significantly longer lengths of stay and more frequent ventilator requirement. Among children admitted after trauma, increased BMI percentile is associated with increased risk of death and potentially preventable complications. These findings suggest that obese children may require different management than nonobese counterparts to prevent complications. Level III; prognosis study. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. The Climatology of Extreme Surge-Producing Extratropical Cyclones in Observations and Models

    NASA Astrophysics Data System (ADS)

    Catalano, A. J.; Broccoli, A. J.; Kapnick, S. B.

    2016-12-01

    Extreme coastal storms devastate heavily populated areas around the world by producing powerful winds that can create a large storm surge. Both tropical and extratropical cyclones (ETCs) occur over the northwestern Atlantic Ocean, and the risks associated with ETCs can be just as severe as those associated with tropical storms (e.g. high winds, storm surge). At The Battery in New York City, 17 of the 20 largest storm surge events were a consequence of extratropical cyclones (ETCs), which are more prevalent than tropical cyclones in the northeast region of the United States. Therefore, we analyze the climatology of ETCs that are capable of producing a large storm surge along the northeastern coast of the United States. For a historical analysis, water level data was collected from National Oceanic and Atmospheric Administration (NOAA) tide gauges at three separate locations (Sewell's Pt., VA, The Battery, NY, and Boston, MA). We perform a k-means cluster analysis of sea level pressure from the ECMWF 20th Century Reanalysis dataset (ERA-20c) to explore the natural sets of observed storms with similar characteristics. We then composite cluster results with features of atmospheric circulation to observe the influence of interannual and multidecadal variability such as the North Atlantic Oscillation. Since observational records contain a small number of well-documented ETCs, the capability of a high-resolution coupled climate model to realistically simulate such extreme coastal storms will also be assessed. Global climate models provide a means of simulating a much larger sample of extreme events, allowing for better resolution of the tail of the distribution. We employ a tracking algorithm to identify ETCs in a multi-century simulation under present-day conditions. Quantitative comparisons of cyclolysis, cyclogenesis, and cyclone densities of simulated ETCs and storms from recent history (using reanalysis products) are conducted.

  19. Remote visual analysis of large turbulence databases at multiple scales

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pulido, Jesus; Livescu, Daniel; Kanov, Kalin

    The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less

  20. Remote visual analysis of large turbulence databases at multiple scales

    DOE PAGES

    Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...

    2018-06-15

    The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less

  1. Climatological Impact of Atmospheric River Based on NARCCAP and DRI-RCM Datasets

    NASA Astrophysics Data System (ADS)

    Mejia, J. F.; Perryman, N. M.

    2012-12-01

    This study evaluates spatial responses of extreme precipitation environments, typically associated with Atmospheric River events, using Regional Climate Model (RCM) output from NARCCAP dataset (50km grid size) and the Desert Research Institute-RCM simulations (36 and 12 km grid size). For this study, a pattern-detection algorithm was developed to characterize Atmospheric Rivers (ARs)-like features from climate models. Topological analysis of the enhanced elongated moisture flux (500-300hPa; daily means) cores is used to objectively characterize such AR features in two distinct groups: (i) zonal, north Pacific ARs, and (ii) subtropical ARs, also known as "Pineapple Express" events. We computed the climatological responses of the different RCMs upon these two AR groups, from which intricate differences among RCMs stand out. This study presents these climatological responses from historical and scenario driven simulations, as well as implications for precipitation extreme-value analyses.

  2. Multilayer Extreme Learning Machine With Subnetwork Nodes for Representation Learning.

    PubMed

    Yang, Yimin; Wu, Q M Jonathan

    2016-11-01

    The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks, provides efficient unified learning solutions for the applications of clustering, regression, and classification. It presents competitive accuracy with superb efficiency in many applications. However, ELM with subnetwork nodes architecture has not attracted much research attentions. Recently, many methods have been proposed for supervised/unsupervised dimension reduction or representation learning, but these methods normally only work for one type of problem. This paper studies the general architecture of multilayer ELM (ML-ELM) with subnetwork nodes, showing that: 1) the proposed method provides a representation learning platform with unsupervised/supervised and compressed/sparse representation learning and 2) experimental results on ten image datasets and 16 classification datasets show that, compared to other conventional feature learning methods, the proposed ML-ELM with subnetwork nodes performs competitively or much better than other feature learning methods.

  3. Tomographic local 2D analyses of the WISExSuperCOSMOS all-sky galaxy catalogue

    NASA Astrophysics Data System (ADS)

    Novaes, C. P.; Bernui, A.; Xavier, H. S.; Marques, G. A.

    2018-05-01

    The recent progress in obtaining larger and deeper galaxy catalogues is of fundamental importance for cosmological studies, especially to robustly measure the large scale density fluctuations in the Universe. The present work uses the Minkowski Functionals (MF) to probe the galaxy density field from the WISExSuperCOSMOS (WSC) all-sky catalogue by performing tomographic local analyses in five redshift shells (of thickness δz = 0.05) in the total range of 0.10 < z < 0.35. Here, for the first time, the MF are applied to 2D projections of the galaxy number count (GNC) fields with the purpose of looking for regions in the WSC catalogue with unexpected features compared to ΛCDM mock realisations. Our methodology reveals 1 - 3 regions of the GNC maps in each redshift shell with an uncommon behaviour (extreme regions), i.e., p-value < 1.4%. Indeed, the resulting MF curves show signatures that suggest the uncommon behaviour to be associated with the presence of over- or under-densities there, but contamination due to residual foregrounds is not discarded. Additionally, even though our analyses indicate a good agreement among data and simulations, we identify 1 highly extreme region, seemingly associated to a large clustered distribution of galaxies. Our results confirm the usefulness of the MF to analyse GNC maps from photometric galaxy datasets.

  4. A retrospective analysis of American football hyperthermia deaths in the United States

    NASA Astrophysics Data System (ADS)

    Grundstein, Andrew J.; Ramseyer, Craig; Zhao, Fang; Pesses, Jordan L.; Akers, Pete; Qureshi, Aneela; Becker, Laura; Knox, John A.; Petro, Myron

    2012-01-01

    Over the period 1980-2009, there were 58 documented hyperthermia deaths of American-style football players in the United States. This study examines the geography, timing, and meteorological conditions present during the onset of hyperthermia, using the most complete dataset available. Deaths are concentrated in the eastern quadrant of the United States and are most common during August. Over half the deaths occurred during morning practices when high humidity levels were common. The athletes were typically large (79% with a body mass index >30) and mostly (86%) played linemen positions. Meteorological conditions were atypically hot and humid by local standards on most days with fatalities. Further, all deaths occurred under conditions defined as high or extreme by the American College of Sports Medicine using the wet bulb globe temperature (WBGT), but under lower threat levels using the heat index (HI). Football-specific thresholds based on clothing (full football uniform, practice uniform, or shorts) were also examined. The thresholds matched well with data from athletes wearing practice uniforms but poorly for those in shorts only. Too few cases of athletes in full pads were available to draw any broad conclusions. We recommend that coaches carefully monitor players, particularly large linemen, early in the pre-season on days with wet bulb globe temperatures that are categorized as high or extreme. Also, as most of the deaths were among young athletes, longer acclimatization periods may be needed.

  5. Constructing An Event Based Aerosol Product Under High Aerosol Loading Conditions

    NASA Astrophysics Data System (ADS)

    Levy, R. C.; Shi, Y.; Mattoo, S.; Remer, L. A.; Zhang, J.

    2016-12-01

    High aerosol loading events, such as the Indonesia's forest fire in Fall 2015 or the persistent wintertime haze near Beijing, gain tremendous interests due to their large impact on regional visibility and air quality. Understanding the optical properties of these events and further being able to simulate and predict these events are beneficial. However, it is a great challenge to consistently identify and then retrieve aerosol optical depth (AOD) from passive sensors during heavy aerosol events. Some reasons include:1). large differences between optical properties of high-loading aerosols and those under normal conditions, 2) spectral signals of optically thick aerosols can be mistaken with surface depending on aerosol types, and 3) Extremely optically thick aerosol plumes can also be misidentified as clouds due to its high optical thickness. Thus, even under clear-sky conditions, the global distribution of extreme aerosol events is not well captured in datasets such as the MODIS Dark-Target (DT) aerosol product. In this study, with the synthetic use of OMI Aerosol Index, MODIS cloud product, and operational DT product, the heavy smoke events over the seven sea region are identified and retrieved over the dry season. An event based aerosol product that would compensate the standard "global" aerosol retrieval will be created and evaluated. The impact of missing high AOD retrievals on the regional aerosol climatology will be studied using this newly developed research product.

  6. Reanalysis Data Evaluation to Study Temperature Extremes in Siberia

    NASA Astrophysics Data System (ADS)

    Shulgina, T. M.; Gordov, E. P.

    2014-12-01

    Ongoing global climate changes are strongly pronounced in Siberia by significant warming in the 2nd half of 20th century and recent extreme events such as 2010 heat wave and 2013 flood in Russia's Far East. To improve our understanding of observed climate extremes and to provide to regional decision makers the reliable scientifically based information with high special and temporal resolution on climate state, we need to operate with accurate meteorological data in our study. However, from available 231 stations across Siberia only 130 of them present the homogeneous daily temperature time series. Sparse, station network, especially in high latitudes, force us to use simulated reanalysis data. However those might differ from observations. To obtain reliable information on temperature extreme "hot spots" in Siberia we have compared daily temperatures form ERA-40, ERA Interim, JRA-25, JRA-55, NCEP/DOE, MERRA Reanalysis, HadEX2 and GHCNDEX gridded datasets with observations from RIHMI-WDC/CDIAC dataset for overlap period 1981-2000. Data agreement was estimated at station coordinates to which reanalysis data were interpolated using modified Shepard method. Comparison of averaged over 20 year annual mean temperatures shows general agreement for Siberia excepting Baikal region, where reanalyses significantly underestimate observed temperature behavior. The annual temperatures closest to observed one were obtained from ERA-40 and ERA Interim. Furthermore, t-test results show homogeneity of these datasets, which allows one to combine them for long term time series analysis. In particular, we compared the combined data with observations for percentile-based extreme indices. In Western Siberia reanalysis and gridded data accurately reproduce observed daily max/min temperatures. For East Siberia, Lake Baikal area, ERA Interim data slightly underestimates TN90p and TX90p values. Results obtained allows regional decision-makers to get required high spatial resolution (0,25°×0,25°) climatic information products from the combined ERA data. The authors acknowledge partial financial support for this research from the RFBR (13-05-12034, 14-05-00502), SB RAS Integration projects (131, VIII.80.2.1.) and grant of the President of RF (№ 181).

  7. Causes and Consequences of Exceptional North Atlantic Heat Loss in Recent Winters

    NASA Astrophysics Data System (ADS)

    Josey, Simon; Grist, Jeremy; Duchez, Aurelie; Frajka-Williams, Eleanor; Hirschi, Joel; Marsh, Robert; Sinha, Bablu

    2016-04-01

    The mid-high latitude North Atlantic loses large amounts of heat to the atmosphere in winter leading to dense water formation. An examination of reanalysis datasets (ERA-Interim, NCEP/NCAR) reveals that heat loss in the recent winters 2013-14 and 2014-15 was exceptionally strong. The causes and consequences of this extraordinary ocean heat loss will be discussed. In 2013-2014, the net air-sea heat flux anomaly averaged over the whole winter exceeded 100 Wm-2 in the eastern subpolar gyre (the most extreme in the period since 1979 spanned by ERA-Interim). The causes of this extreme heat loss will be shown to be severe latent and sensible heat fluxes driven primarily by anomalously strong westerly airflows from North America and northerly airflows originating in the Nordic Seas. The associated sea level pressure anomaly field reflects the dominance of the second mode of atmospheric variability, the East Atlantic Pattern (EAP) over the North Atlantic Oscillation (NAO) in this winter. The extreme winter heat loss had a significant impact on the ocean extending from the sea surface into the deeper layers and a re-emergent cold Sea Surface Temperature (SST) anomaly is evident in November 2014. The following winter 2014-15 experienced further extreme heat loss that served to amplify the strength of the re-emergent SST anomaly. By summer 2015, an unprecedented cold mid-latitude North Atlantic Ocean surface temperature anomaly is evident in observations and has been widely referred to as the 'big blue blob'. The role played by the extreme surface heat loss in the preceding winters in generating this feature and it subsequent evolution through winter 2015-16 will be explored.

  8. Satellite skill in detecting extreme episodes in near-surface air quality

    NASA Astrophysics Data System (ADS)

    Ruiz, D. J.; Prather, M. J.

    2017-12-01

    Ozone (O3) contributes to ambient air pollution, adversely affecting public health, agriculture, and ecosystems. Reliable, long-term, densely distributed surface networks are required to establish the scale, intensity and repeatability of major pollution events (designated here in a climatological sense as air quality extremes, AQX as defined in Schnell's work). Regrettably, such networks are only available for North America (NA) and Europe (EU), which does not include many populated regions where the deaths associated with air pollution exposure are alarmingly high. Directly measuring surface pollutants from space without lidar is extremely difficult. Mapping of daily pollution events requires cross-track nadir scanners and these have limited sensitivity to surface O3 levels. This work examines several years of coincident surface and OMI satellite measurements over NA-EU, in combination with a chemistry-transport model (CTM) hindcast of that period to understand how the large-scale AQX episodes may extend into the free troposphere and thus be more amenable to satellite mapping. We show how extreme NA-EU episodes are measured from OMI and then look for such patterns over other polluted regions of the globe. We gather individual high-quality O3 surface site measurements from these other regions, to check on our satellite detection. Our approach with global satellite detection would avoid issues associated with regional variations in seasonality, chemical regime, data product biases; and it does not require defining a separate absolute threshold for each data product (surface site and satellite). This also enables coherent linking of the extreme events into large-scale pollution episodes whose magnitude evolves over 100's of km for several days. Tools used here include the UC Irvine CTM, which shows that much of the O3 surface variability is lost at heights above 2 km, but AQX local events are readily seen in a 0-3 km column average. The OMI data are taken from X. Liu's dataset using an improved algorithm for detection of tropospheric O3. Surface site observations outside NA and EU are taken from research stations where possible.

  9. Large-scale image region documentation for fully automated image biomarker algorithm development and evaluation

    PubMed Central

    Reeves, Anthony P.; Xie, Yiting; Liu, Shuang

    2017-01-01

    Abstract. With the advent of fully automated image analysis and modern machine learning methods, there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. This paper presents a method and implementation for facilitating such datasets that addresses the critical issue of size scaling for algorithm validation and evaluation; current evaluation methods that are usually used in academic studies do not scale to large datasets. This method includes protocols for the documentation of many regions in very large image datasets; the documentation may be incrementally updated by new image data and by improved algorithm outcomes. This method has been used for 5 years in the context of chest health biomarkers from low-dose chest CT images that are now being used with increasing frequency in lung cancer screening practice. The lung scans are segmented into over 100 different anatomical regions, and the method has been applied to a dataset of over 20,000 chest CT images. Using this framework, the computer algorithms have been developed to achieve over 90% acceptable image segmentation on the complete dataset. PMID:28612037

  10. Climatological assessment of spatiotemporal trends in observational monthly snowfall totals and extremes over the Canadian Great Lakes Basin

    NASA Astrophysics Data System (ADS)

    Baijnath, Janine; Duguay, Claude; Sushama, Laxmi; Huziy, Oleksandr

    2017-04-01

    The Laurentian Great Lakes Basin (GLB) is susceptible to snowfall events that derive from extratropical cyclones and heavy lake effect snowfall (HLES). The former is generated by quasigeostropic forcing from positive temperature or vorticity advection associated with low-pressure centres. HLES is produced by planetary boundary layer (PBL) convection that is initiated as a result of cold and dry continental air mass advecting over relatively warm lakes and generating turbulent moisture and heat fluxes into the PBL. HLES events can have disastrous impacts on local communities such as the November 2014 Buffalo storm that caused 13 fatalities. Albeit the many HLES studies, most are focused on specific case study events with a discernible under examination of climatological HLES trend analyses for the Canadian GLB. The research objectives are to first determine the historical, climatological trends in monthly snowfall totals and to examine potential surface and atmospheric variables driving the resultant changes in HLES. The second aims to analyze the historical extremes in snowfall by assessing the intensity, frequency, and duration of snowfall within the domain of interest. Spatiotemporal snowfall and precipitation trends are computed for the 1982 to 2015 period using Daymet (Version 3) monthly gridded observational datasets from the Oak Ridge National Laboratory. The North American Regional Reanalysis (NARR), NOAA Optimum Interpolation Sea Surface Temperature (OISST), and the Canadian Ice Service (CIS) datasets are also used for evaluating trends in HLES driving variables such as air temperature, lake surface temperature (LST), ice cover concentration, omega, and vertical temperature gradient (VTGlst-850). Climatological trends in monthly snowfall totals show a significant decrease along the Ontario snowbelt of Lake Superior, Lake Huron and Georgian Bay at the 90 percent confidence level. These results are attributed to significant warming in LST, significant decrease in ice cover fraction, and an increase in VTGlst-850, which enhances evaporation into the lower PBL. It is suggested that inefficient moisture recycling and increase moisture storage in warmer air masses inhibits the development of HLES. The 99th percentile of snowfall events within the GLB suggests an extreme snowfall value equal to or exceeding 15 cm per day. Spatiotemporal snowfall patterns indicate that mostly lake effect processes and not extratropical cyclones drive the high intensity, frequency, and duration of these extreme events over the GLB. Furthermore, the Canadian snowbelt region of Lake Huron and Lake Superior exhibit different spatiotemporal trends in snowfall extremes but, even within a particular snowbelt region, trends in extreme snowfall are not spatially coherent. It is suggested that geographic location of the lakes, topography, lake bathymetry, and lake orientation can influence local and large scale surface-atmosphere variables.

  11. Combining wood anatomy and stable isotope variations in a 600-year multi-parameter climate reconstruction from Corsican black pine

    NASA Astrophysics Data System (ADS)

    Szymczak, Sonja; Hetzer, Timo; Bräuning, Achim; Joachimski, Michael M.; Leuschner, Hanns-Hubert; Kuhlemann, Joachim

    2014-10-01

    We present a new multi-parameter dataset from Corsican black pine growing on the island of Corsica in the Western Mediterranean basin covering the period AD 1410-2008. Wood parameters measured include tree-ring width, latewood width, earlywood width, cell lumen area, cell width, cell wall thickness, modelled wood density, as well as stable carbon and oxygen isotopes. We evaluated the relationships between different parameters and determined the value of the dataset for climate reconstructions. Correlation analyses revealed that carbon isotope ratios are influenced by cell parameters determining cell size, whereas oxygen isotope ratios are influenced by cell parameters determining the amount of transportable water in the xylem. A summer (June to August) precipitation reconstruction dating back to AD 1185 was established based on tree-ring width. No long-term trends or pronounced periods with extreme high/low precipitation are recorded in our reconstruction, indicating relatively stable moisture conditions over the entire time period. By comparing the precipitation reconstruction with a summer temperature reconstruction derived from the carbon isotope chronologies, we identified summers with extreme climate conditions, i.e. warm-dry, warm-wet, cold-dry and cold-wet. Extreme climate conditions during summer months were found to influence cell parameter characteristics. Cold-wet summers promote the production of broad latewood composed of wide and thin-walled tracheids, while warm-wet summers promote the production of latewood with small thick-walled cells. The presented dataset emphasizes the potential of multi-parameter wood analysis from one tree species over long time scales.

  12. Magnetotelluric characterization of the northern margin of the Yilgarn Craton (Western Australia)

    NASA Astrophysics Data System (ADS)

    Piña-Varas, Perla; Dentith, Michael

    2017-04-01

    The northern margin of the Yilgarn Craton (Western Australia) was deformed during the convergence and collision with the Pilbara Craton and the intervening Glenburgh Terrain that created the Capricorn Orogen. The Yilgarn Craton is one of the most intensively mineralised areas of continental crust with world class deposits of gold and nickel. However, the region to its north has surprisingly few deposits. Cratonic margins are considered to be key indicators of prospectivity at a regional scale. The northern limit of the Yilgarn Craton within the Capricorn Orogen is not well resolved at date because of overlying Proterozoic sedimentary basins. We present here some of the results of an extensive magnetotelluric (MT) study that is being performed in the area. This study is a component of large multi-disciplinary geoscience project on the 'Distal Footprints of Giant Ore Systems' in the Capricorn Orogen. The MT dataset consists of a total of 240 broadband magnetotelluric stations (BBMT) and 84 long period stations (LMT). Analysis of the dataset reveals a clear 3-D geoelectrical behaviour and extreme complexity for most of the sites, including an extremely high number of sites with phases out-of-quadrant at long periods. 3-D inverse modelling of the MT data shows high resistivity Archean units and low resistivity Paleoproterozoic basins, including very low resistivity structures at depth. These strong resistivity contrasts allow us to successfully map northern margin of the Yilgarn Craton beneath basin cover, as well as identifying major lateral conductivity changes in the deep crust suggestive of different tectonic blocks. Upper crustal conductive zones can be correlated with faults on seismic reflection data. Our results suggest MT surveys are a useful tool for regional-scale exploration in the study area and in area of thick cover in general.

  13. Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm.

    PubMed

    Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong

    2017-01-01

    Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Quantum rendering

    NASA Astrophysics Data System (ADS)

    Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.

    2003-08-01

    In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.

  15. TuMore: generation of synthetic brain tumor MRI data for deep learning based segmentation approaches

    NASA Astrophysics Data System (ADS)

    Lindner, Lydia; Pfarrkirchner, Birgit; Gsaxner, Christina; Schmalstieg, Dieter; Egger, Jan

    2018-03-01

    Accurate segmentation and measurement of brain tumors plays an important role in clinical practice and research, as it is critical for treatment planning and monitoring of tumor growth. However, brain tumor segmentation is one of the most challenging tasks in medical image analysis. Since manual segmentations are subjective, time consuming and neither accurate nor reliable, there exists a need for objective, robust and fast automated segmentation methods that provide competitive performance. Therefore, deep learning based approaches are gaining interest in the field of medical image segmentation. When the training data set is large enough, deep learning approaches can be extremely effective, but in domains like medicine, only limited data is available in the majority of cases. Due to this reason, we propose a method that allows to create a large dataset of brain MRI (Magnetic Resonance Imaging) images containing synthetic brain tumors - glioblastomas more specifically - and the corresponding ground truth, that can be subsequently used to train deep neural networks.

  16. Drivers and implications of recent large fire years in boreal North America

    NASA Astrophysics Data System (ADS)

    Veraverbeke, S.; Rogers, B. M.; Goulden, M.; Jandt, R.; Miller, C. E.; Wiggins, E. B.; Randerson, J. T.

    2016-12-01

    High latitude ecosystems are rapidly transforming because of climate change. Boreal North America recently experienced two exceptionally large fire years: 2014 in the Northwest Territories, Canada, and 2015 in Alaska, USA. We used geospatial climate, lightning, fire, and vegetation datasets to assess the mechanisms contributing to these recent extreme years and to the causes of recent decadal-scale changes in fire dynamics. We found that the two events had a record number of lightning ignitions and unusually high levels of burning near the boreal treeline, contributing to emissions of 164 ± 32 Tg C in the Northwest Territories and 65 ± 13 Tg C in Interior Alaska. The annual number ignitions in both regions displayed a significant increasing trend since 1975, driven by an increase in lightning ignitions. We found that vapor pressure deficit (VPD) in June, lightning, and ignition events were significantly correlated on interannual timescales. Future climate-driven increases in VPD and lightning near the treeline ecotone may enable northward forest expansion within tundra ecosystems.

  17. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

    PubMed Central

    Delcourt, Vivian; Lucier, Jean-François; Gagnon, Jules; Beaudoin, Maxime C; Vanderperre, Benoît; Breton, Marc-André; Motard, Julie; Jacques, Jean-François; Brunelle, Mylène; Gagnon-Arsenault, Isabelle; Fournier, Isabelle; Ouangraoua, Aida; Hunting, Darel J; Cohen, Alan A; Landry, Christian R; Scott, Michelle S

    2017-01-01

    Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins. PMID:29083303

  18. Automated brain tumour detection and segmentation using superpixel-based extremely randomized trees in FLAIR MRI.

    PubMed

    Soltaninejad, Mohammadreza; Yang, Guang; Lambrou, Tryphon; Allinson, Nigel; Jones, Timothy L; Barrick, Thomas R; Howe, Franklyn A; Ye, Xujiong

    2017-02-01

    We propose a fully automated method for detection and segmentation of the abnormal tissue associated with brain tumour (tumour core and oedema) from Fluid- Attenuated Inversion Recovery (FLAIR) Magnetic Resonance Imaging (MRI). The method is based on superpixel technique and classification of each superpixel. A number of novel image features including intensity-based, Gabor textons, fractal analysis and curvatures are calculated from each superpixel within the entire brain area in FLAIR MRI to ensure a robust classification. Extremely randomized trees (ERT) classifier is compared with support vector machine (SVM) to classify each superpixel into tumour and non-tumour. The proposed method is evaluated on two datasets: (1) Our own clinical dataset: 19 MRI FLAIR images of patients with gliomas of grade II to IV, and (2) BRATS 2012 dataset: 30 FLAIR images with 10 low-grade and 20 high-grade gliomas. The experimental results demonstrate the high detection and segmentation performance of the proposed method using ERT classifier. For our own cohort, the average detection sensitivity, balanced error rate and the Dice overlap measure for the segmented tumour against the ground truth are 89.48 %, 6 % and 0.91, respectively, while, for the BRATS dataset, the corresponding evaluation results are 88.09 %, 6 % and 0.88, respectively. This provides a close match to expert delineation across all grades of glioma, leading to a faster and more reproducible method of brain tumour detection and delineation to aid patient management.

  19. Epithelium-Stroma Classification via Convolutional Neural Networks and Unsupervised Domain Adaptation in Histopathological Images.

    PubMed

    Huang, Yue; Zheng, Han; Liu, Chi; Ding, Xinghao; Rohde, Gustavo K

    2017-11-01

    Epithelium-stroma classification is a necessary preprocessing step in histopathological image analysis. Current deep learning based recognition methods for histology data require collection of large volumes of labeled data in order to train a new neural network when there are changes to the image acquisition procedure. However, it is extremely expensive for pathologists to manually label sufficient volumes of data for each pathology study in a professional manner, which results in limitations in real-world applications. A very simple but effective deep learning method, that introduces the concept of unsupervised domain adaptation to a simple convolutional neural network (CNN), has been proposed in this paper. Inspired by transfer learning, our paper assumes that the training data and testing data follow different distributions, and there is an adaptation operation to more accurately estimate the kernels in CNN in feature extraction, in order to enhance performance by transferring knowledge from labeled data in source domain to unlabeled data in target domain. The model has been evaluated using three independent public epithelium-stroma datasets by cross-dataset validations. The experimental results demonstrate that for epithelium-stroma classification, the proposed framework outperforms the state-of-the-art deep neural network model, and it also achieves better performance than other existing deep domain adaptation methods. The proposed model can be considered to be a better option for real-world applications in histopathological image analysis, since there is no longer a requirement for large-scale labeled data in each specified domain.

  20. Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi

    PubMed Central

    Dupont, Pierre-Yves; Cox, Murray P.

    2017-01-01

    Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported. PMID:28235827

  1. Natural Hazards characterisation in industrial practice

    NASA Astrophysics Data System (ADS)

    Bernardara, Pietro

    2017-04-01

    The definition of rare hydroclimatic extremes (up to 10-4 annual probability of occurrence) is of the utmost importance for the design of high value industrial infrastructures, such as grids, power plants, offshore platforms. The underestimation as well as the overestimation of the risk may lead to huge costs (ex. mid-life expensive works or overdesign) which may even prevent the project to happen. Nevertheless, the uncertainty associated to the extrapolation towards the rare frequencies are huge and manifold. They are mainly due to the scarcity of observations, the lack of quality on the extreme value records and on the arbitrary choice of the models used for extrapolations. This often put the design engineers in uncomfortable situations when they must choose the design values to use. Providentially, the recent progresses in the earth observation techniques, information technology, historical data collection and weather and ocean modelling are making huge datasets available. A careful use of big datasets of observations and modelled data are leading towards a better understanding of the physics of the underlying phenomena, the complex interactions between them and thus of the extreme events frequency extrapolations. This will move the engineering practice from the single site, small sample, application of statistical analysis to a more spatially coherent, physically driven extrapolation of extreme values. Few examples, from the EDF industrial practice are given to illustrate these progresses and their potential impact on the design approaches.

  2. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  3. Recent Development on the NOAA's Global Surface Temperature Dataset

    NASA Astrophysics Data System (ADS)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  4. Unveiling the signals from extremely noisy microseismic data for high-resolution hydraulic fracturing monitoring.

    PubMed

    Huang, Weilin; Wang, Runqiu; Li, Huijian; Chen, Yangkang

    2017-09-20

    Microseismic method is an essential technique for monitoring the dynamic status of hydraulic fracturing during the development of unconventional reservoirs. However, one of the challenges in microseismic monitoring is that those seismic signals generated from micro seismicity have extremely low amplitude. We develop a methodology to unveil the signals that are smeared in the strong ambient noise and thus facilitate a more accurate arrival-time picking that will ultimately improve the localization accuracy. In the proposed technique, we decompose the recorded data into several morphological multi-scale components. In order to unveil weak signal, we propose an orthogonalization operator which acts as a time-varying weighting in the morphological reconstruction. The orthogonalization operator is obtained using an inversion process. This orthogonalized morphological reconstruction can be interpreted as a projection of the higher-dimensional vector. We first test the proposed technique using a synthetic dataset. Then the proposed technique is applied to a field dataset recorded in a project in China, in which the signals induced from hydraulic fracturing are recorded by twelve three-component (3-C) geophones in a monitoring well. The result demonstrates that the orthogonalized morphological reconstruction can make the extremely weak microseismic signals detectable.

  5. Extreme learning machines: a new approach for modeling dissolved oxygen (DO) concentration with and without water quality variables as predictors.

    PubMed

    Heddam, Salim; Kisi, Ozgur

    2017-07-01

    In this paper, several extreme learning machine (ELM) models, including standard extreme learning machine with sigmoid activation function (S-ELM), extreme learning machine with radial basis activation function (R-ELM), online sequential extreme learning machine (OS-ELM), and optimally pruned extreme learning machine (OP-ELM), are newly applied for predicting dissolved oxygen concentration with and without water quality variables as predictors. Firstly, using data from eight United States Geological Survey (USGS) stations located in different rivers basins, USA, the S-ELM, R-ELM, OS-ELM, and OP-ELM were compared against the measured dissolved oxygen (DO) using four water quality variables, water temperature, specific conductance, turbidity, and pH, as predictors. For each station, we used data measured at an hourly time step for a period of 4 years. The dataset was divided into a training set (70%) and a validation set (30%). We selected several combinations of the water quality variables as inputs for each ELM model and six different scenarios were compared. Secondly, an attempt was made to predict DO concentration without water quality variables. To achieve this goal, we used the year numbers, 2008, 2009, etc., month numbers from (1) to (12), day numbers from (1) to (31) and hour numbers from (00:00) to (24:00) as predictors. Thirdly, the best ELM models were trained using validation dataset and tested with the training dataset. The performances of the four ELM models were evaluated using four statistical indices: the coefficient of correlation (R), the Nash-Sutcliffe efficiency (NSE), the root mean squared error (RMSE), and the mean absolute error (MAE). Results obtained from the eight stations indicated that: (i) the best results were obtained by the S-ELM, R-ELM, OS-ELM, and OP-ELM models having four water quality variables as predictors; (ii) out of eight stations, the OP-ELM performed better than the other three ELM models at seven stations while the R-ELM performed the best at one station. The OS-ELM models performed the worst and provided the lowest accuracy; (iii) for predicting DO without water quality variables, the R-ELM performed the best at seven stations followed by the S-ELM in the second place and the OP-ELM performed the worst with low accuracy; (iv) for the final application where training ELM models with validation dataset and testing with training dataset, the OP-ELM provided the best accuracy using water quality variables and the R-ELM performed the best at all eight stations without water quality variables. Fourthly, and finally, we compared the results obtained from different ELM models with those obtained using multiple linear regression (MLR) and multilayer perceptron neural network (MLPNN). Results obtained using MLPNN and MLR models reveal that: (i) using water quality variables as predictors, the MLR performed the worst and provided the lowest accuracy in all stations; (ii) MLPNN was ranked in the second place at two stations, in the third place at four stations, and finally, in the fourth place at two stations, (iii) for predicting DO without water quality variables, MLPNN is ranked in the second place at five stations, and ranked in the third, fourth, and fifth places in the remaining three stations, while MLR was ranked in the last place with very low accuracy at all stations. Overall, the results suggest that the ELM is more effective than the MLPNN and MLR for modelling DO concentration in river ecosystems.

  6. A linear relationship between wave power and erosion determines salt-marsh resilience to violent storms and hurricanes

    PubMed Central

    Leonardi, Nicoletta; Ganju, Neil K.; Fagherazzi, Sergio

    2016-01-01

    Salt marsh losses have been documented worldwide because of land use change, wave erosion, and sea-level rise. It is still unclear how resistant salt marshes are to extreme storms and whether they can survive multiple events without collapsing. Based on a large dataset of salt marsh lateral erosion rates collected around the world, here, we determine the general response of salt marsh boundaries to wave action under normal and extreme weather conditions. As wave energy increases, salt marsh response to wind waves remains linear, and there is not a critical threshold in wave energy above which salt marsh erosion drastically accelerates. We apply our general formulation for salt marsh erosion to historical wave climates at eight salt marsh locations affected by hurricanes in the United States. Based on the analysis of two decades of data, we find that violent storms and hurricanes contribute less than 1% to long-term salt marsh erosion rates. In contrast, moderate storms with a return period of 2.5 mo are those causing the most salt marsh deterioration. Therefore, salt marshes seem more susceptible to variations in mean wave energy rather than changes in the extremes. The intrinsic resistance of salt marshes to violent storms and their predictable erosion rates during moderate events should be taken into account by coastal managers in restoration projects and risk management plans. PMID:26699461

  7. A linear relationship between wave power and erosion determines salt-marsh resilience to violent storms and hurricanes

    USGS Publications Warehouse

    Leonardi, Nicoletta; Ganju, Neil K.; Fagherazzi, Sergio

    2016-01-01

    Salt marsh losses have been documented worldwide because of land use change, wave erosion, and sea-level rise. It is still unclear how resistant salt marshes are to extreme storms and whether they can survive multiple events without collapsing. Based on a large dataset of salt marsh lateral erosion rates collected around the world, here, we determine the general response of salt marsh boundaries to wave action under normal and extreme weather conditions. As wave energy increases, salt marsh response to wind waves remains linear, and there is not a critical threshold in wave energy above which salt marsh erosion drastically accelerates. We apply our general formulation for salt marsh erosion to historical wave climates at eight salt marsh locations affected by hurricanes in the United States. Based on the analysis of two decades of data, we find that violent storms and hurricanes contribute less than 1% to long-term salt marsh erosion rates. In contrast, moderate storms with a return period of 2.5 mo are those causing the most salt marsh deterioration. Therefore, salt marshes seem more susceptible to variations in mean wave energy rather than changes in the extremes. The intrinsic resistance of salt marshes to violent storms and their predictable erosion rates during moderate events should be taken into account by coastal managers in restoration projects and risk management plans.

  8. A linear relationship between wave power and erosion determines salt-marsh resilience to violent storms and hurricanes.

    PubMed

    Leonardi, Nicoletta; Ganju, Neil K; Fagherazzi, Sergio

    2016-01-05

    Salt marsh losses have been documented worldwide because of land use change, wave erosion, and sea-level rise. It is still unclear how resistant salt marshes are to extreme storms and whether they can survive multiple events without collapsing. Based on a large dataset of salt marsh lateral erosion rates collected around the world, here, we determine the general response of salt marsh boundaries to wave action under normal and extreme weather conditions. As wave energy increases, salt marsh response to wind waves remains linear, and there is not a critical threshold in wave energy above which salt marsh erosion drastically accelerates. We apply our general formulation for salt marsh erosion to historical wave climates at eight salt marsh locations affected by hurricanes in the United States. Based on the analysis of two decades of data, we find that violent storms and hurricanes contribute less than 1% to long-term salt marsh erosion rates. In contrast, moderate storms with a return period of 2.5 mo are those causing the most salt marsh deterioration. Therefore, salt marshes seem more susceptible to variations in mean wave energy rather than changes in the extremes. The intrinsic resistance of salt marshes to violent storms and their predictable erosion rates during moderate events should be taken into account by coastal managers in restoration projects and risk management plans.

  9. The observed clustering of damaging extra-tropical cyclones in Europe

    NASA Astrophysics Data System (ADS)

    Cusack, S.

    2015-12-01

    The clustering of severe European windstorms on annual timescales has substantial impacts on the re/insurance industry. Management of the risk is impaired by large uncertainties in estimates of clustering from historical storm datasets typically covering the past few decades. The uncertainties are unusually large because clustering depends on the variance of storm counts. Eight storm datasets are gathered for analysis in this study in order to reduce these uncertainties. Six of the datasets contain more than 100~years of severe storm information to reduce sampling errors, and the diversity of information sources and analysis methods between datasets sample observational errors. All storm severity measures used in this study reflect damage, to suit re/insurance applications. It is found that the shortest storm dataset of 42 years in length provides estimates of clustering with very large sampling and observational errors. The dataset does provide some useful information: indications of stronger clustering for more severe storms, particularly for southern countries off the main storm track. However, substantially different results are produced by removal of one stormy season, 1989/1990, which illustrates the large uncertainties from a 42-year dataset. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm datasets show a greater degree of clustering with increasing storm severity and suggest clustering of severe storms is much more material than weaker storms. Further, they contain signs of stronger clustering in areas off the main storm track, and weaker clustering for smaller-sized areas, though these signals are smaller than uncertainties in actual values. Both the improvement of existing storm records and development of new historical storm datasets would help to improve management of this risk.

  10. Impacts of spatial resolution and representation of flow connectivity on large-scale simulation of floods

    NASA Astrophysics Data System (ADS)

    Mateo, Cherry May R.; Yamazaki, Dai; Kim, Hyungjun; Champathong, Adisorn; Vaze, Jai; Oki, Taikan

    2017-10-01

    Global-scale river models (GRMs) are core tools for providing consistent estimates of global flood hazard, especially in data-scarce regions. Due to former limitations in computational power and input datasets, most GRMs have been developed to use simplified representations of flow physics and run at coarse spatial resolutions. With increasing computational power and improved datasets, the application of GRMs to finer resolutions is becoming a reality. To support development in this direction, the suitability of GRMs for application to finer resolutions needs to be assessed. This study investigates the impacts of spatial resolution and flow connectivity representation on the predictive capability of a GRM, CaMa-Flood, in simulating the 2011 extreme flood in Thailand. Analyses show that when single downstream connectivity (SDC) is assumed, simulation results deteriorate with finer spatial resolution; Nash-Sutcliffe efficiency coefficients decreased by more than 50 % between simulation results at 10 km resolution and 1 km resolution. When multiple downstream connectivity (MDC) is represented, simulation results slightly improve with finer spatial resolution. The SDC simulations result in excessive backflows on very flat floodplains due to the restrictive flow directions at finer resolutions. MDC channels attenuated these effects by maintaining flow connectivity and flow capacity between floodplains in varying spatial resolutions. While a regional-scale flood was chosen as a test case, these findings should be universal and may have significant impacts on large- to global-scale simulations, especially in regions where mega deltas exist.These results demonstrate that a GRM can be used for higher resolution simulations of large-scale floods, provided that MDC in rivers and floodplains is adequately represented in the model structure.

  11. Revealing the selection history of adaptive loci using genome-wide scans for selection: an example from domestic sheep.

    PubMed

    Rochus, Christina Marie; Tortereau, Flavie; Plisson-Petit, Florence; Restoux, Gwendal; Moreno-Romieux, Carole; Tosser-Klopp, Gwenola; Servin, Bertrand

    2018-01-23

    One of the approaches to detect genetics variants affecting fitness traits is to identify their surrounding genomic signatures of past selection. With established methods for detecting selection signatures and the current and future availability of large datasets, such studies should have the power to not only detect these signatures but also to infer their selective histories. Domesticated animals offer a powerful model for these approaches as they adapted rapidly to environmental and human-mediated constraints in a relatively short time. We investigated this question by studying a large dataset of 542 individuals from 27 domestic sheep populations raised in France, genotyped for more than 500,000 SNPs. Population structure analysis revealed that this set of populations harbour a large part of European sheep diversity in a small geographical area, offering a powerful model for the study of adaptation. Identification of extreme SNP and haplotype frequency differences between populations listed 126 genomic regions likely affected by selection. These signatures revealed selection at loci commonly identified as selection targets in many species ("selection hotspots") including ABCG2, LCORL/NCAPG, MSTN, and coat colour genes such as ASIP, MC1R, MITF, and TYRP1. For one of these regions (ABCG2, LCORL/NCAPG), we could propose a historical scenario leading to the introgression of an adaptive allele into a new genetic background. Among selection signatures, we found clear evidence for parallel selection events in different genetic backgrounds, most likely for different mutations. We confirmed this allelic heterogeneity in one case by resequencing the MC1R gene in three black-faced breeds. Our study illustrates how dense genetic data in multiple populations allows the deciphering of evolutionary history of populations and of their adaptive mutations.

  12. Modeling Interdependent and Periodic Real-World Action Sequences

    PubMed Central

    Kurashima, Takeshi; Althoff, Tim; Leskovec, Jure

    2018-01-01

    Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions in the real world is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model, called TIPAS, for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million real-world actions (e.g., eating, sleep, and exercise) taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, TIPAS improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions. PMID:29780977

  13. Ongoing climatic extreme dynamics in Siberia

    NASA Astrophysics Data System (ADS)

    Gordov, E. P.; Shulgina, T. M.; Okladnikov, I. G.; Titov, A. G.

    2013-12-01

    Ongoing global climate changes accompanied by the restructuring of global processes in the atmosphere and biosphere are strongly pronounced in the Northern Eurasia regions, especially in Siberia. Recent investigations indicate not only large changes in averaged climatic characteristics (Kabanov and Lykosov, 2006, IPCC, 2007; Groisman and Gutman, 2012), but more frequent occurrence and stronger impacts of climatic extremes are reported as well (Bulygina et al., 2007; IPCC, 2012: Climate Extremes, 2012; Oldenborh et al., 2013). This paper provides the results of daily temperature and precipitation extreme dynamics in Siberia for the last three decades (1979 - 2012). Their seasonal dynamics is assessed using 10th and 90th percentile-based threshold indices that characterize frequency, intensity and duration of climatic extremes. To obtain the geographical pattern of these variations with high spatial resolution, the sub-daily temperature data from ECMWF ERA-Interim reanalysis and daily precipitation amounts from APHRODITE JMA dataset were used. All extreme indices and linear trend coefficients have been calculated using web-GIS information-computational platform Climate (http://climate.scert.ru/) developed to support collaborative multidisciplinary investigations of regional climatic changes and their impacts (Gordov et al., 2012). Obtained results show that seasonal dynamics of daily temperature extremes is asymmetric for tails of cold and warm temperature extreme distributions. Namely, the intensity of warming during cold nights is higher than during warm nights, especially at high latitudes of Siberia. The similar dynamics is observed for cold and warm day-time temperatures. Slight summer cooling was observed in the central part of Siberia. It is associated with decrease in warm temperature extremes. In the southern Siberia in winter, we also observe some cooling mostly due to strengthening of the cold temperature extremes. Changes in daily precipitation extremes are spatially inhomogeneous. The largest increase in frequency and intensity of heavy precipitation is observed in the north of East Siberia. Negative trends related to precipitation amount decrease are found in the central West Siberia and in the south of East Siberia. The authors acknowledge partial financial support for this research from the Russian Foundation for Basic Research projects (11-05-01190 and 13-05-12034), SB RAS Integration project 131 and project VIII.80.2.1., the Ministry of Education and Science of the Russian Federation contract 8345 and grant of the President of Russian Federation (decree 181).

  14. A peek into the future of radiology using big data applications

    PubMed Central

    Kharat, Amit T.; Singhal, Shubham

    2017-01-01

    Big data is extremely large amount of data which is available in the radiology department. Big data is identified by four Vs – Volume, Velocity, Variety, and Veracity. By applying different algorithmic tools and converting raw data to transformed data in such large datasets, there is a possibility of understanding and using radiology data for gaining new knowledge and insights. Big data analytics consists of 6Cs – Connection, Cloud, Cyber, Content, Community, and Customization. The global technological prowess and per-capita capacity to save digital information has roughly doubled every 40 months since the 1980's. By using big data, the planning and implementation of radiological procedures in radiology departments can be given a great boost. Potential applications of big data in the future are scheduling of scans, creating patient-specific personalized scanning protocols, radiologist decision support, emergency reporting, virtual quality assurance for the radiologist, etc. Targeted use of big data applications can be done for images by supporting the analytic process. Screening software tools designed on big data can be used to highlight a region of interest, such as subtle changes in parenchymal density, solitary pulmonary nodule, or focal hepatic lesions, by plotting its multidimensional anatomy. Following this, we can run more complex applications such as three-dimensional multi planar reconstructions (MPR), volumetric rendering (VR), and curved planar reconstruction, which consume higher system resources on targeted data subsets rather than querying the complete cross-sectional imaging dataset. This pre-emptive selection of dataset can substantially reduce the system requirements such as system memory, server load and provide prompt results. However, a word of caution, “big data should not become “dump data” due to inadequate and poor analysis and non-structured improperly stored data. In the near future, big data can ring in the era of personalized and individualized healthcare. PMID:28744087

  15. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    NASA Astrophysics Data System (ADS)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be distributed. The most important updates in GLEAM include the revision of the soil moisture data assimilation system, the evaporative stress functions and the infiltration of rainfall. In this presentation, we will highlight the changes of the methodology and present the new datasets, their validation against in situ observations and the comparisons against alternative datasets of terrestrial evaporation, such as GLDAS-Noah, ERA-Interim and previous GLEAM datasets. Preliminary results indicate that the magnitude and the spatio-temporal variability of the evaporation estimates have been slightly improved upon previous versions of the datasets.

  16. Creating a comprehensive quality-controlled dataset of severe weather occurrence in Europe

    NASA Astrophysics Data System (ADS)

    Groenemeijer, P.; Kühne, T.; Liang, Z.; Holzer, A.; Feuerstein, B.; Dotzek, N.

    2010-09-01

    Ground-truth quality-controlled data on severe weather occurrence is required for meaningful research on severe weather hazards. Such data are collected by observation networks of several authorities in Europe, most prominently the National Hydrometeorological Institutes (NHMS). However, some events challenge the capabilities of such conventional networks by their isolated and short-lived nature. These rare and very localized but extreme events include thunderstorm wind gusts, large hail and tornadoes and are poorly resolved by synoptic observations. Moreover, their detection by remote-sensing techniques such as radar and satellites is in development and has proven to be difficult. Using the fact that all across across Europe there are many people with a special personal or professional interest in such events, who are typically organized in associations, allows pursuing a different strategy. Data delivered to the European Severe Weather Database is recorded and quality controlled by ESSL and a large number of partners including the Hydrometeorological Institutes of Germany, Finland, Austria, Italy and Bulgaria. Additionally, nine associations of storm spotters and centres of expertise in these and other countries are involved. The two categories of organizations (NHMSes/other) each have different privileges in the quality control procedure, which involves assigning a quality level of QC0+ (plausibility checked), QC1 (confirmed by reliable sources) or QC2 (verified) to each of the reports. Within the EWENT project funded by the EU 7th framework programme, the RegioExakt project funded by the German Ministry of Education and Research, and with support from the German Weather Service (DWD), several enhancements of the ESWD have been and will be carried out. Completed enhancements include the creation of an interface that allows partner organizations to upload data automatically, in the case of our German partner "Skywarn Germany" in near-real time. Moreover, the database's web-interface has been translated into 14 European languages. At the time of writing, a nowcast-mode to the web interface, which renders the ESWD a convenient tool for meteorologists in forecast centres, is being developed. In the near future, within the EWENT project, an extension of the data set with several other isolated but extreme events including avalanches, landslides, heavy snowfall and extremely powerful lightning flashes, is foreseen. The resulting ESWD dataset, that grows at a rate of 4000-5000 events per year, is used for wide range of purposes including the validation of remote-sensing techniques, forecast verification studies, projections of the future severe storm climate, and risk assessments. Its users include scientists working for EUMETSAT, NASA, NSSL, DLR, and several reinsurance companies.

  17. The Extreme Climate Index: a novel and multi-hazard index for extreme weather events.

    NASA Astrophysics Data System (ADS)

    Cucchi, Marco; Petitta, Marcello; Calmanti, Sandro

    2017-04-01

    In this presentation we introduce the Extreme Climate Index (ECI): an objective, multi-hazard index capable of tracking changes in the frequency or magnitude of extreme weather events in African countries, thus indicating that a shift to a new climate regime is underway in a particular area. This index has been developed in the context of XCF (eXtreme Climate Facilities) project lead by ARC (African Risk Capacity, specialised agency of the African Union), and will be used in the payouts triggering mechanism of an insurance programme against risks related to the increase of frequency and magnitude of extreme weather events due to climate regimes' changes. The main hazards covered by ECI will be extreme dry, wet and heat events, with the possibility of adding region-specific risk events such as tropical cyclones for the most vulnerable areas. It will be based on data coming from consistent, sufficiently long, high quality historical records and will be standardized across broad geographical regions, so that extreme events occurring under different climatic regimes in Africa can be comparable. The first step to construct such an index is to define single hazard indicators. In this first study we focused on extreme dry/wet and heat events, using for their description respectively the well-known SPI (Standardized Precipitation Index) and an index developed by us, called SHI (Standardized Heat-waves Index). The second step consists in the development of a computational strategy to combine these, and possibly other indices, so that the ECI can describe, by means of a single indicator, different types of climatic extremes. According to the methodology proposed in this paper, the ECI is defined by two statistical components: the ECI intensity, which indicates whether an event is extreme or not; the angular component, which represent the contribution of each hazard to the overall intensity of the index. The ECI can thus be used to identify "extremes" after defining a suitable threshold above which the events can be held as extremes. In this presentation, after describing the methodology we used for the construction of the ECI, we present results obtained on different African regions, using NCEP Reanalysis dataset for air temperature at sig995 level and CHIRP dataset for precipitations. Particular attention will be devoted to 2015/2016 Malawi drought, which received some media attention due to the failure of the risk assessment model used to trigger due payouts: it will be shown how, on the contrary, combination of hydrological and temperature data used in ECI succeed in evaluating the extremeness of this event.

  18. Object recognition using deep convolutional neural networks with complete transfer and partial frozen layers

    NASA Astrophysics Data System (ADS)

    Kruithof, Maarten C.; Bouma, Henri; Fischer, Noëlle M.; Schutte, Klamer

    2016-10-01

    Object recognition is important to understand the content of video and allow flexible querying in a large number of cameras, especially for security applications. Recent benchmarks show that deep convolutional neural networks are excellent approaches for object recognition. This paper describes an approach of domain transfer, where features learned from a large annotated dataset are transferred to a target domain where less annotated examples are available as is typical for the security and defense domain. Many of these networks trained on natural images appear to learn features similar to Gabor filters and color blobs in the first layer. These first-layer features appear to be generic for many datasets and tasks while the last layer is specific. In this paper, we study the effect of copying all layers and fine-tuning a variable number. We performed an experiment with a Caffe-based network on 1000 ImageNet classes that are randomly divided in two equal subgroups for the transfer from one to the other. We copy all layers and vary the number of layers that is fine-tuned and the size of the target dataset. We performed additional experiments with the Keras platform on CIFAR-10 dataset to validate general applicability. We show with both platforms and both datasets that the accuracy on the target dataset improves when more target data is used. When the target dataset is large, it is beneficial to freeze only a few layers. For a large target dataset, the network without transfer learning performs better than the transfer network, especially if many layers are frozen. When the target dataset is small, it is beneficial to transfer (and freeze) many layers. For a small target dataset, the transfer network boosts generalization and it performs much better than the network without transfer learning. Learning time can be reduced by freezing many layers in a network.

  19. The experience of linking Victorian emergency medical service trauma data

    PubMed Central

    Boyle, Malcolm J

    2008-01-01

    Background The linking of a large Emergency Medical Service (EMS) dataset with the Victorian Department of Human Services (DHS) hospital datasets and Victorian State Trauma Outcome Registry and Monitoring (VSTORM) dataset to determine patient outcomes has not previously been undertaken in Victoria. The objective of this study was to identify the linkage rate of a large EMS trauma dataset with the Department of Human Services hospital datasets and VSTORM dataset. Methods The linking of an EMS trauma dataset to the hospital datasets utilised deterministic and probabilistic matching. The linking of three EMS trauma datasets to the VSTORM dataset utilised deterministic, probabilistic and manual matching. Results There were 66.7% of patients from the EMS dataset located in the VEMD. There were 96% of patients located in the VAED who were defined in the VEMD as being admitted to hospital. 3.7% of patients located in the VAED could not be found in the VEMD due to hospitals not reporting to the VEMD. For the EMS datasets, there was a 146% increase in successful links with the trauma profile dataset, a 221% increase in successful links with the mechanism of injury only dataset, and a 46% increase with sudden deterioration dataset, to VSTORM when using manual compared to deterministic matching. Conclusion This study has demonstrated that EMS data can be successfully linked to other health related datasets using deterministic and probabilistic matching with varying levels of success. The quality of EMS data needs to be improved to ensure better linkage success rates with other health related datasets. PMID:19014622

  20. Dryland ecosystem responses to precipitation extremes and wildfire at a long-term rainfall manipulation experiment

    NASA Astrophysics Data System (ADS)

    Brown, R. F.; Collins, S. L.

    2017-12-01

    Climate is becoming increasingly more variable due to global environmental change, which is evidenced by fewer, but more extreme precipitation events, changes in precipitation seasonality, and longer, higher severity droughts. These changes, combined with a rising incidence of wildfire, have the potential to strongly impact net primary production (NPP) and key biogeochemical cycles, particularly in dryland ecosystems where NPP is sequentially limited by water and nutrient availability. Here we utilize a ten-year dataset from an ongoing long-term field experiment established in 2007 in which we experimentally altered monsoon rainfall variability to examine how our manipulations, along with naturally occurring events, affect NPP and associated biogeochemical cycles in a semi-arid grassland in central New Mexico, USA. Using long-term regional averages, we identified extremely wet monsoon years (242.8 mm, 2013), and extremely dry monsoon years (86.0 mm, 2011; 80.0 mm, 2015) and water years (117.0 mm, 2011). We examined how changes in precipitation variability and extreme events affected ecosystem processes and function particularly in the context of ecosystem recovery following a 2009 wildfire. Response variables included above- and below-ground plant biomass (ANPP & BNPP) and abundance, soil nitrogen availability, and soil CO2 efflux. Mean ANPP ranged from 3.6 g m-2 in 2011 to 254.5 g m-2 in 2013, while BNPP ranged from 23.5 g m-2 in 2015 to 194.2 g m-2 in 2013, demonstrating NPP in our semi-arid grassland is directly linked to extremes in both seasonal and annual precipitation. We also show increased nitrogen deposition positively affects NPP in unburned grassland, but has no significant impact on NPP post-fire except during extremely wet monsoon years. While soil respiration rates reflect lower ANPP post-fire, patterns in CO2 efflux have not been shown to change significantly in that efflux is greatest following large precipitation events preceded by longer drying periods. Current land surface models poorly represent dryland ecosystems, which frequently undergo extreme weather events. Our long-term experiment provides key insights into ecosystem processes and function, thereby providing capacity for model improvement particularly in the context of future environmental change.

  1. An Evaluation of Teleconnections Over the United States in an Ensemble of AMIP Simulations with the MERRA-2 Configuration of the GEOS Atmospheric Model

    NASA Technical Reports Server (NTRS)

    Collow, Allison B. Marquardt; Mahanama, Sarith P.; Bosilovich, Michael G.; Koster, Randal D.; Schubert, Siegfried D.

    2017-01-01

    The atmospheric general circulation model that is used in NASA's Modern Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2) is evaluated with respect to the relationship between large-scale teleconnection patterns and daily temperature and precipitation over the United States (US) using a ten-member ensemble of simulations, referred to as M2AMIP. A focus is placed on four teleconnection patterns that are known to influence weather and climate in the US: El Nino Southern Oscillation, the Pacific Decadal Oscillation, the North Atlantic Oscillation, and the Pacific-North American Pattern. The monthly and seasonal indices associated with the patterns are correlated with daily temperature and precipitation statistics including: (i) monthly mean 2 m temperature and precipitation, (ii) the frequency of extreme temperature events at the 90th, 95th, and 99th percentiles, and (iii) the frequency and intensity of extreme precipitation events classified at the 90th, 95th, and 99th percentiles.Correlations obtained with M2AMIP data and thus the strength of teleconnections in the free-running model are evaluated through comparison against corresponding correlations computed from observations and from MERRA-2. Overall, the strongest teleconnections in all datasets occur during the winter and coincide with the largest agreement between the observations, MERRA-2, and M2AMIP. When M2AMIP does capture the correlation seen in observations, there is a tendency for the spatial extent to be exaggerated. The weakest agreement between the data sources, for all teleconnection patterns, is in the correlation with extreme precipitation; however there are discrepancies between the datasets in the number of days with at least 1 mm of precipitation: M2AMIP has too few days with precipitation in the Northwest and the Northern Great Plains and too many days in the Northeast. In JJA, M2AMIP has too few days with precipitation in the western two-thirds of the country and too many days with precipitation along the east coast.

  2. Wide-Open: Accelerating public data release by automating detection of overdue datasets

    PubMed Central

    Poon, Hoifung; Howe, Bill

    2017-01-01

    Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819

  3. Wide-Open: Accelerating public data release by automating detection of overdue datasets.

    PubMed

    Grechkin, Maxim; Poon, Hoifung; Howe, Bill

    2017-06-01

    Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

  4. Climate Forcing Datasets for Agricultural Modeling: Merged Products for Gap-Filling and Historical Climate Series Estimation

    NASA Technical Reports Server (NTRS)

    Ruane, Alex C.; Goldberg, Richard; Chryssanthacopoulos, James

    2014-01-01

    The AgMERRA and AgCFSR climate forcing datasets provide daily, high-resolution, continuous, meteorological series over the 1980-2010 period designed for applications examining the agricultural impacts of climate variability and climate change. These datasets combine daily resolution data from retrospective analyses (the Modern-Era Retrospective Analysis for Research and Applications, MERRA, and the Climate Forecast System Reanalysis, CFSR) with in situ and remotely-sensed observational datasets for temperature, precipitation, and solar radiation, leading to substantial reductions in bias in comparison to a network of 2324 agricultural-region stations from the Hadley Integrated Surface Dataset (HadISD). Results compare favorably against the original reanalyses as well as the leading climate forcing datasets (Princeton, WFD, WFD-EI, and GRASP), and AgMERRA distinguishes itself with substantially improved representation of daily precipitation distributions and extreme events owing to its use of the MERRA-Land dataset. These datasets also peg relative humidity to the maximum temperature time of day, allowing for more accurate representation of the diurnal cycle of near-surface moisture in agricultural models. AgMERRA and AgCFSR enable a number of ongoing investigations in the Agricultural Model Intercomparison and Improvement Project (AgMIP) and related research networks, and may be used to fill gaps in historical observations as well as a basis for the generation of future climate scenarios.

  5. Preliminary Climate Uncertainty Quantification Study on Model-Observation Test Beds at Earth Systems Grid Federation Repository

    NASA Astrophysics Data System (ADS)

    Lin, G.; Stephan, E.; Elsethagen, T.; Meng, D.; Riihimaki, L. D.; McFarlane, S. A.

    2012-12-01

    Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in applications. It determines how likely certain outcomes are if some aspects of the system are not exactly known. UQ studies such as the atmosphere datasets greatly increased in size and complexity because they now comprise of additional complex iterative steps, involve numerous simulation runs and can consist of additional analytical products such as charts, reports, and visualizations to explain levels of uncertainty. These new requirements greatly expand the need for metadata support beyond the NetCDF convention and vocabulary and as a result an additional formal data provenance ontology is required to provide a historical explanation of the origin of the dataset that include references between the explanations and components within the dataset. This work shares a climate observation data UQ science use case and illustrates how to reduce climate observation data uncertainty and use a linked science application called Provenance Environment (ProvEn) to enable and facilitate scientific teams to publish, share, link, and discover knowledge about the UQ research results. UQ results include terascale datasets that are published to an Earth Systems Grid Federation (ESGF) repository. Uncertainty exists in observation data sets, which is due to sensor data process (such as time averaging), sensor failure in extreme weather conditions, and sensor manufacture error etc. To reduce the uncertainty in the observation data sets, a method based on Principal Component Analysis (PCA) was proposed to recover the missing values in observation data. Several large principal components (PCs) of data with missing values are computed based on available values using an iterative method. The computed PCs can approximate the true PCs with high accuracy given a condition of missing values is met; the iterative method greatly improve the computational efficiency in computing PCs. Moreover, noise removal is done at the same time during the process of computing missing values by using only several large PCs. The uncertainty quantification is done through statistical analysis of the distribution of different PCs. To record above UQ process, and provide an explanation on the uncertainty before and after the UQ process on the observation data sets, additional data provenance ontology, such as ProvEn, is necessary. In this study, we demonstrate how to reduce observation data uncertainty on climate model-observation test beds and using ProvEn to record the UQ process on ESGF. ProvEn demonstrates how a scientific team conducting UQ studies can discover dataset links using its domain knowledgebase, allowing them to better understand and convey the UQ study research objectives, the experimental protocol used, the resulting dataset lineage, related analytical findings, ancillary literature citations, along with the social network of scientists associated with the study. Climate scientists will not only benefit from understanding a particular dataset within a knowledge context, but also benefit from the cross reference of knowledge among the numerous UQ studies being stored in ESGF.

  6. Handling a Small Dataset Problem in Prediction Model by employ Artificial Data Generation Approach: A Review

    NASA Astrophysics Data System (ADS)

    Lateh, Masitah Abdul; Kamilah Muda, Azah; Yusof, Zeratul Izzah Mohd; Azilah Muda, Noor; Sanusi Azmi, Mohd

    2017-09-01

    The emerging era of big data for past few years has led to large and complex data which needed faster and better decision making. However, the small dataset problems still arise in a certain area which causes analysis and decision are hard to make. In order to build a prediction model, a large sample is required as a training sample of the model. Small dataset is insufficient to produce an accurate prediction model. This paper will review an artificial data generation approach as one of the solution to solve the small dataset problem.

  7. Spatial interpolation quality assessments for soil sensor transect datasets

    USDA-ARS?s Scientific Manuscript database

    Near-ground geophysical soil sensors provide extremely valuable information for precision agriculture applications. Indeed, their readings can be used as proxy for many soil parameters. Typically, leave-one-out (loo) cross-validation (CV) of spatial interpolation of sensor data returns overly optimi...

  8. Changing monsoon and midlatitude circulation interactions over the Western Himalayas and possible links to occurrences of extreme precipitation

    NASA Astrophysics Data System (ADS)

    Priya, P.; Krishnan, R.; Mujumdar, Milind; Houze, Robert A.

    2017-10-01

    Historical rainfall records reveal that the frequency and intensity of extreme precipitation events, during the summer monsoon (June-September) season, have significantly risen over the Western Himalayas (WH) and adjoining upper Indus basin since 1950s. Using multiple datasets, the present study investigates the possible coincidences between an increasing trend of precipitation extremes over WH and changes in background flow climatology. The present findings suggest that the combined effects of a weakened southwest monsoon circulation, increased activity of transient upper-air westerly troughs over the WH region, enhanced moisture supply by southerly winds from the Arabian Sea into the Indus basin have likely provided favorable conditions for an increased frequency of certain types of extreme precipitation events over the WH region in recent decades.

  9. The Spatial Coherence of Interannual Temperature Variations in the Antarctic Peninsula

    NASA Technical Reports Server (NTRS)

    King, John C.; Comiso, Josefino C.; Koblinsky, Chester J. (Technical Monitor)

    2002-01-01

    Over 50 years of observations from climate stations on the west coast of the Antarctic Peninsula show that this is a region of extreme interannual variability in near-surface temperatures. The region has also experienced more rapid warming than any other part of the Southern Hemisphere. In this paper we use a new dataset of satellite-derived surface temperatures to define the extent of the region of extreme variability more clearly than was possible using the sparse station data. The region in which satellite surface temperatures correlate strongly with west Peninsula station temperatures is found to be quite small and is largely confined to the seas just west of the Peninsula, with a northward and eastward extension into the Scotia Sea and a southward extension onto the western slopes of Palmer Land. Correlation of Peninsula surface temperatures with surface temperatures over the rest of continental Antarctica is poor confirming that the west Peninsula is in a different climate regime. The analysis has been used to identify sites where ice core proxy records might be representative of variations on the west coast of the Peninsula. Of the five existing core sites examined, only one is likely to provide a representative record for the west coast.

  10. A Climate Information Platform for Copernicus (CLIPC): managing the data flood

    NASA Astrophysics Data System (ADS)

    Juckes, Martin; Swart, Rob; Bärring, Lars; Groot, Annemarie; Thysse, Peter; Som de Cerff, Wim; Costa, Luis; Lückenkötter, Johannes; Callaghan, Sarah; Bennett, Victoria

    2016-04-01

    The FP7 project "Climate Information Platform for Copernicus" (CLIPC) is developing a demonstration portal for the Copernicus Climate Change Service (C3S). The project confronts many problems associated with the huge diversity of underlying data, complex multi-layered uncertainties and extremely complex and evolving user requirements. The infrastructure is founded on a comprehensive approach to managing data and documentation, using global domain independent standards where possible. An extensive thesaurus of terms provides both a robust and flexible foundation for data discovery services and accessible definitions to support users. It is, of course, essential to provide information to users through an interface which reflects their expectations rather than the intricacies of abstract data models. CLIPC has reviewed user engagement activities from other collaborative European projects, conducted user polls, interviews and meetings and is now entering an evaluation phase in which users discuss new features and options in the portal design. The CLIPC portal will provide access to raw climate science data and climate impact indicators derived from that data. The portal needs the flexibility to support access to extremely large datasets as well as providing means to manipulate data and explore complex products interactively.

  11. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  12. InSAR constraints on soil moisture evolution after the March 2015 extreme precipitation event in Chile.

    PubMed

    Scott, C P; Lohman, R B; Jordan, T E

    2017-07-07

    Constraints on soil moisture can guide agricultural practices, act as input into weather, flooding and climate models and inform water resource policies. Space-based interferometric synthetic aperture radar (InSAR) observations provide near-global coverage, even in the presence of clouds, of proxies for soil moisture derived from the amplitude and phase content of radar imagery. We describe results from a 1.5 year-long InSAR time series spanning the March, 2015 extreme precipitation event in the hyperarid Atacama desert of Chile, constraining the immediate increase in soil moisture and drying out over the following months, as well as the response to a later, smaller precipitation event. The inferred temporal evolution of soil moisture is remarkably consistent between independent, overlapping SAR tracks covering a region ~100 km in extent. The unusually large rain event, combined with the extensive spatial and temporal coverage of the SAR dataset, present an unprecedented opportunity to image the time-evolution of soil characteristics over different surface types. Constraints on the timescale of shallow water storage after precipitation events are increasingly valuable as global water resources continue to be stretched to their limits and communities continue to develop in flood-prone areas.

  13. A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification

    PubMed Central

    Liu, Fuxian

    2018-01-01

    One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references. PMID:29581722

  14. A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification.

    PubMed

    Yu, Yunlong; Liu, Fuxian

    2018-01-01

    One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references.

  15. Evaluation of Statistical Downscaling Skill at Reproducing Extreme Events

    NASA Astrophysics Data System (ADS)

    McGinnis, S. A.; Tye, M. R.; Nychka, D. W.; Mearns, L. O.

    2015-12-01

    Climate model outputs usually have much coarser spatial resolution than is needed by impacts models. Although higher resolution can be achieved using regional climate models for dynamical downscaling, further downscaling is often required. The final resolution gap is often closed with a combination of spatial interpolation and bias correction, which constitutes a form of statistical downscaling. We use this technique to downscale regional climate model data and evaluate its skill in reproducing extreme events. We downscale output from the North American Regional Climate Change Assessment Program (NARCCAP) dataset from its native 50-km spatial resolution to the 4-km resolution of University of Idaho's METDATA gridded surface meterological dataset, which derives from the PRISM and NLDAS-2 observational datasets. We operate on the major variables used in impacts analysis at a daily timescale: daily minimum and maximum temperature, precipitation, humidity, pressure, solar radiation, and winds. To interpolate the data, we use the patch recovery method from the Earth System Modeling Framework (ESMF) regridding package. We then bias correct the data using Kernel Density Distribution Mapping (KDDM), which has been shown to exhibit superior overall performance across multiple metrics. Finally, we evaluate the skill of this technique in reproducing extreme events by comparing raw and downscaled output with meterological station data in different bioclimatic regions according to the the skill scores defined by Perkins et al in 2013 for evaluation of AR4 climate models. We also investigate techniques for improving bias correction of values in the tails of the distributions. These techniques include binned kernel density estimation, logspline kernel density estimation, and transfer functions constructed by fitting the tails with a generalized pareto distribution.

  16. geoknife: Reproducible web-processing of large gridded datasets

    USGS Publications Warehouse

    Read, Jordan S.; Walker, Jordan I.; Appling, Alison P.; Blodgett, David L.; Read, Emily K.; Winslow, Luke A.

    2016-01-01

    Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large-scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially-averaged time series values filtered by user-specified areas of interest, and categorical coverage fractions for various land-use types.

  17. A high-resolution European dataset for hydrologic modeling

    NASA Astrophysics Data System (ADS)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as inputs to the hydrological calibration and validation of EFAS as well as for establishing long-term discharge "proxy" climatologies which can then in turn be used for statistical analysis to derive return periods or other time series derivatives. In addition, this dataset will be used to assess climatological trends in Europe. Unfortunately, to date no baseline dataset at the European scale exists to test the quality of the herein presented data. Hence, a comparison against other existing datasets can therefore only be an indication of data quality. Due to availability, a comparison was made for precipitation and temperature only, arguably the most important meteorological drivers for hydrologic models. A variety of analyses was undertaken at country scale against data reported to EUROSTAT and E-OBS datasets. The comparison revealed that while the datasets showed overall similar temporal and spatial patterns, there were some differences in magnitudes especially for precipitation. It is not straightforward to define the specific cause for these differences. However, in most cases the comparatively low observation station density appears to be the principal reason for the differences in magnitude.

  18. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments.

    PubMed

    Ionescu, Catalin; Papava, Dragos; Olaru, Vlad; Sminchisescu, Cristian

    2014-07-01

    We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state-of-the-art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture, and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large-scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large-scale model can leverage our full training set to obtain a 20% improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online at http://vision.imar.ro/human3.6m.

  19. Spatial distribution of precipitation extremes in Norway

    NASA Astrophysics Data System (ADS)

    Verpe Dyrrdal, Anita; Skaugen, Thomas; Lenkoski, Alex; Thorarinsdottir, Thordis; Stordal, Frode; Førland, Eirik J.

    2015-04-01

    Estimates of extreme precipitation, in terms of return levels, are crucial in planning and design of important infrastructure. Through two separate studies, we have examined the levels and spatial distribution of daily extreme precipitation over catchments in Norway, and hourly extreme precipitation in a point. The analyses were carried out through the development of two new methods for estimating extreme precipitation in Norway. For daily precipitation we fit the Generalized Extreme Value (GEV) distribution to areal time series from a gridded dataset, consisting of daily precipitation during the period 1957-today with a resolution of 1x1 km². This grid-based method is more objective and less manual and time-consuming compared to the existing method at MET Norway. In addition, estimates in ungauged catchments are easier to obtain, and the GEV approach includes a measure of uncertainty, which is a requirement in climate studies today. Further, we go into depth on the debated GEV shape parameter, which plays an important role for longer return periods. We show that it varies according to dominating precipitation types, having positive values in the southeast and negative values in the southwest. We also find indications that the degree of orographic enhancement might affect the shape parameter. For hourly precipitation, we estimate return levels on a 1x1 km² grid, by linking GEV distributions with latent Gaussian fields in a Bayesian hierarchical model (BHM). Generalized linear models on the GEV parameters, estimated from observations, are able to incorporate location-specific geographic and meteorological information and thereby accommodate these effects on extreme precipitation. Gaussian fields capture additional unexplained spatial heterogeneity and overcome the sparse grid on which observations are collected, while a Bayesian model averaging component directly assesses model uncertainty. We find that mean summer precipitation, mean summer temperature, latitude, longitude, mean annual precipitation and elevation are good covariate candidates for hourly precipitation in our model. Summer indices succeed because hourly precipitation extremes often occur during the convective season. The spatial distribution of hourly and daily precipitation differs in Norway. Daily precipitation extremes are larger along the southwestern coast, where large-scale frontal systems dominate during fall season and the mountain ridge generates strong orographic enhancement. The largest hourly precipitation extremes are mostly produced by intense convective showers during summer, and are thus found along the entire southern coast, including the Oslo-region.

  20. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

  1. An interactive web application for the dissemination of human systems immunology data.

    PubMed

    Speake, Cate; Presnell, Scott; Domico, Kelly; Zeitner, Brad; Bjork, Anna; Anderson, David; Mason, Michael J; Whalen, Elizabeth; Vargas, Olivia; Popov, Dimitry; Rinchai, Darawan; Jourde-Chiche, Noemie; Chiche, Laurent; Quinn, Charlie; Chaussabel, Damien

    2015-06-19

    Systems immunology approaches have proven invaluable in translational research settings. The current rate at which large-scale datasets are generated presents unique challenges and opportunities. Mining aggregates of these datasets could accelerate the pace of discovery, but new solutions are needed to integrate the heterogeneous data types with the contextual information that is necessary for interpretation. In addition, enabling tools and technologies facilitating investigators' interaction with large-scale datasets must be developed in order to promote insight and foster knowledge discovery. State of the art application programming was employed to develop an interactive web application for browsing and visualizing large and complex datasets. A collection of human immune transcriptome datasets were loaded alongside contextual information about the samples. We provide a resource enabling interactive query and navigation of transcriptome datasets relevant to human immunology research. Detailed information about studies and samples are displayed dynamically; if desired the associated data can be downloaded. Custom interactive visualizations of the data can be shared via email or social media. This application can be used to browse context-rich systems-scale data within and across systems immunology studies. This resource is publicly available online at [Gene Expression Browser Landing Page ( https://gxb.benaroyaresearch.org/dm3/landing.gsp )]. The source code is also available openly [Gene Expression Browser Source Code ( https://github.com/BenaroyaResearch/gxbrowser )]. We have developed a data browsing and visualization application capable of navigating increasingly large and complex datasets generated in the context of immunological studies. This intuitive tool ensures that, whether taken individually or as a whole, such datasets generated at great effort and expense remain interpretable and a ready source of insight for years to come.

  2. The Greenwich Photo-heliographic Results (1874 - 1976): Summary of the Observations, Applications, Datasets, Definitions and Errors

    NASA Astrophysics Data System (ADS)

    Willis, D. M.; Coffey, H. E.; Henwood, R.; Erwin, E. H.; Hoyt, D. V.; Wild, M. N.; Denig, W. F.

    2013-11-01

    The measurements of sunspot positions and areas that were published initially by the Royal Observatory, Greenwich, and subsequently by the Royal Greenwich Observatory (RGO), as the Greenwich Photo-heliographic Results ( GPR), 1874 - 1976, exist in both printed and digital forms. These printed and digital sunspot datasets have been archived in various libraries and data centres. Unfortunately, however, typographic, systematic and isolated errors can be found in the various datasets. The purpose of the present paper is to begin the task of identifying and correcting these errors. In particular, the intention is to provide in one foundational paper all the necessary background information on the original solar observations, their various applications in scientific research, the format of the different digital datasets, the necessary definitions of the quantities measured, and the initial identification of errors in both the printed publications and the digital datasets. Two companion papers address the question of specific identifiable errors; namely, typographic errors in the printed publications, and both isolated and systematic errors in the digital datasets. The existence of two independently prepared digital datasets, which both contain information on sunspot positions and areas, makes it possible to outline a preliminary strategy for the development of an even more accurate digital dataset. Further work is in progress to generate an extremely reliable sunspot digital dataset, based on the programme of solar observations supported for more than a century by the Royal Observatory, Greenwich, and the Royal Greenwich Observatory. This improved dataset should be of value in many future scientific investigations.

  3. A globally calibrated scheme for generating daily meteorology from monthly statistics: Global-WGEN (GWGEN) v1.0

    NASA Astrophysics Data System (ADS)

    Sommer, Philipp S.; Kaplan, Jed O.

    2017-10-01

    While a wide range of Earth system processes occur at daily and even subdaily timescales, many global vegetation and other terrestrial dynamics models historically used monthly meteorological forcing both to reduce computational demand and because global datasets were lacking. Recently, dynamic land surface modeling has moved towards resolving daily and subdaily processes, and global datasets containing daily and subdaily meteorology have become available. These meteorological datasets, however, cover only the instrumental era of the last approximately 120 years at best, are subject to considerable uncertainty, and represent extremely large data files with associated computational costs of data input/output and file transfer. For periods before the recent past or in the future, global meteorological forcing can be provided by climate model output, but the quality of these data at high temporal resolution is low, particularly for daily precipitation frequency and amount. Here, we present GWGEN, a globally applicable statistical weather generator for the temporal downscaling of monthly climatology to daily meteorology. Our weather generator is parameterized using a global meteorological database and simulates daily values of five common variables: minimum and maximum temperature, precipitation, cloud cover, and wind speed. GWGEN is lightweight, modular, and requires a minimal set of monthly mean variables as input. The weather generator may be used in a range of applications, for example, in global vegetation, crop, soil erosion, or hydrological models. While GWGEN does not currently perform spatially autocorrelated multi-point downscaling of daily weather, this additional functionality could be implemented in future versions.

  4. Oceanographic variation influences spatial genomic structure in the sea scallop, Placopecten magellanicus.

    PubMed

    Van Wyngaarden, Mallory; Snelgrove, Paul V R; DiBacco, Claudio; Hamilton, Lorraine C; Rodríguez-Ezpeleta, Naiara; Zhan, Luyao; Beiko, Robert G; Bradbury, Ian R

    2018-03-01

    Environmental factors can influence diversity and population structure in marine species and accurate understanding of this influence can both improve fisheries management and help predict responses to environmental change. We used 7163 SNPs derived from restriction site-associated DNA sequencing genotyped in 245 individuals of the economically important sea scallop, Placopecten magellanicus , to evaluate the correlations between oceanographic variation and a previously identified latitudinal genomic cline. Sea scallops span a broad latitudinal area (>10 degrees), and we hypothesized that climatic variation significantly drives clinal trends in allele frequency. Using a large environmental dataset, including temperature, salinity, chlorophyll a, and nutrient concentrations, we identified a suite of SNPs (285-621, depending on analysis and environmental dataset) potentially under selection through correlations with environmental variation. Principal components analysis of different outlier SNPs and environmental datasets revealed similar northern and southern clusters, with significant associations between the first axes of each ( R 2 adj  = .66-.79). Multivariate redundancy analysis of outlier SNPs and the environmental principal components indicated that environmental factors explained more than 32% of the variance. Similarly, multiple linear regressions and random-forest analysis identified winter average and minimum ocean temperatures as significant parameters in the link between genetic and environmental variation. This work indicates that oceanographic variation is associated with the observed genomic cline in this species and that seasonal periods of extreme cold may restrict gene flow along a latitudinal gradient in this marine benthic bivalve. Incorporating this finding into management may improve accuracy of management strategies and future predictions.

  5. Putative archaeal viruses from the mesopelagic ocean.

    PubMed

    Vik, Dean R; Roux, Simon; Brum, Jennifer R; Bolduc, Ben; Emerson, Joanne B; Padilla, Cory C; Stewart, Frank J; Sullivan, Matthew B

    2017-01-01

    Oceanic viruses that infect bacteria, or phages, are known to modulate host diversity, metabolisms, and biogeochemical cycling, while the viruses that infect marine Archaea remain understudied despite the critical ecosystem roles played by their hosts. Here we introduce "MArVD", for Metagenomic Archaeal Virus Detector, an annotation tool designed to identify putative archaeal virus contigs in metagenomic datasets. MArVD is made publicly available through the online iVirus analytical platform. Benchmarking analysis of MArVD showed it to be >99% accurate and 100% sensitive in identifying the 127 known archaeal viruses among the 12,499 viruses in the VirSorter curated dataset. Application of MArVD to 10 viral metagenomes from two depth profiles in the Eastern Tropical North Pacific (ETNP) oxygen minimum zone revealed 43 new putative archaeal virus genomes and large genome fragments ranging in size from 10 to 31 kb. Network-based classifications, which were consistent with marker gene phylogenies where available, suggested that these putative archaeal virus contigs represented six novel candidate genera. Ecological analyses, via fragment recruitment and ordination, revealed that the diversity and relative abundances of these putative archaeal viruses were correlated with oxygen concentration and temperature along two OMZ-spanning depth profiles, presumably due to structuring of the host Archaea community. Peak viral diversity and abundances were found in surface waters, where Thermoplasmata 16S rRNA genes are prevalent, suggesting these archaea as hosts in the surface habitats. Together these findings provide a baseline for identifying archaeal viruses in sequence datasets, and an initial picture of the ecology of such viruses in non-extreme environments.

  6. Extreme flood event analysis in Indonesia based on rainfall intensity and recharge capacity

    NASA Astrophysics Data System (ADS)

    Narulita, Ida; Ningrum, Widya

    2018-02-01

    Indonesia is very vulnerable to flood disaster because it has high rainfall events throughout the year. Flood is categorized as the most important hazard disaster because it is causing social, economic and human losses. The purpose of this study is to analyze extreme flood event based on satellite rainfall dataset to understand the rainfall characteristic (rainfall intensity, rainfall pattern, etc.) that happened before flood disaster in the area for monsoonal, equatorial and local rainfall types. Recharge capacity will be analyzed using land cover and soil distribution. The data used in this study are CHIRPS rainfall satellite data on 0.05 ° spatial resolution and daily temporal resolution, and GSMap satellite rainfall dataset operated by JAXA on 1-hour temporal resolution and 0.1 ° spatial resolution, land use and soil distribution map for recharge capacity analysis. The rainfall characteristic before flooding, and recharge capacity analysis are expected to become the important information for flood mitigation in Indonesia.

  7. The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation

    PubMed Central

    Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B.; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali

    2017-01-01

    The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications. PMID:28545077

  8. The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation.

    PubMed

    Andreadis, Konstantinos M; Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali

    2017-01-01

    The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications.

  9. The use of large scale datasets for understanding traffic network state.

    DOT National Transportation Integrated Search

    2013-09-01

    The goal of this proposal is to develop novel modeling techniques to infer individual activity patterns from the large scale cell phone : datasets and taxi data from NYC. As such this research offers a paradigm shift from traditional transportation m...

  10. Food Security and Extreme Events: Evidence from Smallholder Farmers in Central America

    NASA Astrophysics Data System (ADS)

    Saborio-Rodriguez, M.; Alpizar, F.; Harvey, C.; Martinez, R.; Vignola, R.; Viguera, B.; Capitan, T.

    2016-12-01

    Extreme weather events, which are expected to increase in magnitude and frequency due to climate change, are one of the main threats for smallholder farmers in Central America. Using a rich dataset from carefully selected subsistence farm households, we explore the determinants and severity of food insecurity resulting from extreme hydrometeorological hazards. In addition, we analyze farmerś coping strategies. Our analysis sheds light over food insecurity as an expression of vulnerability in a region that is expected to be increasingly exposed to extreme events and in a population already stressed by poverty and lack of opportunities. Regarding food insecurity, multivariate analyses indicate that education, having at least one migrant in the household, labor allocation, number of plots, and producing coffee are determinants of the probability of experiencing lack of food after an extreme weather event. Once the household is lacking food, the duration of the episode is related to access to credit, number of plots, producing coffee, ownership of land and gender of the head of the household. This results are in line with previous literature on the determinants of food insecurity in particular, and vulnerability, in general. Our dataset also allows us to analyze coping strategies. Households experiencing lack of food after an extreme weather event report mainly changes in their habits, as decreasing the amount of food consumed (54%) and modifying their diet (35%). A low proportion of household (between 10% and 15%, depending on the nature of the event) use their assets, by redirecting their savings, migrating, and selling items from the house. Asking money or food from family and friends or from an organization is reported for 4% of the households. This general results are connected to the specific coping strategies related to damages in crops, which are explored in detail. Our results indicate that there are patterns among the household experiencing lack of food after an extreme weather event. These patterns create opportunities for directing help, and preparing farmers in advance. The coping strategies used are precarious. Therefore, there is a need for rethinking policies that effectively help farmers to cope with extreme weather events with sustainable responses that reduce their vulnerability.

  11. Application of Radar-Rainfall Estimates to Probable Maximum Precipitation in the Carolinas

    NASA Astrophysics Data System (ADS)

    England, J. F.; Caldwell, R. J.; Sankovich, V.

    2011-12-01

    Extreme storm rainfall data are essential in the assessment of potential impacts on design precipitation amounts, which are used in flood design criteria for dams and nuclear power plants. Probable Maximum Precipitation (PMP) from National Weather Service Hydrometeorological Report 51 (HMR51) is currently used for design rainfall estimates in the eastern U.S. The extreme storm database associated with the report has not been updated since the early 1970s. In the past several decades, several extreme precipitation events have occurred that have the potential to alter the PMP values, particularly across the Southeast United States (e.g., Hurricane Floyd 1999). Unfortunately, these and other large precipitation-producing storms have not been analyzed with the detail required for application in design studies. This study focuses on warm-season tropical cyclones (TCs) in the Carolinas, as these systems are the critical maximum rainfall mechanisms in the region. The goal is to discern if recent tropical events may have reached or exceeded current PMP values. We have analyzed 10 storms using modern datasets and methodologies that provide enhanced spatial and temporal resolution relative to point measurements used in past studies. Specifically, hourly multisensor precipitation reanalysis (MPR) data are used to estimate storm total precipitation accumulations at various durations throughout each storm event. The accumulated grids serve as input to depth-area-duration calculations. Individual storms are then maximized using back-trajectories to determine source regions for moisture. The development of open source software has made this process time and resource efficient. Based on the current methodology, two of the ten storms analyzed have the potential to challenge HMR51 PMP values. Maximized depth-area curves for Hurricane Floyd indicate exceedance at 24- and 72-hour durations for large area sizes, while Hurricane Fran (1996) appears to exceed PMP at large area sizes for short-duration, 6-hour storms. Utilizing new methods and data, however, requires careful consideration of the potential limitations and caveats associated with the analysis and further evaluation of the newer storms within the context of historical storms from HMR51. Here, we provide a brief background on extreme rainfall in the Carolinas, along with an overview of the methods employed for converting MPR to depth-area relationships. Discussion of the issues and limitations, evaluation of the various techniques, and comparison to HMR51 storms and PMP values are also presented.

  12. Propagation of Flood and Drought from Atmosphere down to Groundwater Based on 1983-2013 Direct Observations in Illinois

    NASA Astrophysics Data System (ADS)

    Zhang, W.; Chen, Y.

    2017-12-01

    Climate change is expected to significantly alter and intensify the global hydrologic cycle, with the severe consequence of more frequent occurrence of floods and droughts. In this study, we utilize a long-term 1983-2013 hydro-climatic dataset in Illinois collected from multiple sources to characterize historical occurrence of anomalously large floods and drought events. This unique 31-year dataset covering daily and monthly variables of temperature, humidity, radiation, potential evapotranspiration, atmospheric vapor convergence, precipitation, evapotranspiration, soil moisture, groundwater depth and river flow. The analysis is based on the perspective of combined land-atmospheric interactions to understand the mechanisms of flood and drought occurrence due to anomalous precipitation and temperature conditions, and how they propagate through the entire hydrologic cycle from atmospheric water vapor to soil moisture, groundwater and river flow. The sensitivity of hydroclimatic anomalies propagation to climate factors (precipitation, temperature, radiation and humidity) are examined as exemplified from the historically water extremes such as the Mississippi floods in 1993 and 2008 and the Midwest droughts in 1988, 2005 and 2012. The findings from this study bears significant implications in understanding hydrologic response to warming climate, in particular the consensus of projected increasing occurrence of future floods and droughts.

  13. Digital Object Identifiers (DOI's) usage and adoption in U.S Geological Survey (USGS)

    NASA Astrophysics Data System (ADS)

    Frame, M. T.; Palanisamy, G.

    2013-12-01

    Addressing grand environmental science challenges requires unprecedented access to easily understood data that cross the breadth of temporal, spatial, and thematic scales. From a scientist's perspective, the big challenges lie in discovering the relevant data, dealing with extreme data heterogeneity, large data volumes, and converting data to information and knowledge. Historical linkages between derived products, i.e. Publications, and associated datasets has not existed in the earth science community. The USGS Core Science Analytics and Synthesis, in collaboration with DOE's Oak Ridge National Laboratory (ORNL) Mercury Consortium (funded by NASA, USGS and DOE), established a Digital Object Identifier (DOI) service for USGS data, metadata, and other media. This service is offered in partnership through the University of California Digital Library EZID service. USGS scientists, data managers, and other professionals can generate globally unique, persistent and resolvable identifiers for any kind of digital objects. Additional efforts to assign DOIs to historical data and publications have also been underway. These DOI identifiers are being used to cite data in journal articles, web-accessible datasets, and other media for distribution, integration, and in support of improved data management practices. The session will discuss the current DOI efforts within USGS, including a discussion on adoption, challenges, and future efforts necessary to improve access, reuse, sharing, and discoverability of USGS data and information.

  14. Methane Leaks from Natural Gas Systems Follow Extreme Distributions

    DOE PAGES

    Brandt, Adam R.; Heath, Garvin A.; Cooley, Daniel

    2016-10-14

    Future energy systems may rely on natural gas as a low-cost fuel to support variable renewable power. However, leaking natural gas causes climate damage because methane (CH 4) has a high global warming potential. In this study, we use extreme-value theory to explore the distribution of natural gas leak sizes. By analyzing ~15,000 measurements from 18 prior studies, we show that all available natural gas leakage datasets are statistically heavy-tailed, and that gas leaks are more extremely distributed than other natural and social phenomena. A unifying result is that the largest 5% of leaks typically contribute over 50% of themore » total leakage volume. While prior studies used lognormal model distributions, we show that lognormal functions poorly represent tail behavior. Our results suggest that published uncertainty ranges of CH 4 emissions are too narrow, and that larger sample sizes are required in future studies to achieve targeted confidence intervals. Additionally, we find that cross-study aggregation of datasets to increase sample size is not recommended due to apparent deviation between sampled populations. Finally, understanding the nature of leak distributions can improve emission estimates, better illustrate their uncertainty, allow prioritization of source categories, and improve sampling design. Also, these data can be used for more effective design of leak detection technologies.« less

  15. Methane Leaks from Natural Gas Systems Follow Extreme Distributions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brandt, Adam R.; Heath, Garvin A.; Cooley, Daniel

    Future energy systems may rely on natural gas as a low-cost fuel to support variable renewable power. However, leaking natural gas causes climate damage because methane (CH 4) has a high global warming potential. In this study, we use extreme-value theory to explore the distribution of natural gas leak sizes. By analyzing ~15,000 measurements from 18 prior studies, we show that all available natural gas leakage datasets are statistically heavy-tailed, and that gas leaks are more extremely distributed than other natural and social phenomena. A unifying result is that the largest 5% of leaks typically contribute over 50% of themore » total leakage volume. While prior studies used lognormal model distributions, we show that lognormal functions poorly represent tail behavior. Our results suggest that published uncertainty ranges of CH 4 emissions are too narrow, and that larger sample sizes are required in future studies to achieve targeted confidence intervals. Additionally, we find that cross-study aggregation of datasets to increase sample size is not recommended due to apparent deviation between sampled populations. Finally, understanding the nature of leak distributions can improve emission estimates, better illustrate their uncertainty, allow prioritization of source categories, and improve sampling design. Also, these data can be used for more effective design of leak detection technologies.« less

  16. Evaluation of exposure to lead from drinking water in large buildings.

    PubMed

    Deshommes, Elise; Andrews, Robert C; Gagnon, Graham; McCluskey, Tim; McIlwain, Brad; Doré, Evelyne; Nour, Shokoufeh; Prévost, Michèle

    2016-08-01

    Lead results from 78,971 water samples collected in four Canadian provinces from elementary schools, daycares, and other large buildings using regulatory and investigative sampling protocols were analyzed to provide lead concentration distributions. Maximum concentrations reached 13,200 and 3890 μg/L following long and short stagnation periods respectively. High lead levels were persistent in some large buildings, reflected by high median values considering all taps, or specific to a few taps in the building. Simulations using the Integrated Uptake Biokinetic (IEUBK) model and lead concentrations after 30 min of stagnation in the dataset showed that, for most buildings, exposure to lead at the tap does not increase children's blood lead levels (BLLs). However, buildings or taps with extreme concentrations represent a significant health risk to young children attending school or daycare, as the estimated BLL far exceeded the 5 μg/dL threshold. Ingestion of water from specific taps could lead to acute exposure. Finally, for a few taps, the total daily lead intake reached the former World Health Organization (WHO) tolerable level for adults, suggesting potential health risks. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    NASA Astrophysics Data System (ADS)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  18. Solving the challenges of data preprocessing, uploading, archiving, retrieval, analysis and visualization for large heterogeneous paleo- and rock magnetic datasets

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A. A.; Tauxe, L.; Constable, C.; Jarboe, N. A.

    2011-12-01

    The Magnetics Information Consortium (MagIC) provides an archive for the wealth of rock- and paleomagnetic data and interpretations from studies on natural and synthetic samples. As with many fields, most peer-reviewed paleo- and rock magnetic publications only include high level results. However, access to the raw data from which these results were derived is critical for compilation studies and when updating results based on new interpretation and analysis methods. MagIC provides a detailed metadata model with places for everything from raw measurements to their interpretations. Prior to MagIC, these raw data were extremely cumbersome to collect because they mostly existed in a lab's proprietary format on investigator's personal computers or undigitized in field notebooks. MagIC has developed a suite of offline and online tools to enable the paleomagnetic, rock magnetic, and affiliated scientific communities to easily contribute both their previously published data and data supporting an article undergoing peer-review, to retrieve well-annotated published interpretations and raw data, and to analyze and visualize large collections of published data online. Here we present the technology we chose (including VBA in Excel spreadsheets, Python libraries, FastCGI JSON webservices, Oracle procedures, and jQuery user interfaces) and how we implemented it in order to serve the scientific community as seamlessly as possible. These tools are now in use in labs worldwide, have helped archive many valuable legacy studies and datasets, and routinely enable new contributions to the MagIC Database (http://earthref.org/MAGIC/).

  19. Distributed File System Utilities to Manage Large DatasetsVersion 0.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-05-21

    FileUtils provides a suite of tools to manage large datasets typically created by large parallel MPI applications. They are written in C and use standard POSIX I/Ocalls. The current suite consists of tools to copy, compare, remove, and list. The tools provide dramatic speedup over existing Linux tools, which often run as a single process.

  20. A studyforrest extension, retinotopic mapping and localization of higher visual areas

    PubMed Central

    Sengupta, Ayan; Kaule, Falko R.; Guntupalli, J. Swaroop; Hoffmann, Michael B.; Häusler, Christian; Stadler, Jörg; Hanke, Michael

    2016-01-01

    The studyforrest (http://studyforrest.org) dataset is likely the largest neuroimaging dataset on natural language and story processing publicly available today. In this article, along with a companion publication, we present an update of this dataset that extends its scope to vision and multi-sensory research. 15 participants of the original cohort volunteered for a series of additional studies: a clinical examination of visual function, a standard retinotopic mapping procedure, and a localization of higher visual areas—such as the fusiform face area. The combination of this update, the previous data releases for the dataset, and the companion publication, which includes neuroimaging and eye tracking data from natural stimulation with a motion picture, form an extremely versatile and comprehensive resource for brain imaging research—with almost six hours of functional neuroimaging data across five different stimulation paradigms for each participant. Furthermore, we describe employed paradigms and present results that document the quality of the data for the purpose of characterising major properties of participants’ visual processing stream. PMID:27779618

  1. Assessment of gridded observations used for climate model validation in the Mediterranean region: the HyMeX and MED-CORDEX framework

    NASA Astrophysics Data System (ADS)

    Flaounas, Emmanouil; Drobinski, Philippe; Borga, Marco; Calvet, Jean-Christophe; Delrieu, Guy; Morin, Efrat; Tartari, Gianni; Toffolon, Roberta

    2012-06-01

    This letter assesses the quality of temperature and rainfall daily retrievals of the European Climate Assessment and Dataset (ECA&D) with respect to measurements collected locally in various parts of the Euro-Mediterranean region in the framework of the Hydrological Cycle in the Mediterranean Experiment (HyMeX), endorsed by the Global Energy and Water Cycle Experiment (GEWEX) of the World Climate Research Program (WCRP). The ECA&D, among other gridded datasets, is very often used as a reference for model calibration and evaluation. This is for instance the case in the context of the WCRP Coordinated Regional Downscaling Experiment (CORDEX) and its Mediterranean declination MED-CORDEX. This letter quantifies ECA&D dataset uncertainties associated with temperature and precipitation intra-seasonal variability, seasonal distribution and extremes. Our motivation is to help the interpretation of the results when validating or calibrating downscaling models by the ECA&D dataset in the context of regional climate research in the Euro-Mediterranean region.

  2. Pubface: Celebrity face identification based on deep learning

    NASA Astrophysics Data System (ADS)

    Ouanan, H.; Ouanan, M.; Aksasse, B.

    2018-05-01

    In this paper, we describe a new real time application called PubFace, which allows to recognize celebrities in public spaces by employs a new pose invariant face recognition deep neural network algorithm with an extremely low error rate. To build this application, we make the following contributions: firstly, we build a novel dataset with over five million faces labelled. Secondly, we fine tuning the deep convolutional neural network (CNN) VGG-16 architecture on our new dataset that we have built. Finally, we deploy this model on the Raspberry Pi 3 model B using the OpenCv dnn module (OpenCV 3.3).

  3. Temporal pattern and memory in sediment transport in an experimental step-pool channel

    NASA Astrophysics Data System (ADS)

    Saletti, Matteo; Molnar, Peter; Zimmermann, André; Hassan, Marwan A.; Church, Michael; Burlando, Paolo

    2015-04-01

    In this work we study the complex dynamics of sediment transport and bed morphology in steep streams, using a dataset of experiments performed in a steep flume with natural sediment. High-resolution (1 sec) time series of sediment transport were measured for individual size classes at the outlet of the flume for different combinations of sediment input rates, discharges, and flume slopes. The data show that the relation between instantaneous discharge and sediment transport exhibits large variability on different levels. After dividing the time series into segments of constant water discharge, we quantify the statistical properties of transport rates by fitting the data with a Generalized Extreme Value distribution, whose 3 parameters are related to the average sediment flux. We analyze separately extreme events of transport rate in terms of their fractional composition; if only events of high magnitude are considered, coarse grains become the predominant component of the total sediment yield. We quantify the memory in grain size dependent sediment transport with variance scaling and autocorrelation analyses; more specifically, we study how the variance changes with different aggregation scales and how the autocorrelation coefficient changes with different time lags. Our results show that there is a tendency to an infinite memory regime in transport rate signals, which is limited by the intermittency of the largest fractions. Moreover, the structure of memory is both grain size-dependent and magnitude-dependent: temporal autocorrelation is stronger for small grain size fractions and when the average sediment transport rate is large. The short-term memory in coarse grain transport increases with temporal aggregation and this reveals the importance of the sampling frequency of bedload transport rates in natural streams, especially for large fractions.

  4. Environmental Factors and Internal Processes Contributing to Interrupted Rapid Decay of Hurricane Joaquin (2015)

    NASA Astrophysics Data System (ADS)

    Hendricks, E. A.; Elsberry, R. L.; Velden, C.; Creasey, R.; Jorgensen, A.; Jordan, M.

    2017-12-01

    Hurricane Joaquin (2015) was the most intense Atlantic hurricane with a non-tropical origin during the satellite era. In addition to its rapid intensification, Joaquin was noteworthy for the difficulty in forecasting its post-recurvature track to the northeast after having struck the Bahama Islands. Such a track typically leads to a decay as the hurricane moves poleward over colder water, and Joaquin had an extreme decay rate from 135 kt to 65 kt in only 30 h. The focus of this study is on the environmental and internal factors that interrupted this extreme decay at 1800 UTC 4 October, and then how Joaquin re-intensified to 75 kt and maintained that intensity for 30 hours. The real-time Statistical Hurricane Intensity Prediction System (SHIPS) database is used to calculate each six hours six environmental variables that Hendricks et al. (2010) had found contributed to intensity change. Only the deep-layer vertical wind shear (VWS) from SHIPS, and also from the Cooperative Institute for Meteorological Satellite Studies (CIMSS), had a well-defined relationship with both the interrupted rapid decay and the subsequent constant intensity period. A special dataset of Atmospheric Motion Vectors (AMVs) at 15-minute intervals prepared by CIMSS is then utilized to create a continuous VWS record that documents the large ( 15 m s-1) VWS throughout most of the rapid decay period, and then a rapid decrease in VWS to moderate ( 8 m s-1) values at and following the rapid decay period. Horizontal distributions of these CIMSS VWSs demonstrate that during this period Joaquin was located in a large gradient region between large VWSs to the north and near-zero VWSs to the south, which was favorable for sustaining Joaquin at hurricane intensity.

  5. A Variable-Selection Heuristic for K-Means Clustering.

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Cradit, J. Dennis

    2001-01-01

    Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)

  6. Changing monsoon and midlatitude circulation interactions over the Western Himalayas and possible links to occurrences of extreme precipitation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Priya, P.; Krishnan, R.; Mujumdar, Milind

    Historical rainfall records reveal that the frequency and intensity of extreme precipitation events, during the summer monsoon (June to September) season, have significantly risen over the Western Himalayas (WH) and adjoining upper Indus basin since 1950s. Using multiple datasets, the present study investigates the possible coincidences between an increasing trend of precipitation extremes over WH and changes in background flow climatology. The present findings suggest that the combined effects of a weakened southwest monsoon circulation, increased activity of transient upper-air westerly troughs over the WH region, enhanced moisture supply by southerly winds from the Arabian Sea into the Indus basinmore » have likely provided favorable conditions for an increased frequency of certain types of extreme precipitation events over the WH region in recent decades.« less

  7. Interactive Visualization and Analysis of Geospatial Data Sets - TrikeND-iGlobe

    NASA Astrophysics Data System (ADS)

    Rosebrock, Uwe; Hogan, Patrick; Chandola, Varun

    2013-04-01

    The visualization of scientific datasets is becoming an ever-increasing challenge as advances in computing technologies have enabled scientists to build high resolution climate models that have produced petabytes of climate data. To interrogate and analyze these large datasets in real-time is a task that pushes the boundaries of computing hardware and software. But integration of climate datasets with geospatial data requires considerable amount of effort and close familiarity of various data formats and projection systems, which has prevented widespread utilization outside of climate community. TrikeND-iGlobe is a sophisticated software tool that bridges this gap, allows easy integration of climate datasets with geospatial datasets and provides sophisticated visualization and analysis capabilities. The objective for TrikeND-iGlobe is the continued building of an open source 4D virtual globe application using NASA World Wind technology that integrates analysis of climate model outputs with remote sensing observations as well as demographic and environmental data sets. This will facilitate a better understanding of global and regional phenomenon, and the impact analysis of climate extreme events. The critical aim is real-time interactive interrogation. At the data centric level the primary aim is to enable the user to interact with the data in real-time for the purpose of analysis - locally or remotely. TrikeND-iGlobe provides the basis for the incorporation of modular tools that provide extended interactions with the data, including sub-setting, aggregation, re-shaping, time series analysis methods and animation to produce publication-quality imagery. TrikeND-iGlobe may be run locally or can be accessed via a web interface supported by high-performance visualization compute nodes placed close to the data. It supports visualizing heterogeneous data formats: traditional geospatial datasets along with scientific data sets with geographic coordinates (NetCDF, HDF, etc.). It also supports multiple data access mechanisms, including HTTP, FTP, WMS, WCS, and Thredds Data Server (for NetCDF data and for scientific data, TrikeND-iGlobe supports various visualization capabilities, including animations, vector field visualization, etc. TrikeND-iGlobe is a collaborative open-source project, contributors include NASA (ARC-PX), ORNL (Oakridge National Laboratories), Unidata, Kansas University, CSIRO CMAR Australia and Geoscience Australia.

  8. Statistical analysis of large simulated yield datasets for studying climate effects

    USDA-ARS?s Scientific Manuscript database

    Ensembles of process-based crop models are now commonly used to simulate crop growth and development for climate scenarios of temperature and/or precipitation changes corresponding to different projections of atmospheric CO2 concentrations. This approach generates large datasets with thousands of de...

  9. Genome-wide association study using extreme truncate selection identifies novel genes affecting bone mineral density and fracture risk.

    PubMed

    Duncan, Emma L; Danoy, Patrick; Kemp, John P; Leo, Paul J; McCloskey, Eugene; Nicholson, Geoffrey C; Eastell, Richard; Prince, Richard L; Eisman, John A; Jones, Graeme; Sambrook, Philip N; Reid, Ian R; Dennison, Elaine M; Wark, John; Richards, J Brent; Uitterlinden, Andre G; Spector, Tim D; Esapa, Chris; Cox, Roger D; Brown, Steve D M; Thakker, Rajesh V; Addison, Kathryn A; Bradbury, Linda A; Center, Jacqueline R; Cooper, Cyrus; Cremin, Catherine; Estrada, Karol; Felsenberg, Dieter; Glüer, Claus-C; Hadler, Johanna; Henry, Margaret J; Hofman, Albert; Kotowicz, Mark A; Makovey, Joanna; Nguyen, Sing C; Nguyen, Tuan V; Pasco, Julie A; Pryce, Karena; Reid, David M; Rivadeneira, Fernando; Roux, Christian; Stefansson, Kari; Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Tichawangana, Rumbidzai; Evans, David M; Brown, Matthew A

    2011-04-01

    Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD) is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS) of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55-85 years), with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or -4.0 to -1.5, n = 900), with replication in cohorts of women drawn from the general population (n = 20,898). The study replicates 21 of 26 known BMD-associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies.

  10. Genome-Wide Association Study Using Extreme Truncate Selection Identifies Novel Genes Affecting Bone Mineral Density and Fracture Risk

    PubMed Central

    Duncan, Emma L.; Danoy, Patrick; Kemp, John P.; Leo, Paul J.; McCloskey, Eugene; Nicholson, Geoffrey C.; Eastell, Richard; Prince, Richard L.; Eisman, John A.; Jones, Graeme; Sambrook, Philip N.; Reid, Ian R.; Dennison, Elaine M.; Wark, John; Richards, J. Brent; Uitterlinden, Andre G.; Spector, Tim D.; Esapa, Chris; Cox, Roger D.; Brown, Steve D. M.; Thakker, Rajesh V.; Addison, Kathryn A.; Bradbury, Linda A.; Center, Jacqueline R.; Cooper, Cyrus; Cremin, Catherine; Estrada, Karol; Felsenberg, Dieter; Glüer, Claus-C.; Hadler, Johanna; Henry, Margaret J.; Hofman, Albert; Kotowicz, Mark A.; Makovey, Joanna; Nguyen, Sing C.; Nguyen, Tuan V.; Pasco, Julie A.; Pryce, Karena; Reid, David M.; Rivadeneira, Fernando; Roux, Christian; Stefansson, Kari; Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Tichawangana, Rumbidzai; Evans, David M.; Brown, Matthew A.

    2011-01-01

    Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD) is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS) of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55–85 years), with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or −4.0 to −1.5, n = 900), with replication in cohorts of women drawn from the general population (n = 20,898). The study replicates 21 of 26 known BMD–associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies. PMID:21533022

  11. Uncertainty and extreme events in future climate and hydrologic projections for the Pacific Northwest: providing a basis for vulnerability and core/corridor assessments

    USGS Publications Warehouse

    Littell, Jeremy S.; Mauger, Guillaume S.; Salathe, Eric P.; Hamlet, Alan F.; Lee, Se-Yeun; Stumbaugh, Matt R.; Elsner, Marketa; Norheim, Robert; Lutz, Eric R.; Mantua, Nathan J.

    2014-01-01

    The purpose of this project was to (1) provide an internally-consistent set of downscaled projections across the Western U.S., (2) include information about projection uncertainty, and (3) assess projected changes of hydrologic extremes. These objectives were designed to address decision support needs for climate adaptation and resource management actions. Specifically, understanding of uncertainty in climate projections – in particular for extreme events – is currently a key scientific and management barrier to adaptation planning and vulnerability assessment. The new dataset fills in the Northwest domain to cover a key gap in the previous dataset, adds additional projections (both from other global climate models and a comparison with dynamical downscaling) and includes an assessment of changes to flow and soil moisture extremes. This new information can be used to assess variations in impacts across the landscape, uncertainty in projections, and how these differ as a function of region, variable, and time period. In this project, existing University of Washington Climate Impacts Group (UW CIG) products were extended to develop a comprehensive data archive that accounts (in a reigorous and physically based way) for climate model uncertainty in future climate and hydrologic scenarios. These products can be used to determine likely impacts on vegetation and aquatic habitat in the Pacific Northwest (PNW) region, including WA, OR, ID, northwest MT to the continental divide, northern CA, NV, UT, and the Columbia Basin portion of western WY New data series and summaries produced for this project include: 1) extreme statistics for surface hydrology (e.g. frequency of soil moisture and summer water deficit) and streamflow (e.g. the 100-year flood, extreme 7-day low flows with a 10-year recurrence interval); 2) snowpack vulnerability as indicated by the ratio of April 1 snow water to cool-season precipitation; and, 3) uncertainty analyses for multiple climate scenarios.

  12. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach.

    PubMed

    Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

  13. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

    PubMed Central

    Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546

  14. Making the MagIC (Magnetics Information Consortium) Web Application Accessible to New Users and Useful to Experts

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Jarboe, N.; Tauxe, L.; Constable, C.; Jonestrask, L.

    2017-12-01

    Challenges are faced by both new and experienced users interested in contributing their data to community repositories, in data discovery, or engaged in potentially transformative science. The Magnetics Information Consortium (https://earthref.org/MagIC) has recently simplified its data model and developed a new containerized web application to reduce the friction in contributing, exploring, and combining valuable and complex datasets for the paleo-, geo-, and rock magnetic scientific community. The new data model more closely reflects the hierarchical workflow in paleomagnetic experiments to enable adequate annotation of scientific results and ensure reproducibility. The new open-source (https://github.com/earthref/MagIC) application includes an upload tool that is integrated with the data model to provide early data validation feedback and ease the friction of contributing and updating datasets. The search interface provides a powerful full text search of contributions indexed by ElasticSearch and a wide array of filters, including specific geographic and geological timescale filtering, to support both novice users exploring the database and experts interested in compiling new datasets with specific criteria across thousands of studies and millions of measurements. The datasets are not large, but they are complex, with many results from evolving experimental and analytical approaches. These data are also extremely valuable due to the cost in collecting or creating physical samples and the, often, destructive nature of the experiments. MagIC is heavily invested in encouraging young scientists as well as established labs to cultivate workflows that facilitate contributing their data in a consistent format. This eLightning presentation includes a live demonstration of the MagIC web application, developed as a configurable container hosting an isomorphic Meteor JavaScript application, MongoDB database, and ElasticSearch search engine. Visitors can explore the MagIC Database through maps and image or plot galleries or search and filter the raw measurements and their derived hierarchy of analytical interpretations.

  15. Characteristics of occurrence of heavy rainfall events over Odisha during summer monsoon season

    NASA Astrophysics Data System (ADS)

    Swain, Madhusmita; Pattanayak, Sujata; Mohanty, U. C.

    2018-06-01

    During summer monsoon season heavy to very heavy rainfall events have been occurring over most part of India, routinely result in flooding over Indian Monsoon Region (IMR). It is worthwhile to mention that as per Geological Survey of India, Odisha is one of the most flood prone regions of India. The present study analyses the occurrence of very light (0-2.4 mm/day), light (2.5 - 15.5 mm/day), moderate (15.6 - 64.4 mm/day), heavy (64.5 - 115.4 mm/day), very heavy (115.5 - 204.4 mm/day) and extreme (≥ 204.5 mm/day) rainy days over Odisha during summer monsoon season for a period of 113 years (1901 - 2013) and a detailed study has been done for heavy-to-extreme rainy days. For this purpose, India Meteorological Department (IMD) gridded (0.25° × 0.25° lat/lon) rainfall data and the European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-Interim) (0.125° × 0.125° lat/lon) datasets are used. The analysis reveals that the frequency of very light, light and moderate rainy days persists with almost constant trend, but the heavy, very heavy and extreme rainy days exhibit an increasing trend during the study period. It may be noted that more than 60% of heavy-to-extreme rainy days are observed in the month of July and August. Furthermore, during the recent period (1980-2013), there are a total of 150 extreme rainy days are observed over Odisha, out of which 47% are associated with monsoon depressions (MDs) and cyclonic storms, 41% are with lows, 2% are due to the presence of middle and upper tropospheric cyclonic circulations, 1% is due to monsoon trough and other 9% of extreme rainy days does not follow any of these synoptic conditions. Since a large (nearly half) percentage of extreme rainy days over Odisha is due to the presence of MDs, a detailed examination of MDs is illustrated in this study. Analysis reveals that there are a total of 91 MDs formed over the Bay of Bengal (BoB) during 1980 - 2013, and out of which 56 (61.5% of total MD) MDs crossed Odisha. Further spatial analysis of extreme rainfall days exhibits that the maximum frequency of extreme rainy days is present over the south west region of Odisha.

  16. Extraction of drainage networks from large terrain datasets using high throughput computing

    NASA Astrophysics Data System (ADS)

    Gong, Jianya; Xie, Jibo

    2009-02-01

    Advanced digital photogrammetry and remote sensing technology produces large terrain datasets (LTD). How to process and use these LTD has become a big challenge for GIS users. Extracting drainage networks, which are basic for hydrological applications, from LTD is one of the typical applications of digital terrain analysis (DTA) in geographical information applications. Existing serial drainage algorithms cannot deal with large data volumes in a timely fashion, and few GIS platforms can process LTD beyond the GB size. High throughput computing (HTC), a distributed parallel computing mode, is proposed to improve the efficiency of drainage networks extraction from LTD. Drainage network extraction using HTC involves two key issues: (1) how to decompose the large DEM datasets into independent computing units and (2) how to merge the separate outputs into a final result. A new decomposition method is presented in which the large datasets are partitioned into independent computing units using natural watershed boundaries instead of using regular 1-dimensional (strip-wise) and 2-dimensional (block-wise) decomposition. Because the distribution of drainage networks is strongly related to watershed boundaries, the new decomposition method is more effective and natural. The method to extract natural watershed boundaries was improved by using multi-scale DEMs instead of single-scale DEMs. A HTC environment is employed to test the proposed methods with real datasets.

  17. Extremes and bursts in complex multi-scale plasmas

    NASA Astrophysics Data System (ADS)

    Watkins, N. W.; Chapman, S. C.; Hnat, B.

    2012-04-01

    Quantifying the spectrum of sizes and durations of large and/or long-lived fluctuations in complex, multi-scale, space plasmas is a topic of both theoretical and practical importance. The predictions of inherently multi-scale physical theories such as MHD turbulence have given one direct stimulus for its investigation. There are also space weather implications to an improved ability to assess the likelihood of an extreme fluctuation of a given size. Our intuition as scientists tends to be formed on the familiar Gaussian "normal" distribution, which has a very low likelihood of extreme fluctuations. Perhaps surprisingly, there is both theoretical and observational evidence that favours non-Gaussian, heavier-tailed, probability distributions for some space physics datasets. Additionally there is evidence for the existence of long-ranged memory between the values of fluctuations. In this talk I will show how such properties can be captured in a preliminary way by a self-similar, fractal model. I will show how such a fractal model can be used to make predictions for experimental accessible quantities like the size and duration of a buurst (a sequence of values that exceed a given threshold), or the survival probability of a burst [c.f. preliminary results in Watkins et al, PRE, 2009]. In real-world time series scaling behaviour need not be "mild" enough to be captured by a single self-similarity exponent H, but might instead require a "wild" multifractal spectrum of scaling exponents [e.g. Rypdal and Rypdal, JGR, 2011; Moloney and Davidsen, JGR, 2011] to give a complete description. I will discuss preliminary work on extending the burst approach into the multifractal domain [see also Watkins et al, chapter in press for AGU Chapman Conference on Complexity and Extreme Events in the Geosciences, Hyderabad].

  18. Rainy Day: A Remote Sensing-Driven Extreme Rainfall Simulation Approach for Hazard Assessment

    NASA Astrophysics Data System (ADS)

    Wright, Daniel; Yatheendradas, Soni; Peters-Lidard, Christa; Kirschbaum, Dalia; Ayalew, Tibebu; Mantilla, Ricardo; Krajewski, Witold

    2015-04-01

    Progress on the assessment of rainfall-driven hazards such as floods and landslides has been hampered by the challenge of characterizing the frequency, intensity, and structure of extreme rainfall at the watershed or hillslope scale. Conventional approaches rely on simplifying assumptions and are strongly dependent on the location, the availability of long-term rain gage measurements, and the subjectivity of the analyst. Regional and global-scale rainfall remote sensing products provide an alternative, but are limited by relatively short (~15-year) observational records. To overcome this, we have coupled these remote sensing products with a space-time resampling framework known as stochastic storm transposition (SST). SST "lengthens" the rainfall record by resampling from a catalog of observed storms from a user-defined region, effectively recreating the regional extreme rainfall hydroclimate. This coupling has been codified in Rainy Day, a Python-based platform for quickly generating large numbers of probabilistic extreme rainfall "scenarios" at any point on the globe. Rainy Day is readily compatible with any gridded rainfall dataset. The user can optionally incorporate regional rain gage or weather radar measurements for bias correction using the Precipitation Uncertainties for Satellite Hydrology (PUSH) framework. Results from Rainy Day using the CMORPH satellite precipitation product are compared with local observations in two examples. The first example is peak discharge estimation in a medium-sized (~4000 square km) watershed in the central United States performed using CUENCAS, a parsimonious physically-based distributed hydrologic model. The second example is rainfall frequency analysis for Saint Lucia, a small volcanic island in the eastern Caribbean that is prone to landslides and flash floods. The distinct rainfall hydroclimates of the two example sites illustrate the flexibility of the approach and its usefulness for hazard analysis in data-poor regions.

  19. Do pre-trained deep learning models improve computer-aided classification of digital mammograms?

    NASA Astrophysics Data System (ADS)

    Aboutalib, Sarah S.; Mohamed, Aly A.; Zuley, Margarita L.; Berg, Wendie A.; Luo, Yahong; Wu, Shandong

    2018-02-01

    Digital mammography screening is an important exam for the early detection of breast cancer and reduction in mortality. False positives leading to high recall rates, however, results in unnecessary negative consequences to patients and health care systems. In order to better aid radiologists, computer-aided tools can be utilized to improve distinction between image classifications and thus potentially reduce false recalls. The emergence of deep learning has shown promising results in the area of biomedical imaging data analysis. This study aimed to investigate deep learning and transfer learning methods that can improve digital mammography classification performance. In particular, we evaluated the effect of pre-training deep learning models with other imaging datasets in order to boost classification performance on a digital mammography dataset. Two types of datasets were used for pre-training: (1) a digitized film mammography dataset, and (2) a very large non-medical imaging dataset. By using either of these datasets to pre-train the network initially, and then fine-tuning with the digital mammography dataset, we found an increase in overall classification performance in comparison to a model without pre-training, with the very large non-medical dataset performing the best in improving the classification accuracy.

  20. A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics

    PubMed Central

    Fukuchi, Claudiane A.; Duarte, Marcos

    2017-01-01

    Background The goals of this study were (1) to present the set of data evaluating running biomechanics (kinematics and kinetics), including data on running habits, demographics, and levels of muscle strength and flexibility made available at Figshare (DOI: 10.6084/m9.figshare.4543435); and (2) to examine the effect of running speed on selected gait-biomechanics variables related to both running injuries and running economy. Methods The lower-extremity kinematics and kinetics data of 28 regular runners were collected using a three-dimensional (3D) motion-capture system and an instrumented treadmill while the subjects ran at 2.5 m/s, 3.5 m/s, and 4.5 m/s wearing standard neutral shoes. Results A dataset comprising raw and processed kinematics and kinetics signals pertaining to this experiment is available in various file formats. In addition, a file of metadata, including demographics, running characteristics, foot-strike patterns, and muscle strength and flexibility measurements is provided. Overall, there was an effect of running speed on most of the gait-biomechanics variables selected for this study. However, the foot-strike patterns were not affected by running speed. Discussion Several applications of this dataset can be anticipated, including testing new methods of data reduction and variable selection; for educational purposes; and answering specific research questions. This last application was exemplified in the study’s second objective. PMID:28503379

  1. A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics.

    PubMed

    Fukuchi, Reginaldo K; Fukuchi, Claudiane A; Duarte, Marcos

    2017-01-01

    The goals of this study were (1) to present the set of data evaluating running biomechanics (kinematics and kinetics), including data on running habits, demographics, and levels of muscle strength and flexibility made available at Figshare (DOI: 10.6084/m9.figshare.4543435); and (2) to examine the effect of running speed on selected gait-biomechanics variables related to both running injuries and running economy. The lower-extremity kinematics and kinetics data of 28 regular runners were collected using a three-dimensional (3D) motion-capture system and an instrumented treadmill while the subjects ran at 2.5 m/s, 3.5 m/s, and 4.5 m/s wearing standard neutral shoes. A dataset comprising raw and processed kinematics and kinetics signals pertaining to this experiment is available in various file formats. In addition, a file of metadata, including demographics, running characteristics, foot-strike patterns, and muscle strength and flexibility measurements is provided. Overall, there was an effect of running speed on most of the gait-biomechanics variables selected for this study. However, the foot-strike patterns were not affected by running speed. Several applications of this dataset can be anticipated, including testing new methods of data reduction and variable selection; for educational purposes; and answering specific research questions. This last application was exemplified in the study's second objective.

  2. Secondary analysis of national survey datasets.

    PubMed

    Boo, Sunjoo; Froelicher, Erika Sivarajan

    2013-06-01

    This paper describes the methodological issues associated with secondary analysis of large national survey datasets. Issues about survey sampling, data collection, and non-response and missing data in terms of methodological validity and reliability are discussed. Although reanalyzing large national survey datasets is an expedient and cost-efficient way of producing nursing knowledge, successful investigations require a methodological consideration of the intrinsic limitations of secondary survey analysis. Nursing researchers using existing national survey datasets should understand potential sources of error associated with survey sampling, data collection, and non-response and missing data. Although it is impossible to eliminate all potential errors, researchers using existing national survey datasets must be aware of the possible influence of errors on the results of the analyses. © 2012 The Authors. Japan Journal of Nursing Science © 2012 Japan Academy of Nursing Science.

  3. A collection of European sweet cherry phenology data for assessing climate change

    NASA Astrophysics Data System (ADS)

    Wenden, Bénédicte; Campoy, José Antonio; Lecourt, Julien; López Ortega, Gregorio; Blanke, Michael; Radičević, Sanja; Schüller, Elisabeth; Spornberger, Andreas; Christen, Danilo; Magein, Hugo; Giovannini, Daniela; Campillo, Carlos; Malchev, Svetoslav; Peris, José Miguel; Meland, Mekjell; Stehr, Rolf; Charlot, Gérard; Quero-García, José

    2016-12-01

    Professional and scientific networks built around the production of sweet cherry (Prunus avium L.) led to the collection of phenology data for a wide range of cultivars grown in experimental sites characterized by highly contrasted climatic conditions. We present a dataset of flowering and maturity dates, recorded each year for one tree when available, or the average of several trees for each cultivar, over a period of 37 years (1978-2015). Such a dataset is extremely valuable for characterizing the phenological response to climate change, and the plasticity of the different cultivars' behaviour under different environmental conditions. In addition, this dataset will support the development of predictive models for sweet cherry phenology exploitable at the continental scale, and will help anticipate breeding strategies in order to maintain and improve sweet cherry production in Europe.

  4. A collection of European sweet cherry phenology data for assessing climate change.

    PubMed

    Wenden, Bénédicte; Campoy, José Antonio; Lecourt, Julien; López Ortega, Gregorio; Blanke, Michael; Radičević, Sanja; Schüller, Elisabeth; Spornberger, Andreas; Christen, Danilo; Magein, Hugo; Giovannini, Daniela; Campillo, Carlos; Malchev, Svetoslav; Peris, José Miguel; Meland, Mekjell; Stehr, Rolf; Charlot, Gérard; Quero-García, José

    2016-12-06

    Professional and scientific networks built around the production of sweet cherry (Prunus avium L.) led to the collection of phenology data for a wide range of cultivars grown in experimental sites characterized by highly contrasted climatic conditions. We present a dataset of flowering and maturity dates, recorded each year for one tree when available, or the average of several trees for each cultivar, over a period of 37 years (1978-2015). Such a dataset is extremely valuable for characterizing the phenological response to climate change, and the plasticity of the different cultivars' behaviour under different environmental conditions. In addition, this dataset will support the development of predictive models for sweet cherry phenology exploitable at the continental scale, and will help anticipate breeding strategies in order to maintain and improve sweet cherry production in Europe.

  5. A collection of European sweet cherry phenology data for assessing climate change

    PubMed Central

    Wenden, Bénédicte; Campoy, José Antonio; Lecourt, Julien; López Ortega, Gregorio; Blanke, Michael; Radičević, Sanja; Schüller, Elisabeth; Spornberger, Andreas; Christen, Danilo; Magein, Hugo; Giovannini, Daniela; Campillo, Carlos; Malchev, Svetoslav; Peris, José Miguel; Meland, Mekjell; Stehr, Rolf; Charlot, Gérard; Quero-García, José

    2016-01-01

    Professional and scientific networks built around the production of sweet cherry (Prunus avium L.) led to the collection of phenology data for a wide range of cultivars grown in experimental sites characterized by highly contrasted climatic conditions. We present a dataset of flowering and maturity dates, recorded each year for one tree when available, or the average of several trees for each cultivar, over a period of 37 years (1978–2015). Such a dataset is extremely valuable for characterizing the phenological response to climate change, and the plasticity of the different cultivars’ behaviour under different environmental conditions. In addition, this dataset will support the development of predictive models for sweet cherry phenology exploitable at the continental scale, and will help anticipate breeding strategies in order to maintain and improve sweet cherry production in Europe. PMID:27922629

  6. An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

    PubMed

    Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

    2015-09-01

    Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.

  7. Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project

    PubMed Central

    Boubela, Roland N.; Kalcher, Klaudius; Huf, Wolfgang; Našel, Christian; Moser, Ewald

    2016-01-01

    Technologies for scalable analysis of very large datasets have emerged in the domain of internet computing, but are still rarely used in neuroimaging despite the existence of data and research questions in need of efficient computation tools especially in fMRI. In this work, we present software tools for the application of Apache Spark and Graphics Processing Units (GPUs) to neuroimaging datasets, in particular providing distributed file input for 4D NIfTI fMRI datasets in Scala for use in an Apache Spark environment. Examples for using this Big Data platform in graph analysis of fMRI datasets are shown to illustrate how processing pipelines employing it can be developed. With more tools for the convenient integration of neuroimaging file formats and typical processing steps, big data technologies could find wider endorsement in the community, leading to a range of potentially useful applications especially in view of the current collaborative creation of a wealth of large data repositories including thousands of individual fMRI datasets. PMID:26778951

  8. A secure and efficiently searchable health information architecture.

    PubMed

    Yasnoff, William A

    2016-06-01

    Patient-centric repositories of health records are an important component of health information infrastructure. However, patient information in a single repository is potentially vulnerable to loss of the entire dataset from a single unauthorized intrusion. A new health record storage architecture, the personal grid, eliminates this risk by separately storing and encrypting each person's record. The tradeoff for this improved security is that a personal grid repository must be sequentially searched since each record must be individually accessed and decrypted. To allow reasonable search times for large numbers of records, parallel processing with hundreds (or even thousands) of on-demand virtual servers (now available in cloud computing environments) is used. Estimated search times for a 10 million record personal grid using 500 servers vary from 7 to 33min depending on the complexity of the query. Since extremely rapid searching is not a critical requirement of health information infrastructure, the personal grid may provide a practical and useful alternative architecture that eliminates the large-scale security vulnerabilities of traditional databases by sacrificing unnecessary searching speed. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. Computer aided manual validation of mass spectrometry-based proteomic data.

    PubMed

    Curran, Timothy G; Bryson, Bryan D; Reigelhaupt, Michael; Johnson, Hannah; White, Forest M

    2013-06-15

    Advances in mass spectrometry-based proteomic technologies have increased the speed of analysis and the depth provided by a single analysis. Computational tools to evaluate the accuracy of peptide identifications from these high-throughput analyses have not kept pace with technological advances; currently the most common quality evaluation methods are based on statistical analysis of the likelihood of false positive identifications in large-scale data sets. While helpful, these calculations do not consider the accuracy of each identification, thus creating a precarious situation for biologists relying on the data to inform experimental design. Manual validation is the gold standard approach to confirm accuracy of database identifications, but is extremely time-intensive. To palliate the increasing time required to manually validate large proteomic datasets, we provide computer aided manual validation software (CAMV) to expedite the process. Relevant spectra are collected, catalogued, and pre-labeled, allowing users to efficiently judge the quality of each identification and summarize applicable quantitative information. CAMV significantly reduces the burden associated with manual validation and will hopefully encourage broader adoption of manual validation in mass spectrometry-based proteomics. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Examining global extreme sea level variations on the coast from in-situ and remote observations

    NASA Astrophysics Data System (ADS)

    Menendez, Melisa; Benkler, Anna S.

    2017-04-01

    The estimation of extreme water level values on the coast is a requirement for a wide range of engineering and coastal management applications. In addition, climate variations of extreme sea levels on the coastal area result from a complex interacting of oceanic, atmospheric and terrestrial processes across a wide range of spatial and temporal scales. In this study, variations of extreme sea level return values are investigated from two available sources of information: in-situ tide-gauge records and satellite altimetry data. Long time series of sea level from tide-gauge records are the most valuable observations since they directly measure water level in a specific coastal location. They have however a number of sources of in-homogeneities that may affect the climate description of extremes when this data source is used. Among others, the presence of gaps, historical time in-homogeneities and jumps in the mean sea level signal are factors that can provide uncertainty in the characterization of the extreme sea level behaviour. Moreover, long records from tide-gauges are sparse and there are many coastal areas worldwide without in-situ available information. On the other hand, with the accumulating altimeter records of several satellite missions from the 1990s, approaching 25 recorded years at the time of writing, it is becoming possible the analysis of extreme sea level events from this data source. Aside the well-known issue of altimeter measurements very close to the coast (mainly due to corruption by land, wet troposphere path delay errors and local tide effects on the coastal area), there are other aspects that have to be considered when sea surface height values estimated from satellite are going to be used in a statistical extreme model, such as the use of a multi-mission product to get long observed periods and the selection of the maxima sample, since altimeter observations do not provide values uniform in time and space. Here, we have compared the extreme values of 'still water level' and 'non-tidal-residual' of in-situ records from the GESLA2 dataset (Woodworth et al. 2016) against the novel coastal altimetry datasets (Cipollini et al. 2016). Seasonal patterns, inter-annual variability and long-term trends are analyzed. Then, a time-dependent extreme model (Menendez et al. 2009) is applied to characterize extreme sea level return values and their variability on the coastal area around the world.

  11. Uvf - Unified Volume Format: A General System for Efficient Handling of Large Volumetric Datasets.

    PubMed

    Krüger, Jens; Potter, Kristin; Macleod, Rob S; Johnson, Christopher

    2008-01-01

    With the continual increase in computing power, volumetric datasets with sizes ranging from only a few megabytes to petascale are generated thousands of times per day. Such data may come from an ordinary source such as simple everyday medical imaging procedures, while larger datasets may be generated from cluster-based scientific simulations or measurements of large scale experiments. In computer science an incredible amount of work worldwide is put into the efficient visualization of these datasets. As researchers in the field of scientific visualization, we often have to face the task of handling very large data from various sources. This data usually comes in many different data formats. In medical imaging, the DICOM standard is well established, however, most research labs use their own data formats to store and process data. To simplify the task of reading the many different formats used with all of the different visualization programs, we present a system for the efficient handling of many types of large scientific datasets (see Figure 1 for just a few examples). While primarily targeted at structured volumetric data, UVF can store just about any type of structured and unstructured data. The system is composed of a file format specification with a reference implementation of a reader. It is not only a common, easy to implement format but also allows for efficient rendering of most datasets without the need to convert the data in memory.

  12. The multiple imputation method: a case study involving secondary data analysis.

    PubMed

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  13. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

    PubMed

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.

  14. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

    PubMed Central

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  15. Evaluation of climatic changes in South-Asia

    NASA Astrophysics Data System (ADS)

    Kjellstrom, Erik; Rana, Arun; Grigory, Nikulin; Renate, Wilcke; Hansson, Ulf; Kolax, Michael

    2016-04-01

    Literature has sufficient evidences of climate change impact all over the world and its impact on various sectors. In light of new advancements made in climate modeling, availability of several climate downscaling approaches, the more robust bias correction methods with varying complexities and strengths, in the present study we performed a systematic evaluation of climate change impact over South-Asia region. We have used different Regional Climate Models (RCMs) (from CORDEX domain), (Global Climate Models GCMs) and gridded observations for the study area to evaluate the models in historical/control period (1980-2010) and changes in future period (2010-2099). Firstly, GCMs and RCMs are evaluated against the Gridded observational datasets in the area using precipitation and temperature as indicative variables. Observational dataset are also evaluated against the reliable set of observational dataset, as pointed in literature. Bias, Correlation, and changes (among other statistical measures) are calculated for the entire region and both the variables. Eventually, the region was sub-divided into various smaller domains based on homogenous precipitation zones to evaluate the average changes over time period. Spatial and temporal changes for the region are then finally calculated to evaluate the future changes in the region. Future changes are calculated for 2 Representative Concentration Pathways (RCPs), the middle emission (RCP4.5) and high emission (RCP8.5) and for both climatic variables, precipitation and temperature. Lastly, Evaluation of Extremes is performed based on precipitation and temperature based indices for whole region in future dataset. Results have indicated that the whole study region is under extreme stress in future climate scenarios for both climatic variables i.e. precipitation and temperature. Precipitation variability is dependent on the location in the area leading to droughts and floods in various regions in future. Temperature is hinting towards a constant increase throughout the region regardless of location.

  16. Deep learning-based fine-grained car make/model classification for visual surveillance

    NASA Astrophysics Data System (ADS)

    Gundogdu, Erhan; Parıldı, Enes Sinan; Solmaz, Berkan; Yücesoy, Veysel; Koç, Aykut

    2017-10-01

    Fine-grained object recognition is a potential computer vision problem that has been recently addressed by utilizing deep Convolutional Neural Networks (CNNs). Nevertheless, the main disadvantage of classification methods relying on deep CNN models is the need for considerably large amount of data. In addition, there exists relatively less amount of annotated data for a real world application, such as the recognition of car models in a traffic surveillance system. To this end, we mainly concentrate on the classification of fine-grained car make and/or models for visual scenarios by the help of two different domains. First, a large-scale dataset including approximately 900K images is constructed from a website which includes fine-grained car models. According to their labels, a state-of-the-art CNN model is trained on the constructed dataset. The second domain that is dealt with is the set of images collected from a camera integrated to a traffic surveillance system. These images, which are over 260K, are gathered by a special license plate detection method on top of a motion detection algorithm. An appropriately selected size of the image is cropped from the region of interest provided by the detected license plate location. These sets of images and their provided labels for more than 30 classes are employed to fine-tune the CNN model which is already trained on the large scale dataset described above. To fine-tune the network, the last two fully-connected layers are randomly initialized and the remaining layers are fine-tuned in the second dataset. In this work, the transfer of a learned model on a large dataset to a smaller one has been successfully performed by utilizing both the limited annotated data of the traffic field and a large scale dataset with available annotations. Our experimental results both in the validation dataset and the real field show that the proposed methodology performs favorably against the training of the CNN model from scratch.

  17. Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

    PubMed

    Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

    2015-01-01

    Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.

  18. Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

    PubMed Central

    Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

    2015-01-01

    Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133

  19. Applying complex networks to evaluate precipitation patterns over South America

    NASA Astrophysics Data System (ADS)

    Ciemer, Catrin; Boers, Niklas; Barbosa, Henrique; Kurths, Jürgen; Rammig, Anja

    2016-04-01

    The climate of South America exhibits pronounced differences between the wet- and the dry-season, which are accompanied by specific synoptic events like changes in the location of the South American Low Level Jet (SALLJ) and the establishment of the South American Convergence Zone (SACZ). The onset of these events can be related to the presence of typical large-scale precipitation patterns over South America, as previous studies have shown[1,2]. The application of complex network methods to precipitation data recently received increased scientific attention for the special case of extreme events, as it is possible with such methods to analyze the spatiotemporal correlation structure as well as possible teleconnections of these events[3,4]. In these approaches the correlation between precipitation datasets is calculated by means of Event Synchronization which restricts their applicability to extreme precipitation events. In this work, we propose a method which is able to consider not only extreme precipitation but complete time series. A direct application of standard similarity measures in order to correlate precipitation time series is impossible due to their intricate statistical properties as the large amount of zeros. Therefore, we introduced and evaluated a suitable modification of Pearson's correlation coefficient to construct spatial correlation networks of precipitation. By analyzing the characteristics of spatial correlation networks constructed on the basis of this new measure, we are able to determine coherent areas of similar precipitation patterns, spot teleconnections of correlated areas, and detect central regions for precipitation correlation. By analyzing the change of the network over the year[5], we are also able to determine local and global changes in precipitation correlation patterns. Additionally, global network characteristics as the network connectivity yield indications for beginning and end of wet- and dry season. In order to identify large-scale synoptic events like the SACZ and SALLJ onset, detecting the changes of correlation over time between certain regions is of significant relevance. [1] Nieto-Ferreira et al. Quarterly Journal of the Royal Meteorological Society (2011) [2] Vera et al. Bulletin of the American Meteorological Society (2006) [3] Quiroga et al. Physical review E (2002) [4] Boers et al. nature communications (2014) [5] Radebach et al. Physical review E (2013)

  20. Simulation of extreme rainfall and projection of future changes using the GLIMCLIM model

    NASA Astrophysics Data System (ADS)

    Rashid, Md. Mamunur; Beecham, Simon; Chowdhury, Rezaul Kabir

    2017-10-01

    In this study, the performance of the Generalized LInear Modelling of daily CLImate sequence (GLIMCLIM) statistical downscaling model was assessed to simulate extreme rainfall indices and annual maximum daily rainfall (AMDR) when downscaled daily rainfall from National Centers for Environmental Prediction (NCEP) reanalysis and Coupled Model Intercomparison Project Phase 5 (CMIP5) general circulation models (GCM) (four GCMs and two scenarios) output datasets and then their changes were estimated for the future period 2041-2060. The model was able to reproduce the monthly variations in the extreme rainfall indices reasonably well when forced by the NCEP reanalysis datasets. Frequency Adapted Quantile Mapping (FAQM) was used to remove bias in the simulated daily rainfall when forced by CMIP5 GCMs, which reduced the discrepancy between observed and simulated extreme rainfall indices. Although the observed AMDR were within the 2.5th and 97.5th percentiles of the simulated AMDR, the model consistently under-predicted the inter-annual variability of AMDR. A non-stationary model was developed using the generalized linear model for local, shape and scale to estimate the AMDR with an annual exceedance probability of 0.01. The study shows that in general, AMDR is likely to decrease in the future. The Onkaparinga catchment will also experience drier conditions due to an increase in consecutive dry days coinciding with decreases in heavy (>long term 90th percentile) rainfall days, empirical 90th quantile of rainfall and maximum 5-day consecutive total rainfall for the future period (2041-2060) compared to the base period (1961-2000).

  1. Large-scale machine learning and evaluation platform for real-time traffic surveillance

    NASA Astrophysics Data System (ADS)

    Eichel, Justin A.; Mishra, Akshaya; Miller, Nicholas; Jankovic, Nicholas; Thomas, Mohan A.; Abbott, Tyler; Swanson, Douglas; Keller, Joel

    2016-09-01

    In traffic engineering, vehicle detectors are trained on limited datasets, resulting in poor accuracy when deployed in real-world surveillance applications. Annotating large-scale high-quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud-based positive and negative mining process and a large-scale learning and evaluation system for the application of automatic traffic measurements and classification. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on 1,000,000 Haar-like features extracted from 70,000 annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least 95% for 1/2 and about 78% for 19/20 of the time when tested on ˜7,500,000 video frames. At the end of 2016, the dataset is expected to have over 1 billion annotated video frames.

  2. Transforming the Geocomputational Battlespace Framework with HDF5

    DTIC Science & Technology

    2010-08-01

    layout level, dataset arrays can be stored in chunks or tiles , enabling fast subsetting of large datasets, including compressed datasets. HDF software...Image Base (CIB) image of the AOI: an orthophoto made from rectified grayscale aerial images b. An IKONOS satellite image made up of 3 spectral

  3. Design and interpretation of cell trajectory assays

    PubMed Central

    Bowden, Lucie G.; Simpson, Matthew J.; Baker, Ruth E.

    2013-01-01

    Cell trajectory data are often reported in the experimental cell biology literature to distinguish between different types of cell migration. Unfortunately, there is no accepted protocol for designing or interpreting such experiments and this makes it difficult to quantitatively compare different published datasets and to understand how changes in experimental design influence our ability to interpret different experiments. Here, we use an individual-based mathematical model to simulate the key features of a cell trajectory experiment. This shows that our ability to correctly interpret trajectory data is extremely sensitive to the geometry and timing of the experiment, the degree of motility bias and the number of experimental replicates. We show that cell trajectory experiments produce data that are most reliable when the experiment is performed in a quasi-one-dimensional geometry with a large number of identically prepared experiments conducted over a relatively short time-interval rather than a few trajectories recorded over particularly long time-intervals. PMID:23985736

  4. Machine Learning Toolkit for Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less

  5. Bringing an ecological view of change to Landsat-based remote sensing

    USGS Publications Warehouse

    Kennedy, Robert E.; Andrefouet, Serge; Cohen, Warren; Gomez, Cristina; Griffiths, Patrick; Hais, Martin; Healey, Sean; Helmer, Eileen H.; Hostert, Patrick; Lyons, Mitchell; Meigs, Garrett; Pflugmacher, Dirk; Phinn, Stuart; Powell, Scott; Scarth, Peter; Susmita, Sen; Schroeder, Todd A.; Schneider, Annemarie; Sonnenschein, Ruth; Vogelmann, James; Wulder, Michael A.; Zhu, Zhe

    2014-01-01

    When characterizing the processes that shape ecosystems, ecologists increasingly use the unique perspective offered by repeat observations of remotely sensed imagery. However, the concept of change embodied in much of the traditional remote-sensing literature was primarily limited to capturing large or extreme changes occurring in natural systems, omitting many more subtle processes of interest to ecologists. Recent technical advances have led to a fundamental shift toward an ecological view of change. Although this conceptual shift began with coarser-scale global imagery, it has now reached users of Landsat imagery, since these datasets have temporal and spatial characteristics appropriate to many ecological questions. We argue that this ecologically relevant perspective of change allows the novel characterization of important dynamic processes, including disturbances, long-term trends, cyclical functions, and feedbacks, and that these improvements are already facilitating our understanding of critical driving forces, such as climate change, ecological interactions, and economic pressures.

  6. Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

    PubMed

    Keuleers, Emmanuel; Balota, David A

    2015-01-01

    This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.

  7. Sleep stages identification in patients with sleep disorder using k-means clustering

    NASA Astrophysics Data System (ADS)

    Fadhlullah, M. U.; Resahya, A.; Nugraha, D. F.; Yulita, I. N.

    2018-05-01

    Data mining is a computational intelligence discipline where a large dataset processed using a certain method to look for patterns within the large dataset. This pattern then used for real time application or to develop some certain knowledge. This is a valuable tool to solve a complex problem, discover new knowledge, data analysis and decision making. To be able to get the pattern that lies inside the large dataset, clustering method is used to get the pattern. Clustering is basically grouping data that looks similar so a certain pattern can be seen in the large data set. Clustering itself has several algorithms to group the data into the corresponding cluster. This research used data from patients who suffer sleep disorders and aims to help people in the medical world to reduce the time required to classify the sleep stages from a patient who suffers from sleep disorders. This study used K-Means algorithm and silhouette evaluation to find out that 3 clusters are the optimal cluster for this dataset which means can be divided to 3 sleep stages.

  8. Image segmentation evaluation for very-large datasets

    NASA Astrophysics Data System (ADS)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  9. Numericware i: Identical by State Matrix Calculator

    PubMed Central

    Kim, Bongsong; Beavis, William D

    2017-01-01

    We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db. PMID:28469375

  10. Large-Scale Pattern Discovery in Music

    NASA Astrophysics Data System (ADS)

    Bertin-Mahieux, Thierry

    This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent.

  11. LIPS database with LIPService: a microscopic image database of intracellular structures in Arabidopsis guard cells.

    PubMed

    Higaki, Takumi; Kutsuna, Natsumaro; Hasezawa, Seiichiro

    2013-05-16

    Intracellular configuration is an important feature of cell status. Recent advances in microscopic imaging techniques allow us to easily obtain a large number of microscopic images of intracellular structures. In this circumstance, automated microscopic image recognition techniques are of extreme importance to future phenomics/visible screening approaches. However, there was no benchmark microscopic image dataset for intracellular organelles in a specified plant cell type. We previously established the Live Images of Plant Stomata (LIPS) database, a publicly available collection of optical-section images of various intracellular structures of plant guard cells, as a model system of environmental signal perception and transduction. Here we report recent updates to the LIPS database and the establishment of a database table, LIPService. We updated the LIPS dataset and established a new interface named LIPService to promote efficient inspection of intracellular structure configurations. Cell nuclei, microtubules, actin microfilaments, mitochondria, chloroplasts, endoplasmic reticulum, peroxisomes, endosomes, Golgi bodies, and vacuoles can be filtered using probe names or morphometric parameters such as stomatal aperture. In addition to the serial optical sectional images of the original LIPS database, new volume-rendering data for easy web browsing of three-dimensional intracellular structures have been released to allow easy inspection of their configurations or relationships with cell status/morphology. We also demonstrated the utility of the new LIPS image database for automated organelle recognition of images from another plant cell image database with image clustering analyses. The updated LIPS database provides a benchmark image dataset for representative intracellular structures in Arabidopsis guard cells. The newly released LIPService allows users to inspect the relationship between organellar three-dimensional configurations and morphometrical parameters.

  12. Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences

    PubMed Central

    2013-01-01

    Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002

  13. Clock Agreement Among Parallel Supercomputer Nodes

    DOE Data Explorer

    Jones, Terry R.; Koenig, Gregory A.

    2014-04-30

    This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.

  14. SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

    PubMed

    Yu, Qiang; Wei, Dingbang; Huo, Hongwei

    2018-06-18

    Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.

  15. Low fidelity of CORDEX and their driving experiments indicates future climatic uncertainty over Himalayan watersheds of Indus basin

    NASA Astrophysics Data System (ADS)

    Hasson, Shabeh ul; Böhner, Jürgen; Chishtie, Farrukh

    2018-03-01

    Assessment of future water availability from the Himalayan watersheds of Indus Basin (Jhelum, Kabul and upper Indus basin—UIB) is a growing concern for safeguarding the sustainable socioeconomic wellbeing downstream. This requires, before all, robust climate change information from the present-day state-of-the-art climate models. However, the robustness of climate change projections highly depends upon the fidelity of climate modeling experiments. Hence, this study assesses the fidelity of seven dynamically refined (0.44° ) experiments, performed under the framework of the coordinated regional climate downscaling experiment for South Asia (CX-SA), and additionally, their six coarse-resolution driving datasets participating in the coupled model intercomparison project phase 5 (CMIP5). We assess fidelity in terms of reproducibility of the observed climatology of temperature and precipitation, and the seasonality of the latter for the historical period (1971-2005). Based on the model fidelity results, we further assess the robustness or uncertainty of the far future climate (2061-2095), as projected under the extreme-end warming scenario of the representative concentration pathway (RCP) 8.5. Our results show that the CX-SA and their driving CMIP5 experiments consistently feature low fidelity in terms of the chosen skill metrics, suggesting substantial cold (6-10 ° C) and wet (up to 80%) biases and underestimation of observed precipitation seasonality. Surprisingly, the CX-SA are unable to outperform their driving datasets. Further, the biases of CX-SA and of their driving CMIP5 datasets are higher in magnitude than their projected changes under RCP8.5—and hence under less extreme RCPs—by the end of 21st century, indicating uncertain future climates for the Indus Basin watersheds. Higher inter-dataset disagreements of both CMIP5 and CX-SA for their simulated historical precipitation and for its projected changes reinforce uncertain future wet/dry conditions whereas the CMIP5 projected warming is less robust owing to higher historical period uncertainty. Interestingly, a better agreement among those CX-SA experiments that have been obtained through downscaling different CMIP5 experiments with the same regional climate model (RCM) indicates the RCMs' ability of modulating the influence of lateral boundary conditions over a large domain. These findings, instead of suggesting the usual skill-based identification of 'reasonable' global or regional low fidelity experiments, rather emphasize on a paradigm shift towards improving their fidelity by exploiting the potential of meso-to-local scale climate models—preferably of those that can solely resolve global-to-local scale climatic processes—in terms of microphysics, resolution and explicitly resolved convections. Additionally, an extensive monitoring of the nival regime within the Himalayan watersheds will reduce the observational uncertainty, allowing for a more robust fidelity assessment of the climate modeling experiments.

  16. Evaluation and adjustment of altimeter measurement and numerical hindcast in wave height trend estimation in China's coastal seas

    NASA Astrophysics Data System (ADS)

    Li, Shuiqing; Guan, Shoude; Hou, Yijun; Liu, Yahao; Bi, Fan

    2018-05-01

    A long-term trend of significant wave height (SWH) in China's coastal seas was examined based on three datasets derived from satellite measurements and numerical hindcasts. One set of altimeter data were obtained from the GlobWave, while the other two datasets of numerical hindcasts were obtained from the third-generation wind wave model, WAVEWATCH III, forced by wind fields from the Cross-Calibrated Multi-Platform (CCMP) and NCEP's Climate Forecast System Reanalysis (CFSR). The mean and extreme wave trends were estimated for the period 1992-2010 with respect to the annual mean and the 99th-percentile values of SWH, respectively. The altimeter wave trend estimates feature considerable uncertainties owing to the sparse sampling rate. Furthermore, the extreme wave trend tends to be overestimated because of the increasing sampling rate over time. Numerical wave trends strongly depend on the quality of the wind fields, as the CCMP waves significantly overestimate the wave trend, whereas the CFSR waves tend to underestimate the trend. Corresponding adjustments were applied which effectively improved the trend estimates from the altimeter and numerical data. The adjusted results show generally increasing mean wave trends, while the extreme wave trends are more spatially-varied, from decreasing trends prevailing in the South China Sea to significant increasing trends mainly in the East China Sea.

  17. An Examination of the Hadley Sea-Surface Temperature Time Series for the Nino 3.4 Region

    NASA Technical Reports Server (NTRS)

    Wilson, Robert M.

    2010-01-01

    The Hadley sea-surface temperature (HadSST) dataset is investigated for the interval 1871-2008. The purpose of this investigation is to determine the degree of success in identifying and characterizing El Nino (EN) southern (ENSO) extreme events, both EN and La Nina (LN) events. Comparisons are made against both the Southern Oscillation Index for the same time interval and with published values of the Oceanic Nino Index for the interval since 1950. Some 60 ENSO extreme events are identified in the HadSST dataset, consisting of 33 EN and 27 LN events. Also, preferential associations are found to exist between the duration of ENSO extreme events and their maximum anomalous excursion temperatures and between the recurrence rate for an EN event and the duration of the last known EN event. Because the present ongoing EN is a strong event, it should persist 11 months or longer, inferring that the next EN event should not be expected until June 2012 or later. Furthermore, the decadal sum of EN-related months is found to have increased somewhat steadily since the decade of 1920-1929, suggesting that the present decade (2010-2019) possibly will see about 3-4 EN events, totaling about 37 +/- 3 EN-related months (i.e., months that meet the definition for the occurrence of an EN event).

  18. -A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome.

    PubMed

    Mackeh, Rafah; Boughorbel, Sabri; Chaussabel, Damien; Kino, Tomoshige

    2017-01-01

    The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB), a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp.

  19. ­A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome

    PubMed Central

    Mackeh, Rafah; Boughorbel, Sabri; Chaussabel, Damien; Kino, Tomoshige

    2017-01-01

    The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB), a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp. PMID:28413616

  20. Global patterns of drought recovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schwalm, Christopher R.; Anderegg, William R. L.; Michalak, Anna M.

    Drought is a recurring multi-factor phenomenon with major impacts on natural and human systems1-3. Drought is especially important for land carbon sink variability, influencing climate regulation of the terrestrial biosphere4. While 20th Century trends in drought regime are ambiguous, “more extreme extremes” as well as more frequent and severe droughts3,7 are expected in the 21st Century. Recovery time, the length of time an ecosystem requires to revert to its pre-drought functional state, is a critical metric of drought impact. Yet the spatiotemporal patterning and controls of drought recovery are largely unknown. Here we use three distinct global datasets of grossmore » primary productivity to show that across diverse terrestrial ecosystems drought recovery times are driven by biological productivity and biodiversity, with drought length and severity of secondary importance. Recovery time, especially for extreme droughts, and the areal extent of ecosystems in recovery from drought generally increase over the 20th Century, supporting an increase globally in drought impact8. Our results indicate that if future Anthropocene droughts become more widespread as expected, that droughts will become more frequent relative to recovery time. This increases the risk of entering a new regime where vegetation never recovers to its original state and widespread degradation of the land carbon sink ensues.« less

  1. Extreme coastal erosion enhanced by anomalous extratropical storm wave direction.

    PubMed

    Harley, Mitchell D; Turner, Ian L; Kinsela, Michael A; Middleton, Jason H; Mumford, Peter J; Splinter, Kristen D; Phillips, Matthew S; Simmons, Joshua A; Hanslow, David J; Short, Andrew D

    2017-07-20

    Extratropical cyclones (ETCs) are the primary driver of large-scale episodic beach erosion along coastlines in temperate regions. However, key drivers of the magnitude and regional variability in rapid morphological changes caused by ETCs at the coast remain poorly understood. Here we analyze an unprecedented dataset of high-resolution regional-scale morphological response to an ETC that impacted southeast Australia, and evaluate the new observations within the context of an existing long-term coastal monitoring program. This ETC was characterized by moderate intensity (for this regional setting) deepwater wave heights, but an anomalous wave direction approximately 45 degrees more counter-clockwise than average. The magnitude of measured beach volume change was the largest in four decades at the long-term monitoring site and, at the regional scale, commensurate with that observed due to extreme North Atlantic hurricanes. Spatial variability in morphological response across the study region was predominantly controlled by alongshore gradients in storm wave energy flux and local coastline alignment relative to storm wave direction. We attribute the severity of coastal erosion observed due to this ETC primarily to its anomalous wave direction, and call for greater research on the impacts of changing storm wave directionality in addition to projected future changes in wave heights.

  2. Baseflow recession analysis across the Eagle Ford shale play (Texas, USA)

    NASA Astrophysics Data System (ADS)

    Arciniega, Saul; Brena-Naranjo, Agustin; Hernandez-Espriu, Jose Antonio; Pedrozo-Acuña, Adrian

    2016-04-01

    Baseflow is an important process of the hydrological cycle as it can be related to aquatic ecosystem health and groundwater recharge. The temporal and spatial dynamics of baseflow are typically governed by fluctuations in the water table of shallow aquifers hence groundwater pumping and return flow can greatly modify baseflow patterns. More recently, in some regions of the world the exploitation of gas trapped in shale formations by means of hydraulic fracturing (fracking) has raised major concerns on the quantitative and qualitative groundwater impacts. Although fracking implies massive amounts of groundwater withdrawals, its contribution on baseflow decline has not yet been fully investigated. Furthermore, its impact with respect to other human activities or climate extremes such as irrigation or extreme droughts, respectively, remain largely unknown. This work analyzes baseflow recession time-space patterns for a set of watersheds located across the largest shale producer in the world, the Eagle Ford shale play in Texas (USA). The period of study (1985-2014) includes a pre-development and post-development period. The dataset includes 56 hydrometric time series located inside and outside the shale play. Results show that during the development and expansion of the Eagle Ford play, around 70 % of the time series displayed a significant decline wheras no decline was observed during the pre-development)

  3. Projected changes of extreme precipitation over Contiguous United States with Nested regional climate model (NRCM)

    NASA Astrophysics Data System (ADS)

    Wang, J.

    2013-12-01

    Extreme weather events have already significantly influenced North America. During 2005-2011, the extreme events have increased by 250 %, from four or fewer events occurring in 2005, while 14 events occurring in 2011 (www.ncdc.noaa.gov/billions/). In addition, extreme rainfall amounts, frequency, and intensity were all expected to increase under greenhouse warming scenarios (Wehner 2005; Kharin et al. 2007; Tebaldi et al. 2006). Global models are powerful tools to investigate the climate and climate change on large scales. However, such models do not represent local terrain and mesoscale weather systems well owing to their coarse horizontal resolution (150-300 km). To capture the fine-scale features of extreme weather events, regional climate models (RCMs) with a more realistic representation of the complex terrain and heterogeneous land surfaces are needed (Mass et al. 2002). This study uses the Nested Regional Climate model (NRCM) to perform regional scale climate simulations on a 12-km × 12-km high resolution scale over North America (including Alaska; with 600 × 515 grid cells at longitude and latitude), known as CORDEX_North America, instead of small regions as studied previously (eg., Dominguez et al. 2012; Gao et al. 2012). The performance and the biases of the NRCM extreme precipitation calculations (2000-2010) have been evaluated with PRISM precipitation (Daly et al. 1997) by Wang and Kotamarthi (2013): the NRCM replicated very well the monthly amount of extreme precipitation with less than 3% overestimation over East CONUS, and the frequency of extremes over West CONUS and upper Mississippi River Basin. The Representative Concentration Pathway (RCP) 8.5 and RCP 4.5 from the new Community Earth System Model version 1.0 (CESM v1.0) are dynamically downscaled to predict the extreme rainfall events at the end-of-century (2085-2095) and to explore the uncertainties of future extreme precipitation induced by different scenarios over distinct regions. We have corrected the CO2 atmospheric concentration in the longwave/shortwave radiation schemes of the NRCM according to the recommended datasets by CMIP5 (Clarke et al. 2007; Riahi et al. 2007). We have also corrected an inconsistency in skin temperature during the downscaling process by modifying the land/sea mask of CLM 4.0 as mentioned by Gao et al. (2012). Acknowledgements: This work was supported under a military interdepartmental purchase request from the SERDP, RC-2242, through U.S. Department of Energy contract DE-AC02-06CH11357.

  4. Associations between Smoking and Extreme Dieting among Adolescents

    ERIC Educational Resources Information Center

    Seo, Dong-Chul; Jiang, Nan

    2009-01-01

    This study examined the association between cigarette smoking and dieting behaviors and trends in that association among US adolescents in grades 9-12 between 1999 and 2007. Youth Risk Behavior Survey datasets were analyzed using the multivariable logistic regression method. The sample size of each survey year ranged from 13,554 to 15,273 with…

  5. Academic Experiences of War-Zone Students in Canada

    ERIC Educational Resources Information Center

    Stermac, Lana; Elgie, Susan; Clarke, Allyson; Dunlap, Hester

    2012-01-01

    This research examined educational outcomes and experiences of late adolescent immigrant students who entered the Canadian educational system following residence in global war-zone regions or areas of extreme civil unrest. Data from a Statistics Canada data-set of 18- to 20-year-old respondents (N = 658) were used to compare the academic…

  6. Atlas-guided cluster analysis of large tractography datasets.

    PubMed

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.

  7. Extreme learning machine for ranking: generalization analysis and applications.

    PubMed

    Chen, Hong; Peng, Jiangtao; Zhou, Yicong; Li, Luoqing; Pan, Zhibin

    2014-05-01

    The extreme learning machine (ELM) has attracted increasing attention recently with its successful applications in classification and regression. In this paper, we investigate the generalization performance of ELM-based ranking. A new regularized ranking algorithm is proposed based on the combinations of activation functions in ELM. The generalization analysis is established for the ELM-based ranking (ELMRank) in terms of the covering numbers of hypothesis space. Empirical results on the benchmark datasets show the competitive performance of the ELMRank over the state-of-the-art ranking methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Approximating the Generalized Voronoi Diagram of Closely Spaced Objects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, John; Daniel, Eric; Pascucci, Valerio

    2015-06-22

    We present an algorithm to compute an approximation of the generalized Voronoi diagram (GVD) on arbitrary collections of 2D or 3D geometric objects. In particular, we focus on datasets with closely spaced objects; GVD approximation is expensive and sometimes intractable on these datasets using previous algorithms. With our approach, the GVD can be computed using commodity hardware even on datasets with many, extremely tightly packed objects. Our approach is to subdivide the space with an octree that is represented with an adjacency structure. We then use a novel adaptive distance transform to compute the distance function on octree vertices. Themore » computed distance field is sampled more densely in areas of close object spacing, enabling robust and parallelizable GVD surface generation. We demonstrate our method on a variety of data and show example applications of the GVD in 2D and 3D.« less

  9. Final Technical Report for Collaborative Research: Developing and Implementing Ocean-Atmosphere Reanalyses for Climate Applications (OARCA)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Compo, Gilbert P

    As an important step toward a coupled data assimilation system for generating reanalysis fields needed to assess climate model projections, the Ocean Atmosphere Coupled Reanalysis for Climate Applications (OARCA) project assesses and improves the longest reanalyses currently available of the atmosphere and ocean: the 20th Century Reanalysis Project (20CR) and the Simple Ocean Data Assimilation with sparse observational input (SODAsi) system, respectively. In this project, we make off-line but coordinated improvements in the 20CR and SODAsi datasets, with improvements in one feeding into improvements of the other through an iterative generation of new versions. These datasets now span from themore » 19th to 21st centuries. We then study the extreme weather and variability from days to decades of the resulting datasets. A total of 24 publications have been produced in this project.« less

  10. Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets

    PubMed Central

    McKinney, Bill; Meyer, Peter A.; Crosas, Mercè; Sliz, Piotr

    2016-01-01

    Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension—functionality supporting preservation of filesystem structure within Dataverse—which is essential for both in-place computation and supporting non-http data transfers. PMID:27862010

  11. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    NASA Astrophysics Data System (ADS)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  12. Identification and characterization of extraordinary rainstorms in Italy

    NASA Astrophysics Data System (ADS)

    Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

    2017-04-01

    Despite its generally mild climate, Italy, as most of the Mediterranean region, is prone to the development of "super-extreme" events with extraordinary rainfall intensities. The main triggering mechanisms of these events is nowadays quite well known, but more research is needed to transform this knowledge in directions to build updated rainstorm hazard maps at the national scale. Moreover, a precise definition of "super-extremes" is still lacking, since the original suggestion of a second specific EV1 component made with the TCEV distribution. The above considerations led us to consider Italy a peculiar and challenging case study, where the geographic and orographic settings, associated with recurring storm-induced disasters, require an updated assessment of the "super-extreme" rainfall hazard at the country scale. Until now, the lack of a unique dataset of rainfall extremes has made the above task difficult to reach. In this work we report the results of the analysis made on a comprehensive and uniform set of rainfall annual maxima, collected from the different authorities in charge, representing the reference dataset of extremes from 1 to 24 hours duration. The database includes more than 6000 measuring points nationwide, spanning the period 1916 - 2014. Our analysis aims at identifying a meaningful population of records deviating from an "ordinary" definition of extreme value distribution, and assessing the stationarity in the timing of these events at the national scale. The first problems that need to be overcome are related to the not uniform distribution of data in time and space. Then the evaluation of meaningful relative thresholds aimed at selecting significant samples for the trend assessment has to be addressed. A first investigation attempt refers to the events exceeding a threshold that identify an average of one occurrence per year all over Italy, i.e. with a 1/1000 overall probability of exceedance. Geographic representation of these "outliers", scaled on local averages, demonstrates some prevailing clustering on the Thyrrenian coastal areas. Subsequent application of quantile regressions, aimed at minimizing the temporal non-uniformity of samples, shows significant increasing trends on the extremes of very short duration. Further efforts have been undertaken to explore the selection of a common national set of higher order parameters all over Italy, that would make less arduous to identify the probability of occurrence of "super-extremes" in the country.

  13. PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

    PubMed Central

    2014-01-01

    Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system. PMID:24894600

  14. A Novel Strategy for Very-Large-Scale Cash-Crop Mapping in the Context of Weather-Related Risk Assessment, Combining Global Satellite Multispectral Datasets, Environmental Constraints, and In Situ Acquisition of Geospatial Data

    PubMed Central

    Iannelli, Gianni Cristian; Torres, Marco A.

    2018-01-01

    Cash crops are agricultural crops intended to be sold for profit as opposed to subsistence crops, meant to support the producer, or to support livestock. Since cash crops are intended for future sale, they translate into large financial value when considered on a wide geographical scale, so their production directly involves financial risk. At a national level, extreme weather events including destructive rain or hail, as well as drought, can have a significant impact on the overall economic balance. It is thus important to map such crops in order to set up insurance and mitigation strategies. Using locally generated data—such as municipality-level records of crop seeding—for mapping purposes implies facing a series of issues like data availability, quality, homogeneity, etc. We thus opted for a different approach relying on global datasets. Global datasets ensure homogeneity and availability of data, although sometimes at the expense of precision and accuracy. A typical global approach makes use of spaceborne remote sensing, for which different land cover classification strategies are available in literature at different levels of cost and accuracy. We selected the optimal strategy in the perspective of a global processing chain. Thanks to a specifically developed strategy for fusing unsupervised classification results with environmental constraints and other geospatial inputs including ground-based data, we managed to obtain good classification results despite the constraints placed. The overall production process was composed using “good-enough" algorithms at each step, ensuring that the precision, accuracy, and data-hunger of each algorithm was commensurate to the precision, accuracy, and amount of data available. This paper describes the tailored strategy developed on the occasion as a cooperation among different groups with diverse backgrounds, a strategy which is believed to be profitably reusable in other, similar contexts. The paper presents the problem, the constraints and the adopted solutions; it then summarizes the main findings including that efforts and costs can be saved on the side of Earth Observation data processing when additional ground-based data are available to support the mapping task. PMID:29443919

  15. PVT: an efficient computational procedure to speed up next-generation sequence analysis.

    PubMed

    Maji, Ranjan Kumar; Sarkar, Arijita; Khatua, Sunirmal; Dasgupta, Subhasis; Ghosh, Zhumur

    2014-06-04

    High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.

  16. A Novel Strategy for Very-Large-Scale Cash-Crop Mapping in the Context of Weather-Related Risk Assessment, Combining Global Satellite Multispectral Datasets, Environmental Constraints, and In Situ Acquisition of Geospatial Data.

    PubMed

    Dell'Acqua, Fabio; Iannelli, Gianni Cristian; Torres, Marco A; Martina, Mario L V

    2018-02-14

    Cash crops are agricultural crops intended to be sold for profit as opposed to subsistence crops, meant to support the producer, or to support livestock. Since cash crops are intended for future sale, they translate into large financial value when considered on a wide geographical scale, so their production directly involves financial risk. At a national level, extreme weather events including destructive rain or hail, as well as drought, can have a significant impact on the overall economic balance. It is thus important to map such crops in order to set up insurance and mitigation strategies. Using locally generated data-such as municipality-level records of crop seeding-for mapping purposes implies facing a series of issues like data availability, quality, homogeneity, etc. We thus opted for a different approach relying on global datasets. Global datasets ensure homogeneity and availability of data, although sometimes at the expense of precision and accuracy. A typical global approach makes use of spaceborne remote sensing, for which different land cover classification strategies are available in literature at different levels of cost and accuracy. We selected the optimal strategy in the perspective of a global processing chain. Thanks to a specifically developed strategy for fusing unsupervised classification results with environmental constraints and other geospatial inputs including ground-based data, we managed to obtain good classification results despite the constraints placed. The overall production process was composed using "good-enough" algorithms at each step, ensuring that the precision, accuracy, and data-hunger of each algorithm was commensurate to the precision, accuracy, and amount of data available. This paper describes the tailored strategy developed on the occasion as a cooperation among different groups with diverse backgrounds, a strategy which is believed to be profitably reusable in other, similar contexts. The paper presents the problem, the constraints and the adopted solutions; it then summarizes the main findings including that efforts and costs can be saved on the side of Earth Observation data processing when additional ground-based data are available to support the mapping task.

  17. Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

    NASA Astrophysics Data System (ADS)

    Manago, K. F.; Hogue, T. S.; Hering, A. S.

    2014-12-01

    In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.

  18. A New High Resolution Climate Dataset for Climate Change Impacts Assessments in New England

    NASA Astrophysics Data System (ADS)

    Komurcu, M.; Huber, M.

    2016-12-01

    Assessing regional impacts of climate change (such as changes in extreme events, land surface hydrology, water resources, energy, ecosystems and economy) requires much higher resolution climate variables than those available from global model projections. While it is possible to run global models in higher resolution, the high computational cost associated with these simulations prevent their use in such manner. To alleviate this problem, dynamical downscaling offers a method to deliver higher resolution climate variables. As part of an NSF EPSCoR funded interdisciplinary effort to assess climate change impacts on New Hampshire ecosystems, hydrology and economy (the New Hampshire Ecosystems and Society project), we create a unique high-resolution climate dataset for New England. We dynamically downscale global model projections under a high impact emissions scenario using the Weather Research and Forecasting model (WRF) with three nested grids of 27, 9 and 3 km horizontal resolution with the highest resolution innermost grid focusing over New England. We prefer dynamical downscaling over other methods such as statistical downscaling because it employs physical equations to progressively simulate climate variables as atmospheric processes interact with surface processes, emissions, radiation, clouds, precipitation and other model components, hence eliminates fix relationships between variables. In addition to simulating mean changes in regional climate, dynamical downscaling also allows for the simulation of climate extremes that significantly alter climate change impacts. We simulate three time slices: 2006-2015, 2040-2060 and 2080-2100. This new high-resolution climate dataset (with more than 200 variables saved in hourly (six hourly) intervals for the highest resolution domain (outer two domains)) along with model input and restart files used in our WRF simulations will be publicly available for use to the broader scientific community to support in-depth climate change impacts assessments for New England. We present results focusing on future changes in New England extreme events.

  19. How are the wetlands over tropical basins impacted by the extreme hydrological events?

    NASA Astrophysics Data System (ADS)

    Al-Bitar, A.; Parrens, M.; Frappart, F.; Papa, F.; Kerr, Y. H.; Cretaux, J. F.; Wigneron, J. P.

    2016-12-01

    Wetlands play a crucial role in tropical basins and still many questions remain unanswered on how extreme events (like El-Nino) impacts them. Answering these questions is challenging as monitoring of inland water surfaces via remote sensing over tropical areas is a difficult task because of impact of vegetation and cloud cover. Several microwave based products have been elaborated to monitor these surfaces (Papa et al. 2010). In this study we combine the use of L-band microwave brightness temperatures and altimetric data from SARAL/ALTIKA to derive water storage maps at relatively high (7days) temporal frequency. The area of interest concerns the Amazon, Congo and GBH basins A first order radiative model is used to derive surface water over land from the brightness temperature measured by ESA SMOS mission at coarse resolution (25 km x 25 km) and 7-days frequency. An initial investigation of the use of the SMAP mission for the same purpose will be also presented. The product is compared to the static land cover map such as ESA CCI and the International Geosphere-Biosphere Program (IGBP) and also dynamic maps from SWAPS. It is then combined to the altimetric data to derive water storage maps. The water surfaces and water storage products are then compared to precipitation data from GPM TRMM datasets, ground water storage change from GRACE and river discharge data from field data. The amplitudes and time shifts of the signals is compared based on the sub-basin definition from Hydroshed database. The dataset is then divided into years of strong and weak El-Nino signal and the anomaly is between the two dataset is compared. The results show a strong influence of EL-Nino on the time shift of the different components showing that the hydrological regime of wetlands is highly impacted by these extreme events. This can have dramatic impacts on the ecosystem as the wetlands are vulnerable with a high biodiversity.

  20. Convective - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  1. LANL - Convective - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  2. LANL - Neutral - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  3. Primary Datasets for Case Studies of River-Water Quality

    ERIC Educational Resources Information Center

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  4. Accuracy assessment of the U.S. Geological Survey National Elevation Dataset, and comparison with other large-area elevation datasets: SRTM and ASTER

    USGS Publications Warehouse

    Gesch, Dean B.; Oimoen, Michael J.; Evans, Gayla A.

    2014-01-01

    The National Elevation Dataset (NED) is the primary elevation data product produced and distributed by the U.S. Geological Survey. The NED provides seamless raster elevation data of the conterminous United States, Alaska, Hawaii, U.S. island territories, Mexico, and Canada. The NED is derived from diverse source datasets that are processed to a specification with consistent resolutions, coordinate system, elevation units, and horizontal and vertical datums. The NED serves as the elevation layer of The National Map, and it provides basic elevation information for earth science studies and mapping applications in the United States and most of North America. An important part of supporting scientific and operational use of the NED is provision of thorough dataset documentation including data quality and accuracy metrics. The focus of this report is on the vertical accuracy of the NED and on comparison of the NED with other similar large-area elevation datasets, namely data from the Shuttle Radar Topography Mission (SRTM) and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER).

  5. Data-driven decision support for radiologists: re-using the National Lung Screening Trial dataset for pulmonary nodule management.

    PubMed

    Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L

    2015-02-01

    Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.

  6. Scalable Visual Analytics of Massive Textual Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.

    2007-04-01

    This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

  7. a Critical Review of Automated Photogrammetric Processing of Large Datasets

    NASA Astrophysics Data System (ADS)

    Remondino, F.; Nocerino, E.; Toschi, I.; Menna, F.

    2017-08-01

    The paper reports some comparisons between commercial software able to automatically process image datasets for 3D reconstruction purposes. The main aspects investigated in the work are the capability to correctly orient large sets of image of complex environments, the metric quality of the results, replicability and redundancy. Different datasets are employed, each one featuring a diverse number of images, GSDs at cm and mm resolutions, and ground truth information to perform statistical analyses of the 3D results. A summary of (photogrammetric) terms is also provided, in order to provide rigorous terms of reference for comparisons and critical analyses.

  8. Use of Patient Registries and Administrative Datasets for the Study of Pediatric Cancer

    PubMed Central

    Rice, Henry E.; Englum, Brian R.; Gulack, Brian C.; Adibe, Obinna O.; Tracy, Elizabeth T.; Kreissman, Susan G.; Routh, Jonathan C.

    2015-01-01

    Analysis of data from large administrative databases and patient registries is increasingly being used to study childhood cancer care, although the value of these data sources remains unclear to many clinicians. Interpretation of large databases requires a thorough understanding of how the dataset was designed, how data were collected, and how to assess data quality. This review will detail the role of administrative databases and registry databases for the study of childhood cancer, tools to maximize information from these datasets, and recommendations to improve the use of these databases for the study of pediatric oncology. PMID:25807938

  9. Debris flows associated with the 2015 Gorkha Earthquake in Nepal

    NASA Astrophysics Data System (ADS)

    Dahlquist, M. P.; West, A. J.; Martinez, J.

    2017-12-01

    Debris flows are a primary driver of erosion and a major geologic hazard in many steep landscapes, particularly near the headwaters of rivers, and are generated in large numbers by extreme events. The 2015 Mw 7.8 Gorkha Earthquake triggered 25,000 coseismic landslides in central Nepal. During the ensuing monsoon, sediment delivered to channels by landslides was mobilized in the heavy rains, and new postseismic landslides were triggered in rock weakened by the shaking. These coseismic and postseismic landslide-generated debris flows form a useful dataset for studying the impact and behavior of debris flows on one of the most active landscapes on Earth. Debris flow-dominated channel reaches are generally understood to have a topographic signature recognizable in slope-area plots and distinct from fluvial channels, but in examining debris flows associated with the Gorkha earthquake we find they frequently extend into reaches with geometry typically associated with fluvial systems. We examine a dataset of these debris flows, considering whether they are generated by coseismic or postseismic landslides, whether they are likely to be driving active incision into bedrock, and whether their channels correspond with those typically associated with debris flows. Preliminary analysis of debris flow channels in Nepal suggests there may be systematic differences in the geometry of channels containing debris flows triggered by coseismic versus postseismic landslides, which potentially holds implications for hazard analyses and the mechanics behind the different debris flow types.

  10. On the visualization of water-related big data: extracting insights from drought proxies' datasets

    NASA Astrophysics Data System (ADS)

    Diaz, Vitali; Corzo, Gerald; van Lanen, Henny A. J.; Solomatine, Dimitri

    2017-04-01

    Big data is a growing area of science where hydroinformatics can benefit largely. There have been a number of important developments in the area of data science aimed at analysis of large datasets. Such datasets related to water include measurements, simulations, reanalysis, scenario analyses and proxies. By convention, information contained in these databases is referred to a specific time and a space (i.e., longitude/latitude). This work is motivated by the need to extract insights from large water-related datasets, i.e., transforming large amounts of data into useful information that helps to better understand of water-related phenomena, particularly about drought. In this context, data visualization, part of data science, involves techniques to create and to communicate data by encoding it as visual graphical objects. They may help to better understand data and detect trends. Base on existing methods of data analysis and visualization, this work aims to develop tools for visualizing water-related large datasets. These tools were developed taking advantage of existing libraries for data visualization into a group of graphs which include both polar area diagrams (PADs) and radar charts (RDs). In both graphs, time steps are represented by the polar angles and the percentages of area in drought by the radios. For illustration, three large datasets of drought proxies are chosen to identify trends, prone areas and spatio-temporal variability of drought in a set of case studies. The datasets are (1) SPI-TS2p1 (1901-2002, 11.7 GB), (2) SPI-PRECL0p5 (1948-2016, 7.91 GB) and (3) SPEI-baseV2.3 (1901-2013, 15.3 GB). All of them are on a monthly basis and with a spatial resolution of 0.5 degrees. First two were retrieved from the repository of the International Research Institute for Climate and Society (IRI). They are included into the Analyses Standardized Precipitation Index (SPI) project (iridl.ldeo.columbia.edu/SOURCES/.IRI/.Analyses/.SPI/). The third dataset was recovered from the Standardized Precipitation Evaporation Index (SPEI) Monitor (digital.csic.es/handle/10261/128892). PADs were found suitable to identify the spatio-temporal variability and prone areas of drought. Drought trends were visually detected by using both PADs and RDs. A similar approach can be followed to include other types of graphs to deal with the analysis of water-related big data. Key words: Big data, data visualization, drought, SPI, SPEI

  11. The MATISSE analysis of large spectral datasets from the ESO Archive

    NASA Astrophysics Data System (ADS)

    Worley, C.; de Laverny, P.; Recio-Blanco, A.; Hill, V.; Vernisse, Y.; Ordenovic, C.; Bijaoui, A.

    2010-12-01

    The automated stellar classification algorithm, MATISSE, has been developed at the Observatoire de la Côte d'Azur (OCA) in order to determine stellar temperatures, gravities and chemical abundances for large datasets of stellar spectra. The Gaia Data Processing and Analysis Consortium (DPAC) has selected MATISSE as one of the key programmes to be used in the analysis of the Gaia Radial Velocity Spectrometer (RVS) spectra. MATISSE is currently being used to analyse large datasets of spectra from the ESO archive with the primary goal of producing advanced data products to be made available in the ESO database via the Virtual Observatory. This is also an invaluable opportunity to identify and address issues that can be encountered with the analysis large samples of real spectra prior to the launch of Gaia in 2012. The analysis of the archived spectra of the FEROS spectrograph is currently underway and preliminary results are presented.

  12. Global Bedload Flux Modeling and Analysis in Large Rivers

    NASA Astrophysics Data System (ADS)

    Islam, M. T.; Cohen, S.; Syvitski, J. P.

    2017-12-01

    Proper sediment transport quantification has long been an area of interest for both scientists and engineers in the fields of geomorphology, and management of rivers and coastal waters. Bedload flux is important for monitoring water quality and for sustainable development of coastal and marine bioservices. Bedload measurements, especially for large rivers, is extremely scarce across time, and many rivers have never been monitored. Bedload measurements in rivers, is particularly acute in developing countries where changes in sediment yields is high. The paucity of bedload measurements is the result of 1) the nature of the problem (large spatial and temporal uncertainties), and 2) field costs including the time-consuming nature of the measurement procedures (repeated bedform migration tracking, bedload samplers). Here we present a first of its kind methodology for calculating bedload in large global rivers (basins are >1,000 km. Evaluation of model skill is based on 113 bedload measurements. The model predictions are compared with an empirical model developed from the observational dataset in an attempt to evaluate the differences between a physically-based numerical model and a lumped relationship between bedload flux and fluvial and basin parameters (e.g., discharge, drainage area, lithology). The initial study success opens up various applications to global fluvial geomorphology (e.g. including the relationship between suspended sediment (wash load) and bedload). Simulated results with known uncertainties offers a new research product as a valuable resource for the whole scientific community.

  13. Influence of ENSO on coastal flood hazard and exposure at the global-scale

    NASA Astrophysics Data System (ADS)

    Muis, S.; Haigh, I. D.; Guimarães Nobre, G.; Aerts, J.; Ward, P.

    2017-12-01

    The El Niño-Southern Oscillation (ENSO) is the dominant signal of interannual climate variability. The unusually warm (El Niño) and cold (La Niña) oceanic and atmospheric conditions in the tropical Pacific drives interannual variability in both mean and extreme sea levels, which in turn may influence the probabilities and impacts of coastal flooding. We assess the influence of ENSO on coastal flood hazard and exposure using daily timeseries from the Global Time and Surge Reanalysis (GTSR) dataset (Muis et al., 2016). As the GTSR timeseries do not include steric effects (i.e. density differences), we improve the GTSR timeseries by adding steric sea levels. Evaluation against observed sea levels shows that the including steric sea levels leads to a much better representation of the seasonal and interannual variability. We show that sea level anomalies occur during ENSO years with higher sea levels during La Niña in the South-Atlantic, Indian Ocean and the West Pacific, whereas sea levels are lower in the east Pacific. The pattern is generally inversed for El Niño. We also find an effect of ENSO in the number of people exposed to coastal flooding. Although the effect is minor at the global-scale, it may be important for flood risk management to consider at the national or sub national levels. Previous studies at the global-scale have used tide gauge observation to assess the influence of ENSO on extreme sea levels. The advantage of our approach over observations is that GTSR provides a consistent dataset with a full global coverage for the period 1979-2014. This allows us to assess ENSO's influence on sea level extremes anywhere in the world. Furthermore, it enables us to also calculate the impacts of extreme sea levels in terms of coastal flooding and exposed population. ReferencesMuis et al (2016) A global reanalysis of storm surges and extreme sea levels. Nature Communications.7:11969. doi:10.1038/ncomms11969.

  14. Localized Multi-Model Extremes Metrics for the Fourth National Climate Assessment

    NASA Astrophysics Data System (ADS)

    Thompson, T. R.; Kunkel, K.; Stevens, L. E.; Easterling, D. R.; Biard, J.; Sun, L.

    2017-12-01

    We have performed localized analysis of scenario-based datasets for the Fourth National Climate Assessment (NCA4). These datasets include CMIP5-based Localized Constructed Analogs (LOCA) downscaled simulations at daily temporal resolution and 1/16th-degree spatial resolution. Over 45 temperature and precipitation extremes metrics have been processed using LOCA data, including threshold, percentile, and degree-days calculations. The localized analysis calculates trends in the temperature and precipitation extremes metrics for relatively small regions such as counties, metropolitan areas, climate zones, administrative areas, or economic zones. For NCA4, we are currently addressing metropolitan areas as defined by U.S. Census Bureau Metropolitan Statistical Areas. Such localized analysis provides essential information for adaptation planning at scales relevant to local planning agencies and businesses. Nearly 30 such regions have been analyzed to date. Each locale is defined by a closed polygon that is used to extract LOCA-based extremes metrics specific to the area. For each metric, single-model data at each LOCA grid location are first averaged over several 30-year historical and future periods. Then, for each metric, the spatial average across the region is calculated using model weights based on both model independence and reproducibility of current climate conditions. The range of single-model results is also captured on the same localized basis, and then combined with the weighted ensemble average for each region and each metric. For example, Boston-area cooling degree days and maximum daily temperature is shown below for RCP8.5 (red) and RCP4.5 (blue) scenarios. We also discuss inter-regional comparison of these metrics, as well as their relevance to risk analysis for adaptation planning.

  15. Universal Batch Steganalysis

    DTIC Science & Technology

    2014-06-30

    steganalysis) in large-scale datasets such as might be obtained by monitoring a corporate network or social network. Identifying guilty actors...guilty’ user (of steganalysis) in large-scale datasets such as might be obtained by monitoring a corporate network or social network. Identifying guilty...floating point operations (1 TFLOPs) for a 1 megapixel image. We designed a new implementation using Compute Unified Device Architecture (CUDA) on NVIDIA

  16. The role of metadata in managing large environmental science datasets. Proceedings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Melton, R.B.; DeVaney, D.M.; French, J. C.

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  17. Thermalnet: a Deep Convolutional Network for Synthetic Thermal Image Generation

    NASA Astrophysics Data System (ADS)

    Kniaz, V. V.; Gorbatsevich, V. S.; Mizginov, V. A.

    2017-05-01

    Deep convolutional neural networks have dramatically changed the landscape of the modern computer vision. Nowadays methods based on deep neural networks show the best performance among image recognition and object detection algorithms. While polishing of network architectures received a lot of scholar attention, from the practical point of view the preparation of a large image dataset for a successful training of a neural network became one of major challenges. This challenge is particularly profound for image recognition in wavelengths lying outside the visible spectrum. For example no infrared or radar image datasets large enough for successful training of a deep neural network are available to date in public domain. Recent advances of deep neural networks prove that they are also capable to do arbitrary image transformations such as super-resolution image generation, grayscale image colorisation and imitation of style of a given artist. Thus a natural question arise: how could be deep neural networks used for augmentation of existing large image datasets? This paper is focused on the development of the Thermalnet deep convolutional neural network for augmentation of existing large visible image datasets with synthetic thermal images. The Thermalnet network architecture is inspired by colorisation deep neural networks.

  18. A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video

    DTIC Science & Technology

    2011-06-01

    orders of magnitude larger than existing datasets such CAVIAR [7]. TRECVID 2008 airport dataset [16] contains 100 hours of video, but, it provides only...entire human figure (e.g., above shoulder), amounting to 500% human to video 2Some statistics are approximate, obtained from the CAVIAR 1st scene and...and diversity in both col- lection sites and viewpoints. In comparison to surveillance datasets such as CAVIAR [7] and TRECVID [16] shown in Fig. 3

  19. Reconstructing missing information on precipitation datasets: impact of tails on adopted statistical distributions.

    NASA Astrophysics Data System (ADS)

    Pedretti, Daniele; Beckie, Roger Daniel

    2014-05-01

    Missing data in hydrological time-series databases are ubiquitous in practical applications, yet it is of fundamental importance to make educated decisions in problems involving exhaustive time-series knowledge. This includes precipitation datasets, since recording or human failures can produce gaps in these time series. For some applications, directly involving the ratio between precipitation and some other quantity, lack of complete information can result in poor understanding of basic physical and chemical dynamics involving precipitated water. For instance, the ratio between precipitation (recharge) and outflow rates at a discharge point of an aquifer (e.g. rivers, pumping wells, lysimeters) can be used to obtain aquifer parameters and thus to constrain model-based predictions. We tested a suite of methodologies to reconstruct missing information in rainfall datasets. The goal was to obtain a suitable and versatile method to reduce the errors given by the lack of data in specific time windows. Our analyses included both a classical chronologically-pairing approach between rainfall stations and a probability-based approached, which accounted for the probability of exceedence of rain depths measured at two or multiple stations. Our analyses proved that it is not clear a priori which method delivers the best methodology. Rather, this selection should be based considering the specific statistical properties of the rainfall dataset. In this presentation, our emphasis is to discuss the effects of a few typical parametric distributions used to model the behavior of rainfall. Specifically, we analyzed the role of distributional "tails", which have an important control on the occurrence of extreme rainfall events. The latter strongly affect several hydrological applications, including recharge-discharge relationships. The heavy-tailed distributions we considered were parametric Log-Normal, Generalized Pareto, Generalized Extreme and Gamma distributions. The methods were first tested on synthetic examples, to have a complete control of the impact of several variables such as minimum amount of data required to obtain reliable statistical distributions from the selected parametric functions. Then, we applied the methodology to precipitation datasets collected in the Vancouver area and on a mining site in Peru.

  20. Does using different modern climate datasets impact pollen-based paleoclimate reconstructions in North America during the past 2,000 years

    NASA Astrophysics Data System (ADS)

    Ladd, Matthew; Viau, Andre

    2013-04-01

    Paleoclimate reconstructions rely on the accuracy of modern climate datasets for calibration of fossil records under the assumption of climate normality through time, which means that the modern climate operates in a similar manner as over the past 2,000 years. In this study, we show how using different modern climate datasets have an impact on a pollen-based reconstruction of mean temperature of the warmest month (MTWA) during the past 2,000 years for North America. The modern climate datasets used to explore this research question include the: Whitmore et al., (2005) modern climate dataset; North American Regional Reanalysis (NARR); National Center For Environmental Prediction (NCEP); European Center for Medium Range Weather Forecasting (ECMWF) ERA-40 reanalysis; WorldClim, Global Historical Climate Network (GHCN) and New et al., which is derived from the CRU dataset. Results show that some caution is advised in using the reanalysis data on large-scale reconstructions. Station data appears to dampen out the variability of the reconstruction produced using station based datasets. The reanalysis or model-based datasets are not recommended for paleoclimate large-scale North American reconstructions as they appear to lack some of the dynamics observed in station datasets (CRU) which resulted in warm-biased reconstructions as compared to the station-based reconstructions. The Whitmore et al. (2005) modern climate dataset appears to be a compromise between CRU-based datasets and model-based datasets except for the ERA-40. In addition, an ultra-high resolution gridded climate dataset such as WorldClim may only be useful if the pollen calibration sites in North America have at least the same spatial precision. We reconstruct the MTWA to within +/-0.01°C by using an average of all curves derived from the different modern climate datasets, demonstrating the robustness of the procedure used. It may be that the use of an average of different modern datasets may reduce the impact of uncertainty of paleoclimate reconstructions, however, this is yet to be determined with certainty. Future evaluation using for example the newly developed Berkeley earth surface temperature datasets should be tested against the paleoclimate record.

  1. Atlas-Guided Cluster Analysis of Large Tractography Datasets

    PubMed Central

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  2. NREL - SOWFA - Neutral - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  3. PNNL - WRF-LES - Convective - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  4. ANL - WRF-LES - Convective - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  5. LLNL - WRF-LES - Neutral - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  6. ANL - WRF-LES - Neutral - TTU

    DOE Data Explorer

    Kosovic, Branko

    2018-06-20

    This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  7. LANL - WRF-LES - Neutral - TTU

    DOE Data Explorer

    Kosovic, Branko

    2018-06-20

    This dataset includes large-eddy simulation (LES) output from a neutrally stratified atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on Aug. 17, 2012. The dataset was used to assess LES models for simulation of canonical neutral ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  8. LANL - WRF-LES - Convective - TTU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kosovic, Branko

    This dataset includes large-eddy simulation (LES) output from a convective atmospheric boundary layer (ABL) simulation of observations at the SWIFT tower near Lubbock, Texas on July 4, 2012. The dataset was used to assess the LES models for simulation of canonical convective ABL. The dataset can be used for comparison with other LES and computational fluid dynamics model outputs.

  9. A Computational Approach to Qualitative Analysis in Large Textual Datasets

    PubMed Central

    Evans, Michael S.

    2014-01-01

    In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern. PMID:24498398

  10. Evolving Deep Networks Using HPC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Young, Steven R.; Rose, Derek C.; Johnston, Travis

    While a large number of deep learning networks have been studied and published that produce outstanding results on natural image datasets, these datasets only make up a fraction of those to which deep learning can be applied. These datasets include text data, audio data, and arrays of sensors that have very different characteristics than natural images. As these “best” networks for natural images have been largely discovered through experimentation and cannot be proven optimal on some theoretical basis, there is no reason to believe that they are the optimal network for these drastically different datasets. Hyperparameter search is thus oftenmore » a very important process when applying deep learning to a new problem. In this work we present an evolutionary approach to searching the possible space of network hyperparameters and construction that can scale to 18, 000 nodes. This approach is applied to datasets of varying types and characteristics where we demonstrate the ability to rapidly find best hyperparameters in order to enable practitioners to quickly iterate between idea and result.« less

  11. Improving quantitative structure-activity relationship models using Artificial Neural Networks trained with dropout.

    PubMed

    Mendenhall, Jeffrey; Meiler, Jens

    2016-02-01

    Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both enrichment false positive rate and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22-46 % over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods.

  12. Improving Quantitative Structure-Activity Relationship Models using Artificial Neural Networks Trained with Dropout

    PubMed Central

    Mendenhall, Jeffrey; Meiler, Jens

    2016-01-01

    Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery (LB-CADD) pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both Enrichment false positive rate (FPR) and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22–46% over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods. PMID:26830599

  13. Identification of Atmospheric Blocking Events and its Influence on Temperature and Precipitation Extremes in Europe

    NASA Astrophysics Data System (ADS)

    Richling, Andy; Rust, Henning W.; Bissolli, Peter; Ulbrich, Uwe

    2017-04-01

    Atmospheric blocking plays a crucial role in climate variability in the mid-latitudes. Especially meteorological extremes like heatwaves, cold spells and droughts are often related to persistent and stationary blocking events. For climate monitoring it is important to identify and characterise such blocking events as well as to analyse the relationship between blockings and meteorological extremes in a quantitative way. In this study we identify atmospheric blocking events and analyse the influence on temperature and precipitation extremes with statistical models. For the detection of atmospheric blocking events, we apply modified 2-dimensional versions of commonly used blocking indices suggested by Tibaldi and Molteni (1990) as well as Masato et al. (2013) on daily fields of 500hPa geopotential heights of the Era-Interim reanalysis dataset. A result is a list of blocking events with a multidimensional index characterising area, intensity, location and duration and maps of these parameters, which are intended to be used operationally for regular climate diagnostics at the German Meteorological Service. In addition, relationships between grid-point-base blocking frequency, intensity and location parameters and the number of daily temperature/precipitation extremes based on the E-OBS gridded dataset are investigated using general linear models on a monthly time scale. The number of counts as well as probabilities of occurrence of daily extremes within a certain calendar month will be analysed in this framework. G. Masato, B. J. Hoskins, and T. Woollings. Winter and Summer Northern Hemisphere Blocking in CMIP5 Models. J. Climate, 26:7044-7059, 2013a. doi: http://dx.doi.org/10.1175/JCLI-D- 12-00466.1. G. Masato, B. J. Hoskins, and T. Woollings. Wave-Breaking Characteristics of Northern Hemi- sphere Winter Blocking: A Two-Dimensional Approach. J. Climate, 26:4535-4549, 2013b. doi: http://dx.doi.org/10.1175/JCLI-D-12-00240.1. S. Tibaldi and F. Molteni. On the operational predictability of blocking. Tellus, 42A:343-365, 1990. doi: 10.1034/j.1600-0870.1990.t01-2-00003.x.

  14. An "Ensemble Approach" to Modernizing Extreme Precipitation Estimation for Dam Safety Decision-Making

    NASA Astrophysics Data System (ADS)

    Cifelli, R.; Mahoney, K. M.; Webb, R. S.; McCormick, B.

    2017-12-01

    To ensure structural and operational safety of dams and other water management infrastructure, water resources managers and engineers require information about the potential for heavy precipitation. The methods and data used to estimate extreme rainfall amounts for managing risk are based on 40-year-old science and in need of improvement. The need to evaluate new approaches based on the best science available has led the states of Colorado and New Mexico to engage a body of scientists and engineers in an innovative "ensemble approach" to updating extreme precipitation estimates. NOAA is at the forefront of one of three technical approaches that make up the "ensemble study"; the three approaches are conducted concurrently and in collaboration with each other. One approach is the conventional deterministic, "storm-based" method, another is a risk-based regional precipitation frequency estimation tool, and the third is an experimental approach utilizing NOAA's state-of-the-art High Resolution Rapid Refresh (HRRR) physically-based dynamical weather prediction model. The goal of the overall project is to use the individual strengths of these different methods to define an updated and broadly acceptable state of the practice for evaluation and design of dam spillways. This talk will highlight the NOAA research and NOAA's role in the overarching goal to better understand and characterizing extreme precipitation estimation uncertainty. The research led by NOAA explores a novel high-resolution dataset and post-processing techniques using a super-ensemble of hourly forecasts from the HRRR model. We also investigate how this rich dataset may be combined with statistical methods to optimally cast the data in probabilistic frameworks. NOAA expertise in the physical processes that drive extreme precipitation is also employed to develop careful testing and improved understanding of the limitations of older estimation methods and assumptions. The process of decision making in the midst of uncertainty is a major part of this study. We will speak to how the ensemble approach may be used in concert with one another to manage risk and enhance resiliency in the midst of uncertainty. Finally, the presentation will also address the implications of including climate change in future extreme precipitation estimation studies.

  15. Extreme Value Analysis of hydro meteorological extremes in the ClimEx Large-Ensemble

    NASA Astrophysics Data System (ADS)

    Wood, R. R.; Martel, J. L.; Willkofer, F.; von Trentini, F.; Schmid, F. J.; Leduc, M.; Frigon, A.; Ludwig, R.

    2017-12-01

    Many studies show an increase in the magnitude and frequency of hydrological extreme events in the course of climate change. However the contribution of natural variability to the magnitude and frequency of hydrological extreme events is not yet settled. A reliable estimate of extreme events is from great interest for water management and public safety. In the course of the ClimEx Project (www.climex-project.org) a new single-model large-ensemble was created by dynamically downscaling the CanESM2 large-ensemble with the Canadian Regional Climate Model version 5 (CRCM5) for an European Domain and a Northeastern North-American domain. By utilizing the ClimEx 50-Member Large-Ensemble (CRCM5 driven by CanESM2 Large-Ensemble) a thorough analysis of natural variability in extreme events is possible. Are the current extreme value statistical methods able to account for natural variability? How large is the natural variability for e.g. a 1/100 year return period derived from a 50-Member Large-Ensemble for Europe and Northeastern North-America? These questions should be answered by applying various generalized extreme value distributions (GEV) to the ClimEx Large-Ensemble. Hereby various return levels (5-, 10-, 20-, 30-, 60- and 100-years) based on various lengths of time series (20-, 30-, 50-, 100- and 1500-years) should be analyzed for the maximum one day precipitation (RX1d), the maximum three hourly precipitation (RX3h) and the streamflow for selected catchments in Europe. The long time series of the ClimEx Ensemble (7500 years) allows us to give a first reliable estimate of the magnitude and frequency of certain extreme events.

  16. Integrating genome-wide association studies and gene expression data highlights dysregulated multiple sclerosis risk pathways.

    PubMed

    Liu, Guiyou; Zhang, Fang; Jiang, Yongshuai; Hu, Yang; Gong, Zhongying; Liu, Shoufeng; Chen, Xiuju; Jiang, Qinghua; Hao, Junwei

    2017-02-01

    Much effort has been expended on identifying the genetic determinants of multiple sclerosis (MS). Existing large-scale genome-wide association study (GWAS) datasets provide strong support for using pathway and network-based analysis methods to investigate the mechanisms underlying MS. However, no shared genetic pathways have been identified to date. We hypothesize that shared genetic pathways may indeed exist in different MS-GWAS datasets. Here, we report results from a three-stage analysis of GWAS and expression datasets. In stage 1, we conducted multiple pathway analyses of two MS-GWAS datasets. In stage 2, we performed a candidate pathway analysis of the large-scale MS-GWAS dataset. In stage 3, we performed a pathway analysis using the dysregulated MS gene list from seven human MS case-control expression datasets. In stage 1, we identified 15 shared pathways. In stage 2, we successfully replicated 14 of these 15 significant pathways. In stage 3, we found that dysregulated MS genes were significantly enriched in 10 of 15 MS risk pathways identified in stages 1 and 2. We report shared genetic pathways in different MS-GWAS datasets and highlight some new MS risk pathways. Our findings provide new insights on the genetic determinants of MS.

  17. Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

    PubMed

    McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

    2017-01-01

    Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers. © 2016 New York Academy of Sciences.

  18. Predicting Vegetation Condition from ASCAT Soil Water Index over Southwest India

    NASA Astrophysics Data System (ADS)

    Pfeil, Isabella Maria; Hochstöger, Simon; Amarnath, Giriraj; Pani, Peejush; Enenkel, Markus; Wagner, Wolfgang

    2017-04-01

    In India, extreme water scarcity events are expected to occur on average every five years. Record-breaking droughts affecting millions of human beings and livestock are common. If the south-west monsoon (summer monsoon) is delayed or brings less rainfall than expected, a season's harvest can be destroyed despite optimal farm management, leading to, in the worst case, life-threatening circumstances for a large number of farmers. Therefore, the monitoring of key drought indicators, such as the healthiness of the vegetation, and subsequent early warning is crucial. The aim of this work is to predict vegetation state from earth observation data instead of relying on models which need a lot of input data, increasing the complexity of error propagation, or seasonal forecasts, that are often too uncertain to be used as a regression component for a vegetation parameter. While precipitation is the main water supply for large parts of India's agricultural areas, vegetation datasets such as the Normalized Difference Vegetation Index (NDVI) provide reliable estimates of vegetation greenness that can be related to vegetation health. Satellite-derived soil moisture represents the missing link between a deficit in rainfall and the response of vegetation. In particular the water available in the root zone plays an important role for near-future vegetation health. Exploiting the added-value of root zone soil moisture is therefore crucial, and its use in vegetation studies presents an added value for drought analyses and decision-support. The soil water index (SWI) dataset derived from the Advanced Scatterometer (ASCAT) on board the Metop satellites represents the water content that is available in the root zone. This dataset shows a strong correlation with NDVI data obtained from measurements of the Moderate Resolution Imaging Spectroradiometer (MODIS), which is exploited in this study. A linear regression function is fit to the multi-year SWI and NDVI dataset with a temporal resolution of eight days, returning a set of parameters for every eight-day period of the year. Those parameters are then used to predict vegetation health based on the SWI up to 32 days after the latest available SWI and NDVI observations. In this work, the prediction was carried out for multiple eight-day periods in the year 2015 for three representative districts in India, and then compared to the actually observed NDVI during these periods, showing very similar spatial patterns in most analyzed regions and periods. This approach enables the prediction of vegetation health based on root zone soil moisture instead of relying on agro-meteorological models which often lack crucial input data in remote regions.

  19. Evaluation of reanalysis datasets against observational soil temperature data over China

    NASA Astrophysics Data System (ADS)

    Yang, Kai; Zhang, Jingyong

    2018-01-01

    Soil temperature is a key land surface variable, and is a potential predictor for seasonal climate anomalies and extremes. Using observational soil temperature data in China for 1981-2005, we evaluate four reanalysis datasets, the land surface reanalysis of the European Centre for Medium-Range Weather Forecasts (ERA-Interim/Land), the second modern-era retrospective analysis for research and applications (MERRA-2), the National Center for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR), and version 2 of the Global Land Data Assimilation System (GLDAS-2.0), with a focus on 40 cm soil layer. The results show that reanalysis data can mainly reproduce the spatial distributions of soil temperature in summer and winter, especially over the east of China, but generally underestimate their magnitudes. Owing to the influence of precipitation on soil temperature, the four datasets perform better in winter than in summer. The ERA-Interim/Land and GLDAS-2.0 produce spatial characteristics of the climatological mean that are similar to observations. The interannual variability of soil temperature is well reproduced by the ERA-Interim/Land dataset in summer and by the CFSR dataset in winter. The linear trend of soil temperature in summer is well rebuilt by reanalysis datasets. We demonstrate that soil heat fluxes in April-June and in winter are highly correlated with the soil temperature in summer and winter, respectively. Different estimations of surface energy balance components can contribute to different behaviors in reanalysis products in terms of estimating soil temperature. In addition, reanalysis datasets can mainly rebuild the northwest-southeast gradient of soil temperature memory over China.

  20. Deep Learning @15 Petaflops/second: Semi-supervised pattern detection for 15 Terabytes of climate data

    NASA Astrophysics Data System (ADS)

    Collins, W. D.; Wehner, M. F.; Prabhat, M.; Kurth, T.; Satish, N.; Mitliagkas, I.; Zhang, J.; Racah, E.; Patwary, M.; Sundaram, N.; Dubey, P.

    2017-12-01

    Anthropogenically-forced climate changes in the number and character of extreme storms have the potential to significantly impact human and natural systems. Current high-performance computing enables multidecadal simulations with global climate models at resolutions of 25km or finer. Such high-resolution simulations are demonstrably superior in simulating extreme storms such as tropical cyclones than the coarser simulations available in the Coupled Model Intercomparison Project (CMIP5) and provide the capability to more credibly project future changes in extreme storm statistics and properties. The identification and tracking of storms in the voluminous model output is very challenging as it is impractical to manually identify storms due to the enormous size of the datasets, and therefore automated procedures are used. Traditionally, these procedures are based on a multi-variate set of physical conditions based on known properties of the class of storms in question. In recent years, we have successfully demonstrated that Deep Learning produces state of the art results for pattern detection in climate data. We have developed supervised and semi-supervised convolutional architectures for detecting and localizing tropical cyclones, extra-tropical cyclones and atmospheric rivers in simulation data. One of the primary challenges in the applicability of Deep Learning to climate data is in the expensive training phase. Typical networks may take days to converge on 10GB-sized datasets, while the climate science community has ready access to O(10 TB)-O(PB) sized datasets. In this work, we present the most scalable implementation of Deep Learning to date. We successfully scale a unified, semi-supervised convolutional architecture on all of the Cori Phase II supercomputer at NERSC. We use IntelCaffe, MKL and MLSL libraries. We have optimized single node MKL libraries to obtain 1-4 TF on single KNL nodes. We have developed a novel hybrid parameter update strategy to improve scaling to 9600 KNL nodes (600,000 cores). We obtain 15PF performance over the course of the training run; setting a new watermark for the HPC and Deep Learning communities. This talk will share insights on how to obtain this extreme level of performance, current gaps/challenges and implications for the climate science community.

  1. Extreme rainfall, vulnerability and risk: a continental-scale assessment for South America.

    PubMed

    Vörösmarty, Charles J; Bravo de Guenni, Lelys; Wollheim, Wilfred M; Pellerin, Brian; Bjerklie, David; Cardoso, Manoel; D'Almeida, Cassiano; Green, Pamela; Colon, Lilybeth

    2013-11-13

    Extreme weather continues to preoccupy society as a formidable public safety concern bearing huge economic costs. While attention has focused on global climate change and how it could intensify key elements of the water cycle such as precipitation and river discharge, it is the conjunction of geophysical and socioeconomic forces that shapes human sensitivity and risks to weather extremes. We demonstrate here the use of high-resolution geophysical and population datasets together with documentary reports of rainfall-induced damage across South America over a multi-decadal, retrospective time domain (1960-2000). We define and map extreme precipitation hazard, exposure, affectedpopulations, vulnerability and risk, and use these variables to analyse the impact of floods as a water security issue. Geospatial experiments uncover major sources of risk from natural climate variability and population growth, with change in climate extremes bearing a minor role. While rural populations display greatest relative sensitivity to extreme rainfall, urban settings show the highest rates of increasing risk. In the coming decades, rapid urbanization will make South American cities the focal point of future climate threats but also an opportunity for reducing vulnerability, protecting lives and sustaining economic development through both traditional and ecosystem-based disaster risk management systems.

  2. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

    PubMed

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-06-19

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.

  3. Machine Learning, Sentiment Analysis, and Tweets: An Examination of Alzheimer's Disease Stigma on Twitter.

    PubMed

    Oscar, Nels; Fox, Pamela A; Croucher, Racheal; Wernick, Riana; Keune, Jessica; Hooker, Karen

    2017-09-01

    Social scientists need practical methods for harnessing large, publicly available datasets that inform the social context of aging. We describe our development of a semi-automated text coding method and use a content analysis of Alzheimer's disease (AD) and dementia portrayal on Twitter to demonstrate its use. The approach improves feasibility of examining large publicly available datasets. Machine learning techniques modeled stigmatization expressed in 31,150 AD-related tweets collected via Twitter's search API based on 9 AD-related keywords. Two researchers manually coded 311 random tweets on 6 dimensions. This input from 1% of the dataset was used to train a classifier against the tweet text and code the remaining 99% of the dataset. Our automated process identified that 21.13% of the AD-related tweets used AD-related keywords to perpetuate public stigma, which could impact stereotypes and negative expectations for individuals with the disease and increase "excess disability". This technique could be applied to questions in social gerontology related to how social media outlets reflect and shape attitudes bearing on other developmental outcomes. Recommendations for the collection and analysis of large Twitter datasets are discussed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Parallel task processing of very large datasets

    NASA Astrophysics Data System (ADS)

    Romig, Phillip Richardson, III

    This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.

  5. Understanding extreme rainfall events in Australia through historical data

    NASA Astrophysics Data System (ADS)

    Ashcroft, Linden; Karoly, David John

    2016-04-01

    Historical climate data recovery is still an emerging field in the Australian region. The majority of Australia's instrumental climate analyses begin in 1900 for rainfall and 1910 for temperature, particularly those focussed on extreme event analysis. This data sparsity for the past in turn limits our understanding of long-term climate variability, constraining efforts to predict the impact of future climate change. To address this need for improved historical data in Australia, a new network of recovered climate observations has recently been developed, centred on the highly populated southeastern Australian region (Ashcroft et al., 2014a, 2014b). The dataset includes observations from more than 39 published and unpublished sources and extends from British settlement in 1788 to the formation of the Australian Bureau of Meteorology in 1908. Many of these historical sources provide daily temperature and rainfall information, providing an opportunity to improve understanding of the multidecadal variability of Australia's extreme events. In this study we combine the historical data for three major Australian cities - Melbourne, Sydney and Adelaide - with modern observations to examine extreme rainfall variability over the past 174 years (1839-2013). We first explore two case studies, combining instrumental and documentary evidence to support the occurrence of severe storms in Sydney in 1841 and 1844. These events appear to be at least as extreme as Sydney's modern 24-hour rainfall record. Next we use a suite of rainfall indices to assess the long-term variability of rainfall in southeastern Australia. In particular, we focus on the stationarity of the teleconnection between the El Niño-Southern Oscillation (ENSO) phenomenon and extreme rainfall events. Using ENSO reconstructions derived from both palaeoclimatic and documentary sources, we determine the historical relationship between extreme rainfall in southeastern Australia and ENSO, and examine whether or not this relationship has remained stable since the early to mid-19th century. Ashcroft, L., Gergis, J., Karoly, D.J., 2014a. A historical climate dataset for southeastern Australia, 1788-1859. Geosci. Data J. 1, 158-178. doi:10.1002/gdj3.19 Ashcroft, L., Karoly, D.J., Gergis, J., 2014b. Southeastern Australian climate variability 1860-2009: A multivariate analysis. Int. J. Climatol. 34, 1928-1944. doi:10.1002/joc.3812

  6. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  7. Imbalanced class learning in epigenetics.

    PubMed

    Haque, M Muksitul; Skinner, Michael K; Holder, Lawrence B

    2014-07-01

    In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset.

  8. WRF-Cordex simulations for Europe: mean and extreme precipitation for present and future climates

    NASA Astrophysics Data System (ADS)

    Cardoso, Rita M.; Soares, Pedro M. M.; Miranda, Pedro M. A.

    2013-04-01

    The Weather Research and Forecast (WRF-ARW) model, version 3.3.1, was used to perform the European domain Cordex simulations, at 50km resolution. A first simulation, forced by ERA-Interim (1989-2009), was carried out to evaluate the models performance to represent the mean and extreme precipitation in present European climate. This evaluation is based in the comparison of WRF results against the ECAD regular gridded dataset of daily precipitation. Results are comparable to recent studies with other models for the European region, at this resolution. For the same domain a control and a future scenario (RCP8.5) simulation was performed to assess the climate change impact on the mean and extreme precipitation. These regional simulations were forced by EC-EARTH model results, and, encompass the periods from 1960-2006 and 2006-2100, respectively.

  9. An efficient abnormal cervical cell detection system based on multi-instance extreme learning machine

    NASA Astrophysics Data System (ADS)

    Zhao, Lili; Yin, Jianping; Yuan, Lihuan; Liu, Qiang; Li, Kuan; Qiu, Minghui

    2017-07-01

    Automatic detection of abnormal cells from cervical smear images is extremely demanded in annual diagnosis of women's cervical cancer. For this medical cell recognition problem, there are three different feature sections, namely cytology morphology, nuclear chromatin pathology and region intensity. The challenges of this problem come from feature combination s and classification accurately and efficiently. Thus, we propose an efficient abnormal cervical cell detection system based on multi-instance extreme learning machine (MI-ELM) to deal with above two questions in one unified framework. MI-ELM is one of the most promising supervised learning classifiers which can deal with several feature sections and realistic classification problems analytically. Experiment results over Herlev dataset demonstrate that the proposed method outperforms three traditional methods for two-class classification in terms of well accuracy and less time.

  10. "Tools For Analysis and Visualization of Large Time- Varying CFD Data Sets"

    NASA Technical Reports Server (NTRS)

    Wilhelms, Jane; vanGelder, Allen

    1999-01-01

    During the four years of this grant (including the one year extension), we have explored many aspects of the visualization of large CFD (Computational Fluid Dynamics) datasets. These have included new direct volume rendering approaches, hierarchical methods, volume decimation, error metrics, parallelization, hardware texture mapping, and methods for analyzing and comparing images. First, we implemented an extremely general direct volume rendering approach that can be used to render rectilinear, curvilinear, or tetrahedral grids, including overlapping multiple zone grids, and time-varying grids. Next, we developed techniques for associating the sample data with a k-d tree, a simple hierarchial data model to approximate samples in the regions covered by each node of the tree, and an error metric for the accuracy of the model. We also explored a new method for determining the accuracy of approximate models based on the light field method described at ACM SIGGRAPH (Association for Computing Machinery Special Interest Group on Computer Graphics) '96. In our initial implementation, we automatically image the volume from 32 approximately evenly distributed positions on the surface of an enclosing tessellated sphere. We then calculate differences between these images under different conditions of volume approximation or decimation.

  11. Imaging Protoplanets: Observing Transition Disks with Non-Redundant Masking

    NASA Astrophysics Data System (ADS)

    Sallum, Stephanie

    2017-01-01

    Transition disks - protoplanetary disks with inner, solar system sized clearings - may be shaped by young planets. Directly imaging protoplanets in these objects requires high contrast and resolution, making them promising targets for future extremely large telescopes. The interferometric technique of non-redundant masking (NRM) is well suited for these observations, enabling companion detection for contrasts of 1:100 - 1:1000 at or within the diffraction limit. My dissertation focuses on searching for and characterizing companions in transition disk clearings using NRM. I will briefly describe the technique and present spatially resolved observations of the T Cha and LkCa 15 transition disks. Both of these objects hosted posited substellar companions. However multi-epoch T Cha datasets cannot be explained by planets orbiting in the disk plane. Conversely, LkCa 15 data taken with the Large Binocular Telescope (LBT) in single-aperture mode reveal the presence of multiple forming planets. The dual aperture LBT will provide triple the angular resolution of these observations, dramatically increasing the phase space for exoplanet detection. I will also present new results from the dual-aperture LBT, with similar resolution to that expected for next generation facilities like GMT.

  12. Database Objects vs Files: Evaluation of alternative strategies for managing large remote sensing data

    NASA Astrophysics Data System (ADS)

    Baru, Chaitan; Nandigam, Viswanath; Krishnan, Sriram

    2010-05-01

    Increasingly, the geoscience user community expects modern IT capabilities to be available in service of their research and education activities, including the ability to easily access and process large remote sensing datasets via online portals such as GEON (www.geongrid.org) and OpenTopography (opentopography.org). However, serving such datasets via online data portals presents a number of challenges. In this talk, we will evaluate the pros and cons of alternative storage strategies for management and processing of such datasets using binary large object implementations (BLOBs) in database systems versus implementation in Hadoop files using the Hadoop Distributed File System (HDFS). The storage and I/O requirements for providing online access to large datasets dictate the need for declustering data across multiple disks, for capacity as well as bandwidth and response time performance. This requires partitioning larger files into a set of smaller files, and is accompanied by the concomitant requirement for managing large numbers of file. Storing these sub-files as blobs in a shared-nothing database implemented across a cluster provides the advantage that all the distributed storage management is done by the DBMS. Furthermore, subsetting and processing routines can be implemented as user-defined functions (UDFs) on these blobs and would run in parallel across the set of nodes in the cluster. On the other hand, there are both storage overheads and constraints, and software licensing dependencies created by such an implementation. Another approach is to store the files in an external filesystem with pointers to them from within database tables. The filesystem may be a regular UNIX filesystem, a parallel filesystem, or HDFS. In the HDFS case, HDFS would provide the file management capability, while the subsetting and processing routines would be implemented as Hadoop programs using the MapReduce model. Hadoop and its related software libraries are freely available. Another consideration is the strategy used for partitioning large data collections, and large datasets within collections, using round-robin vs hash partitioning vs range partitioning methods. Each has different characteristics in terms of spatial locality of data and resultant degree of declustering of the computations on the data. Furthermore, we have observed that, in practice, there can be large variations in the frequency of access to different parts of a large data collection and/or dataset, thereby creating "hotspots" in the data. We will evaluate the ability of different approaches for dealing effectively with such hotspots and alternative strategies for dealing with hotspots.

  13. Toward Computational Cumulative Biology by Combining Models of Biological Datasets

    PubMed Central

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176

  14. Toward computational cumulative biology by combining models of biological datasets.

    PubMed

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  15. The 3D Reference Earth Model (REM-3D): Update and Outlook

    NASA Astrophysics Data System (ADS)

    Lekic, V.; Moulik, P.; Romanowicz, B. A.; Dziewonski, A. M.

    2016-12-01

    Elastic properties of the Earth's interior (e.g. density, rigidity, compressibility, anisotropy) vary spatially due to changes in temperature, pressure, composition, and flow. In the 20th century, seismologists have constructed reference models of how these quantities vary with depth, notably the PREM model of Dziewonski and Anderson (1981). These 1D reference earth models have proven indispensable in earthquake location, imaging of interior structure, understanding material properties under extreme conditions, and as a reference in other fields, such as particle physics and astronomy. Over the past three decades, more sophisticated efforts by seismologists have yielded several generations of models of how properties vary not only with depth, but also laterally. Yet, though these three-dimensional (3D) models exhibit compelling similarities at large scales, differences in the methodology, representation of structure, and dataset upon which they are based, have prevented the creation of 3D community reference models. We propose to overcome these challenges by compiling, reconciling, and distributing a long period (>15 s) reference seismic dataset, from which we will construct a 3D seismic reference model (REM-3D) for the Earth's mantle, which will come in two flavors: a long wavelength smoothly parameterized model and a set of regional profiles. Here, we summarize progress made in the construction of the reference long period dataset, and present preliminary versions of the REM-3D in order to illustrate the two flavors of REM-3D and their relative advantages and disadvantages. As a community reference model and with fully quantified uncertainties and tradeoffs, REM-3D will facilitate Earth imaging studies, earthquake characterization, inferences on temperature and composition in the deep interior, and be of improved utility to emerging scientific endeavors, such as neutrino geoscience. In this presentation, we outline the outlook for setting up advisory community working groups and the community workshop that would assess progress, evaluate model and dataset performance, identify avenues for improvement, and recommend strategies for maximizing model adoption in and utility for the deep Earth community.

  16. Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

    PubMed

    Klein, Max; Sharma, Rati; Bohrer, Chris H; Avelis, Cameron M; Roberts, Elijah

    2017-01-15

    Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. A semiparametric graphical modelling approach for large-scale equity selection.

    PubMed

    Liu, Han; Mulvey, John; Zhao, Tianqi

    2016-01-01

    We propose a new stock selection strategy that exploits rebalancing returns and improves portfolio performance. To effectively harvest rebalancing gains, we apply ideas from elliptical-copula graphical modelling and stability inference to select stocks that are as independent as possible. The proposed elliptical-copula graphical model has a latent Gaussian representation; its structure can be effectively inferred using the regularized rank-based estimators. The resulting algorithm is computationally efficient and scales to large data-sets. To show the efficacy of the proposed method, we apply it to conduct equity selection based on a 16-year health care stock data-set and a large 34-year stock data-set. Empirical tests show that the proposed method is superior to alternative strategies including a principal component analysis-based approach and the classical Markowitz strategy based on the traditional buy-and-hold assumption.

  18. Crystallographic Orientation Relationships (CORs) between rutile inclusions and garnet hosts: towards using COR frequencies as a petrogenetic indicator

    NASA Astrophysics Data System (ADS)

    Griffiths, Thomas; Habler, Gerlinde; Schantl, Philip; Abart, Rainer

    2017-04-01

    Crystallographic orientation relationships (CORs) between crystalline inclusions and their hosts are commonly used to support particular inclusion origins, but often interpretations are based on a small fraction of all inclusions in a system. The electron backscatter diffraction (EBSD) method allows collection of large COR datasets more quickly than other methods while maintaining high spatial resolution. Large datasets allow analysis of the relative frequencies of different CORs, and identification of 'statistical CORs', where certain limited degrees of freedom exist in the orientation relationship between two neighbour crystals (Griffiths et al. 2016). Statistical CORs exist in addition to completely fixed 'specific' CORs (previously the only type of COR considered). We present a comparison of three EBSD single point datasets (all N > 200 inclusions) of rutile inclusions in garnet hosts, covering three rock systems, each with a different geological history: 1) magmatic garnet in pegmatite from the Koralpe complex, Eastern Alps, formed at temperatures > 600°C and low pressures; 2) granulite facies garnet rims on ultra-high-pressure garnets from the Kimi complex, Rhodope Massif; and 3) a Moldanubian granulite from the southeastern Bohemian Massif, equilibrated at peak conditions of 1050°C and 1.6 GPa. The present study is unique because all datasets have been analysed using the same catalogue of potential CORs, therefore relative frequencies and other COR properties can be meaningfully compared. In every dataset > 94% of the inclusions analysed exhibit one of the CORs tested for. Certain CORs are consistently among the most common in all datasets. However, the relative abundances of these common CORs show large variations between datasets (varying from 8 to 42 % relative abundance in one case). Other CORs are consistently uncommon but nonetheless present in every dataset. Lastly, there are some CORs that are common in one of the datasets and rare in the remainder. These patterns suggest competing influences on relative COR frequencies. Certain CORs seem consistently favourable, perhaps pointing to very stable low energy configurations, whereas some CORs are favoured in only one system, perhaps due to particulars of the formation mechanism, kinetics or conditions. Variations in COR frequencies between datasets seem to correlate with the conditions of host-inclusion system evolution. The two datasets from granulite-facies metamorphic samples show more similarities to each other than to the pegmatite dataset, and the sample inferred to have experienced the highest temperatures (Moldanubian granulite) shows the lowest diversity of CORs, low frequencies of statistical CORs and the highest frequency of specific CORs. These results provide evidence that petrological information is being encoded in COR distributions. They make a strong case for further studies of the factors influencing COR development and for measurements of COR distributions in other systems and between different phases. Griffiths, T.A., Habler, G., Abart, R. (2016): Crystallographic orientation relationships in host-inclusion systems: New insights from large EBSD data sets. Amer. Miner., 101, 690-705.

  19. Bayesian Non-Stationary Index Gauge Modeling of Gridded Precipitation Extremes

    NASA Astrophysics Data System (ADS)

    Verdin, A.; Bracken, C.; Caldwell, J.; Balaji, R.; Funk, C. C.

    2017-12-01

    We propose a Bayesian non-stationary model to generate watershed scale gridded estimates of extreme precipitation return levels. The Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) dataset is used to obtain gridded seasonal precipitation extremes over the Taylor Park watershed in Colorado for the period 1981-2016. For each year, grid cells within the Taylor Park watershed are aggregated to a representative "index gauge," which is input to the model. Precipitation-frequency curves for the index gauge are estimated for each year, using climate variables with significant teleconnections as proxies. Such proxies enable short-term forecasting of extremes for the upcoming season. Disaggregation ratios of the index gauge to the grid cells within the watershed are computed for each year and preserved to translate the index gauge precipitation-frequency curve to gridded precipitation-frequency maps for select return periods. Gridded precipitation-frequency maps are of the same spatial resolution as CHIRPS (0.05° x 0.05°). We verify that the disaggregation method preserves spatial coherency of extremes in the Taylor Park watershed. Validation of the index gauge extreme precipitation-frequency method consists of ensuring extreme value statistics are preserved on a grid cell basis. To this end, a non-stationary extreme precipitation-frequency analysis is performed on each grid cell individually, and the resulting frequency curves are compared to those produced by the index gauge disaggregation method.

  20. Multiresolution persistent homology for excessively large biomolecular datasets

    NASA Astrophysics Data System (ADS)

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  1. Elitist Binary Wolf Search Algorithm for Heuristic Feature Selection in High-Dimensional Bioinformatics Datasets.

    PubMed

    Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L

    2017-06-28

    Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.

  2. Convolutional Neural Network Based on Extreme Learning Machine for Maritime Ships Recognition in Infrared Images.

    PubMed

    Khellal, Atmane; Ma, Hongbin; Fei, Qing

    2018-05-09

    The success of Deep Learning models, notably convolutional neural networks (CNNs), makes them the favorable solution for object recognition systems in both visible and infrared domains. However, the lack of training data in the case of maritime ships research leads to poor performance due to the problem of overfitting. In addition, the back-propagation algorithm used to train CNN is very slow and requires tuning many hyperparameters. To overcome these weaknesses, we introduce a new approach fully based on Extreme Learning Machine (ELM) to learn useful CNN features and perform a fast and accurate classification, which is suitable for infrared-based recognition systems. The proposed approach combines an ELM based learning algorithm to train CNN for discriminative features extraction and an ELM based ensemble for classification. The experimental results on VAIS dataset, which is the largest dataset of maritime ships, confirm that the proposed approach outperforms the state-of-the-art models in term of generalization performance and training speed. For instance, the proposed model is up to 950 times faster than the traditional back-propagation based training of convolutional neural networks, primarily for low-level features extraction.

  3. Immersive Interaction, Manipulation and Analysis of Large 3D Datasets for Planetary and Earth Sciences

    NASA Astrophysics Data System (ADS)

    Pariser, O.; Calef, F.; Manning, E. M.; Ardulov, V.

    2017-12-01

    We will present implementation and study of several use-cases of utilizing Virtual Reality (VR) for immersive display, interaction and analysis of large and complex 3D datasets. These datasets have been acquired by the instruments across several Earth, Planetary and Solar Space Robotics Missions. First, we will describe the architecture of the common application framework that was developed to input data, interface with VR display devices and program input controllers in various computing environments. Tethered and portable VR technologies will be contrasted and advantages of each highlighted. We'll proceed to presenting experimental immersive analytics visual constructs that enable augmentation of 3D datasets with 2D ones such as images and statistical and abstract data. We will conclude by presenting comparative analysis with traditional visualization applications and share the feedback provided by our users: scientists and engineers.

  4. Decision tree methods: applications for classification and prediction.

    PubMed

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  5. Spatio-Temporal Data Analysis at Scale Using Models Based on Gaussian Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stein, Michael

    Gaussian processes are the most commonly used statistical model for spatial and spatio-temporal processes that vary continuously. They are broadly applicable in the physical sciences and engineering and are also frequently used to approximate the output of complex computer models, deterministic or stochastic. We undertook research related to theory, computation, and applications of Gaussian processes as well as some work on estimating extremes of distributions for which a Gaussian process assumption might be inappropriate. Our theoretical contributions include the development of new classes of spatial-temporal covariance functions with desirable properties and new results showing that certain covariance models lead tomore » predictions with undesirable properties. To understand how Gaussian process models behave when applied to deterministic computer models, we derived what we believe to be the first significant results on the large sample properties of estimators of parameters of Gaussian processes when the actual process is a simple deterministic function. Finally, we investigated some theoretical issues related to maxima of observations with varying upper bounds and found that, depending on the circumstances, standard large sample results for maxima may or may not hold. Our computational innovations include methods for analyzing large spatial datasets when observations fall on a partially observed grid and methods for estimating parameters of a Gaussian process model from observations taken by a polar-orbiting satellite. In our application of Gaussian process models to deterministic computer experiments, we carried out some matrix computations that would have been infeasible using even extended precision arithmetic by focusing on special cases in which all elements of the matrices under study are rational and using exact arithmetic. The applications we studied include total column ozone as measured from a polar-orbiting satellite, sea surface temperatures over the Pacific Ocean, and annual temperature extremes at a site in New York City. In each of these applications, our theoretical and computational innovations were directly motivated by the challenges posed by analyzing these and similar types of data.« less

  6. Genetic insights into dispersal distance and disperser fitness of African lions (Panthera leo) from the latitudinal extremes of the Kruger National Park, South Africa.

    PubMed

    van Hooft, Pim; Keet, Dewald F; Brebner, Diana K; Bastos, Armanda D S

    2018-04-03

    Female lions generally do not disperse far beyond their natal range, while males can disperse distances of over 200 km. However, in bush-like ecosystems dispersal distances less than 25 km are reported. Here, we investigate dispersal in lions sampled from the northern and southern extremes of Kruger National Park, a bush-like ecosystem in South Africa where bovine tuberculosis prevalence ranges from low to high across a north-south gradient. A total of 109 individuals sampled from 1998 to 2004 were typed using 11 microsatellite markers, and mitochondrial RS-3 gene sequences were generated for 28 of these individuals. Considerable north-south genetic differentiation was observed in both datasets. Dispersal was male-biased and generally further than 25 km, with long-distance male gene flow (75-200 km, detected for two individuals) confirming that male lions can travel large distances, even in bush-like ecosystems. In contrast, females generally did not disperse further than 20 km, with two distinctive RS-3 gene clusters for northern and southern females indicating no or rare long-distance female dispersal. However, dispersal rate for the predominantly non-territorial females from southern Kruger (fraction dispersers ≥0.68) was higher than previously reported. Of relevance was the below-average body condition of dispersers and their low presence in prides, suggesting low fitness. Large genetic differences between the two sampling localities, and low relatedness among males and high dispersal rates among females in the south, suggestive of unstable territory structure and high pride turnover, have potential implications for spread of diseases and the management of the Kruger lion population.

  7. Terra incognita: The unknown risks to environmental quality posed by the spatial distribution and abundance of concentrated animal feeding operations.

    PubMed

    Martin, Katherine L; Emanuel, Ryan E; Vose, James M

    2018-06-18

    Concentrated animal feeding operations (CAFOs) pose wide ranging environmental risks to many parts of the US and across the globe, but datasets for CAFO risk assessments are not readily available. Within the United States, some of the greatest concentrations of CAFOs occur in North Carolina. It is also one of the only states with publicly accessible location data for classes of CAFOs that are required to obtain water quality permits from the U.S. Environmental Protection Agency (EPA); however, there are no public data sources for the large number of CAFOs that do not require EPA water quality permits. We combined public records of CAFO locations with data collected in North Carolina by the Waterkeeper and Riverkeeper Alliances to examine the distribution of both permitted and non-permitted CAFOs across the state. Over half (55%) of the state's 6646 CAFOs are located in the Coastal Plain, a low-lying region vulnerable to flooding associated with regular cyclonic and convective storms. We identified 19% of CAFOs ≤ 100 m of the nearest stream, and some as close as 15 m to the nearest stream, a common riparian buffer width for water quality management. Future climate scenarios suggest large storm events are expected to become increasingly extreme, and dry interstorm periods could lengthen. Such extremes could exacerbate the environmental impacts of CAFOs. Understanding the potential impacts of CAFO agroecosystems will require remote sensing to identify CAFOs, fieldwork to determine the extent of environmental footprints, and modeling to identify thresholds that determine environmental risk under changing conditions. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. PeakRanger: A cloud-enabled peak caller for ChIP-seq data

    PubMed Central

    2011-01-01

    Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709

  9. Urban heat stress: novel survey suggests health and fitness as future avenue for research and adaptation strategies

    NASA Astrophysics Data System (ADS)

    Schuster, Christian; Honold, Jasmin; Lauf, Steffen; Lakes, Tobia

    2017-04-01

    Extreme heat has tremendous adverse effects on human health. Heat stress is expected to further increase due to urbanization, an aging population, and global warming. Previous research has identified correlations between extreme heat and mortality. However, the underlying physical, behavioral, environmental, and social risk factors remain largely unknown and comprehensive quantitative investigation on an individual level is lacking. We conducted a new cross-sectional household questionnaire survey to analyze individual heat impairment (self-assessed and reported symptoms) and a large set of potential risk factors in the city of Berlin, Germany. This unique dataset (n = 474) allows for the investigation of new relationships, especially between health/fitness and urban heat stress. Our analysis found previously undocumented associations, leading us to generate new hypotheses for future research: various health/fitness variables returned the strongest associations with individual heat stress. Our primary hypothesis is that age, the most commonly used risk factor, is outperformed by health/fitness as a dominant risk factor. Related variables seem to more accurately represent humans’ cardiovascular capacity to handle elevated temperature. Among them, active travel was associated with reduced heat stress. We observed statistical associations for heat exposure regarding the individual living space but not for the neighborhood environment. Heat stress research should further investigate individual risk factors of heat stress using quantitative methodologies. It should focus more on health and fitness and systematically explore their role in adaptation strategies. The potential of health and fitness to reduce urban heat stress risk means that encouraging active travel could be an effective adaptation strategy. Through reduced CO2 emissions from urban transport, societies could reap double rewards by addressing two root causes of urban heat stress: population health and global warming.

  10. Lack of cool, not warm, extremes distinguishes late 20th Century climate in 979-year Tasmanian summer temperature reconstruction

    NASA Astrophysics Data System (ADS)

    Allen, K. J.; Cook, E. R.; Evans, R.; Francey, R.; Buckley, B. M.; Palmer, J. G.; Peterson, M. J.; Baker, P. J.

    2018-03-01

    Very few annually resolved millennial-length temperature reconstructions exist for the Southern Hemisphere. Here we present four 979-year reconstructions for southeastern Australia for the austral summer months of December-February. Two of the reconstructions are based on the Australian Water Availability Project dataset and two on the Berkeley Earth Surface Temperature dataset. For each climate data set, one reconstruction is based solely on Lagarostrobos franklinii (restricted reconstructions) while the other is based on multiple Tasmanian conifer species (unrestricted reconstructions). Each reconstruction calibrates ~50-60% of the variance in the temperature datasets depending on the number of tree-ring records available for the reconstruction. We found little difference in the temporal variability of the reconstructions, although extremes are amplified in the restricted reconstructions relative to the unrestricted reconstructions. The reconstructions highlight the occurrence of numerous individual years, especially in the 15th-17th Centuries, for which temperatures were comparable with those of the late 20th Century. The 1950-1999 period, however, stands out as the warmest 50-year period on average for the past 979 years, with a sustained shift away from relatively low mean temperatures, the length of which is unique in the 979-year record. The reconstructions are strongly and positively related to temperatures across the southeast of the Australian continent, negatively related to temperatures in the north and northeast of the continent, and uncorrelated with temperatures in the west. The lack of a strong relationship with temperatures across the continent highlights the necessity of a sub-regional focus for Australasian temperature reconstructions.

  11. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

    PubMed

    Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E

    2016-01-01

    Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.

  12. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

    PubMed Central

    Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

    2018-01-01

    Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777

  13. Large-Scale Meteorological Patterns Associated with Extreme Precipitation in the US Northeast

    NASA Astrophysics Data System (ADS)

    Agel, L. A.; Barlow, M. A.

    2016-12-01

    Patterns of daily large-scale circulation associated with Northeast US extreme precipitation are identified using both k-means clustering (KMC) and Self-Organizing Maps (SOM) applied to tropopause height. Tropopause height provides a compact representation of large-scale circulation patterns, as it is linked to mid-level circulation, low-level thermal contrasts and low-level diabatic heating. Extreme precipitation is defined as the top 1% of daily wet-day observations at 35 Northeast stations, 1979-2008. KMC is applied on extreme precipitation days only, while the SOM algorithm is applied to all days in order to place the extreme results into a larger context. Six tropopause patterns are identified on extreme days: a summertime tropopause ridge, a summertime shallow trough/ridge, a summertime shallow eastern US trough, a deeper wintertime eastern US trough, and two versions of a deep cold-weather trough located across the east-central US. Thirty SOM patterns for all days are identified. Results for all days show that 6 SOM patterns account for almost half of the extreme days, although extreme precipitation occurs in all SOM patterns. The same SOM patterns associated with extreme precipitation also routinely produce non-extreme precipitation; however, on extreme precipitation days the troughs, on average, are deeper and the downstream ridges more pronounced. Analysis of other fields associated with the large-scale patterns show various degrees of anomalously strong upward motion during, and moisture transport preceding, extreme precipitation events.

  14. Restructuring Big Data to Improve Data Access and Performance in Analytic Services Making Research More Efficient for the Study of Extreme Weather Events and Application User Communities

    NASA Astrophysics Data System (ADS)

    Ostrenga, D.; Shen, S.; Vollmer, B.; Meyer, D. L.

    2017-12-01

    NASA climate reanalysis dataset from MERRA-2 contains numerous data for atmosphere, land, and ocean, that are grouped into 95 products of archived volume over 300 TB. The data files are saved as hourly-file, day-file (hourly time interval) and month-file containing up to 125 parameters. Due to the large number of data files and the sheer data volumes, it is a challenging for users, especially those in the application research community, to handle dealing with the original data files. Most of these researchers prefer to focus on a small region or single location using the hourly data for long time periods to analyze extreme weather events or say winds for renewable energy applications. At the GES DISC, we have been working closely with the science teams and the application user community to create several new value added data products and high quality services to facilitate the use of the model data for various types of research. We have tested converting hourly data from one-day per file into different data cubes, such as one-month, one-year, or whole-mission and then continued to analyze the efficiency of the accessibility of this newly structured data through various services. Initial results have shown that compared to the original file structure, the new data has significantly improved the performance for accessing long time series. It is noticed that the performance is associated to the cube size and structure, the compression method, and how the data are accessed. The optimized data cube structure will not only improve the data access, but also enable better online analytic services for doing statistical analysis and extreme events mining. Two case studies will be presented using the newly structured data and value added services, the California drought and the extreme drought of the Northeastern states of Brazil. Furthermore, data access and analysis through cloud storage capabilities will be investigated.

  15. Changes in wind speed and extremes in Beijing during 1960-2008 based on homogenized observations

    NASA Astrophysics Data System (ADS)

    Li, Zhen; Yan, Zhongwei; Tu, Kai; Liu, Weidong; Wang, Yingchun

    2011-03-01

    Daily observations of wind speed at 12 stations in the Greater Beijing Area during 1960-2008 were homogenized using the Multiple Analysis of Series for Homogenization method. The linear trends in the regional mean annual and seasonal (winter, spring, summer and autumn) wind speed series were -0.26, -0.39, -0.30, -0.12 and -0.22 m s-1 (10 yr)-1, respectively. Winter showed the greatest magnitude in declining wind speed, followed by spring, autumn and summer. The annual and seasonal frequencies of wind speed extremes (days) also decreased, more prominently for winter than for the other seasons. The declining trends in wind speed and extremes were formed mainly by some rapid declines during the 1970s and 1980s. The maximum declining trend in wind speed occurred at Chaoyang (CY), a station within the central business district (CBD) of Beijing with the highest level of urbanization. The declining trends were in general smaller in magnitude away from the city center, except for the winter case in which the maximum declining trend shifted northeastward to rural Miyun (MY). The influence of urbanization on the annual wind speed was estimated to be about -0.05 m s-1 (10 yr)-1 during 1960-2008, accounting for around one fifth of the regional mean declining trend. The annual and seasonal geostrophic wind speeds around Beijing, based on daily mean sea level pressure (MSLP) from the ERA-40 reanalysis dataset, also exhibited decreasing trends, coincident with the results from site observations. A comparative analysis of the MSLP fields between 1966-1975 and 1992-2001 suggested that the influences of both the winter and summer monsoons on Beijing were weaker in the more recent of the two decades. It is suggested that the bulk of wind in Beijing is influenced considerably by urbanization, while changes in strong winds or wind speed extremes are prone to large-scale climate change in the region.

  16. geneLAB: Expanding the Impact of NASA's Biological Research in Space

    NASA Technical Reports Server (NTRS)

    Rayl, Nicole; Smith, Jeffrey D.

    2014-01-01

    The geneLAB project is designed to leverage the value of large 'omics' datasets from molecular biology projects conducted on the ISS by making these datasets available, citable, discoverable, interpretable, reusable, and reproducible. geneLAB will create a collaboration space with an integrated set of tools for depositing, accessing, analyzing, and modeling these diverse datasets from spaceflight and related terrestrial studies.

  17. Scene text detection by leveraging multi-channel information and local context

    NASA Astrophysics Data System (ADS)

    Wang, Runmin; Qian, Shengyou; Yang, Jianfeng; Gao, Changxin

    2018-03-01

    As an important information carrier, texts play significant roles in many applications. However, text detection in unconstrained scenes is a challenging problem due to cluttered backgrounds, various appearances, uneven illumination, etc.. In this paper, an approach based on multi-channel information and local context is proposed to detect texts in natural scenes. According to character candidate detection plays a vital role in text detection system, Maximally Stable Extremal Regions(MSERs) and Graph-cut based method are integrated to obtain the character candidates by leveraging the multi-channel image information. A cascaded false positive elimination mechanism are constructed from the perspective of the character and the text line respectively. Since the local context information is very valuable for us, these information is utilized to retrieve the missing characters for boosting the text detection performance. Experimental results on two benchmark datasets, i.e., the ICDAR 2011 dataset and the ICDAR 2013 dataset, demonstrate that the proposed method have achieved the state-of-the-art performance.

  18. Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge

    PubMed Central

    Wei, Wei; Ji, Zhanglong; He, Yupeng; Zhang, Kai; Ha, Yuanchi; Li, Qi; Ohno-Machado, Lucila

    2018-01-01

    Abstract The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline PMID:29688374

  19. The maximum vector-angular margin classifier and its fast training on large datasets using a core vector machine.

    PubMed

    Hu, Wenjun; Chung, Fu-Lai; Wang, Shitong

    2012-03-01

    Although pattern classification has been extensively studied in the past decades, how to effectively solve the corresponding training on large datasets is a problem that still requires particular attention. Many kernelized classification methods, such as SVM and SVDD, can be formulated as the corresponding quadratic programming (QP) problems, but computing the associated kernel matrices requires O(n2)(or even up to O(n3)) computational complexity, where n is the size of the training patterns, which heavily limits the applicability of these methods for large datasets. In this paper, a new classification method called the maximum vector-angular margin classifier (MAMC) is first proposed based on the vector-angular margin to find an optimal vector c in the pattern feature space, and all the testing patterns can be classified in terms of the maximum vector-angular margin ρ, between the vector c and all the training data points. Accordingly, it is proved that the kernelized MAMC can be equivalently formulated as the kernelized Minimum Enclosing Ball (MEB), which leads to a distinctive merit of MAMC, i.e., it has the flexibility of controlling the sum of support vectors like v-SVC and may be extended to a maximum vector-angular margin core vector machine (MAMCVM) by connecting the core vector machine (CVM) method with MAMC such that the corresponding fast training on large datasets can be effectively achieved. Experimental results on artificial and real datasets are provided to validate the power of the proposed methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Training Scalable Restricted Boltzmann Machines Using a Quantum Annealer

    NASA Astrophysics Data System (ADS)

    Kumar, V.; Bass, G.; Dulny, J., III

    2016-12-01

    Machine learning and the optimization involved therein is of critical importance for commercial and military applications. Due to the computational complexity of many-variable optimization, the conventional approach is to employ meta-heuristic techniques to find suboptimal solutions. Quantum Annealing (QA) hardware offers a completely novel approach with the potential to obtain significantly better solutions with large speed-ups compared to traditional computing. In this presentation, we describe our development of new machine learning algorithms tailored for QA hardware. We are training restricted Boltzmann machines (RBMs) using QA hardware on large, high-dimensional commercial datasets. Traditional optimization heuristics such as contrastive divergence and other closely related techniques are slow to converge, especially on large datasets. Recent studies have indicated that QA hardware when used as a sampler provides better training performance compared to conventional approaches. Most of these studies have been limited to moderately-sized datasets due to the hardware restrictions imposed by exisitng QA devices, which make it difficult to solve real-world problems at scale. In this work we develop novel strategies to circumvent this issue. We discuss scale-up techniques such as enhanced embedding and partitioned RBMs which allow large commercial datasets to be learned using QA hardware. We present our initial results obtained by training an RBM as an autoencoder on an image dataset. The results obtained so far indicate that the convergence rates can be improved significantly by increasing RBM network connectivity. These ideas can be readily applied to generalized Boltzmann machines and we are currently investigating this in an ongoing project.

  1. Non-stationarity of extreme weather events in a changing climate - an application to long-term droughts in the US Southwest

    NASA Astrophysics Data System (ADS)

    Grossmann, I.

    2013-12-01

    Return periods of many extreme weather events are not stationary over time, given increasing risks due to global warming and multidecadal variability resulting from large scale climate patterns. This is problematic as extreme weather events and long-term climate risks such as droughts are typically conceptualized via measures such as return periods that implicitly assume non-stationarity. I briefly review these problems and present an application to the non-stationarity of droughts in the US Southwest. The US Southwest relies on annual precipitation maxima during winter and the North American Monsoon (NAM), both of which vary with large-scale climate patterns, in particular ENSO, the Pacific Decadal Oscillation (PDO) and the Atlantic Multidecadal Oscillation (AMO). The latter two exhibit variability on longer (multi-decadal) time scales in addition to short-term variations. The region is also part of the subtropical belt projected to become more arid in a warming climate. The possible multidecadal impacts of the PDO on precipitation in the study region are analyzed with a focus on Arizona and New Mexico, using GPCC and CRU data since 1900. The projected impacts of the PDO on annual precipitation during the next three decades with GPCC data are similar in scale to the impacts of global warming on precipitation according to the A1B scenario and the CMIP2 multi-model means, while the combined impact of the PDO and AMO is about 19% larger. The effects according to the CRU dataset are about half as large as the projected global warming impacts. Given the magnitude of the projected impacts from both multidecadal variability and global warming, water management needs to explicitly incorporate both of these trends into long-term planning. Multi-decadal variability could be incorporated into the concept of return periods by presenting return periods as time-varying or as conditional on the respective 'phase' of relevant multidecadal patterns and on global warming. Problems in detecting the PDO signal and potential solutions are also discussed. We find that the long-term effect of the PDO can be more clearly separated from short-term variability by considering return periods of multi-year drought measures rather than return periods of simple drought measures that are more affected by short-term variations.

  2. Observational analysis and large-scale pattern associated with cold events moving up the equator line over South America

    NASA Astrophysics Data System (ADS)

    Viana, Liviany; Herdies, Dirceu; Muller, Gabriela

    2017-04-01

    An observational study was carried out to quantify the events of cold air outbreak moving above the Equator from 1980 to 2013 during the austral winter period (May, June, July, August and September), and later analyzed the behavior of the circulation responsible for this displacement. The observational datasets from the Sector of Climatological studies of the Institute of Airspace Control of the city of Iauarete (0.61N, 69.0W; 120m), located at the extreme northern of the Brazilian Amazon Basin, were used for the analyzes. The meteorological variables used were the temperatures minimum, maximum and maximum atmospheric pressure. A new methodology was used to identify these events, calculated by the difference between the monthly average and 2 (two) standard deviations for the extremes of the air temperature, and the sum of 1 (one) standard deviation for the maximum atmospheric pressure. As a result, a total of 11 cold events were recorded that reached the extreme northern of the Brazilian Amazon Basin, with values recorded at a minimum temperature of 17.8 °C, at the maximum temperature of 21.0 °C and maximum atmospheric pressure reaching 1021.2 hPa. These reductions and augmentation are equivalent to the negative anomalies of 5.9 and 8.7 °C at the minimum and maximum temperatures, respectively, while a positive anomaly of 7.1 hPa was observed at the maximum pressure. In relation to the dynamic behavior of large-scale circulation, a Rossby wave-type configuration propagating from west to east over subtropical latitudes was observed from the European Center for Medium-Range Weather Forecast (ECMWF) since the days before the arrival of the event in the city of Iauarete. This behavior was observed both in the anomalies of the gepotencial (250 hPa and 850 hPa) and in the southern component of the wind (250 hPa and 850 hPa), both presenting statistical significance of 99 % (Student's T test). Therefore, a new criterion for the identification of "friagens" in the tropical latitude has been able to represent the effects of colds air outbreak and the advancement of the cold air mass, which are subsidized by the large-scale circulation, and consequently contribute to the modifications in the weather and the life of the population over this Equatorial region.

  3. Assessing Uncertainty in Deep Learning Techniques that Identify Atmospheric Rivers in Climate Simulations

    NASA Astrophysics Data System (ADS)

    Mahesh, A.; Mudigonda, M.; Kim, S. K.; Kashinath, K.; Kahou, S.; Michalski, V.; Williams, D. N.; Liu, Y.; Prabhat, M.; Loring, B.; O'Brien, T. A.; Collins, W. D.

    2017-12-01

    Atmospheric rivers (ARs) can be the difference between CA facing drought or hurricane-level storms. ARs are a form of extreme weather defined as long, narrow columns of moisture which transport water vapor outside the tropics. When they make landfall, they release the vapor as rain or snow. Convolutional neural networks (CNNs), a machine learning technique that uses filters to recognize features, are the leading computer vision mechanism for classifying multichannel images. CNNs have been proven to be effective in identifying extreme weather events in climate simulation output (Liu et. al. 2016, ABDA'16, http://bit.ly/2hlrFNV). Here, we compare three different CNN architectures, tuned with different hyperparameters and training schemes. We compare two-layer, three-layer, four-layer, and sixteen-layer CNNs' ability to recognize ARs in Community Atmospheric Model version 5 output, and we explore the ability of data augmentation and pre-trained models to increase the accuracy of the classifier. Because pre-training the model with regular images (i.e. benches, stoves, and dogs) yielded the highest accuracy rate, this strategy, also known as transfer learning, may be vital in future scientific CNNs, which likely will not have access to a large labelled training dataset. By choosing the most effective CNN architecture, climate scientists can build an accurate historical database of ARs, which can be used to develop a predictive understanding of these phenomena.

  4. A dataset of human decision-making in teamwork management.

    PubMed

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-17

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  5. A dataset of human decision-making in teamwork management

    PubMed Central

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members’ capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches. PMID:28094787

  6. A dataset of human decision-making in teamwork management

    NASA Astrophysics Data System (ADS)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  7. GODIVA2: interactive visualization of environmental data on the Web.

    PubMed

    Blower, J D; Haines, K; Santokhee, A; Liu, C L

    2009-03-13

    GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.

  8. The Greenwich Photo-heliographic Results (1874 - 1976): Initial Corrections to the Printed Publications

    NASA Astrophysics Data System (ADS)

    Erwin, E. H.; Coffey, H. E.; Denig, W. F.; Willis, D. M.; Henwood, R.; Wild, M. N.

    2013-11-01

    A new sunspot and faculae digital dataset for the interval 1874 - 1955 has been prepared under the auspices of the NOAA National Geophysical Data Center (NGDC). This digital dataset contains measurements of the positions and areas of both sunspots and faculae published initially by the Royal Observatory, Greenwich, and subsequently by the Royal Greenwich Observatory (RGO), under the title Greenwich Photo-heliographic Results ( GPR) , 1874 - 1976. Quality control (QC) procedures based on logical consistency have been used to identify the more obvious errors in the RGO publications. Typical examples of identifiable errors are North versus South errors in specifying heliographic latitude, errors in specifying heliographic (Carrington) longitude, errors in the dates and times, errors in sunspot group numbers, arithmetic errors in the summation process, and the occasional omission of solar ephemerides. Although the number of errors in the RGO publications is remarkably small, an initial table of necessary corrections is provided for the interval 1874 - 1917. Moreover, as noted in the preceding companion papers, the existence of two independently prepared digital datasets, which both contain information on sunspot positions and areas, makes it possible to outline a preliminary strategy for the development of an even more accurate digital dataset. Further work is in progress to generate an extremely reliable sunspot digital dataset, based on the long programme of solar observations supported first by the Royal Observatory, Greenwich, and then by the Royal Greenwich Observatory.

  9. Addressing Methodological Challenges in Large Communication Datasets: Collecting and Coding Longitudinal Interactions in Home Hospice Cancer Care

    PubMed Central

    Reblin, Maija; Clayton, Margaret F; John, Kevin K; Ellington, Lee

    2015-01-01

    In this paper, we present strategies for collecting and coding a large longitudinal communication dataset collected across multiple sites, consisting of over 2000 hours of digital audio recordings from approximately 300 families. We describe our methods within the context of implementing a large-scale study of communication during cancer home hospice nurse visits, but this procedure could be adapted to communication datasets across a wide variety of settings. This research is the first study designed to capture home hospice nurse-caregiver communication, a highly understudied location and type of communication event. We present a detailed example protocol encompassing data collection in the home environment, large-scale, multi-site secure data management, the development of theoretically-based communication coding, and strategies for preventing coder drift and ensuring reliability of analyses. Although each of these challenges have the potential to undermine the utility of the data, reliability between coders is often the only issue consistently reported and addressed in the literature. Overall, our approach demonstrates rigor and provides a “how-to” example for managing large, digitally-recorded data sets from collection through analysis. These strategies can inform other large-scale health communication research. PMID:26580414

  10. Identification of trends in intensity and frequency of extreme rainfall events in part of the Indian Himalaya

    NASA Astrophysics Data System (ADS)

    Bhardwaj, Alok; Ziegler, Alan D.; Wasson, Robert J.; Chow, Winston; Sharma, Mukat L.

    2017-04-01

    Extreme monsoon rainfall is the primary reason of floods and other secondary hazards such as landslides in the Indian Himalaya. Understanding the phenomena of extreme monsoon rainfall is therefore required to study the natural hazards. In this work, we study the characteristics of extreme monsoon rainfall including its intensity and frequency in the Garhwal Himalaya in India, with a focus on the Mandakini River Catchment, the site of devastating flood and multiple large landslides in 2013. We have used two long term rainfall gridded data sets: the Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) product with daily rainfall data from 1951-2007 and the India Meteorological Department (IMD) product with daily rainfall data from 1901 to 2013. Two methods of Mann Kendall and Sen Slope estimator are used to identify the statistical significance and magnitude of trends in intensity and frequency of extreme monsoon rainfall respectively, at a significance level of 0.05. The autocorrelation in the time series of extreme monsoon rainfall is identified and reduced using the methods of: pre-whitening, trend-free pre-whitening, variance correction, and block bootstrap. We define extreme monsoon rainfall threshold as the 99th percentile of time series of rainfall values and any rainfall depth greater than 99th percentile is considered as extreme in nature. With the IMD data set, significant increasing trend in intensity and frequency of extreme rainfall with slope magnitude of 0.55 and 0.02 respectively was obtained in the north of the Mandakini Catchment as identified by all four methods. Significant increasing trend in intensity with a slope magnitude of 0.3 is found in the middle of the catchment as identified by all methods except block bootstrap. In the south of the catchment, significant increasing trend in intensity with a slope magnitude of 0.86 for pre-whitening method and 0.28 for trend-free pre-whitening and variance correction methods was obtained. Further, increasing trend in frequency with a slope magnitude of 0.01 was identified by three methods except block bootstrap in the south of the catchment. With the APHRODITE data set, we obtained significant increasing trend in intensity with a slope magnitude of 1.27 at the middle of the catchment as identified by all four methods. Collectively, both the datasets show signals of increasing intensity, and IMD shows results for increasing frequency in the Mandakini Catchment. The increasing occurrence of extreme events, as identified here, is becoming more disastrous because of rising human population and infrastructure in the Mandakini Catchment. For example, the 2013 flood due to extreme rainfall was catastrophic in terms of loss of human and animal lives and destruction of the local economy. We believe our results will help understand more about extreme rainfall events in the Mandakini Catchment and in the Indian Himalaya.

  11. Development of a global slope dataset for estimation of landslide occurrence resulting from earthquakes

    USGS Publications Warehouse

    Verdin, Kristine L.; Godt, Jonathan W.; Funk, Christopher C.; Pedreros, Diego; Worstell, Bruce; Verdin, James

    2007-01-01

    Landslides resulting from earthquakes can cause widespread loss of life and damage to critical infrastructure. The U.S. Geological Survey (USGS) has developed an alarm system, PAGER (Prompt Assessment of Global Earthquakes for Response), that aims to provide timely information to emergency relief organizations on the impact of earthquakes. Landslides are responsible for many of the damaging effects following large earthquakes in mountainous regions, and thus data defining the topographic relief and slope are critical to the PAGER system. A new global topographic dataset was developed to aid in rapidly estimating landslide potential following large earthquakes. We used the remotely-sensed elevation data collected as part of the Shuttle Radar Topography Mission (SRTM) to generate a slope dataset with nearly global coverage. Slopes from the SRTM data, computed at 3-arc-second resolution, were summarized at 30-arc-second resolution, along with statistics developed to describe the distribution of slope within each 30-arc-second pixel. Because there are many small areas lacking SRTM data and the northern limit of the SRTM mission was lat 60?N., statistical methods referencing other elevation data were used to fill the voids within the dataset and to extrapolate the data north of 60?. The dataset will be used in the PAGER system to rapidly assess the susceptibility of areas to landsliding following large earthquakes.

  12. Observations and simulations of the interactions between clouds, radiation, and precipitation

    NASA Astrophysics Data System (ADS)

    Naegele, Alexandra Claire

    Increasing precipitation and warming temperatures associated with climate change have been documented across the globe, including in the Northeast US. These climate changes threaten human health in many ways. Research is necessary to understand and explain the relationship between climate change and human health. Extreme weather events such as extreme temperatures, convective storms, floods, lightning events, wintry precipitation, and low visibility, are frequently associated with adverse effects on human health. While more media attention is typically given to events that cause the most structural or economic damage (e.g., tornadoes, hurricanes, earthquakes, etc.), extreme temperatures ultimately account for the greatest loss of life in the US. Extreme weather events can be unpredictable; however, improved knowledge and technology allow meteorologists to accurately forecast many of these events, specifically extreme temperature and precipitation events. Advancing our knowledge of climate variability and trends in extreme weather can inform: public education programs to alert the community of the dangers of extreme heat or cold, emergency response plans to hazardous weather conditions, and current thresholds for emergency alerts. This study evaluates trends in extreme weather events across New Hampshire and links these extreme events to adverse health outcomes. Using data from NCEI Global Historical Climatological Network (GHCN) - Daily dataset (1981 - 2015), five daily xiii Extreme Weather Metrics (EWMs) were defined: Daily Maximum Temperature ≤32°F, Daily Maximum Temperature ≥90°F, Daily Maximum Temperature ≥95°F, Daily Precipitation ≥1", and Daily Precipitation ≥2". Relevant human health outcomes were extracted from the New Hampshire Hospital Discharge Dataset for the years 2001-2009. Health cases were defined based on the International Classification of Disease 9th Revision (ICD-9). Outcomes in this analysis include: All-Cause Injury, Vehicle Accidents, Accidental Falls, Accidents Due to Natural and Environmental (including excessive heat, excessive cold, exposure due to weather conditions, lightning, and storms and floods), Accidental Drowning, and Carbon Monoxide Poisoning. Temporal and spatial trends were assessed, and the associations between all health outcomes and EWMs, daily maximum temperature, and daily precipitation were evaluated via Spearman correlations. Once the four strongest correlations were determined, a quasi-Poisson regression model was used to evaluate the relationship between each exposureoutcome pair. These pairs were modeled to show the relation between maximum temperature and all-cause hospital visits, hospital visits related to vehicle accidents, hospital visits related to accidental falls, and hospital visits related to heat. Future work will incorporate these findings into public health planning and programming. This project is a collaboration with New Hampshire Department of Health and Human Services (NH DHHS) who have a shared interest in understanding the impact of extreme weather events on the citizens of New Hampshire. Furthermore, this work supports an ongoing effort to implement the Centers for Disease Control (CDC) Building Resilience Against Climate Effects (BRACE) Framework, which focuses on identifying climate and weather-related hazards and estimating the associated disease burden.

  13. ANNA: A Convolutional Neural Network Code for Spectroscopic Analysis

    NASA Astrophysics Data System (ADS)

    Lee-Brown, Donald; Anthony-Twarog, Barbara J.; Twarog, Bruce A.

    2018-01-01

    We present ANNA, a Python-based convolutional neural network code for the automated analysis of stellar spectra. ANNA provides a flexible framework that allows atmospheric parameters such as temperature and metallicity to be determined with accuracies comparable to those of established but less efficient techniques. ANNA performs its parameterization extremely quickly; typically several thousand spectra can be analyzed in less than a second. Additionally, the code incorporates features which greatly speed up the training process necessary for the neural network to measure spectra accurately, resulting in a tool that can easily be run on a single desktop or laptop computer. Thus, ANNA is useful in an era when spectrographs increasingly have the capability to collect dozens to hundreds of spectra each night. This talk will cover the basic features included in ANNA and demonstrate its performance in two use cases: an open cluster abundance analysis involving several hundred spectra, and a metal-rich field star study. Applicability of the code to large survey datasets will also be discussed.

  14. Independent evolution of baleen whale gigantism linked to Plio-Pleistocene ocean dynamics

    PubMed Central

    Goldbogen, Jeremy A.

    2017-01-01

    Vertebrates have evolved to gigantic sizes repeatedly over the past 250 Myr, reaching their extreme in today's baleen whales (Mysticeti). Hypotheses for the evolution of exceptionally large size in mysticetes range from niche partitioning to predator avoidance, but there has been no quantitative examination of body size evolutionary dynamics in this clade and it remains unclear when, why or how gigantism evolved. By fitting phylogenetic macroevolutionary models to a dataset consisting of living and extinct species, we show that mysticetes underwent a clade-wide shift in their mode of body size evolution during the Plio-Pleistocene. This transition, from Brownian motion-like dynamics to a trended random walk towards larger size, is temporally linked to the onset of seasonally intensified upwelling along coastal ecosystems. High prey densities resulting from wind-driven upwelling, rather than abundant resources alone, are the primary determinant of efficient foraging in extant mysticetes and Late Pliocene changes in ocean dynamics may have provided an ecological pathway to gigantism in multiple independent lineages. PMID:28539520

  15. Rosetta Langmuir Probe Photoelectron Emission and Solar Ultraviolet Flux at Comet 67P

    NASA Astrophysics Data System (ADS)

    Johansson, F. L.; Odelstad, E.; Paulsson, J. J.; Harang, S. S.; Eriksson, A. I.; Mannel, T.; Vigren, E.; Edberg, N. J. T.; Miloch, W. J.; Simon Wedlund, C.; Thiemann, E.; Epavier, F.; Andersson, L.

    2017-12-01

    The Langmuir Probe instrument on Rosetta monitored the photoelectron emission current of the probes during the Rosetta mission at comet 67P/Churyumov-Gerasimenko, in essence acting as a photodiode monitoring the solar ultraviolet radiation at wavelengths below 250 nm. We have used three methods of extracting the photoelectron saturation current from the Langmuir probe measurements. The resulting dataset can be used as an index of the solar far and extreme ultraviolet at the Rosetta spacecraft position, including flares, in wavelengths that are important for photoionisation of the cometary neutral gas. Comparing the photoemission current to data measurements by MAVEN/EUVM and TIMED/SEE, we find good correlation when 67P was at large heliocentric distances early and late in the mission, but up to 50 percent decrease of the expected photoelectron current at perihelion. We discuss possible reasons for the photoemission decrease, including scattering and absorption by nanograins created by disintegration of cometary dust far away from the nucleus.

  16. Campaign datasets for ARM Cloud Aerosol Precipitation Experiment (ACAPEX)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leung, L. Ruby; Mei, Fan; Comstock, Jennifer

    This campaign consisted of the deployment of the DOE ARM Mobile Facility 2 (AMF2) and the ARM Aerial Facility (AAF) G-1 in a field campaign called ARM Cloud Aerosol Precipitation Experiment (ACAPEX), which took place in conjunction with CalWater 2- a NOAA field campaign. The joint CalWater 2/ACAPEX field campaign aimed to improve understanding and modeling of large-scale dynamics and cloud and precipitation processes associated with ARs and aerosol-cloud interactions that influence precipitation variability and extremes in the western U.S. The observational strategy consisted of the use of land and offshore assets to monitor: 1. the evolution and structure ofmore » ARs from near their regions of development 2. the long-range transport of aerosols in the eastern North Pacific and potential interactions with ARs 3. how aerosols from long-range transport and local sources influence cloud and precipitation in the U.S. West Coast where ARs make landfall and post-frontal clouds are frequent.« less

  17. The (un)reliability of item-level semantic priming effects.

    PubMed

    Heyman, Tom; Bruninx, Anke; Hutchison, Keith A; Storms, Gert

    2018-04-05

    Many researchers have tried to predict semantic priming effects using a myriad of variables (e.g., prime-target associative strength or co-occurrence frequency). The idea is that relatedness varies across prime-target pairs, which should be reflected in the size of the priming effect (e.g., cat should prime dog more than animal does). However, it is only insightful to predict item-level priming effects if they can be measured reliably. Thus, in the present study we examined the split-half and test-retest reliabilities of item-level priming effects under conditions that should discourage the use of strategies. The resulting priming effects proved extremely unreliable, and reanalyses of three published priming datasets revealed similar cases of low reliability. These results imply that previous attempts to predict semantic priming were unlikely to be successful. However, one study with an unusually large sample size yielded more favorable reliability estimates, suggesting that big data, in terms of items and participants, should be the future for semantic priming research.

  18. Fermi and Swift Gamma-Ray Burst Afterglow Population Studies

    NASA Technical Reports Server (NTRS)

    Racusin, Judith L.; Oates, S. R.; Schady, P.; Burrows, D. N.; dePasquale, M.; Donato, D.; Gehrels, N.; Koch, S.; McEnery, J.; Piran, T.; hide

    2011-01-01

    The new and extreme population of GRBs detected by Fermi -LAT shows several new features in high energy gamma-rays that are providing interesting and unexpected clues into GRB prompt and afterglow emission mechanisms. Over the last 6 years, it has been Swift that has provided the robust dataset of UV/optical and X-ray afterglow observations that opened many windows into components of GRB emission structure. The relationship between the LAT detected GRBs and the well studied, fainter, less energetic GRBs detected by Swift -BAT is only beginning to be explored by multi-wavelength studies. We explore the large sample of GRBs detected by BAT only, BAT and Fermi -GBM, and GBM and LAT, focusing on these samples separately in order to search for statistically significant differences between the populations, using only those GRBs with measured redshifts in order to physically characterize these objects. We disentangle which differences are instrumental selection effects versus intrinsic properties, in order to better understand the nature of the special characteristics of the LAT bursts.

  19. Independent evolution of baleen whale gigantism linked to Plio-Pleistocene ocean dynamics.

    PubMed

    Slater, Graham J; Goldbogen, Jeremy A; Pyenson, Nicholas D

    2017-05-31

    Vertebrates have evolved to gigantic sizes repeatedly over the past 250 Myr, reaching their extreme in today's baleen whales (Mysticeti). Hypotheses for the evolution of exceptionally large size in mysticetes range from niche partitioning to predator avoidance, but there has been no quantitative examination of body size evolutionary dynamics in this clade and it remains unclear when, why or how gigantism evolved. By fitting phylogenetic macroevolutionary models to a dataset consisting of living and extinct species, we show that mysticetes underwent a clade-wide shift in their mode of body size evolution during the Plio-Pleistocene. This transition, from Brownian motion-like dynamics to a trended random walk towards larger size, is temporally linked to the onset of seasonally intensified upwelling along coastal ecosystems. High prey densities resulting from wind-driven upwelling, rather than abundant resources alone, are the primary determinant of efficient foraging in extant mysticetes and Late Pliocene changes in ocean dynamics may have provided an ecological pathway to gigantism in multiple independent lineages. © 2017 The Author(s).

  20. Barents-Kara sea ice and the winter NAO in the DePreSys3 Met Office Seasonal forecast model

    NASA Astrophysics Data System (ADS)

    Warner, J.; Screen, J.

    2017-12-01

    Accurate seasonal forecasting leads to a wide range of socio-economic benefits and increases resilience to prolonged bouts of extreme weather. This work looks at how November Barents-Kara sea ice may affect the winter northern hemisphere atmospheric circulation, using various compositing methods in the DePreSys3 ensemble model, with lag to argue better a relationship between the two. In particular, the NAO (North Atlantic Oscillation) is focused on given its implications on European weather. Using this large hindcast dataset comprised of 35 years with 30 available ensemble members, it is found that low Barents-Kara sea ice leads to a negative NAO tendency in all composite methods, with increased mean sea level pressure in higher latitudes. The significance of this varies between composites. This is preliminary analysis of a larger PhD project to further understand how Arctic Sea ice may play a role in seasonal forecasting skill through its connection/influence on mid-latitude weather.

  1. Identification and diagnosis of spatiotemporal hydrometeorological structure of heavy precipitation induced floods in Southeast Asia

    NASA Astrophysics Data System (ADS)

    Lu, M.; Hao, X.; Devineni, N.

    2017-12-01

    Extreme floods have a long history of being an important cause of death and destruction worldwide. It is estimated by Munich RE and Swiss RE that floods and severe storms dominate all other natural hazards globally in terms of average annual property loss and human fatalities. The top 5 most disastrous floods in the period from 1900 to 2015, ranked by economic damage, are all in the Asian monsoon region. This study presents an interdisciplinary approach integrating hydrometeorology, atmospheric science and state-of-the-art space-time statistics and modeling to investigate the association between the space-time characteristics of floods, precipitation and atmospheric moisture transport in a statistical and physical framework, using tropical moisture export dataset and curve clustering algorithm to study the source-to-destination features; explore the teleconnected climate regulations on the moisture formation process at different timescales (PDO, ENSO and MJO), and study the role of the synoptic-to-large atmospheric steering on the moisture transport and convergence.

  2. Bayesian inference of a historical bottleneck in a heavily exploited marine mammal.

    PubMed

    Hoffman, J I; Grant, S M; Forcada, J; Phillips, C D

    2011-10-01

    Emerging Bayesian analytical approaches offer increasingly sophisticated means of reconstructing historical population dynamics from genetic data, but have been little applied to scenarios involving demographic bottlenecks. Consequently, we analysed a large mitochondrial and microsatellite dataset from the Antarctic fur seal Arctocephalus gazella, a species subjected to one of the most extreme examples of uncontrolled exploitation in history when it was reduced to the brink of extinction by the sealing industry during the late eighteenth and nineteenth centuries. Classical bottleneck tests, which exploit the fact that rare alleles are rapidly lost during demographic reduction, yielded ambiguous results. In contrast, a strong signal of recent demographic decline was detected using both Bayesian skyline plots and Approximate Bayesian Computation, the latter also allowing derivation of posterior parameter estimates that were remarkably consistent with historical observations. This was achieved using only contemporary samples, further emphasizing the potential of Bayesian approaches to address important problems in conservation and evolutionary biology. © 2011 Blackwell Publishing Ltd.

  3. Intensification and Structure Change of Super Typhoon Flo as Related to the Large-Scale Environment.

    DTIC Science & Technology

    1998-06-01

    large dataset is a challenge. Schiavone and Papathomas (1990) summarize methods currently available for visualizing scientific 116 datasets. These...Prediction and Dynamic Meteorology, Second Edition. John Wiley and Sons, 477 pp. Hardy, R. L., 1971: Multiquadric equations of topography and other...Inter. Corp., Monterey CA, 40 pp. Sawyer, J. S., 1947: Notes on the theory of tropical cyclones. Quart. J. Roy. Meteor. Soc, 73, 101-126. Schiavone

  4. Design and analysis issues in quantitative proteomics studies.

    PubMed

    Karp, Natasha A; Lilley, Kathryn S

    2007-09-01

    Quantitative proteomics is the comparison of distinct proteomes which enables the identification of protein species which exhibit changes in expression or post-translational state in response to a given stimulus. Many different quantitative techniques are being utilized and generate large datasets. Independent of the technique used, these large datasets need robust data analysis to ensure valid conclusions are drawn from such studies. Approaches to address the problems that arise with large datasets are discussed to give insight into the types of statistical analyses of data appropriate for the various experimental strategies that can be employed by quantitative proteomic studies. This review also highlights the importance of employing a robust experimental design and highlights various issues surrounding the design of experiments. The concepts and examples discussed within will show how robust design and analysis will lead to confident results that will ensure quantitative proteomics delivers.

  5. A semiparametric graphical modelling approach for large-scale equity selection

    PubMed Central

    Liu, Han; Mulvey, John; Zhao, Tianqi

    2016-01-01

    We propose a new stock selection strategy that exploits rebalancing returns and improves portfolio performance. To effectively harvest rebalancing gains, we apply ideas from elliptical-copula graphical modelling and stability inference to select stocks that are as independent as possible. The proposed elliptical-copula graphical model has a latent Gaussian representation; its structure can be effectively inferred using the regularized rank-based estimators. The resulting algorithm is computationally efficient and scales to large data-sets. To show the efficacy of the proposed method, we apply it to conduct equity selection based on a 16-year health care stock data-set and a large 34-year stock data-set. Empirical tests show that the proposed method is superior to alternative strategies including a principal component analysis-based approach and the classical Markowitz strategy based on the traditional buy-and-hold assumption. PMID:28316507

  6. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    PubMed

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  7. What does fault tolerant Deep Learning need from MPI?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amatya, Vinay C.; Vishnu, Abhinav; Siegel, Charles M.

    Deep Learning (DL) algorithms have become the {\\em de facto} Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive -- even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults -- requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: {\\em What is needed from MPI for designing fault tolerant DL implementations?} In this paper, we address this problem for permanent faults. We motivate the need for amore » fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by extending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet neural network topology demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM.« less

  8. A comprehensive evaluation of assembly scaffolding tools

    PubMed Central

    2014-01-01

    Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity. PMID:24581555

  9. Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data

    PubMed Central

    Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Sung, Yunsick

    2017-01-01

    Clinical data analysis and forecasting have made substantial contributions to disease control, prevention and detection. However, such data usually suffer from highly imbalanced samples in class distributions. In this paper, we aim to formulate effective methods to rebalance binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat algorithm, and apply them to empower the effects of synthetic minority over-sampling technique (SMOTE) for pre-processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reported in this paper reveal that the performance improvements obtained by the former methods are not scalable to larger data scales. The latter methods, which we call Adaptive Swarm Balancing Algorithms, lead to significant efficiency and effectiveness improvements on large datasets while the first method is invalid. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. The proposed methods lead to more credible performances of the classifier, and shortening the run time compared to brute-force method. PMID:28753613

  10. Large scale validation of the M5L lung CAD on heterogeneous CT datasets.

    PubMed

    Torres, E Lopez; Fiorina, E; Pennazio, F; Peroni, C; Saletta, M; Camarlinghi, N; Fantacci, M E; Cerello, P

    2015-04-01

    M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large scale screenings and clinical programs.

  11. Climatic Extremes and Food Grain Production in India

    NASA Astrophysics Data System (ADS)

    A, A.; Mishra, V.

    2015-12-01

    Climate change is likely to affect food and water security in India. India has witnessed tremendous growth in its food production after the green revolution. However, during the recent decades the food grain yields were significantly affected by the extreme climate and weather events. Air temperature and associated extreme events (number of hot days and hot nights, heat waves) increased significantly during the last 50 years in the majority of India. More remarkably, a substantial increase in mean and extreme temperatures was observed during the winter season in India. On the other hand, India witnessed extreme flood and drought events that have become frequent during the past few decades. Extreme rainfall during the non-monsoon season adversely affected the food grain yields and results in tremendous losses in several parts of the country. Here we evaluate the changes in hydroclimatic extremes and its linkage with the food grain production in India. We use observed food grain yield data for the period of 1980-2012 at district level. We understand the linkages between food grain yield and crop phenology obtained from the high resolution leaf area index and NDVI datasets from satellites. We used long-term observed data of daily precipitation and maximum and minimum temperatures to evaluate changes in the extreme events. We use statistical models to develop relationships between crop yields, mean and extreme temperatures for various crops to understand the sensitivity of these crops towards changing climatic conditions. We find that some of the major crop types and predominant crop growing areas have shown a significant sensitivity towards changes in extreme climatic conditions in India.

  12. Extreme-value dependence: An application to exchange rate markets

    NASA Astrophysics Data System (ADS)

    Fernandez, Viviana

    2007-04-01

    Extreme value theory (EVT) focuses on modeling the tail behavior of a loss distribution using only extreme values rather than the whole data set. For a sample of 10 countries with dirty/free float regimes, we investigate whether paired currencies exhibit a pattern of asymptotic dependence. That is, whether an extremely large appreciation or depreciation in the nominal exchange rate of one country might transmit to another. In general, after controlling for volatility clustering and inertia in returns, we do not find evidence of extreme-value dependence between paired exchange rates. However, for asymptotic-independent paired returns, we find that tail dependency of exchange rates is stronger under large appreciations than under large depreciations.

  13. Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

    NASA Astrophysics Data System (ADS)

    Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

    2017-11-01

    The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.

  14. A New Integrated Threshold Selection Methodology for Spatial Forecast Verification of Extreme Events

    NASA Astrophysics Data System (ADS)

    Kholodovsky, V.

    2017-12-01

    Extreme weather and climate events such as heavy precipitation, heat waves and strong winds can cause extensive damage to the society in terms of human lives and financial losses. As climate changes, it is important to understand how extreme weather events may change as a result. Climate and statistical models are often independently used to model those phenomena. To better assess performance of the climate models, a variety of spatial forecast verification methods have been developed. However, spatial verification metrics that are widely used in comparing mean states, in most cases, do not have an adequate theoretical justification to benchmark extreme weather events. We proposed a new integrated threshold selection methodology for spatial forecast verification of extreme events that couples existing pattern recognition indices with high threshold choices. This integrated approach has three main steps: 1) dimension reduction; 2) geometric domain mapping; and 3) thresholds clustering. We apply this approach to an observed precipitation dataset over CONUS. The results are evaluated by displaying threshold distribution seasonally, monthly and annually. The method offers user the flexibility of selecting a high threshold that is linked to desired geometrical properties. The proposed high threshold methodology could either complement existing spatial verification methods, where threshold selection is arbitrary, or be directly applicable in extreme value theory.

  15. Extreme rainfall, vulnerability and risk: a continental-scale assessment for South America

    USGS Publications Warehouse

    Vorosmarty, Charles J.; de Guenni, Lelys Bravo; Wollheim, Wilfred M.; Pellerin, Brian A.; Bjerklie, David M.; Cardoso, Manoel; D'Almeida, Cassiano; Colon, Lilybeth

    2013-01-01

    Extreme weather continues to preoccupy society as a formidable public safety concern bearing huge economic costs. While attention has focused on global climate change and how it could intensify key elements of the water cycle such as precipitation and river discharge, it is the conjunction of geophysical and socioeconomic forces that shapes human sensitivity and risks to weather extremes. We demonstrate here the use of high-resolution geophysical and population datasets together with documentary reports of rainfall-induced damage across South America over a multi-decadal, retrospective time domain (1960–2000). We define and map extreme precipitation hazard, exposure, affectedpopulations, vulnerability and risk, and use these variables to analyse the impact of floods as a water security issue. Geospatial experiments uncover major sources of risk from natural climate variability and population growth, with change in climate extremes bearing a minor role. While rural populations display greatest relative sensitivity to extreme rainfall, urban settings show the highest rates of increasing risk. In the coming decades, rapid urbanization will make South American cities the focal point of future climate threats but also an opportunity for reducing vulnerability, protecting lives and sustaining economic development through both traditional and ecosystem-based disaster risk management systems.

  16. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach

    PubMed Central

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-01-01

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202

  17. InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor.

    PubMed

    Coletta, Alain; Molter, Colin; Duqué, Robin; Steenhoff, David; Taminau, Jonatan; de Schaetzen, Virginie; Meganck, Stijn; Lazar, Cosmin; Venet, David; Detours, Vincent; Nowé, Ann; Bersini, Hugues; Weiss Solís, David Y

    2012-11-18

    Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.

  18. GeoNotebook: Browser based Interactive analysis and visualization workflow for very large climate and geospatial datasets

    NASA Astrophysics Data System (ADS)

    Ozturk, D.; Chaudhary, A.; Votava, P.; Kotfila, C.

    2016-12-01

    Jointly developed by Kitware and NASA Ames, GeoNotebook is an open source tool designed to give the maximum amount of flexibility to analysts, while dramatically simplifying the process of exploring geospatially indexed datasets. Packages like Fiona (backed by GDAL), Shapely, Descartes, Geopandas, and PySAL provide a stack of technologies for reading, transforming, and analyzing geospatial data. Combined with the Jupyter notebook and libraries like matplotlib/Basemap it is possible to generate detailed geospatial visualizations. Unfortunately, visualizations generated is either static or does not perform well for very large datasets. Also, this setup requires a great deal of boilerplate code to create and maintain. Other extensions exist to remedy these problems, but they provide a separate map for each input cell and do not support map interactions that feed back into the python environment. To support interactive data exploration and visualization on large datasets we have developed an extension to the Jupyter notebook that provides a single dynamic map that can be managed from the Python environment, and that can communicate back with a server which can perform operations like data subsetting on a cloud-based cluster.

  19. The 10 m-resolution TINITALY DEM as a trans-disciplinary basis for the analysis of the Italian territory: Current trends and new perspectives

    NASA Astrophysics Data System (ADS)

    Tarquini, Simone; Nannipieri, Luca

    2017-03-01

    The increasing availability of high resolution digital elevation models (DEMs) is changing our viewpoint towards Earth surface landforms. Nevertheless, large-coverage, intermediate-resolution DEMs are still largely used, and can be the ideal choice in several applications based on the processing of spatially-integrated information. In 2012 the Istituto Nazionale di Geofisica e Vulcanologia opened a website for the free download of the "TINTALY" Digital Elevation Model (DEM), which covers the whole Italian territory. Since then, about 700 users from 28 different countries have been accredited for data download, and a report of 4 years of data dissemination and use is presented. The analysis of the intended use reveals that the 10 m-resolution, seamless TINITALY DEM is of use for an extremely assorted research community. Accredited users are working in virtually any branch of the Earth Sciences (e.g. Volcanology, Seismology, and Geomorphology), in spatially integrated humanities (e.g. History and Archaeology), and in other thematic areas such as in applied Physics and Zoology. Many users are also working in local administrations (e.g. Regions and Municipalities) for civil protection or land use planning purposes. In summary, the documented activity shows that the dissemination of seamless, large coverage elevation datasets can fertilize the technological progress of the whole society providing a significant benefit to stakeholders.

  20. Precipitation intercomparison of a set of satellite- and raingauge-derived datasets, ERA Interim reanalysis, and a single WRF regional climate simulation over Europe and the North Atlantic

    NASA Astrophysics Data System (ADS)

    Skok, Gregor; Žagar, Nedjeljka; Honzak, Luka; Žabkar, Rahela; Rakovec, Jože; Ceglar, Andrej

    2016-01-01

    The study presents a precipitation intercomparison based on two satellite-derived datasets (TRMM 3B42, CMORPH), four raingauge-based datasets (GPCC, E-OBS, Willmott & Matsuura, CRU), ERA Interim reanalysis (ERAInt), and a single climate simulation using the WRF model. The comparison was performed for a domain encompassing parts of Europe and the North Atlantic over the 11-year period of 2000-2010. The four raingauge-based datasets are similar to the TRMM dataset with biases over Europe ranging from -7 % to +4 %. The spread among the raingauge-based datasets is relatively small over most of Europe, although areas with greater uncertainty (more than 30 %) exist, especially near the Alps and other mountainous regions. There are distinct differences between the datasets over the European land area and the Atlantic Ocean in comparison to the TRMM dataset. ERAInt has a small dry bias over the land; the WRF simulation has a large wet bias (+30 %), whereas CMORPH is characterized by a large and spatially consistent dry bias (-21 %). Over the ocean, both ERAInt and CMORPH have a small wet bias (+8 %) while the wet bias in WRF is significantly larger (+47 %). ERAInt has the highest frequency of low-intensity precipitation while the frequency of high-intensity precipitation is the lowest due to its lower native resolution. Both satellite-derived datasets have more low-intensity precipitation over the ocean than over the land, while the frequency of higher-intensity precipitation is similar or larger over the land. This result is likely related to orography, which triggers more intense convective precipitation, while the Atlantic Ocean is characterized by more homogenous large-scale precipitation systems which are associated with larger areas of lower intensity precipitation. However, this is not observed in ERAInt and WRF, indicating the insufficient representation of convective processes in the models. Finally, the Fraction Skill Score confirmed that both models perform better over the Atlantic Ocean with ERAInt outperforming the WRF at low thresholds and WRF outperforming ERAInt at higher thresholds. The diurnal cycle is simulated better in the WRF simulation than in ERAInt, although WRF could not reproduce well the amplitude of the diurnal cycle. While the evaluation of the WRF model confirms earlier findings related to the model's wet bias over European land, the applied satellite-derived precipitation datasets revealed differences between the land and ocean areas along with uncertainties in the observation datasets.

  1. Improving Simulations of Extreme Flows by Coupling a Physically-based Hydrologic Model with a Machine Learning Model

    NASA Astrophysics Data System (ADS)

    Mohammed, K.; Islam, A. S.; Khan, M. J. U.; Das, M. K.

    2017-12-01

    With the large number of hydrologic models presently available along with the global weather and geographic datasets, streamflows of almost any river in the world can be easily modeled. And if a reasonable amount of observed data from that river is available, then simulations of high accuracy can sometimes be performed after calibrating the model parameters against those observed data through inverse modeling. Although such calibrated models can succeed in simulating the general trend or mean of the observed flows very well, more often than not they fail to adequately simulate the extreme flows. This causes difficulty in tasks such as generating reliable projections of future changes in extreme flows due to climate change, which is obviously an important task due to floods and droughts being closely connected to people's lives and livelihoods. We propose an approach where the outputs of a physically-based hydrologic model are used as an input to a machine learning model to try and better simulate the extreme flows. To demonstrate this offline-coupling approach, the Soil and Water Assessment Tool (SWAT) was selected as the physically-based hydrologic model, the Artificial Neural Network (ANN) as the machine learning model and the Ganges-Brahmaputra-Meghna (GBM) river system as the study area. The GBM river system, located in South Asia, is the third largest in the world in terms of freshwater generated and forms the largest delta in the world. The flows of the GBM rivers were simulated separately in order to test the performance of this proposed approach in accurately simulating the extreme flows generated by different basins that vary in size, climate, hydrology and anthropogenic intervention on stream networks. Results show that by post-processing the simulated flows of the SWAT models with ANN models, simulations of extreme flows can be significantly improved. The mean absolute errors in simulating annual maximum/minimum daily flows were minimized from 4967 cusecs to 1294 cusecs for Ganges, from 5695 cusecs to 2115 cusecs for Brahmaputra and from 689 cusecs to 321 cusecs for Meghna. Using this approach, simulations of hydrologic variables other than streamflow can also be improved given that a decent amount of observed data for that variable is available.

  2. Have precipitation extremes and annual totals been increasing in the world's dry regions over the last 60 years?

    NASA Astrophysics Data System (ADS)

    Sippel, Sebastian; Zscheischler, Jakob; Heimann, Martin; Lange, Holger; Mahecha, Miguel D.; van Oldenborgh, Geert Jan; Otto, Friederike E. L.; Reichstein, Markus

    2017-01-01

    Daily precipitation extremes and annual totals have increased in large parts of the global land area over the past decades. These observations are consistent with theoretical considerations of a warming climate. However, until recently these trends have not been shown to consistently affect dry regions over land. A recent study, published by Donat et al. (2016), now identified significant increases in annual-maximum daily extreme precipitation (Rx1d) and annual precipitation totals (PRCPTOT) in dry regions. Here, we revisit the applied methods and explore the sensitivity of changes in precipitation extremes and annual totals to alternative choices of defining a dry region (i.e. in terms of aridity as opposed to precipitation characteristics alone). We find that (a) statistical artifacts introduced by data pre-processing based on a time-invariant reference period lead to an overestimation of the reported trends by up to 40 %, and that (b) the reported trends of globally aggregated extremes and annual totals are highly sensitive to the definition of a dry region of the globe. For example, using the same observational dataset, accounting for the statistical artifacts, and based on different aridity-based dryness definitions, we find a reduction in the positive trend of Rx1d from the originally reported +1.6 % decade-1 to +0.2 to +0.9 % decade-1 (period changes for 1981-2010 averages relative to 1951-1980 are reduced to -1.32 to +0.97 % as opposed to +4.85 % in the original study). If we include additional but less homogenized data to cover larger regions, the global trend increases slightly (Rx1d: +0.4 to +1.1 % decade-1), and in this case we can indeed confirm (partly) significant increases in Rx1d. However, these globally aggregated estimates remain uncertain as considerable gaps in long-term observations in the Earth's arid and semi-arid regions remain. In summary, adequate data pre-processing and accounting for uncertainties regarding the definition of dryness are crucial to the quantification of spatially aggregated trends in precipitation extremes in the world's dry regions. In view of the high relevance of the question to many potentially affected stakeholders, we call for a well-reflected choice of specific data processing methods and the inclusion of alternative dryness definitions to guarantee that communicated results related to climate change be robust.

  3. Spatial-temporal analysis of climate variations in mid-17th through 19th centuries in East China and the possible relationships with Monsoon climate

    NASA Astrophysics Data System (ADS)

    Lin, K. H. E.; Wang, P. K.; Liao, Y. C.; Lee, S. Y.; Tan, P.

    2016-12-01

    IPCC AR5 has revealed more frequent extreme climate events and higher climate variability in the near future. Regardless of all the improvements, East Asia monsoon climate is still less understood and/or poorly projected due partly to insufficient records. Most areas of the Asian region lack sufficient observational records to draw conclusions about trends in annual precipitation over the past century (i.e. WGIAR5 Chapter 2). Precipitation trends, including extremes, are characterized by strong variability, with both increasing and decreasing observed in different parts and seasons of Asia. Understanding the variations of the monsoon climate in historical time may bring significant insights to reveal its spatial and temporal patterns embedded in the atmospheric dynamics at different decadal or centennial scales. This study presents some preliminary research results of high resolution climate reconstruction, in both time and space coverage, in east China, by using RCEC historical climate dataset that is developed under interdisciplinary collaboration led by Research Center for Environmental Changes at Academia Sinica, Taiwan. The present research results are derived from chronological meteorological records in the RCEC dataset in Qing dynasty labeling mid-17th to 19th centuries. In total, the dataset comprises more than 1,300 cities/counties in China that has had more than sixty thousands meteorological records in the period. The analysis comprises three parts. Firstly, the frequency of extreme temperature, precipitation, drought, and flood in every recorded cities/counties were computed to depicting climate variabilities in northeast, central-east and southeast China. Secondly, the multivariate regression model was conducted to estimate the coefficients among the climatic index (temperature, precipitation, and drought). It is found that the temperature and wet-dry characteristics have great seasonal and yearly variations; northeast China compared with central-east or southeast tends to have higher variability. Thirdly, those data was used to conduct empirical orthogonal function (EOF) analysis to decompose possible mechanisms that might have cause changes in East Asia monsoon regime during the time period. The reconstructed data were also compared against paleoclimate simulation.

  4. The NASA Energy and Water Cycle Extreme (NEWSE) Integration Project

    NASA Technical Reports Server (NTRS)

    House, P. R.; Lapenta, W.; Schiffer, R.

    2008-01-01

    Skillful predictions of water and energy cycle extremes (flood and drought) are elusive. To better understand the mechanisms responsible for water and energy extremes, and to make decisive progress in predicting these extremes, the collaborative NASA Energy and Water cycle Extremes (NEWSE) Integration Project, is studying these extremes in the U.S. Southern Great Plains (SGP) during 2006-2007, including their relationships with continental and global scale processes, and assessment of their predictability on multiple space and time scales. It is our hypothesis that an integrative analysis of observed extremes which reflects the current understanding of the role of SST and soil moisture variability influences on atmospheric heating and forcing of planetary waves, incorporating recently available global and regional hydro- meteorological datasets (i.e., precipitation, water vapor, clouds, etc.) in conjunction with advances in data assimilation, can lead to new insights into the factors that lead to persistent drought and flooding. We will show initial results of this project, whose goals are to provide an improved definition, attribution and prediction on sub-seasonal to interannual time scales, improved understanding of the mechanisms of decadal drought and its predictability, including the impacts of SST variability and deep soil moisture variability, and improved monitoring/attributions, with transition to applications; a bridging of the gap between hydrological forecasts and stakeholders (utilization of probabilistic forecasts, education, forecast interpretation for different sectors, assessment of uncertainties for different sectors, etc.).

  5. Trends in mean and extreme temperatures over Ibadan, Southwest Nigeria

    NASA Astrophysics Data System (ADS)

    Abatan, Abayomi A.; Osayomi, Tolulope; Akande, Samuel O.; Abiodun, Babatunde J.; Gutowski, William J.

    2018-02-01

    In recent times, Ibadan has been experiencing an increase in mean temperature which appears to be linked to anthropogenic global warming. Previous studies have indicated that the warming may be accompanied by changes in extreme events. This study examined trends in mean and extreme temperatures over Ibadan during 1971-2012 at annual and seasonal scales using the high-resolution atmospheric reanalysis from European Centre for Medium-Range Weather Forecasts (ECMWF) twentieth-century dataset (ERA-20C) at 15 grid points. Magnitudes of linear trends in mean and extreme temperatures and their statistical significance were calculated using ordinary least squares and Mann-Kendall rank statistic tests. The results show that Ibadan has witnessed an increase in annual and seasonal mean minimum temperatures. The annual mean maximum temperature exhibited a non-significant decline in most parts of Ibadan. While trends in cold extremes at annual scale show warming, trends in coldest night show greater warming than in coldest day. At the seasonal scale, we found that Ibadan experienced a mix of positive and negative trends in absolute extreme temperature indices. However, cold extremes show the largest trend magnitudes, with trends in coldest night showing the greatest warming. The results compare well with those obtained from a limited number of stations. This study should inform decision-makers and urban planners about the ongoing warming in Ibadan.

  6. Scaling of Precipitation Extremes Modelled by Generalized Pareto Distribution

    NASA Astrophysics Data System (ADS)

    Rajulapati, C. R.; Mujumdar, P. P.

    2017-12-01

    Precipitation extremes are often modelled with data from annual maximum series or peaks over threshold series. The Generalized Pareto Distribution (GPD) is commonly used to fit the peaks over threshold series. Scaling of precipitation extremes from larger time scales to smaller time scales when the extremes are modelled with the GPD is burdened with difficulties arising from varying thresholds for different durations. In this study, the scale invariance theory is used to develop a disaggregation model for precipitation extremes exceeding specified thresholds. A scaling relationship is developed for a range of thresholds obtained from a set of quantiles of non-zero precipitation of different durations. The GPD parameters and exceedance rate parameters are modelled by the Bayesian approach and the uncertainty in scaling exponent is quantified. A quantile based modification in the scaling relationship is proposed for obtaining the varying thresholds and exceedance rate parameters for shorter durations. The disaggregation model is applied to precipitation datasets of Berlin City, Germany and Bangalore City, India. From both the applications, it is observed that the uncertainty in the scaling exponent has a considerable effect on uncertainty in scaled parameters and return levels of shorter durations.

  7. Non-negative Tensor Factorization for Robust Exploratory Big-Data Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alexandrov, Boian; Vesselinov, Velimir Valentinov; Djidjev, Hristo Nikolov

    Currently, large multidimensional datasets are being accumulated in almost every field. Data are: (1) collected by distributed sensor networks in real-time all over the globe, (2) produced by large-scale experimental measurements or engineering activities, (3) generated by high-performance simulations, and (4) gathered by electronic communications and socialnetwork activities, etc. Simultaneous analysis of these ultra-large heterogeneous multidimensional datasets is often critical for scientific discoveries, decision-making, emergency response, and national and global security. The importance of such analyses mandates the development of the next-generation of robust machine learning (ML) methods and tools for bigdata exploratory analysis.

  8. Intercomparison of PERSIANN-CDR and TRMM-3B42V7 precipitation estimates at monthly and daily time scales

    NASA Astrophysics Data System (ADS)

    Katiraie-Boroujerdy, Pari-Sima; Akbari Asanjan, Ata; Hsu, Kuo-lin; Sorooshian, Soroosh

    2017-09-01

    In the first part of this paper, monthly precipitation data from Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) and Tropical Rainfall Measuring Mission 3B42 algorithm Version 7 (TRMM-3B42V7) are evaluated over Iran using the Generalized Three-Cornered Hat (GTCH) method which is self-sufficient of reference data as input. Climate Data Unit (CRU) is added to the GTCH evaluations as an independent gauge-based dataset thus, the minimum requirement of three datasets for the model is satisfied. To ensure consistency of all datasets, the two satellite products were aggregated to 0.5° spatial resolution, which is the minimum resolution of CRU. The results show that the PERSIANN-CDR has higher Signal to Noise Ratio (SNR) than TRMM-3B42V7 for the monthly rainfall estimation, especially in the northern half of the country. All datasets showed low SNR in the mountainous area of southwestern Iran, as well as the arid parts in the southeast region of the country. Additionally, in order to evaluate the efficacy of PERSIANN-CDR and TRMM-3B42V7 in capturing extreme daily-precipitation amounts, an in-situ rain-gauge dataset collected by the Islamic Republic of the Iran Meteorological Organization (IRIMO) was employed. Given the sparsity of the rain gauges, only 0.25° pixels containing three or more gauges were used for this evaluation. There were 228 such pixels where daily and extreme rainfall from PERSIANN-CDR and TRMM-3B42V7 could be compared. However, TRMM-3B42V7 overestimates most of the intensity indices (correlation coefficients; R between 0.7648-0.8311, Root Mean Square Error; RMSE between 3.29mm/day-21.2mm/5day); PERSIANN-CDR underestimates these extremes (R between 0.6349-0.7791 and RMSE between 3.59mm/day-30.56mm/5day). Both satellite products show higher correlation coefficients and lower RMSEs for the annual mean of consecutive dry spells than wet spells. The results show that TRMM-3B42V7 can capture the annual mean of the absolute indices (the number of wet days in which daily precipitation > 10 mm, 20 mm) better than PERSIANN-CDR. The results of daily evaluations show that the similarity of Empirical Cumulative Density Function (ECDF) of satellite products and IRIMO gauges daily precipitation, as well as dry spells with different thresholds in some selected pixels (include at least five gauges), are significant. The results also indicate that ECDFs become more significant when threshold increases. In terms of regional analyses, the higher SNR of the products on monthly (based on the GTCH method) and daily evaluations (significant ECDFs) is mostly consistent.

  9. Partial Information Community Detection in a Multilayer Network

    DTIC Science & Technology

    2016-06-01

    Network was taken from the CORE Lab at the Naval Postgraduate School [27]. Facebook dataset We will use a subgraph of the Facebook network to build a...larger synthetic multilayer network. We want to use this Facebook data as a way to introduce a real world example of a network into our synthetic network...This data is provided by the Standford Large Network Dataset Collection [28]. This is a large anonymous subgraph of Facebook . It contains over 4,000

  10. cellVIEW: a Tool for Illustrative and Multi-Scale Rendering of Large Biomolecular Datasets

    PubMed Central

    Le Muzic, Mathieu; Autin, Ludovic; Parulek, Julius; Viola, Ivan

    2017-01-01

    In this article we introduce cellVIEW, a new system to interactively visualize large biomolecular datasets on the atomic level. Our tool is unique and has been specifically designed to match the ambitions of our domain experts to model and interactively visualize structures comprised of several billions atom. The cellVIEW system integrates acceleration techniques to allow for real-time graphics performance of 60 Hz display rate on datasets representing large viruses and bacterial organisms. Inspired by the work of scientific illustrators, we propose a level-of-detail scheme which purpose is two-fold: accelerating the rendering and reducing visual clutter. The main part of our datasets is made out of macromolecules, but it also comprises nucleic acids strands which are stored as sets of control points. For that specific case, we extend our rendering method to support the dynamic generation of DNA strands directly on the GPU. It is noteworthy that our tool has been directly implemented inside a game engine. We chose to rely on a third party engine to reduce software development work-load and to make bleeding-edge graphics techniques more accessible to the end-users. To our knowledge cellVIEW is the only suitable solution for interactive visualization of large bimolecular landscapes on the atomic level and is freely available to use and extend. PMID:29291131

  11. Mapping and spatiotemporal analysis tool for hydrological data: Spellmap

    USDA-ARS?s Scientific Manuscript database

    Lack of data management and analyses tools is one of the major limitations to effectively evaluate and use large datasets of high-resolution atmospheric, surface, and subsurface observations. High spatial and temporal resolution datasets better represent the spatiotemporal variability of hydrologica...

  12. Data Mining of Extremely Large Ad Hoc Data Sets to Produce Inverted Indices

    DTIC Science & Technology

    2016-06-01

    NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS Approved for public release; distribution is unlimited DATA MINING OF...COVERED Master’s Thesis 4. TITLE AND SUBTITLE DATA MINING OF EXTREMELY LARGE AD HOC DATA SETS TO PRODUCE INVERTED INDICES 5. FUNDING NUMBERS 6...INTENTIONALLY LEFT BLANK iii Approved for public release; distribution is unlimited DATA MINING OF EXTREMELY LARGE AD HOC DATA SETS TO PRODUCE

  13. High Temperature Extremes - Will They Transform Structure of Avian Assemblages in the Desert Southwest?

    NASA Astrophysics Data System (ADS)

    Mutiibwa, D.; Albright, T. P.; Wolf, B. O.; Mckechnie, A. E.; Gerson, A. R.; Talbot, W. A.; Sadoti, G.; O'Neill, J.; Smith, E.

    2014-12-01

    Extreme weather events can alter ecosystem structure and function and have caused mass mortality events in animals. With climate change, high temperature extremes are increasing in frequency and magnitude. To better understand the consequences of climate change, scientists have frequently employed correlative models based on species occurrence records. However, these approaches may be of limited utility in the context of extremes, as these are often outside historical ranges and may involve strong non-linear responses. Here we describe work linking physiological response informed by experimental data to geospatial climate datasets in order to mechanistically model the dynamics of dehydration risk to dessert passerine birds. Specifically, we modeled and mapped the occurrence of current (1980-2013) high temperature extremes and evaporative water loss rates for eight species of passerine birds ranging in size from 6.5-75g in the US Southwest portion of their range. We then explored the implications of a 4° C warming scenario. Evaporative water loss (EWL) across a range of high temperatures was measured in heat-acclimated birds captured in the field. We used the North American Land Data Assimilation System 2 dataset to obtain hourly estimates of EWL with a 14-km spatial grain. Assuming lethal dehydration occurs when water loss reaches 15% of body weight, we then produced maps of total daily EWL and time to lethal dehydration based on both current data and future scenarios. We found that milder events capable of producing dehydration in passerine birds over four or more hours were not uncommon over the Southwest, but rapid dehydration conditions (<3 hours) were rare. Under the warming scenario, the frequency and extent of dehydration events expanded greatly, often affecting areas several times larger than in present-day climate. Dehydration risk was especially high among smaller bodied passerines due to their higher mass-specific rates of water loss. Even after accounting for the moderating effects of microsite and topoclimatic refugia, the increase in occurrence of lethal dehydration risk is cause for concern. In particular, our results suggest that smaller bodied passerines may have difficulty in avoiding extirpation over portions of their current range in the desert southwest.

  14. Parallel Visualization of Large-Scale Aerodynamics Calculations: A Case Study on the Cray T3E

    NASA Technical Reports Server (NTRS)

    Ma, Kwan-Liu; Crockett, Thomas W.

    1999-01-01

    This paper reports the performance of a parallel volume rendering algorithm for visualizing a large-scale, unstructured-grid dataset produced by a three-dimensional aerodynamics simulation. This dataset, containing over 18 million tetrahedra, allows us to extend our performance results to a problem which is more than 30 times larger than the one we examined previously. This high resolution dataset also allows us to see fine, three-dimensional features in the flow field. All our tests were performed on the Silicon Graphics Inc. (SGI)/Cray T3E operated by NASA's Goddard Space Flight Center. Using 511 processors, a rendering rate of almost 9 million tetrahedra/second was achieved with a parallel overhead of 26%.

  15. The Derivation of Fault Volumetric Properties from 3D Trace Maps Using Outcrop Constrained Discrete Fracture Network Models

    NASA Astrophysics Data System (ADS)

    Hodgetts, David; Seers, Thomas

    2015-04-01

    Fault systems are important structural elements within many petroleum reservoirs, acting as potential conduits, baffles or barriers to hydrocarbon migration. Large, seismic-scale faults often serve as reservoir bounding seals, forming structural traps which have proved to be prolific plays in many petroleum provinces. Though inconspicuous within most seismic datasets, smaller subsidiary faults, commonly within the damage zones of parent structures, may also play an important role. These smaller faults typically form narrow, tabular low permeability zones which serve to compartmentalize the reservoir, negatively impacting upon hydrocarbon recovery. Though considerable improvements have been made in the visualization field to reservoir-scale fault systems with the advent of 3D seismic surveys, the occlusion of smaller scale faults in such datasets is a source of significant uncertainty during prospect evaluation. The limited capacity of conventional subsurface datasets to probe the spatial distribution of these smaller scale faults has given rise to a large number of outcrop based studies, allowing their intensity, connectivity and size distributions to be explored in detail. Whilst these studies have yielded an improved theoretical understanding of the style and distribution of sub-seismic scale faults, the ability to transform observations from outcrop to quantities that are relatable to reservoir volumes remains elusive. These issues arise from the fact that outcrops essentially offer a pseudo-3D window into the rock volume, making the extrapolation of surficial fault properties such as areal density (fracture length per unit area: P21), to equivalent volumetric measures (i.e. fracture area per unit volume: P32) applicable to fracture modelling extremely challenging. Here, we demonstrate an approach which harnesses advances in the extraction of 3D trace maps from surface reconstructions using calibrated image sequences, in combination with a novel semi-deterministic, outcrop constrained discrete fracture network modeling code to derive volumetric fault intensity measures (fault area per unit volume / fault volume per unit volume). Producing per-vertex measures of volumetric intensity; our method captures the spatial variability in 3D fault density across a surveyed outcrop, enabling first order controls to be probed. We demonstrate our approach on pervasively faulted exposures of a Permian aged reservoir analogue from the Vale of Eden Basin, UK.

  16. Orientation-independent measures of ground motion

    USGS Publications Warehouse

    Boore, D.M.; Watson-Lamprey, Jennie; Abrahamson, N.A.

    2006-01-01

    The geometric mean of the response spectra for two orthogonal horizontal components of motion, commonly used as the response variable in predictions of strong ground motion, depends on the orientation of the sensors as installed in the field. This means that the measure of ground-motion intensity could differ for the same actual ground motion. This dependence on sensor orientation is most pronounced for strongly correlated motion (the extreme example being linearly polarized motion), such as often occurs at periods of 1 sec or longer. We propose two new measures of the geometric mean, GMRotDpp, and GMRotIpp, that are independent of the sensor orientations. Both are based on a set of geometric means computed from the as-recorded orthogonal horizontal motions rotated through all possible non-redundant rotation angles. GMRotDpp is determined as the ppth percentile of the set of geometric means for a given oscillator period. For example, GMRotDOO, GMRotD50, and GMRotD100 correspond to the minimum, median, and maximum values, respectively. The rotations that lead to GMRotDpp depend on period, whereas a single-period-independent rotation is used for GMRotIpp, the angle being chosen to minimize the spread of the rotation-dependent geometric mean (normalized by GMRotDpp) over the usable range of oscillator periods. GMRotI50 is the ground-motion intensity measure being used in the development of new ground-motion prediction equations by the Pacific Earthquake Engineering Center Next Generation Attenuation project. Comparisons with as-recorded geometric means for a large dataset show that the new measures are systematically larger than the geometric-mean response spectra using the as-recorded values of ground acceleration, but only by a small amount (less than 3%). The theoretical advantage of the new measures is that they remove sensor orientation as a contributor to aleatory uncertainty. Whether the reduction is of practical significance awaits detailed studies of large datasets. A preliminary analysis contained in a companion article by Beyer and Bommer finds that the reduction is small-to-nonexistent for equations based on a wide range of magnitudes and distances. The results of Beyer and Bommer do suggest, however, that there is an increasing reduction as period increases. Whether the reduction increases with other subdivisions of the dataset for which strongly correlated motions might be expected (e.g., pulselike motions close to faults) awaits further analysis.

  17. Improving Snow Modeling by Assimilating Observational Data Collected by Citizen Scientists

    NASA Astrophysics Data System (ADS)

    Crumley, R. L.; Hill, D. F.; Arendt, A. A.; Wikstrom Jones, K.; Wolken, G. J.; Setiawan, L.

    2017-12-01

    Modeling seasonal snow pack in alpine environments includes a multiplicity of challenges caused by a lack of spatially extensive and temporally continuous observational datasets. This is partially due to the difficulty of collecting measurements in harsh, remote environments where extreme gradients in topography exist, accompanied by large model domains and inclement weather. Engaging snow enthusiasts, snow professionals, and community members to participate in the process of data collection may address some of these challenges. In this study, we use SnowModel to estimate seasonal snow water equivalence (SWE) in the Thompson Pass region of Alaska while incorporating snow depth measurements collected by citizen scientists. We develop a modeling approach to assimilate hundreds of snow depth measurements from participants in the Community Snow Observations (CSO) project (www.communitysnowobs.org). The CSO project includes a mobile application where participants record and submit geo-located snow depth measurements while working and recreating in the study area. These snow depth measurements are randomly located within the model grid at irregular time intervals over the span of four months in the 2017 water year. This snow depth observation dataset is converted into a SWE dataset by employing an empirically-based, bulk density and SWE estimation method. We then assimilate this data using SnowAssim, a sub-model within SnowModel, to constrain the SWE output by the observed data. Multiple model runs are designed to represent an array of output scenarios during the assimilation process. An effort to present model output uncertainties is included, as well as quantification of the pre- and post-assimilation divergence in modeled SWE. Early results reveal pre-assimilation SWE estimations are consistently greater than the post-assimilation estimations, and the magnitude of divergence increases throughout the snow pack evolution period. This research has implications beyond the Alaskan context because it increases our ability to constrain snow modeling outputs by making use of snow measurements collected by non-expert, citizen scientists.

  18. Spatially-explicit estimation of geographical representation in large-scale species distribution datasets.

    PubMed

    Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis

    2014-01-01

    Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution dataset mergers, such as the one exemplified here, can serve as a baseline towards comprehensive species distribution datasets.

  19. Development of large scale riverine terrain-bathymetry dataset by integrating NHDPlus HR with NED,CoNED and HAND data

    NASA Astrophysics Data System (ADS)

    Li, Z.; Clark, E. P.

    2017-12-01

    Large scale and fine resolution riverine bathymetry data is critical for flood inundation modelingbut not available over the continental United States (CONUS). Previously we implementedbankfull hydraulic geometry based approaches to simulate bathymetry for individual riversusing NHDPlus v2.1 data and 10 m National Elevation Dataset (NED). USGS has recentlydeveloped High Resolution NHD data (NHDPlus HR Beta) (USGS, 2017), and thisenhanced dataset has a significant improvement on its spatial correspondence with 10 m DEM.In this study, we used this high resolution data, specifically NHDFlowline and NHDArea,to create bathymetry/terrain for CONUS river channels and floodplains. A software packageNHDPlus Inundation Modeler v5.0 Beta was developed for this project as an Esri ArcGIShydrological analysis extension. With the updated tools, raw 10 m DEM was first hydrologicallytreated to remove artificial blockages (e.g., overpasses, bridges and eve roadways, etc.) usinglow pass moving window filters. Cross sections were then automatically constructed along eachflowline to extract elevation from the hydrologically treated DEM. In this study, river channelshapes were approximated using quadratic curves to reduce uncertainties from commonly usedtrapezoids. We calculated underneath water channel elevation at each cross section samplingpoint using bankfull channel dimensions that were estimated from physiographicprovince/division based regression equations (Bieger et al. 2015). These elevation points werethen interpolated to generate bathymetry raster. The simulated bathymetry raster wasintegrated with USGS NED and Coastal National Elevation Database (CoNED) (whereveravailable) to make seamless terrain-bathymetry dataset. Channel bathymetry was alsointegrated to the HAND (Height above Nearest Drainage) dataset to improve large scaleinundation modeling. The generated terrain-bathymetry was processed at WatershedBoundary Dataset Hydrologic Unit 4 (WBDHU4) level.

  20. Medical imaging informatics based solutions for human performance analytics

    NASA Astrophysics Data System (ADS)

    Verma, Sneha; McNitt-Gray, Jill; Liu, Brent J.

    2018-03-01

    For human performance analysis, extensive experimental trials are often conducted to identify the underlying cause or long-term consequences of certain pathologies and to improve motor functions by examining the movement patterns of affected individuals. Data collected for human performance analysis includes high-speed video, surveys, spreadsheets, force data recordings from instrumented surfaces etc. These datasets are recorded from various standalone sources and therefore captured in different folder structures as well as in varying formats depending on the hardware configurations. Therefore, data integration and synchronization present a huge challenge while handling these multimedia datasets specifically for large datasets. Another challenge faced by researchers is querying large quantity of unstructured data and to design feedbacks/reporting tools for users who need to use datasets at various levels. In the past, database server storage solutions have been introduced to securely store these datasets. However, to automate the process of uploading raw files, various file manipulation steps are required. In the current workflow, this file manipulation and structuring is done manually and is not feasible for large amounts of data. However, by attaching metadata files and data dictionaries with these raw datasets, they can provide information and structure needed for automated server upload. We introduce one such system for metadata creation for unstructured multimedia data based on the DICOM data model design. We will discuss design and implementation of this system and evaluate this system with data set collected for movement analysis study. The broader aim of this paper is to present a solutions space achievable based on medical imaging informatics design and methods for improvement in workflow for human performance analysis in a biomechanics research lab.

Top