NASA Astrophysics Data System (ADS)
Roberge, S.; Chokmani, K.; De Sève, D.
2012-04-01
The snow cover plays an important role in the hydrological cycle of Quebec (Eastern Canada). Consequently, evaluating its spatial extent interests the authorities responsible for the management of water resources, especially hydropower companies. The main objective of this study is the development of a snow-cover mapping strategy using remote sensing data and ensemble based systems techniques. Planned to be tested in a near real-time operational mode, this snow-cover mapping strategy has the advantage to provide the probability of a pixel to be snow covered and its uncertainty. Ensemble systems are made of two key components. First, a method is needed to build an ensemble of classifiers that is diverse as much as possible. Second, an approach is required to combine the outputs of individual classifiers that make up the ensemble in such a way that correct decisions are amplified, and incorrect ones are cancelled out. In this study, we demonstrate the potential of ensemble systems to snow-cover mapping using remote sensing data. The chosen classifier is a sequential thresholds algorithm using NOAA-AVHRR data adapted to conditions over Eastern Canada. Its special feature is the use of a combination of six sequential thresholds varying according to the day in the winter season. Two versions of the snow-cover mapping algorithm have been developed: one is specific for autumn (from October 1st to December 31st) and the other for spring (from March 16th to May 31st). In order to build the ensemble based system, different versions of the algorithm are created by varying randomly its parameters. One hundred of the versions are included in the ensemble. The probability of a pixel to be snow, no-snow or cloud covered corresponds to the amount of votes the pixel has been classified as such by all classifiers. The overall performance of ensemble based mapping is compared to the overall performance of the chosen classifier, and also with ground observations at meteorological stations.
Exploring diversity in ensemble classification: Applications in large area land cover mapping
NASA Astrophysics Data System (ADS)
Mellor, Andrew; Boukir, Samia
2017-07-01
Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Efficient data assimilation algorithm for bathymetry application
NASA Astrophysics Data System (ADS)
Ghorbanidehno, H.; Lee, J. H.; Farthing, M.; Hesser, T.; Kitanidis, P. K.; Darve, E. F.
2017-12-01
Information on the evolving state of the nearshore zone bathymetry is crucial to shoreline management, recreational safety, and naval operations. The high cost and complex logistics of using ship-based surveys for bathymetry estimation have encouraged the use of remote sensing techniques. Data assimilation methods combine the remote sensing data and nearshore hydrodynamic models to estimate the unknown bathymetry and the corresponding uncertainties. In particular, several recent efforts have combined Kalman Filter-based techniques such as ensembled-based Kalman filters with indirect video-based observations to address the bathymetry inversion problem. However, these methods often suffer from ensemble collapse and uncertainty underestimation. Here, the Compressed State Kalman Filter (CSKF) method is used to estimate the bathymetry based on observed wave celerity. In order to demonstrate the accuracy and robustness of the CSKF method, we consider twin tests with synthetic observations of wave celerity, while the bathymetry profiles are chosen based on surveys taken by the U.S. Army Corps of Engineer Field Research Facility (FRF) in Duck, NC. The first test case is a bathymetry estimation problem for a spatially smooth and temporally constant bathymetry profile. The second test case is a bathymetry estimation problem for a temporally evolving bathymetry from a smooth to a non-smooth profile. For both problems, we compare the results of CSKF with those obtained by the local ensemble transform Kalman filter (LETKF), which is a popular ensemble-based Kalman filter method.
Coherent Spin Control at the Quantum Level in an Ensemble-Based Optical Memory.
Jobez, Pierre; Laplane, Cyril; Timoney, Nuala; Gisin, Nicolas; Ferrier, Alban; Goldner, Philippe; Afzelius, Mikael
2015-06-12
Long-lived quantum memories are essential components of a long-standing goal of remote distribution of entanglement in quantum networks. These can be realized by storing the quantum states of light as single-spin excitations in atomic ensembles. However, spin states are often subjected to different dephasing processes that limit the storage time, which in principle could be overcome using spin-echo techniques. Theoretical studies suggest this to be challenging due to unavoidable spontaneous emission noise in ensemble-based quantum memories. Here, we demonstrate spin-echo manipulation of a mean spin excitation of 1 in a large solid-state ensemble, generated through storage of a weak optical pulse. After a storage time of about 1 ms we optically read-out the spin excitation with a high signal-to-noise ratio. Our results pave the way for long-duration optical quantum storage using spin-echo techniques for any ensemble-based memory.
Deterministically Entangling Two Remote Atomic Ensembles via Light-Atom Mixed Entanglement Swapping
Liu, Yanhong; Yan, Zhihui; Jia, Xiaojun; Xie, Changde
2016-01-01
Entanglement of two distant macroscopic objects is a key element for implementing large-scale quantum networks consisting of quantum channels and quantum nodes. Entanglement swapping can entangle two spatially separated quantum systems without direct interaction. Here we propose a scheme of deterministically entangling two remote atomic ensembles via continuous-variable entanglement swapping between two independent quantum systems involving light and atoms. Each of two stationary atomic ensembles placed at two remote nodes in a quantum network is prepared to a mixed entangled state of light and atoms respectively. Then, the entanglement swapping is unconditionally implemented between the two prepared quantum systems by means of the balanced homodyne detection of light and the feedback of the measured results. Finally, the established entanglement between two macroscopic atomic ensembles is verified by the inseparability criterion of correlation variances between two anti-Stokes optical beams respectively coming from the two atomic ensembles. PMID:27165122
Random-Forest Classification of High-Resolution Remote Sensing Images and Ndsm Over Urban Areas
NASA Astrophysics Data System (ADS)
Sun, X. F.; Lin, X. G.
2017-09-01
As an intermediate step between raw remote sensing data and digital urban maps, remote sensing data classification has been a challenging and long-standing research problem in the community of remote sensing. In this work, an effective classification method is proposed for classifying high-resolution remote sensing data over urban areas. Starting from high resolution multi-spectral images and 3D geometry data, our method proceeds in three main stages: feature extraction, classification, and classified result refinement. First, we extract color, vegetation index and texture features from the multi-spectral image and compute the height, elevation texture and differential morphological profile (DMP) features from the 3D geometry data. Then in the classification stage, multiple random forest (RF) classifiers are trained separately, then combined to form a RF ensemble to estimate each sample's category probabilities. Finally the probabilities along with the feature importance indicator outputted by RF ensemble are used to construct a fully connected conditional random field (FCCRF) graph model, by which the classification results are refined through mean-field based statistical inference. Experiments on the ISPRS Semantic Labeling Contest dataset show that our proposed 3-stage method achieves 86.9% overall accuracy on the test data.
Village Building Identification Based on Ensemble Convolutional Neural Networks
Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei
2017-01-01
In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154
NASA Astrophysics Data System (ADS)
Hirpa, F. A.; Gebremichael, M.; Hopson, T. M.; Wojick, R.
2011-12-01
We present results of data assimilation of ground discharge observation and remotely sensed soil moisture observations into Sacramento Soil Moisture Accounting (SACSMA) model in a small watershed (1593 km2) in Minnesota, the Unites States. Specifically, we perform assimilation experiments with Ensemble Kalman Filter (EnKF) and Particle Filter (PF) in order to improve streamflow forecast accuracy at six hourly time step. The EnKF updates the soil moisture states in the SACSMA from the relative errors of the model and observations, while the PF adjust the weights of the state ensemble members based on the likelihood of the forecast. Results of the improvements of each filter over the reference model (without data assimilation) will be presented. Finally, the EnKF and PF are coupled together to further improve the streamflow forecast accuracy.
Ensemble-based data assimilation and optimal sensor placement for scalar source reconstruction
NASA Astrophysics Data System (ADS)
Mons, Vincent; Wang, Qi; Zaki, Tamer
2017-11-01
Reconstructing the characteristics of a scalar source from limited remote measurements in a turbulent flow is a problem of great interest for environmental monitoring, and is challenging due to several aspects. Firstly, the numerical estimation of the scalar dispersion in a turbulent flow requires significant computational resources. Secondly, in actual practice, only a limited number of observations are available, which generally makes the corresponding inverse problem ill-posed. Ensemble-based variational data assimilation techniques are adopted to solve the problem of scalar source localization in a turbulent channel flow at Reτ = 180 . This approach combines the components of variational data assimilation and ensemble Kalman filtering, and inherits the robustness from the former and the ease of implementation from the latter. An ensemble-based methodology for optimal sensor placement is also proposed in order to improve the condition of the inverse problem, which enhances the performances of the data assimilation scheme. This work has been partially funded by the Office of Naval Research (Grant N00014-16-1-2542) and by the National Science Foundation (Grant 1461870).
Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing
NASA Astrophysics Data System (ADS)
Toye, Habib; Zhan, Peng; Gopalakrishnan, Ganesh; Kartadikaria, Aditya R.; Huang, Huang; Knio, Omar; Hoteit, Ibrahim
2017-07-01
We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation and of the Data Research Testbed (DART) for ensemble data assimilation. DART has been configured to integrate all members of an ensemble adjustment Kalman filter (EAKF) in parallel, based on which we adapted the ensemble operations in DART to use an invariant ensemble, i.e., an ensemble Optimal Interpolation (EnOI) algorithm. This approach requires only single forward model integration in the forecast step and therefore saves substantial computational cost. To deal with the strong seasonal variability of the Red Sea, the EnOI ensemble is then seasonally selected from a climatology of long-term model outputs. Observations of remote sensing sea surface height (SSH) and sea surface temperature (SST) are assimilated every 3 days. Real-time atmospheric fields from the National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are used as forcing in different assimilation experiments. We investigate the behaviors of the EAKF and (seasonal-) EnOI and compare their performances for assimilating and forecasting the circulation of the Red Sea. We further assess the sensitivity of the assimilation system to various filtering parameters (ensemble size, inflation) and atmospheric forcing.
Designing Ensemble Based Security Framework for M-Learning System
ERIC Educational Resources Information Center
Mahalingam, Sheila; Abdollah, Mohd Faizal; bin Sahibuddin, Shahrin
2014-01-01
Mobile Learning has a potential to improve efficiency in the education sector and expand educational opportunities to underserved remote area in higher learning institutions. However there are multi challenges in different altitude faced when introducing and implementing m-learning. Despite the evolution of technology changes in education,…
Quantum teleportation between remote atomic-ensemble quantum memories.
Bao, Xiao-Hui; Xu, Xiao-Fan; Li, Che-Ming; Yuan, Zhen-Sheng; Lu, Chao-Yang; Pan, Jian-Wei
2012-12-11
Quantum teleportation and quantum memory are two crucial elements for large-scale quantum networks. With the help of prior distributed entanglement as a "quantum channel," quantum teleportation provides an intriguing means to faithfully transfer quantum states among distant locations without actual transmission of the physical carriers [Bennett CH, et al. (1993) Phys Rev Lett 70(13):1895-1899]. Quantum memory enables controlled storage and retrieval of fast-flying photonic quantum bits with stationary matter systems, which is essential to achieve the scalability required for large-scale quantum networks. Combining these two capabilities, here we realize quantum teleportation between two remote atomic-ensemble quantum memory nodes, each composed of ∼10(8) rubidium atoms and connected by a 150-m optical fiber. The spin wave state of one atomic ensemble is mapped to a propagating photon and subjected to Bell state measurements with another single photon that is entangled with the spin wave state of the other ensemble. Two-photon detection events herald the success of teleportation with an average fidelity of 88(7)%. Besides its fundamental interest as a teleportation between two remote macroscopic objects, our technique may be useful for quantum information transfer between different nodes in quantum networks and distributed quantum computing.
Semantic labeling of high-resolution aerial images using an ensemble of fully convolutional networks
NASA Astrophysics Data System (ADS)
Sun, Xiaofeng; Shen, Shuhan; Lin, Xiangguo; Hu, Zhanyi
2017-10-01
High-resolution remote sensing data classification has been a challenging and promising research topic in the community of remote sensing. In recent years, with the rapid advances of deep learning, remarkable progress has been made in this field, which facilitates a transition from hand-crafted features designing to an automatic end-to-end learning. A deep fully convolutional networks (FCNs) based ensemble learning method is proposed to label the high-resolution aerial images. To fully tap the potentials of FCNs, both the Visual Geometry Group network and a deeper residual network, ResNet, are employed. Furthermore, to enlarge training samples with diversity and gain better generalization, in addition to the commonly used data augmentation methods (e.g., rotation, multiscale, and aspect ratio) in the literature, aerial images from other datasets are also collected for cross-scene learning. Finally, we combine these learned models to form an effective FCN ensemble and refine the results using a fully connected conditional random field graph model. Experiments on the ISPRS 2-D Semantic Labeling Contest dataset show that our proposed end-to-end classification method achieves an overall accuracy of 90.7%, a state-of-the-art in the field.
Liu, Xiangnan; Zhang, Biyao; Liu, Ming; Wu, Ling
2017-01-01
The use of remote sensing technology to diagnose heavy metal stress in crops is of great significance for environmental protection and food security. However, in the natural farmland ecosystem, various stressors could have a similar influence on crop growth, therefore making heavy metal stress difficult to identify accurately, so this is still not a well resolved scientific problem and a hot topic in the field of agricultural remote sensing. This study proposes a method that uses Ensemble Empirical Mode Decomposition (EEMD) to obtain the heavy metal stress signal features on a long time scale. The method operates based on the Leaf Area Index (LAI) simulated by the Enhanced World Food Studies (WOFOST) model, assimilated with remotely sensed data. The following results were obtained: (i) the use of EEMD was effective in the extraction of heavy metal stress signals by eliminating the intra-annual and annual components; (ii) LAIdf (The first derivative of the sum of the interannual component and residual) can preferably reflect the stable feature responses to rice heavy metal stress. LAIdf showed stability with an R2 of greater than 0.9 in three growing stages, and the stability is optimal in June. This study combines the spectral characteristics of the stress effect with the time characteristics, and confirms the potential of long-term remotely sensed data for improving the accuracy of crop heavy metal stress identification. PMID:28878147
Quantum teleportation between remote atomic-ensemble quantum memories
Bao, Xiao-Hui; Xu, Xiao-Fan; Li, Che-Ming; Yuan, Zhen-Sheng; Lu, Chao-Yang; Pan, Jian-Wei
2012-01-01
Quantum teleportation and quantum memory are two crucial elements for large-scale quantum networks. With the help of prior distributed entanglement as a “quantum channel,” quantum teleportation provides an intriguing means to faithfully transfer quantum states among distant locations without actual transmission of the physical carriers [Bennett CH, et al. (1993) Phys Rev Lett 70(13):1895–1899]. Quantum memory enables controlled storage and retrieval of fast-flying photonic quantum bits with stationary matter systems, which is essential to achieve the scalability required for large-scale quantum networks. Combining these two capabilities, here we realize quantum teleportation between two remote atomic-ensemble quantum memory nodes, each composed of ∼108 rubidium atoms and connected by a 150-m optical fiber. The spin wave state of one atomic ensemble is mapped to a propagating photon and subjected to Bell state measurements with another single photon that is entangled with the spin wave state of the other ensemble. Two-photon detection events herald the success of teleportation with an average fidelity of 88(7)%. Besides its fundamental interest as a teleportation between two remote macroscopic objects, our technique may be useful for quantum information transfer between different nodes in quantum networks and distributed quantum computing. PMID:23144222
Ensemble: a web-based system for psychology survey and experiment management.
Tomic, Stefan T; Janata, Petr
2007-08-01
We provide a description of Ensemble, a suite of Web-integrated modules for managing and analyzing data associated with psychology experiments in a small research lab. The system delivers interfaces via a Web browser for creating and presenting simple surveys without the need to author Web pages and with little or no programming effort. The surveys may be extended by selecting and presenting auditory and/or visual stimuli with MATLAB and Flash to enable a wide range of psychophysical and cognitive experiments which do not require the recording of precise reaction times. Additionally, one is provided with the ability to administer and present experiments remotely. The software technologies employed by the various modules of Ensemble are MySQL, PHP, MATLAB, and Flash. The code for Ensemble is open source and available to the public, so that its functions can be readily extended by users. We describe the architecture of the system, the functionality of each module, and provide basic examples of the interfaces.
A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification
NASA Astrophysics Data System (ADS)
Zhang, Ce; Pan, Xin; Li, Huapeng; Gardiner, Andy; Sargent, Isabel; Hare, Jonathon; Atkinson, Peter M.
2018-06-01
The contextual-based convolutional neural network (CNN) with deep architecture and pixel-based multilayer perceptron (MLP) with shallow structure are well-recognized neural network algorithms, representing the state-of-the-art deep learning method and the classical non-parametric machine learning approach, respectively. The two algorithms, which have very different behaviours, were integrated in a concise and effective way using a rule-based decision fusion approach for the classification of very fine spatial resolution (VFSR) remotely sensed imagery. The decision fusion rules, designed primarily based on the classification confidence of the CNN, reflect the generally complementary patterns of the individual classifiers. In consequence, the proposed ensemble classifier MLP-CNN harvests the complementary results acquired from the CNN based on deep spatial feature representation and from the MLP based on spectral discrimination. Meanwhile, limitations of the CNN due to the adoption of convolutional filters such as the uncertainty in object boundary partition and loss of useful fine spatial resolution detail were compensated. The effectiveness of the ensemble MLP-CNN classifier was tested in both urban and rural areas using aerial photography together with an additional satellite sensor dataset. The MLP-CNN classifier achieved promising performance, consistently outperforming the pixel-based MLP, spectral and textural-based MLP, and the contextual-based CNN in terms of classification accuracy. This research paves the way to effectively address the complicated problem of VFSR image classification.
Measurement-induced entanglement for excitation stored in remote atomic ensembles.
Chou, C W; de Riedmatten, H; Felinto, D; Polyakov, S V; van Enk, S J; Kimble, H J
2005-12-08
A critical requirement for diverse applications in quantum information science is the capability to disseminate quantum resources over complex quantum networks. For example, the coherent distribution of entangled quantum states together with quantum memory (for storing the states) can enable scalable architectures for quantum computation, communication and metrology. Here we report observations of entanglement between two atomic ensembles located in distinct, spatially separated set-ups. Quantum interference in the detection of a photon emitted by one of the samples projects the otherwise independent ensembles into an entangled state with one joint excitation stored remotely in 10(5) atoms at each site. After a programmable delay, we confirm entanglement by mapping the state of the atoms to optical fields and measuring mutual coherences and photon statistics for these fields. We thereby determine a quantitative lower bound for the entanglement of the joint state of the ensembles. Our observations represent significant progress in the ability to distribute and store entangled quantum states.
Use of combined radar and radiometer systems in space for precipitation measurement: Some ideas
NASA Technical Reports Server (NTRS)
Moore, R. K.
1981-01-01
A brief survey is given of some fundamental physical concepts of optimal polarization characteristics of a transmission path or scatter ensemble of hydrometers. It is argued that, based on this optimization concept, definite advances in remote atmospheric sensing are to be expected. Basic properties of Kennaugh's optimal polarization theory are identified.
Improving evaluation of climate change impacts on the water cycle by remote sensing ET-retrieval
NASA Astrophysics Data System (ADS)
García Galiano, S. G.; Olmos Giménez, P.; Ángel Martínez Pérez, J.; Diego Giraldo Osorio, J.
2015-05-01
Population growth and intense consumptive water uses are generating pressures on water resources in the southeast of Spain. Improving the knowledge of the climate change impacts on water cycle processes at the basin scale is a step to building adaptive capacity. In this work, regional climate model (RCM) ensembles are considered as an input to the hydrological model, for improving the reliability of hydroclimatic projections. To build the RCMs ensembles, the work focuses on probability density function (PDF)-based evaluation of the ability of RCMs to simulate of rainfall and temperature at the basin scale. To improve the spatial calibration of the continuous hydrological model used, an algorithm for remote sensing actual evapotranspiration (AET) retrieval was applied. From the results, a clear decrease in runoff is expected for 2050 in the headwater basin studied. The plausible future scenario of water shortage will produce negative impacts on the regional economy, where the main activity is irrigated agriculture.
NASA Astrophysics Data System (ADS)
El Alem, A.; Chokmani, K.; Laurion, I.; El-Adlouni, S. E.
2014-12-01
In reason of inland freshwaters sensitivity to Harmful algae blooms (HAB) development and the limits coverage of standards monitoring programs, remote sensing data have become increasingly used for monitoring HAB extension. Usually, HAB monitoring using remote sensing data is based on empirical and semi-empirical models. Development of such models requires a great number of continuous in situ measurements to reach an acceptable accuracy. However, Ministries and water management organizations often use two thresholds, established by the World Health Organization, to determine water quality. Consequently, the available data are ordinal «semi-qualitative» and they are mostly unexploited. Use of such databases with remote sensing data and statistical classification algorithms can produce hazard management maps linked to the presence of cyanobacteria. Unlike standard classification algorithms, which are generally unstable, classifiers based on ensemble systems are more general and stable. In the present study, an ensemble based classifier was developed and compared to a standard classification method called CART (Classification and Regression Tree) in a context of HAB monitoring in freshwaters using MODIS images downscaled to 250 spatial resolution and ordinal in situ data. Calibration and validation data on cyanobacteria densities were collected by the Ministère du Développement durable, de l'Environnement et de la Lutte contre les changements climatiques on 22 waters bodies between 2000 and 2010. These data comprise three density classes: waters poorly (< 20,000 cells mL-1), moderately (20,000 - 100,000 cells mL-1), and highly (> 100,000 cells mL-1) loaded in cyanobacteria. Results were very interesting and highlighted that inland waters exhibit different spectral response allowing them to be classified into the three above classes for water quality monitoring. On the other, even if the accuracy (Kappa-index = 0.86) of the proposed approach is relatively lower than that of the CART algorithm (Kappa-index = 0.87), but its robustness is higher with a standard-deviation of 0.05 versus 0.06, specifically when applied on MODIS images. A new accurate, robust, and quick approach is thus proposed for a daily near real-time monitoring of HAB in southern Quebec freshwaters.
Assimilating Remotely Sensed Surface Soil Moisture into SWAT using Ensemble Kalman Filter
USDA-ARS?s Scientific Manuscript database
In this study, a 1-D Ensemble Kalman Filter has been used to update the soil moisture states of the Soil and Water Assessment Tool (SWAT) model. Experiments were conducted for the Cobb Creek Watershed in southeastern Oklahoma for 2006-2008. Assimilation of in situ data proved limited success in the ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crossno, Patricia J.; Gittinger, Jaxon; Hunt, Warren L.
Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features acrossmore » the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.« less
NASA Astrophysics Data System (ADS)
Dykema, J. A.; Anderson, J. G.
2014-12-01
Measuring water vapor at the highest spatial and temporal at all vertical levels and at arbitrary times requires strategic utilization of disparate observations from satellites, ground-based remote sensing, and in situ measurements. These different measurement types have different response times and very different spatial averaging properties, both horizontally and vertically. Accounting for these different measurement properties and explicit propagation of associated uncertainties is necessary to test particular scientific hypotheses, especially in cases of detection of weak signals in the presence of natural fluctuations, and for process studies with small ensembles. This is also true where ancillary data from meteorological analyses are required, which have their own sampling limitations and uncertainties. This study will review two investigations pertaining to measurements of water vapor in the mid-troposphere and lower stratosphere that mix satellite observations with observations from other sources. The focus of the mid-troposphere analysis is to obtain improved estimates of water vapor at the instant of a sounding satellite overpass. The lower stratosphere work examines the uncertainty inherent in a small ensemble of anomalously elevated lower stratospheric water vapor observations when meteorological analysis products and aircraft in situ observations are required for interpretation.
Remote preparation of a qudit using maximally entangled states of qubits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu Changshui; Song Heshan; Wang Yahong
2006-02-15
Known quantum pure states of a qudit can be remotely prepared onto a group of particles of qubits exactly or probabilistically with the aid of two-level Einstein-Podolsky-Rosen states. We present a protocol for such kind of remote state preparation. We are mainly focused on the remote preparation of the ensembles of equatorial states and those of states in real Hilbert space. In particular, a kind of states of qudits in real Hilbert space have been shown to be remotely prepared in faith without the limitation of the input space dimension.
Local Versus Remote Contributions of Soil Moisture to Near-Surface Temperature Variability
NASA Technical Reports Server (NTRS)
Koster, R.; Schubert, S.; Wang, H.; Chang, Y.
2018-01-01
Soil moisture variations have a straightforward impact on overlying air temperatures, wetter soils can induce higher evaporative cooling of the soil and thus, locally, cooler temperatures overall. Not known, however, is the degree to which soil moisture variations can affect remote air temperatures through their impact on the atmospheric circulation. In this talk we describe a two-pronged analysis that addresses this question. In the first segment, an extensive ensemble of NASA/GSFC GEOS-5 atmospheric model simulations is analyzed statistically to isolate and quantify the contributions of various soil moisture states, both local and remote, to the variability of air temperature at a given local site. In the second segment, the relevance of the derived statistical relationships is evaluated by applying them to observations-based data. Results from the second segment suggest that the GEOS-5-based relationships do, at least to first order, hold in nature and thus may provide some skill to forecasts of air temperature at subseasonal time scales, at least in certain regions.
NASA Astrophysics Data System (ADS)
Silvestro, Paolo Cosmo; Casa, Raffaele; Pignatti, Stefano; Castaldi, Fabio; Yang, Hao; Guijun, Yang
2016-08-01
The aim of this work was to develop a tool to evaluate the effect of water stress on yield losses at the farmland and regional scale, by assimilating remotely sensed biophysical variables into crop growth models. Biophysical variables were retrieved from HJ1A, HJ1B and Landsat 8 images, using an algorithm based on the training of artificial neural networks on PROSAIL.For the assimilation, two crop models of differing degree of complexity were used: Aquacrop and SAFY. For Aquacrop, an optimization procedure to reduce the difference between the remotely sensed and simulated CC was developed. For the modified version of SAFY, the assimilation procedure was based on the Ensemble Kalman Filter.These procedures were tested in a spatialized application, by using data collected in the rural area of Yangling (Shaanxi Province) between 2013 and 2015Results were validated by utilizing yield data both from ground measurements and statistical survey.
NASA Astrophysics Data System (ADS)
Ghirri, Alberto; Bonizzoni, Claudio; Troiani, Filippo; Affronte, Marco
The problem of coupling remote ensembles of two-level systems through cavity photons is revisited by using molecular spin centers and a high critical temperature superconducting coplanar resonator. By using PyBTM organic radicals, we achieved the strong coupling regime with values of the cooperativity reaching 4300 at 2 K. We show that up to three distinct spin ensembles are simultaneously coupled through the resonator mode. The ensembles are made physically distinguishable by chemically varying the g-factor and by exploiting the inhomogeneities of the applied magnetic field. The coherent mixing of the spin and field modes is demonstrated by the observed multiple anticrossing, along with the simulations performed within the input-output formalism, and quantified by suitable entropic measures.
USDA-ARS?s Scientific Manuscript database
We develop a robust understanding of the effects of assimilating remote sensing observations of leaf area index and soil moisture (in the top 5 cm) on DSSAT-CSM CropSim-Ceres wheat yield estimates. Synthetic observing system simulation experiments compare the abilities of the Ensemble Kalman Filter...
Data assimilation of citizen collected information for real-time flood hazard mapping
NASA Astrophysics Data System (ADS)
Sayama, T.; Takara, K. T.
2017-12-01
Many studies in data assimilation in hydrology have focused on the integration of satellite remote sensing and in-situ monitoring data into hydrologic or land surface models. For flood predictions also, recent studies have demonstrated to assimilate remotely sensed inundation information with flood inundation models. In actual flood disaster situations, citizen collected information including local reports by residents and rescue teams and more recently tweets via social media also contain valuable information. The main interest of this study is how to effectively use such citizen collected information for real-time flood hazard mapping. Here we propose a new data assimilation technique based on pre-conducted ensemble inundation simulations and update inundation depth distributions sequentially when local data becomes available. The propose method is composed by the following two-steps. The first step is based on weighting average of preliminary ensemble simulations, whose weights are updated by Bayesian approach. The second step is based on an optimal interpolation, where the covariance matrix is calculated from the ensemble simulations. The proposed method was applied to case studies including an actual flood event occurred. It considers two situations with more idealized one by assuming continuous flood inundation depth information is available at multiple locations. The other one, which is more realistic case during such a severe flood disaster, assumes uncertain and non-continuous information is available to be assimilated. The results show that, in the first idealized situation, the large scale inundation during the flooding was estimated reasonably with RMSE < 0.4 m in average. For the second more realistic situation, the error becomes larger (RMSE 0.5 m) and the impact of the optimal interpolation becomes comparatively less effective. Nevertheless, the applications of the proposed data assimilation method demonstrated a high potential of this method for assimilating citizen collected information for real-time flood hazard mapping in the future.
NASA Astrophysics Data System (ADS)
Abbaszadeh, P.; Moradkhani, H.
2017-12-01
Soil moisture contributes significantly towards the improvement of weather and climate forecast and understanding terrestrial ecosystem processes. It is known as a key hydrologic variable in the agricultural drought monitoring, flood modeling and irrigation management. While satellite retrievals can provide an unprecedented information on soil moisture at global-scale, the products are generally at coarse spatial resolutions (25-50 km2). This often hampers their use in regional or local studies, which normally require a finer resolution of the data set. This work presents a new framework based on an ensemble learning method while using soil-climate information derived from remote-sensing and ground-based observations to downscale the level 3 daily composite version (L3_SM_P) of SMAP radiometer soil moisture over the Continental U.S. (CONUS) at 1 km spatial resolution. In the proposed method, a suite of remotely sensed and in situ data sets in addition to soil texture information and topography data among others were used. The downscaled product was validated against in situ soil moisture measurements collected from a limited number of core validation sites and several hundred sparse soil moisture networks throughout the CONUS. The obtained results indicated a great potential of the proposed methodology to derive the fine resolution soil moisture information applicable for fine resolution hydrologic modeling, data assimilation and other regional studies.
West, Amanda M.; Evangelista, Paul H.; Jarnevich, Catherine S.; Young, Nicholas E.; Stohlgren, Thomas J.; Talbert, Colin; Talbert, Marian; Morisette, Jeffrey; Anderson, Ryan
2016-01-01
Early detection of invasive plant species is vital for the management of natural resources and protection of ecosystem processes. The use of satellite remote sensing for mapping the distribution of invasive plants is becoming more common, however conventional imaging software and classification methods have been shown to be unreliable. In this study, we test and evaluate the use of five species distribution model techniques fit with satellite remote sensing data to map invasive tamarisk (Tamarix spp.) along the Arkansas River in Southeastern Colorado. The models tested included boosted regression trees (BRT), Random Forest (RF), multivariate adaptive regression splines (MARS), generalized linear model (GLM), and Maxent. These analyses were conducted using a newly developed software package called the Software for Assisted Habitat Modeling (SAHM). All models were trained with 499 presence points, 10,000 pseudo-absence points, and predictor variables acquired from the Landsat 5 Thematic Mapper (TM) sensor over an eight-month period to distinguish tamarisk from native riparian vegetation using detection of phenological differences. From the Landsat scenes, we used individual bands and calculated Normalized Difference Vegetation Index (NDVI), Soil-Adjusted Vegetation Index (SAVI), and tasseled capped transformations. All five models identified current tamarisk distribution on the landscape successfully based on threshold independent and threshold dependent evaluation metrics with independent location data. To account for model specific differences, we produced an ensemble of all five models with map output highlighting areas of agreement and areas of uncertainty. Our results demonstrate the usefulness of species distribution models in analyzing remotely sensed data and the utility of ensemble mapping, and showcase the capability of SAHM in pre-processing and executing multiple complex models.
West, Amanda M; Evangelista, Paul H; Jarnevich, Catherine S; Young, Nicholas E; Stohlgren, Thomas J; Talbert, Colin; Talbert, Marian; Morisette, Jeffrey; Anderson, Ryan
2016-10-11
Early detection of invasive plant species is vital for the management of natural resources and protection of ecosystem processes. The use of satellite remote sensing for mapping the distribution of invasive plants is becoming more common, however conventional imaging software and classification methods have been shown to be unreliable. In this study, we test and evaluate the use of five species distribution model techniques fit with satellite remote sensing data to map invasive tamarisk (Tamarix spp.) along the Arkansas River in Southeastern Colorado. The models tested included boosted regression trees (BRT), Random Forest (RF), multivariate adaptive regression splines (MARS), generalized linear model (GLM), and Maxent. These analyses were conducted using a newly developed software package called the Software for Assisted Habitat Modeling (SAHM). All models were trained with 499 presence points, 10,000 pseudo-absence points, and predictor variables acquired from the Landsat 5 Thematic Mapper (TM) sensor over an eight-month period to distinguish tamarisk from native riparian vegetation using detection of phenological differences. From the Landsat scenes, we used individual bands and calculated Normalized Difference Vegetation Index (NDVI), Soil-Adjusted Vegetation Index (SAVI), and tasseled capped transformations. All five models identified current tamarisk distribution on the landscape successfully based on threshold independent and threshold dependent evaluation metrics with independent location data. To account for model specific differences, we produced an ensemble of all five models with map output highlighting areas of agreement and areas of uncertainty. Our results demonstrate the usefulness of species distribution models in analyzing remotely sensed data and the utility of ensemble mapping, and showcase the capability of SAHM in pre-processing and executing multiple complex models.
2010-09-30
and climate forecasting and use of satellite data assimilation for model evaluation. He is a task leader on another NSF_EPSCoR project for the...1 DISTRIBUTION STATEMENT A: Approved for public release; distribution is unlimited. Data Analysis, Modeling, and Ensemble Forecasting to...observations including remotely sensed data . OBJECTIVES The main objectives of the study are: 1) to further develop, test, and continue twice daily
Data Assimilation Results from PLASMON
NASA Astrophysics Data System (ADS)
Jorgensen, A. M.; Lichtenberger, J.; Duffy, J.; Friedel, R. H.; Clilverd, M.; Heilig, B.; Vellante, M.; Manninen, J. K.; Raita, T.; Rodger, C. J.; Collier, A.; Reda, J.; Holzworth, R. H.; Ober, D. M.; Boudouridis, A.; Zesta, E.; Chi, P. J.
2013-12-01
VLF and magnetometer observations can be used to remotely sense the plasmasphere. VLF whistler waves can be used to measure the electron density and magnetic Field Line Resonance (FLR) measurements can be used to measure the mass density. In principle it is then possible to remotely map the plasmasphere with a network of ground-based stations which are also less expensive and more permanent than satellites. The PLASMON project, funded by the EU FP-7 program, is in the process of doing just this. A large number of ground-based observations will be input into a data assimilative framework which models the plasmasphere structure and dynamics. The data assimilation framework combines the Ensemble Kalman Filter with the Dynamic Global Core Plasma Model. In this presentation we will describe the plasmasphere model, the data assimilation approach that we have taken, PLASMON data and data assimilation results for specific events.
NASA Technical Reports Server (NTRS)
Benediktsson, J. A.; Ersoy, O. K.; Swain, P. H.
1991-01-01
A neural network architecture called a consensual neural network (CNN) is proposed for the classification of data from multiple sources. Its relation to hierarchical and ensemble neural networks is discussed. CNN is based on the statistical consensus theory and uses nonlinearly transformed input data. The input data are transformed several times, and the different transformed data are applied as if they were independent inputs. The independent inputs are classified using stage neural networks and outputs from the stage networks are then weighted and combined to make a decision. Experimental results based on remote-sensing data and geographic data are given.
Liu, J.; Liu, S.; Loveland, Thomas R.; Tieszen, L.L.
2008-01-01
Land cover change is one of the key driving forces for ecosystem carbon (C) dynamics. We present an approach for using sequential remotely sensed land cover observations and a biogeochemical model to estimate contemporary and future ecosystem carbon trends. We applied the General Ensemble Biogeochemical Modelling System (GEMS) for the Laurentian Plains and Hills ecoregion in the northeastern United States for the period of 1975-2025. The land cover changes, especially forest stand-replacing events, were detected on 30 randomly located 10-km by 10-km sample blocks, and were assimilated by GEMS for biogeochemical simulations. In GEMS, each unique combination of major controlling variables (including land cover change history) forms a geo-referenced simulation unit. For a forest simulation unit, a Monte Carlo process is used to determine forest type, forest age, forest biomass, and soil C, based on the Forest Inventory and Analysis (FIA) data and the U.S. General Soil Map (STATSGO) data. Ensemble simulations are performed for each simulation unit to incorporate input data uncertainty. Results show that on average forests of the Laurentian Plains and Hills ecoregion have been sequestrating 4.2 Tg C (1 teragram = 1012 gram) per year, including 1.9 Tg C removed from the ecosystem as the consequences of land cover change. ?? 2008 Elsevier B.V.
Floating Forests: Validation of a Citizen Science Effort to Answer Global Ecological Questions
NASA Astrophysics Data System (ADS)
Rosenthal, I.; Byrnes, J.; Cavanaugh, K. C.; Haupt, A. J.; Trouille, L.; Bell, T. W.; Rassweiler, A.; Pérez-Matus, A.; Assis, J.
2017-12-01
Researchers undertaking long term, large-scale ecological analyses face significant challenges for data collection and processing. Crowdsourcing via citizen science can provide an efficient method for analyzing large data sets. However, many scientists have raised questions about the quality of data collected by citizen scientists. Here we use Floating-Forests (http://floatingforests.org), a citizen science platform for creating a global time series of giant kelp abundance, to show that ensemble classifications of satellite data can ensure data quality. Citizen scientists view satellite images of coastlines and classify kelp forests by tracing all visible patches of kelp. Each image is classified by fifteen citizen scientists before being retired. To validate citizen science results, all fifteen classifications are converted to a raster and overlaid on a calibration dataset generated from previous studies. Results show that ensemble classifications from citizen scientists are consistently accurate when compared to calibration data. Given that all source images were acquired by Landsat satellites, we expect this consistency to hold across all regions. At present, we have over 6000 web-based citizen scientists' classifications of almost 2.5 million images of kelp forests in California and Tasmania. These results are not only useful for remote sensing of kelp forests, but also for a wide array of applications that combine citizen science with remote sensing.
Cao, Cong; Wang, Chuan; He, Ling-Yan; Zhang, Ru
2013-02-25
We investigate an atomic entanglement purification protocol based on the coherent state input-output process by working in low-Q cavity in the atom-cavity intermediate coupling region. The information of entangled states are encoded in three-level configured single atoms confined in separated one-side optical micro-cavities. Using the coherent state input-output process, we design a two-qubit parity check module (PCM), which allows the quantum nondemolition measurement for the atomic qubits, and show its use for remote parities to distill a high-fidelity atomic entangled ensemble from an initial mixed state ensemble nonlocally. The proposed scheme can further be used for unknown atomic states entanglement concentration. Also by exploiting the PCM, we describe a modified scheme for atomic entanglement concentration by introducing ancillary single atoms. As the coherent state input-output process is robust and scalable in realistic applications, and the detection in the PCM is based on the intensity of outgoing coherent state, the present protocols may be widely used in large-scaled and solid-based quantum repeater and quantum information processing.
NASA Astrophysics Data System (ADS)
Setiyono, T. D.; Nelson, A.; Ravis, J.; Maunahan, A.; Villano, L.; Li, T.; Bouman, B.
2012-12-01
A semi-empirical model derived from the water-cloud model was used to convert synthetic- aperture radar (SAR) backscattering data into LAI. The SAR-based LAI at early rice growth stages were in a close agreement (90%) with LAI derived from MODIS data for the same study location in Nueva Ecija, Philippines. ORYZA2000 simulated rice yield of 4.5 Mg ha-1 for the 2008 wet season in Nueva Ejica, Philippines when using LAI inputs derived from SAR data, which is closer to the observed yield of 3.9 Mg ha-1, whereas simulated yield without SAR-derived LAI inputs was 5.4 Mg ha-1. The dynamic water and nitrogen balances were accounted in these simulations based on site-specific soil properties and actual fertilizer N and water management. The use of remote sensing data was promising for model application to approximate actual growth conditions and to compensate for limitations in the model due to relevant underlining processes absent in model formulations such as detailed tillering, leaf shading effect, etc., and also limiting factors not accounted in the model such as biotic factors and abiotic factors other than water and N shortages. This study also demonstrated the use an ensembles approach for provincial level rice yield estimation in the Philippines. Such ensembles approach involved statistical classifications of agronomic management settings into 25% percentile, median, and 75% levels followed by generation of factorial combinations. For irrigated lowland system, 4 factors were considered that include transplanting date, plant density, fertilizer N rate, and amount of irrigation water. For rainfed lowland system, there were 3 agronomic management factors (transplanting date, plant density, fertilizer N) and 1 soil parameter (depth of ground water table). These 4 management/soil factors and 3 statistical levels resulted in 81 total factorial combinations representing simulation scenarios for each area of interest (province in the Philippines) and water environments (irrigated vs. rainfed). Finally a normal distribution was assumed and applied to the simulations outputs. This ensembles approach provided an efficient and yet effective method of aggregating point-based crop model results into a larger spatial level of interest. Lack of access to accurate model parameters (e.g. depth of ground water table) could be solved with this approach. The use of process-based crop growth model was critical because the ultimate aim of this study was not just to establish a reliable rice yield estimation system but also to allow yield estimation outputs explainable by the underlining agronomic practices such as transplanting date, fertilizer N application, and water management.
Cho, Jin-Hyung; Huang, Ben S.; Gray, Jesse M.
2016-01-01
The stable formation of remote fear memories is thought to require neuronal gene induction in cortical ensembles that are activated during learning. However, the set of genes expressed specifically in these activated ensembles is not known; knowledge of such transcriptional profiles may offer insights into the molecular program underlying stable memory formation. Here we use RNA-Seq to identify genes whose expression is enriched in activated cortical ensembles labeled during associative fear learning. We first establish that mouse temporal association cortex (TeA) is required for remote recall of auditory fear memories. We then perform RNA-Seq in TeA neurons that are labeled by the activity reporter Arc-dVenus during learning. We identify 944 genes with enriched expression in Arc-dVenus+ neurons. These genes include markers of L2/3, L5b, and L6 excitatory neurons but not glial or inhibitory markers, confirming Arc-dVenus to be an excitatory neuron-specific but non-layer-specific activity reporter. Cross comparisons to other transcriptional profiles show that 125 of the enriched genes are also activity-regulated in vitro or induced by visual stimulus in the visual cortex, suggesting that they may be induced generally in the cortex in an experience-dependent fashion. Prominent among the enriched genes are those encoding potassium channels that down-regulate neuronal activity, suggesting the possibility that part of the molecular program induced by fear conditioning may initiate homeostatic plasticity. PMID:27557751
What Models and Satellites Tell Us (and Don't Tell Us) About Arctic Sea Ice Melt Season Length
NASA Astrophysics Data System (ADS)
Ahlert, A.; Jahn, A.
2017-12-01
Melt season length—the difference between the sea ice melt onset date and the sea ice freeze onset date—plays an important role in the radiation balance of the Arctic and the predictability of the sea ice cover. However, there are multiple possible definitions for sea ice melt and freeze onset in climate models, and none of them exactly correspond to the remote sensing definition. Using the CESM Large Ensemble model simulations, we show how this mismatch between model and remote sensing definitions of melt and freeze onset limits the utility of melt season remote sensing data for bias detection in models. It also opens up new questions about the precise physical meaning of the melt season remote sensing data. Despite these challenges, we find that the increase in melt season length in the CESM is not as large as that derived from remote sensing data, even when we account for internal variability and different definitions. At the same time, we find that the CESM ensemble members that have the largest trend in sea ice extent over the period 1979-2014 also have the largest melt season trend, driven primarily by the trend towards later freeze onsets. This might be an indication that an underestimation of the melt season length trend is one factor contributing to the generally underestimated sea ice loss within the CESM, and potentially climate models in general.
Ou, Yangming; Resnick, Susan M.; Gur, Ruben C.; Gur, Raquel E.; Satterthwaite, Theodore D.; Furth, Susan; Davatzikos, Christos
2016-01-01
Atlas-based automated anatomical labeling is a fundamental tool in medical image segmentation, as it defines regions of interest for subsequent analysis of structural and functional image data. The extensive investigation of multi-atlas warping and fusion techniques over the past 5 or more years has clearly demonstrated the advantages of consensus-based segmentation. However, the common approach is to use multiple atlases with a single registration method and parameter set, which is not necessarily optimal for every individual scan, anatomical region, and problem/data-type. Different registration criteria and parameter sets yield different solutions, each providing complementary information. Herein, we present a consensus labeling framework that generates a broad ensemble of labeled atlases in target image space via the use of several warping algorithms, regularization parameters, and atlases. The label fusion integrates two complementary sources of information: a local similarity ranking to select locally optimal atlases and a boundary modulation term to refine the segmentation consistently with the target image's intensity profile. The ensemble approach consistently outperforms segmentations using individual warping methods alone, achieving high accuracy on several benchmark datasets. The MUSE methodology has been used for processing thousands of scans from various datasets, producing robust and consistent results. MUSE is publicly available both as a downloadable software package, and as an application that can be run on the CBICA Image Processing Portal (https://ipp.cbica.upenn.edu), a web based platform for remote processing of medical images. PMID:26679328
Hydrologic Remote Sensing and Land Surface Data Assimilation.
Moradkhani, Hamid
2008-05-06
Accurate, reliable and skillful forecasting of key environmental variables such as soil moisture and snow are of paramount importance due to their strong influence on many water resources applications including flood control, agricultural production and effective water resources management which collectively control the behavior of the climate system. Soil moisture is a key state variable in land surface-atmosphere interactions affecting surface energy fluxes, runoff and the radiation balance. Snow processes also have a large influence on land-atmosphere energy exchanges due to snow high albedo, low thermal conductivity and considerable spatial and temporal variability resulting in the dramatic change on surface and ground temperature. Measurement of these two variables is possible through variety of methods using ground-based and remote sensing procedures. Remote sensing, however, holds great promise for soil moisture and snow measurements which have considerable spatial and temporal variability. Merging these measurements with hydrologic model outputs in a systematic and effective way results in an improvement of land surface model prediction. Data Assimilation provides a mechanism to combine these two sources of estimation. Much success has been attained in recent years in using data from passive microwave sensors and assimilating them into the models. This paper provides an overview of the remote sensing measurement techniques for soil moisture and snow data and describes the advances in data assimilation techniques through the ensemble filtering, mainly Ensemble Kalman filter (EnKF) and Particle filter (PF), for improving the model prediction and reducing the uncertainties involved in prediction process. It is believed that PF provides a complete representation of the probability distribution of state variables of interests (according to sequential Bayes law) and could be a strong alternative to EnKF which is subject to some limitations including the linear updating rule and assumption of jointly normal distribution of errors in state variables and observation.
Remote Sensing and River Discharge Forecasting for Major Rivers in South Asia (Invited)
NASA Astrophysics Data System (ADS)
Webster, P. J.; Hopson, T. M.; Hirpa, F. A.; Brakenridge, G. R.; De-Groeve, T.; Shrestha, K.; Gebremichael, M.; Restrepo, P. J.
2013-12-01
The South Asia is a flashpoint for natural disasters particularly flooding of the Indus, Ganges, and Brahmaputra has profound societal impacts for the region and globally. The 2007 Brahmaputra floods affecting India and Bangladesh, the 2008 avulsion of the Kosi River in India, the 2010 flooding of the Indus River in Pakistan and the 2013 Uttarakhand exemplify disasters on scales almost inconceivable elsewhere. Their frequent occurrence of floods combined with large and rapidly growing populations, high levels of poverty and low resilience, exacerbate the impact of the hazards. Mitigation of these devastating hazards are compounded by limited flood forecast capability, lack of rain/gauge measuring stations and forecast use within and outside the country, and transboundary data sharing on natural hazards. Here, we demonstrate the utility of remotely-derived hydrologic and weather products in producing skillful flood forecasting information without reliance on vulnerable in situ data sources. Over the last decade a forecast system has been providing operational probabilistic forecasts of severe flooding of the Brahmaputra and Ganges Rivers in Bangldesh was developed (Hopson and Webster 2010). The system utilizes ECMWF weather forecast uncertainty information and ensemble weather forecasts, rain gauge and satellite-derived precipitation estimates, together with the limited near-real-time river stage observations from Bangladesh. This system has been expanded to Pakistan and has successfully forecast the 2010-2012 flooding (Shrestha and Webster 2013). To overcome the in situ hydrological data problem, recent efforts in parallel with the numerical modeling have utilized microwave satellite remote sensing of river widths to generate operational discharge advective-based forecasts for the Ganges and Brahmaputra. More than twenty remotely locations upstream of Bangldesh were used to produce stand-alone river flow nowcasts and forecasts at 1-15 days lead time. showing that satellite-based flow estimates are a useful source of dynamical surface water information in data-scarce regions and that they could be used for model calibration and data assimilation purposes in near-time hydrologic forecast applications (Hirpa et al. 2013). More recent efforts during this year's monsoon season are optimally combining these different independent sources of river forecast information along with archived flood inundation imagery of the Dartmouth Flood Observatory to improve the visualization and overall skill of the ongoing CFAB ensemble weather forecast-based flood forecasting system within the unique context of the ongoing flood forecasting efforts for Bangladesh.
Shorebird Migration Patterns in Response to Climate Change: A Modeling Approach
NASA Technical Reports Server (NTRS)
Smith, James A.
2010-01-01
The availability of satellite remote sensing observations at multiple spatial and temporal scales, coupled with advances in climate modeling and information technologies offer new opportunities for the application of mechanistic models to predict how continental scale bird migration patterns may change in response to environmental change. In earlier studies, we explored the phenotypic plasticity of a migratory population of Pectoral sandpipers by simulating the movement patterns of an ensemble of 10,000 individual birds in response to changes in stopover locations as an indicator of the impacts of wetland loss and inter-annual variability on the fitness of migratory shorebirds. We used an individual based, biophysical migration model, driven by remotely sensed land surface data, climate data, and biological field data. Mean stop-over durations and stop-over frequency with latitude predicted from our model for nominal cases were consistent with results reported in the literature and available field data. In this study, we take advantage of new computing capabilities enabled by recent GP-GPU computing paradigms and commodity hardware (general purchase computing on graphics processing units). Several aspects of our individual based (agent modeling) approach lend themselves well to GP-GPU computing. We have been able to allocate compute-intensive tasks to the graphics processing units, and now simulate ensembles of 400,000 birds at varying spatial resolutions along the central North American flyway. We are incorporating additional, species specific, mechanistic processes to better reflect the processes underlying bird phenotypic plasticity responses to different climate change scenarios in the central U.S.
Assessing skill of a global bimonthly streamflow ensemble prediction system
NASA Astrophysics Data System (ADS)
van Dijk, A. I.; Peña-Arancibia, J.; Sheffield, J.; Wood, E. F.
2011-12-01
Ideally, a seasonal streamflow forecasting system might be conceived of as a system that ingests skillful climate forecasts from general circulation models and propagates these through thoroughly calibrated hydrological models that are initialised using hydrometric observations. In practice, there are practical problems with each of these aspects. Instead, we analysed whether a comparatively simple hydrological model-based Ensemble Prediction System (EPS) can provide global bimonthly streamflow forecasts with some skill and if so, under what circumstances the greatest skill may be expected. The system tested produces ensemble forecasts for each of six annual bimonthly periods based on the previous 30 years of global daily gridded 1° resolution climate variables and an initialised global hydrological model. To incorporate some of the skill derived from ocean conditions, a post-EPS analog method was used to sample from the ensemble based on El Niño Southern Oscillation (ENSO), Indian Ocean Dipole (IOD), North Atlantic Oscillation (NAO) and Pacific Decadal Oscillation (PDO) index values observed prior to the forecast. Forecasts skill was assessed through a hind-casting experiment for the period 1979-2008. Potential skill was calculated with reference to a model run with the actual forcing for the forecast period (the 'perfect' model) and was compared to actual forecast skill calculated for each of the six forecast times for an average 411 Australian and 51 pan-tropical catchments. Significant potential skill in bimonthly forecasts was largely limited to northern regions during the snow melt period, seasonally wet tropical regions at the transition of wet to dry season, and the Indonesian region where rainfall is well correlated to ENSO. The actual skill was approximately 34-50% of the potential skill. We attribute this primarily to limitations in the model structure, parameterisation and global forcing data. Use of better climate forecasts and remote sensing observations of initial catchment conditions should help to increase actual skill in future. Future work also could address the potential skill gain from using weather and climate forecasts and from a calibrated and/or alternative hydrological model or model ensemble. The approach and data might be useful as a benchmark for joint seasonal forecasting experiments planned under GEWEX.
Generation of single photons with highly tunable wave shape from a cold atomic ensemble
Farrera, Pau; Heinze, Georg; Albrecht, Boris; Ho, Melvyn; Chávez, Matías; Teo, Colin; Sangouard, Nicolas; de Riedmatten, Hugues
2016-01-01
The generation of ultra-narrowband, pure and storable single photons with widely tunable wave shape is an enabling step toward hybrid quantum networks requiring interconnection of remote disparate quantum systems. It allows interaction of quantum light with several material systems, including photonic quantum memories, single trapped ions and opto-mechanical systems. Previous approaches have offered a limited tuning range of the photon duration of at most one order of magnitude. Here we report on a heralded single photon source with controllable emission time based on a cold atomic ensemble, which can generate photons with temporal durations varying over three orders of magnitude up to 10 μs without a significant change of the readout efficiency. We prove the nonclassicality of the emitted photons, show that they are emitted in a pure state, and demonstrate that ultra-long photons with nonstandard wave shape can be generated, which are ideally suited for several quantum information tasks. PMID:27886166
High Throughput Biological Analysis Using Multi-bit Magnetic Digital Planar Tags
NASA Astrophysics Data System (ADS)
Hong, B.; Jeong, J.-R.; Llandro, J.; Hayward, T. J.; Ionescu, A.; Trypiniotis, T.; Mitrelias, T.; Kopper, K. P.; Steinmuller, S. J.; Bland, J. A. C.
2008-06-01
We report a new magnetic labelling technology for high-throughput biomolecular identification and DNA sequencing. Planar multi-bit magnetic tags have been designed and fabricated, which comprise a magnetic barcode formed by an ensemble of micron-sized thin film Ni80Fe20 bars encapsulated in SU8. We show that by using a globally applied magnetic field and magneto-optical Kerr microscopy the magnetic elements in the multi-bit magnetic tags can be addressed individually and encoded/decoded remotely. The critical steps needed to show the feasibility of this technology are demonstrated, including fabrication, flow transport, remote writing and reading, and successful functionalization of the tags as verified by fluorescence detection. This approach is ideal for encoding information on tags in microfluidic flow or suspension, for such applications as labelling of chemical precursors during drug synthesis and combinatorial library-based high-throughput multiplexed bioassays.
Limits to the Stability of Pulsar Time
NASA Technical Reports Server (NTRS)
Petit, Gerard
1996-01-01
The regularity of the rotation rate of millisecond pulsars is the underlying hypothesis for using these neutron stars as 'celestial clocks'. Given their remote location in our galaxy and to our lack of precise knowledge on the galactic environment, a number of phenomena effect the apparent rotation rate observed on Earth. This paper reviews these phenomena and estimates the order of magnitude of their effect. It concludes that an ensemble pulsar time based on a number of selected millisecond pulsars should have a fractional frequency stability close to 2 x 10(sup -15) for an averaging time of a few years.
Mélin, Frédéric; Zibordi, Giuseppe
2007-06-20
An optically based technique is presented that produces merged spectra of normalized water-leaving radiances L(WN) by combining spectral data provided by independent satellite ocean color missions. The assessment of the merging technique is based on a four-year field data series collected by an autonomous above-water radiometer located on the Acqua Alta Oceanographic Tower in the Adriatic Sea. The uncertainties associated with the merged L(WN) obtained from the Sea-viewing Wide Field-of-view Sensor and the Moderate Resolution Imaging Spectroradiometer are consistent with the validation statistics of the individual sensor products. The merging including the third mission Medium Resolution Imaging Spectrometer is also addressed for a reduced ensemble of matchups.
Remote-sensing based approach to forecast habitat quality under climate change scenarios.
Requena-Mullor, Juan M; López, Enrique; Castro, Antonio J; Alcaraz-Segura, Domingo; Castro, Hermelindo; Reyes, Andrés; Cabello, Javier
2017-01-01
As climate change is expected to have a significant impact on species distributions, there is an urgent challenge to provide reliable information to guide conservation biodiversity policies. In addressing this challenge, we propose a remote sensing-based approach to forecast the future habitat quality for European badger, a species not abundant and at risk of local extinction in the arid environments of southeastern Spain, by incorporating environmental variables related with the ecosystem functioning and correlated with climate and land use. Using ensemble prediction methods, we designed global spatial distribution models for the distribution range of badger using presence-only data and climate variables. Then, we constructed regional models for an arid region in the southeast Spain using EVI (Enhanced Vegetation Index) derived variables and weighting the pseudo-absences with the global model projections applied to this region. Finally, we forecast the badger potential spatial distribution in the time period 2071-2099 based on IPCC scenarios incorporating the uncertainty derived from the predicted values of EVI-derived variables. By including remotely sensed descriptors of the temporal dynamics and spatial patterns of ecosystem functioning into spatial distribution models, results suggest that future forecast is less favorable for European badgers than not including them. In addition, change in spatial pattern of habitat suitability may become higher than when forecasts are based just on climate variables. Since the validity of future forecast only based on climate variables is currently questioned, conservation policies supported by such information could have a biased vision and overestimate or underestimate the potential changes in species distribution derived from climate change. The incorporation of ecosystem functional attributes derived from remote sensing in the modeling of future forecast may contribute to the improvement of the detection of ecological responses under climate change scenarios.
Remote-sensing based approach to forecast habitat quality under climate change scenarios
Requena-Mullor, Juan M.; López, Enrique; Castro, Antonio J.; Alcaraz-Segura, Domingo; Castro, Hermelindo; Reyes, Andrés; Cabello, Javier
2017-01-01
As climate change is expected to have a significant impact on species distributions, there is an urgent challenge to provide reliable information to guide conservation biodiversity policies. In addressing this challenge, we propose a remote sensing-based approach to forecast the future habitat quality for European badger, a species not abundant and at risk of local extinction in the arid environments of southeastern Spain, by incorporating environmental variables related with the ecosystem functioning and correlated with climate and land use. Using ensemble prediction methods, we designed global spatial distribution models for the distribution range of badger using presence-only data and climate variables. Then, we constructed regional models for an arid region in the southeast Spain using EVI (Enhanced Vegetation Index) derived variables and weighting the pseudo-absences with the global model projections applied to this region. Finally, we forecast the badger potential spatial distribution in the time period 2071–2099 based on IPCC scenarios incorporating the uncertainty derived from the predicted values of EVI-derived variables. By including remotely sensed descriptors of the temporal dynamics and spatial patterns of ecosystem functioning into spatial distribution models, results suggest that future forecast is less favorable for European badgers than not including them. In addition, change in spatial pattern of habitat suitability may become higher than when forecasts are based just on climate variables. Since the validity of future forecast only based on climate variables is currently questioned, conservation policies supported by such information could have a biased vision and overestimate or underestimate the potential changes in species distribution derived from climate change. The incorporation of ecosystem functional attributes derived from remote sensing in the modeling of future forecast may contribute to the improvement of the detection of ecological responses under climate change scenarios. PMID:28257501
NASA Astrophysics Data System (ADS)
Fowler, H. J.; Forsythe, N. D.; Blenkinsop, S.; Archer, D.; Hardy, A.; Janes, T.; Jones, R. G.; Holderness, T.
2013-12-01
We present results of two distinct, complementary analyses to assess evidence of elevation dependency in temperature change in the UIB (Karakoram, Eastern Hindu Kush) and wider WH. The first analysis component examines historical remotely-sensed land surface temperature (LST) from the second and third generation of the Advanced Very High Resolution Radiometer (AVHRR/2, AVHRR/3) instrument flown on NOAA satellite platforms since the mid-1980s through present day. The high spatial resolution (<4km) from AVHRR instrument enables precise consideration of the relationship between estimated LST and surface topography. The LST data product was developed as part of initiative to produce continuous time-series for key remotely sensed spatial products (LST, snow covered area, cloud cover, NDVI) extending as far back into the historical record as feasible. Context for the AVHRR LST data product is provided by results of bias assessment and validation procedures against both available local observations, both manned and automatic weather stations. Local observations provide meaningful validation and bias assessment of the vertical gradients found in the AVHRR LST as the elevation range from the lowest manned meteorological station (at 1460m asl) to the highest automatic weather station (4733m asl) covers much of the key range yielding runoff from seasonal snowmelt. Furthermore the common available record period of these stations (1995 to 2007) enables assessment not only of the AVHRR LST but also performance comparisons with the more recent MODIS LST data product. A range of spatial aggregations (from minor tributary catchments to primary basin headwaters) is performed to assess regional homogeneity and identify potential latitudinal or longitudinal gradients in elevation dependency. The second analysis component investigates elevation dependency, including its uncertainty, in projected temperature change trajectories in the downscaling of a seventeen member Global Climate Model (GCM) perturbed physics ensemble (PPE) of transient (130-year) simulations using a moderate resolution (25km) regional climate model (RCM). The GCM ensemble is the17-member QUMP (Quantifying Uncertainty in Model Projections) ensemble and the downscaling is done using HadRM3P, part of the PRECIS regional climate modelling system. Both the RCM and GCMs are models developed the UK Met Office Hadley Centre and are based on the HadCM3 GCM. Use of the multi-member PPE enables quantification of uncertainty in projected temperature change while the spatial resolution of RCM improves insight into the role of elevation in projected rates of change. Furthermore comparison with the results of the remote sensing analysis component - considered to provide an 'observed climatology' - permits evaluation of individual ensemble members with regards to biases in spatial gradients in temperature as well timing and magnitude of annual cycles.
Snow water equivalent in the Alps as seen by gridded data sets, CMIP5 and CORDEX climate models
NASA Astrophysics Data System (ADS)
Terzago, Silvia; von Hardenberg, Jost; Palazzi, Elisa; Provenzale, Antonello
2017-07-01
The estimate of the current and future conditions of snow resources in mountain areas would require reliable, kilometre-resolution, regional-observation-based gridded data sets and climate models capable of properly representing snow processes and snow-climate interactions. At the moment, the development of such tools is hampered by the sparseness of station-based reference observations. In past decades passive microwave remote sensing and reanalysis products have mainly been used to infer information on the snow water equivalent distribution. However, the investigation has usually been limited to flat terrains as the reliability of these products in mountain areas is poorly characterized.This work considers the available snow water equivalent data sets from remote sensing and from reanalyses for the greater Alpine region (GAR), and explores their ability to provide a coherent view of the snow water equivalent distribution and climatology in this area. Further we analyse the simulations from the latest-generation regional and global climate models (RCMs, GCMs), participating in the Coordinated Regional Climate Downscaling Experiment over the European domain (EURO-CORDEX) and in the Fifth Coupled Model Intercomparison Project (CMIP5) respectively. We evaluate their reliability in reproducing the main drivers of snow processes - near-surface air temperature and precipitation - against the observational data set EOBS, and compare the snow water equivalent climatology with the remote sensing and reanalysis data sets previously considered. We critically discuss the model limitations in the historical period and we explore their potential in providing reliable future projections.The results of the analysis show that the time-averaged spatial distribution of snow water equivalent and the amplitude of its annual cycle are reproduced quite differently by the different remote sensing and reanalysis data sets, which in fact exhibit a large spread around the ensemble mean. We find that GCMs at spatial resolutions equal to or finer than 1.25° longitude are in closer agreement with the ensemble mean of satellite and reanalysis products in terms of root mean square error and standard deviation than lower-resolution GCMs. The set of regional climate models from the EURO-CORDEX ensemble provides estimates of snow water equivalent at 0.11° resolution that are locally much larger than those indicated by the gridded data sets, and only in a few cases are these differences smoothed out when snow water equivalent is spatially averaged over the entire Alpine domain. ERA-Interim-driven RCM simulations show an annual snow cycle that is comparable in amplitude to those provided by the reference data sets, while GCM-driven RCMs present a large positive bias. RCMs and higher-resolution GCM simulations are used to provide an estimate of the snow reduction expected by the mid-21st century (RCP 8.5 scenario) compared to the historical climatology, with the main purpose of highlighting the limits of our current knowledge and the need for developing more reliable snow simulations.
Multi-scale assimilation of remotely sensed snow observations for hydrologic estimation
NASA Astrophysics Data System (ADS)
Andreadis, K.; Lettenmaier, D.
2008-12-01
Data assimilation provides a framework for optimally merging model predictions and remote sensing observations of snow properties (snow cover extent, water equivalent, grain size, melt state), ideally overcoming limitations of both. A synthetic twin experiment is used to evaluate a data assimilation system that would ingest remotely sensed observations from passive microwave and visible wavelength sensors (brightness temperature and snow cover extent derived products, respectively) with the objective of estimating snow water equivalent. Two data assimilation techniques are used, the Ensemble Kalman filter and the Ensemble Multiscale Kalman filter (EnMKF). One of the challenges inherent in such a data assimilation system is the discrepancy in spatial scales between the different types of snow-related observations. The EnMKF represents the sample model error covariance with a tree that relates the system state variables at different locations and scales through a set of parent-child relationships. This provides an attractive framework to efficiently assimilate observations at different spatial scales. This study provides a first assessment of the feasibility of a system that would assimilate observations from multiple sensors (MODIS snow cover and AMSR-E brightness temperatures) and at different spatial scales for snow water equivalent estimation. The relative value of the different types of observations is examined. Additionally, the error characteristics of both model and observations are discussed.
The NRL relocatable ocean/acoustic ensemble forecast system
NASA Astrophysics Data System (ADS)
Rowley, C.; Martin, P.; Cummings, J.; Jacobs, G.; Coelho, E.; Bishop, C.; Hong, X.; Peggion, G.; Fabre, J.
2009-04-01
A globally relocatable regional ocean nowcast/forecast system has been developed to support rapid implementation of new regional forecast domains. The system is in operational use at the Naval Oceanographic Office for a growing number of regional and coastal implementations. The new system is the basis for an ocean acoustic ensemble forecast and adaptive sampling capability. We present an overview of the forecast system and the ocean ensemble and adaptive sampling methods. The forecast system consists of core ocean data analysis and forecast modules, software for domain configuration, surface and boundary condition forcing processing, and job control, and global databases for ocean climatology, bathymetry, tides, and river locations and transports. The analysis component is the Navy Coupled Ocean Data Assimilation (NCODA) system, a 3D multivariate optimum interpolation system that produces simultaneous analyses of temperature, salinity, geopotential, and vector velocity using remotely-sensed SST, SSH, and sea ice concentration, plus in situ observations of temperature, salinity, and currents from ships, buoys, XBTs, CTDs, profiling floats, and autonomous gliders. The forecast component is the Navy Coastal Ocean Model (NCOM). The system supports one-way nesting and multiple assimilation methods. The ensemble system uses the ensemble transform technique with error variance estimates from the NCODA analysis to represent initial condition error. Perturbed surface forcing or an atmospheric ensemble is used to represent errors in surface forcing. The ensemble transform Kalman filter is used to assess the impact of adaptive observations on future analysis and forecast uncertainty for both ocean and acoustic properties.
Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook
2014-01-01
Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data. PMID:25225874
Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook
2014-09-15
Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data.
Efficient Data Assimilation Algorithms for Bathymetry Applications
NASA Astrophysics Data System (ADS)
Ghorbanidehno, H.; Kokkinaki, A.; Lee, J. H.; Farthing, M.; Hesser, T.; Kitanidis, P. K.; Darve, E. F.
2016-12-01
Information on the evolving state of the nearshore zone bathymetry is crucial to shoreline management, recreational safety, and naval operations. The high cost and complex logistics of using ship-based surveys for bathymetry estimation have encouraged the use of remote sensing monitoring. Data assimilation methods combine monitoring data and models of nearshore dynamics to estimate the unknown bathymetry and the corresponding uncertainties. Existing applications have been limited to the basic Kalman Filter (KF) and the Ensemble Kalman Filter (EnKF). The former can only be applied to low-dimensional problems due to its computational cost; the latter often suffers from ensemble collapse and uncertainty underestimation. This work explores the use of different variants of the Kalman Filter for bathymetry applications. In particular, we compare the performance of the EnKF to the Unscented Kalman Filter and the Hierarchical Kalman Filter, both of which are KF variants for non-linear problems. The objective is to identify which method can better handle the nonlinearities of nearshore physics, while also having a reasonable computational cost. We present two applications; first, the bathymetry of a synthetic one-dimensional cross section normal to the shore is estimated from wave speed measurements. Second, real remote measurements with unknown error statistics are used and compared to in situ bathymetric survey data collected at the USACE Field Research Facility in Duck, NC. We evaluate the information content of different data sets and explore the impact of measurement error and nonlinearities.
Online vegetation parameter estimation using passive microwave remote sensing observations
USDA-ARS?s Scientific Manuscript database
In adaptive system identification the Kalman filter can be used to identify the coefficient of the observation operator of a linear system. Here the ensemble Kalman filter is tested for adaptive online estimation of the vegetation opacity parameter of a radiative transfer model. A state augmentatio...
NASA Astrophysics Data System (ADS)
Palacios-Peña, Laura; Baró, Rocío; Jiménez-Guerrero, Pedro
2016-04-01
The changes in Earth's climate are produced by forcing agents such as greenhouse gases, clouds and atmospheric aerosols. The latter modify the Earth's radiative budget due to their optical, microphysical and chemical properties, and are considered to be the most uncertain forcing agent. There are two main approaches to the study of aerosols: (1) ground-based and remote sensing observations and (2) atmospheric modelling. With the aim of characterizing the uncertainties associated with these approaches, and estimating the radiative forcing caused by aerosols, the main objective of this work is to assess the representation of aerosol optical properties by different remote sensing sensors and online-coupled chemistry-climate models and to determine whether the inclusion of aerosol radiative feedbacks in this type of models improves the modelling outputs over Europe. Two case studies have been selected under the framework of the EuMetChem COST Action ES1004, when important aerosol episodes during 2010 over Europe took place: a Russian wildfires episode and a Saharan desert dust outbreak covering most of Europe. Model data comes from an ensemble of regional air quality-climate simulations performed by the working group 2 of EuMetChem, that investigates the importance of different processes and feedbacks in on-line coupled chemistry-climate models. These simulations are run for three different configurations for each model, differing in the inclusion (or not) of aerosol-radiation and aerosol-cloud interactions. The remote sensing data comes from three different sensors, MODIS (Moderate Resolution Imaging Spectroradiometer), OMI (Ozone Monitoring Instrument) and SeaWIFS (Sea-viewing Wide Field-of-view Sensor). The evaluation has been performed by using classical statistical metrics, comparing modelled and remotely sensed data versus a ground-based instrument network (AERONET). The evaluated variables are aerosol optical depth (AOD) and the Angström exponent (AE) at different wavelengths. Regarding the uncertainty in satellite representation of AOD, MODIS appears to have the best agreement with AERONET observations when compared to other satellite AOD observations. Focusing on the comparison between model output and MODIS and AERONET, results indicate a general slight improvement of AOD in the case of including the aerosol radiative effects in the model and a slight worsening for the Angström exponent for some stations and regions. Regarding the correlation coefficient, both episodes show similar values of this metric, which are higher for AOD. Generally, for the Angström exponent, models tend to underestimate the variability of this variable. Despite this , the improvement in the representation by on-line coupled chemistry-climate models of AOD reflected here may be of essential importance for a better description of aerosol-radiation-cloud interactions in regional climate models. On the other hand, the differences found between remote sensing sensors (which is of the same order of magnitude as the differences between the different members of the model ensemble) point out the uncertainty in the measurements and observations that have to be taken into account when the models are evaluated. Acknowledgments: the funding from REPAIR-CGL2014-59677-R projects (Spanish Ministry of Economy and Innovation, funded by the FEDER programme of the European Union). Special thanks to the EuMetChem COST ACTION ES1004.
NASA Technical Reports Server (NTRS)
Nearing, Grey S.; Crow, Wade T.; Thorp, Kelly R.; Moran, Mary S.; Reichle, Rolf H.; Gupta, Hoshin V.
2012-01-01
Observing system simulation experiments were used to investigate ensemble Bayesian state updating data assimilation of observations of leaf area index (LAI) and soil moisture (theta) for the purpose of improving single-season wheat yield estimates with the Decision Support System for Agrotechnology Transfer (DSSAT) CropSim-Ceres model. Assimilation was conducted in an energy-limited environment and a water-limited environment. Modeling uncertainty was prescribed to weather inputs, soil parameters and initial conditions, and cultivar parameters and through perturbations to model state transition equations. The ensemble Kalman filter and the sequential importance resampling filter were tested for the ability to attenuate effects of these types of uncertainty on yield estimates. LAI and theta observations were synthesized according to characteristics of existing remote sensing data, and effects of observation error were tested. Results indicate that the potential for assimilation to improve end-of-season yield estimates is low. Limitations are due to a lack of root zone soil moisture information, error in LAI observations, and a lack of correlation between leaf and grain growth.
An inquiry into the cirrus-cloud thermostat effect for tropical sea surface temperature
NASA Technical Reports Server (NTRS)
Lau, K.-M.; Sui, C.-H.; Chou, M.-D.; Tao, W.-K.
1994-01-01
In this paper, we investigate the relative importance of local vs remote control on cloud radiative forcing using a cumulus ensemble model. It is found that cloud and surface radiation forcings are much more sensitive to the mean vertical motion assoicated with large scale tropical circulation than to the local SST (sea surface temperature). When the local SST is increased with the mean vertical motion held constant, increased surface latent and sensible heat flux associated with enhanced moisture recycling is found to be the primary mechanism for cooling the ocean surface. Large changes in surface shortwave fluxes are related to changes in cloudiness induced by changes in the large scale circulation. These results are consistent with a number of earlier empirical studies, which raised concerns regarding the validity of the cirrus-thermostat hypothesis (Ramanathan and Collins, 1991). It is argued that for a better understanding of cloud feedback, both local and remote controls need to be considered and that a cumulus ensemble model is a powerful tool that should be explored for such purpose.
NASA Astrophysics Data System (ADS)
Laiolo, Paola; Gabellani, Simone; Campo, Lorenzo; Cenci, Luca; Silvestro, Francesco; Delogu, Fabio; Boni, Giorgio; Rudari, Roberto
2015-04-01
The reliable estimation of hydrological variables (e.g. soil moisture, evapotranspiration, surface temperature) in space and time is of fundamental importance in operational hydrology to improve the forecast of the rainfall-runoff response of catchments and, consequently, flood predictions. Nowadays remote sensing can offer a chance to provide good space-time estimates of several hydrological variables and then improve hydrological model performances especially in environments with scarce in-situ data. This work investigates the impact of the assimilation of different remote sensing products on the hydrological cycle by using a continuous physically based distributed hydrological model. Three soil moisture products derived by ASCAT (Advanced SCATterometer) are used to update the model state variables. The satellite-derived products are assimilated into the hydrological model using different assimilation techniques: a simple nudging and the Ensemble Kalman Filter. Moreover two assimilation strategies are evaluated to assess the impact of assimilating the satellite products at model spatial resolution or at the satellite scale. The experiments are carried out for three Italian catchments on multi year period. The benefits on the model predictions of discharge, LST, evapotranspiration and soil moisture dynamics are tested and discussed.
NASA Astrophysics Data System (ADS)
Singh, Leeth; Mutanga, Onisimo; Mafongoya, Paramu; Peerbhay, Kabir
2017-07-01
The concentration of forage fiber content is critical in explaining the palatability of forage quality for livestock grazers in tropical grasslands. Traditional methods of determining forage fiber content are usually time consuming, costly, and require specialized laboratory analysis. With the potential of remote sensing technologies, determination of key fiber attributes can be made more accurately. This study aims to determine the effectiveness of known absorption wavelengths for detecting forage fiber biochemicals, neutral detergent fiber, acid detergent fiber, and lignin using hyperspectral data. Hyperspectral reflectance spectral measurements (350 to 2500 nm) of grass were collected and implemented within the random forest (RF) ensemble. Results show successful correlations between the known absorption features and the biochemicals with coefficients of determination (R2) ranging from 0.57 to 0.81 and root mean square errors ranging from 6.97 to 3.03 g/kg. In comparison, using the entire dataset, the study identified additional wavelengths for detecting fiber biochemicals, which contributes to the accurate determination of forage quality in a grassland environment. Overall, the results showed that hyperspectral remote sensing in conjunction with the competent RF ensemble could discriminate each key biochemical evaluated. This study shows the potential to upscale the methodology to a space-borne multispectral platform with similar spectral configurations for an accurate and cost effective mapping analysis of forage quality.
Spatial Ensemble Postprocessing of Precipitation Forecasts Using High Resolution Analyses
NASA Astrophysics Data System (ADS)
Lang, Moritz N.; Schicker, Irene; Kann, Alexander; Wang, Yong
2017-04-01
Ensemble prediction systems are designed to account for errors or uncertainties in the initial and boundary conditions, imperfect parameterizations, etc. However, due to sampling errors and underestimation of the model errors, these ensemble forecasts tend to be underdispersive, and to lack both reliability and sharpness. To overcome such limitations, statistical postprocessing methods are commonly applied to these forecasts. In this study, a full-distributional spatial post-processing method is applied to short-range precipitation forecasts over Austria using Standardized Anomaly Model Output Statistics (SAMOS). Following Stauffer et al. (2016), observation and forecast fields are transformed into standardized anomalies by subtracting a site-specific climatological mean and dividing by the climatological standard deviation. Due to the need of fitting only a single regression model for the whole domain, the SAMOS framework provides a computationally inexpensive method to create operationally calibrated probabilistic forecasts for any arbitrary location or for all grid points in the domain simultaneously. Taking advantage of the INCA system (Integrated Nowcasting through Comprehensive Analysis), high resolution analyses are used for the computation of the observed climatology and for model training. The INCA system operationally combines station measurements and remote sensing data into real-time objective analysis fields at 1 km-horizontal resolution and 1 h-temporal resolution. The precipitation forecast used in this study is obtained from a limited area model ensemble prediction system also operated by ZAMG. The so called ALADIN-LAEF provides, by applying a multi-physics approach, a 17-member forecast at a horizontal resolution of 10.9 km and a temporal resolution of 1 hour. The performed SAMOS approach statistically combines the in-house developed high resolution analysis and ensemble prediction system. The station-based validation of 6 hour precipitation sums shows a mean improvement of more than 40% in CRPS when compared to bilinearly interpolated uncalibrated ensemble forecasts. The validation on randomly selected grid points, representing the true height distribution over Austria, still indicates a mean improvement of 35%. The applied statistical model is currently set up for 6-hourly and daily accumulation periods, but will be extended to a temporal resolution of 1-3 hours within a new probabilistic nowcasting system operated by ZAMG.
NASA Astrophysics Data System (ADS)
Wanders, N.; Bierkens, M. F. P.; de Jong, S. M.; de Roo, A.; Karssenberg, D.
2014-08-01
Large-scale hydrological models are nowadays mostly calibrated using observed discharge. As a result, a large part of the hydrological system, in particular the unsaturated zone, remains uncalibrated. Soil moisture observations from satellites have the potential to fill this gap. Here we evaluate the added value of remotely sensed soil moisture in calibration of large-scale hydrological models by addressing two research questions: (1) Which parameters of hydrological models can be identified by calibration with remotely sensed soil moisture? (2) Does calibration with remotely sensed soil moisture lead to an improved calibration of hydrological models compared to calibration based only on discharge observations, such that this leads to improved simulations of soil moisture content and discharge? A dual state and parameter Ensemble Kalman Filter is used to calibrate the hydrological model LISFLOOD for the Upper Danube. Calibration is done using discharge and remotely sensed soil moisture acquired by AMSR-E, SMOS, and ASCAT. Calibration with discharge data improves the estimation of groundwater and routing parameters. Calibration with only remotely sensed soil moisture results in an accurate identification of parameters related to land-surface processes. For the Upper Danube upstream area up to 40,000 km2, calibration on both discharge and soil moisture results in a reduction by 10-30% in the RMSE for discharge simulations, compared to calibration on discharge alone. The conclusion is that remotely sensed soil moisture holds potential for calibration of hydrological models, leading to a better simulation of soil moisture content throughout the catchment and a better simulation of discharge in upstream areas. This article was corrected on 15 SEP 2014. See the end of the full text for details.
NASA Astrophysics Data System (ADS)
dos Santos, A. F.; Freitas, S. R.; de Mattos, J. G. Z.; de Campos Velho, H. F.; Gan, M. A.; da Luz, E. F. P.; Grell, G. A.
2013-09-01
In this paper we consider an optimization problem applying the metaheuristic Firefly algorithm (FY) to weight an ensemble of rainfall forecasts from daily precipitation simulations with the Brazilian developments on the Regional Atmospheric Modeling System (BRAMS) over South America during January 2006. The method is addressed as a parameter estimation problem to weight the ensemble of precipitation forecasts carried out using different options of the convective parameterization scheme. Ensemble simulations were performed using different choices of closures, representing different formulations of dynamic control (the modulation of convection by the environment) in a deep convection scheme. The optimization problem is solved as an inverse problem of parameter estimation. The application and validation of the methodology is carried out using daily precipitation fields, defined over South America and obtained by merging remote sensing estimations with rain gauge observations. The quadratic difference between the model and observed data was used as the objective function to determine the best combination of the ensemble members to reproduce the observations. To reduce the model rainfall biases, the set of weights determined by the algorithm is used to weight members of an ensemble of model simulations in order to compute a new precipitation field that represents the observed precipitation as closely as possible. The validation of the methodology is carried out using classical statistical scores. The algorithm has produced the best combination of the weights, resulting in a new precipitation field closest to the observations.
Few-Photon Nonlinearity with an Atomic Ensemble in an Optical Cavity
NASA Astrophysics Data System (ADS)
Tanji, Haruka
2011-12-01
This thesis investigates the effect of the cavity vacuum field on the dispersive properties of an atomic ensemble in a strongly coupled high-finesse cavity. In particular, we demonstrate vacuum-induced transparency (VIT). The light absorption by the ensemble is suppressed by up to 40% in the presence of a cavity vacuum field. The sharp transparency peak is accompanied by the reduction in the group velocity of a light pulse, measured to be as low as 1800 m/s. This observation is a large step towards the realization of photon number-state filters, recently proposed by Nikoghosyan et al. Furthermore, we demonstrate few-photon optical nonlinearity, where the transparency is increased from 40% to 80% with ˜12 photons in the cavity mode. The result may be viewed as all-optical switching, where the transmission of photons in one mode may be controlled by 12 photons in another. These studies point to the possibility of nonlinear interaction between photons in different free-space modes, a scheme that circumvents cavity-coupling losses that plague cavity-based quantum information processing. Potential applications include advanced quantum devices such as photonic quantum gates, photon-number resolving detectors, and single-photon transistors. In the efforts leading up to these results, we investigate the collective enhancement of atomic coupling to a single mode of a low-finesse cavity. With the strong collective coupling, we obtain exquisite control of quantum states in the atom-photon coupled system. In this system, we demonstrate a heralded single-photon source with 84% conditional efficiency, a quantum bus for deterministic entanglement of two remote ensembles, and heralded polarization-state quantum memory with fidelity above 90%.
Preliminary results of BTDF calibration of transmissive solar diffusers for remote sensing
NASA Astrophysics Data System (ADS)
Georgiev, Georgi T.; Butler, James J.; Thome, Kurt; Cooksey, Catherine; Ding, Leibo
2016-09-01
Satellite instruments operating in the reflected solar wavelength region require accurate and precise determination of the optical properties of their diffusers used in pre-flight and post-flight calibrations. The majority of recent and current space instruments use reflective diffusers. As a result, numerous Bidirectional Reflectance Distribution Function (BRDF) calibration comparisons have been conducted between the National Institute of Standards and Technology (NIST) and other industry and university-based metrology laboratories. However, based on literature searches and communications with NIST and other laboratories, no Bidirectional Transmittance Distribution Function (BTDF) measurement comparisons have been conducted between National Measurement Laboratories (NMLs) and other metrology laboratories. On the other hand, there is a growing interest in the use of transmissive diffusers in the calibration of satellite, air-borne, and ground-based remote sensing instruments. Current remote sensing instruments employing transmissive diffusers include the Ozone Mapping and Profiler Suite instrument (OMPS) Limb instrument on the Suomi-National Polar-orbiting Partnership (S-NPP) platform,, the Geostationary Ocean Color Imager (GOCI) on the Korea Aerospace Research Institute's (KARI) Communication, Ocean, and Meteorological Satellite (COMS), the Ozone Monitoring Instrument (OMI) on NASA's Earth Observing System (EOS) Aura platform, the Tropospheric Emissions: Monitoring of Pollution (TEMPO) instrument and the Geostationary Environmental Monitoring Spectrometer (GEMS).. This ensemble of instruments requires validated BTDF measurements of their onboard transmissive diffusers from the ultraviolet through the near infrared. This paper presents the preliminary results of a BTDF comparison between the NASA Diffuser Calibration Laboratory (DCL) and NIST on quartz and thin Spectralon samples.
Comparison of Ensemble Mean and Deterministic Forecasts for Long-Range Airlift Fuel Planning
2014-03-27
honors in volleyball . In 2002, she transferred to the University of Oklahoma, where she earned a bachelor’s degree in Meteorology and was com- missioned...instrumental in safely estab- lishing remotely-piloted aircraft operations. She was a member of the Air Force Women’s Volleyball team in 2007, 2009, and 2010, as
USDA-ARS?s Scientific Manuscript database
Assimilation of remotely sensed soil moisture data (SM-DA) to correct soil water stores of rainfall-runoff models has shown skill in improving streamflow prediction. In the case of large and sparsely monitored catchments, SM-DA is a particularly attractive tool.Within this context, we assimilate act...
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system. PMID:27835638
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.
On the predictability of outliers in ensemble forecasts
NASA Astrophysics Data System (ADS)
Siegert, S.; Bröcker, J.; Kantz, H.
2012-03-01
In numerical weather prediction, ensembles are used to retrieve probabilistic forecasts of future weather conditions. We consider events where the verification is smaller than the smallest, or larger than the largest ensemble member of a scalar ensemble forecast. These events are called outliers. In a statistically consistent K-member ensemble, outliers should occur with a base rate of 2/(K+1). In operational ensembles this base rate tends to be higher. We study the predictability of outlier events in terms of the Brier Skill Score and find that forecast probabilities can be calculated which are more skillful than the unconditional base rate. This is shown analytically for statistically consistent ensembles. Using logistic regression, forecast probabilities for outlier events in an operational ensemble are calculated. These probabilities exhibit positive skill which is quantitatively similar to the analytical results. Possible causes of these results as well as their consequences for ensemble interpretation are discussed.
Argumentation Based Joint Learning: A Novel Ensemble Learning Approach
Xu, Junyi; Yao, Li; Li, Le
2015-01-01
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359
NASA Astrophysics Data System (ADS)
Christensen, C.; Summa, B.; Scorzelli, G.; Lee, J. W.; Venkat, A.; Bremer, P. T.; Pascucci, V.
2017-12-01
Massive datasets are becoming more common due to increasingly detailed simulations and higher resolution acquisition devices. Yet accessing and processing these huge data collections for scientific analysis is still a significant challenge. Solutions that rely on extensive data transfers are increasingly untenable and often impossible due to lack of sufficient storage at the client side as well as insufficient bandwidth to conduct such large transfers, that in some cases could entail petabytes of data. Large-scale remote computing resources can be useful, but utilizing such systems typically entails some form of offline batch processing with long delays, data replications, and substantial cost for any mistakes. Both types of workflows can severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. In order to facilitate interactivity in both analysis and visualization of these massive data ensembles, we introduce a dynamic runtime system suitable for progressive computation and interactive visualization of arbitrarily large, disparately located spatiotemporal datasets. Our system includes an embedded domain-specific language (EDSL) that allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible processing. Computations involving large amounts of data can be performed remotely in an incremental fashion that dramatically reduces data movement, while the client receives updates progressively thereby remaining robust to fluctuating network latency or limited bandwidth. This system facilitates interactive, incremental analysis and visualization of massive remote datasets up to petabytes in size. Our system is now available for general use in the community through both docker and anaconda.
Ensemble-based docking: From hit discovery to metabolism and toxicity predictions
Evangelista, Wilfredo; Weir, Rebecca; Ellingson, Sally; ...
2016-07-29
The use of ensemble-based docking for the exploration of biochemical pathways and toxicity prediction of drug candidates is described. We describe the computational engineering work necessary to enable large ensemble docking campaigns on supercomputers. We show examples where ensemble-based docking has significantly increased the number and the diversity of validated drug candidates. Finally, we illustrate how ensemble-based docking can be extended beyond hit discovery and toward providing a structural basis for the prediction of metabolism and off-target binding relevant to pre-clinical and clinical trials.
NASA Astrophysics Data System (ADS)
Loizu, Javier; Massari, Christian; Álvarez-Mozos, Jesús; Casalí, Javier; Goñi, Mikel
2016-04-01
Assimilation of Surface Soil Moisture (SSM) observations obtained from remote sensing techniques have been shown to improve streamflow prediction at different time scales of hydrological modeling. Different sensors and methods have been tested for their application in SSM estimation, especially in the microwave region of the electromagnetic spectrum. The available observation devices include passive microwave sensors such as the Advanced Microwave Scanning Radiometer - Earth Observation System (AMSR-E) onboard the Aqua satellite and the Soil Moisture and Ocean Salinity (SMOS) mission. On the other hand, active microwave systems include Scatterometers (SCAT) onboard the European Remote Sensing satellites (ERS-1/2) and the Advanced Scatterometer (ASCAT) onboard MetOp-A satellite. Data assimilation (DA) include different techniques that have been applied in hydrology and other fields for decades. These techniques include, among others, Kalman Filtering (KF), Variational Assimilation or Particle Filtering. From the initial KF method, different techniques were developed to suit its application to different systems. The Ensemble Kalman Filter (EnKF), extensively applied in hydrological modeling improvement, shows its capability to deal with nonlinear model dynamics without linearizing model equations, as its main advantage. The objective of this study was to investigate whether data assimilation of SSM ASCAT observations, through the EnKF method, could improve streamflow simulation of mediterranean catchments with TOPLATS hydrological complex model. The DA technique was programmed in FORTRAN, and applied to hourly simulations of TOPLATS catchment model. TOPLATS (TOPMODEL-based Land-Atmosphere Transfer Scheme) was applied on its lumped version for two mediterranean catchments of similar size, located in northern Spain (Arga, 741 km2) and central Italy (Nestore, 720 km2). The model performs a separated computation of energy and water balances. In those balances, the soil is divided into two layers, the upper Surface Zone (SZ), and the deeper Transmission Zone (TZ). In this study, the SZ depth was fixed to 5 cm, for adequate assimilation of observed data. Available data was distributed as follows: first, the model was calibrated for the 2001-2007 period; then the 2007-2010 period was used for satellite data rescaling purposes. Finally, data assimilation was applied during the validation (2010-2013) period. Application of the EnKF required the following steps: 1) rescaling of satellite data, 2) transformation of rescaled data into Soil Water Index (SWI) through a moving average filter, where a T = 9 calibrated value was applied, 3) generation of a 50 member ensemble through perturbation of inputs (rainfall and temperature) and three selected parameters, 4) validation of the ensemble through the compliance of two criteria based on ensemble's spread, mean square error and skill and, 5) Kalman Gain calculation. In this work, comparison of three satellite data rescaling techniques: 1) cumulative distribution Function (CDF) matching, 2) variance matching and 3) linear least square regression was also performed. Results obtained in this study showed slight improvements of hourly Nash-Sutcliffe Efficiency (NSE) in both catchments, with the different rescaling methods evaluated. Larger improvements were found in terms of seasonal simulated volume error reduction.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Simulation's Ensemble is Better Than Ensemble Simulation
NASA Astrophysics Data System (ADS)
Yan, X.
2017-12-01
Simulation's ensemble is better than ensemble simulation Yan Xiaodong State Key Laboratory of Earth Surface Processes and Resource Ecology (ESPRE) Beijing Normal University,19 Xinjiekouwai Street, Haidian District, Beijing 100875, China Email: yxd@bnu.edu.cnDynamical system is simulated from initial state. However initial state data is of great uncertainty, which leads to uncertainty of simulation. Therefore, multiple possible initial states based simulation has been used widely in atmospheric science, which has indeed been proved to be able to lower the uncertainty, that was named simulation's ensemble because multiple simulation results would be fused . In ecological field, individual based model simulation (forest gap models for example) can be regarded as simulation's ensemble compared with community based simulation (most ecosystem models). In this talk, we will address the advantage of individual based simulation and even their ensembles.
Residue-level global and local ensemble-ensemble comparisons of protein domains.
Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew
2015-09-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. © 2015 The Protein Society.
Residue-level global and local ensemble-ensemble comparisons of protein domains
Clark, Sarah A; Tronrud, Dale E; Andrew Karplus, P
2015-01-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a “consistency check” of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515
NASA Astrophysics Data System (ADS)
Elders, Akiko; Pegion, Kathy
2017-12-01
Arctic sea ice plays an important role in the climate system, moderating the exchange of energy and moisture between the ocean and the atmosphere. An emerging area of research investigates how changes, particularly declines, in sea ice extent (SIE) impact climate in regions local to and remote from the Arctic. Therefore, both observations and model estimates of sea ice become important. This study investigates the skill of sea ice predictions from models participating in the North American Multi-Model Ensemble (NMME) project. Three of the models in this project provide sea-ice predictions. The ensemble average of these models is used to determine seasonal climate impacts on surface air temperature (SAT) and sea level pressure (SLP) in remote regions such as the mid-latitudes. It is found that declines in fall SIE are associated with cold temperatures in the mid-latitudes and pressure patterns across the Arctic and mid-latitudes similar to the negative phase of the Arctic Oscillation (AO). These findings are consistent with other studies that have investigated the relationship between declines in SIE and mid-latitude weather and climate. In an attempt to include additional NMME models for sea-ice predictions, a proxy for SIE is used to estimate ice extent in the remaining models, using sea surface temperature (SST). It is found that SST is a reasonable proxy for SIE estimation when compared to model SIE forecasts and observations. The proxy sea-ice estimates also show similar relationships to mid-latitude temperature and pressure as the actual sea-ice predictions.
Using structure to explore the sequence alignment space of remote homologs.
Kuziemko, Andrew; Honig, Barry; Petrey, Donald
2011-10-01
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
NASA Astrophysics Data System (ADS)
Gelfan, Alexander; Moreydo, Vsevolod; Motovilov, Yury; Solomatine, Dimitri P.
2018-04-01
A long-term forecasting ensemble methodology, applied to water inflows into the Cheboksary Reservoir (Russia), is presented. The methodology is based on a version of the semi-distributed hydrological model ECOMAG (ECOlogical Model for Applied Geophysics) that allows for the calculation of an ensemble of inflow hydrographs using two different sets of weather ensembles for the lead time period: observed weather data, constructed on the basis of the Ensemble Streamflow Prediction methodology (ESP-based forecast), and synthetic weather data, simulated by a multi-site weather generator (WG-based forecast). We have studied the following: (1) whether there is any advantage of the developed ensemble forecasts in comparison with the currently issued operational forecasts of water inflow into the Cheboksary Reservoir, and (2) whether there is any noticeable improvement in probabilistic forecasts when using the WG-simulated ensemble compared to the ESP-based ensemble. We have found that for a 35-year period beginning from the reservoir filling in 1982, both continuous and binary model-based ensemble forecasts (issued in the deterministic form) outperform the operational forecasts of the April-June inflow volume actually used and, additionally, provide acceptable forecasts of additional water regime characteristics besides the inflow volume. We have also demonstrated that the model performance measures (in the verification period) obtained from the WG-based probabilistic forecasts, which are based on a large number of possible weather scenarios, appeared to be more statistically reliable than the corresponding measures calculated from the ESP-based forecasts based on the observed weather scenarios.
White, M.A.; de Beurs, K. M.; Didan, K.; Inouye, D.W.; Richardson, A.D.; Jensen, O.P.; O'Keefe, J.; Zhang, G.; Nemani, R.R.; van, Leeuwen; Brown, Jesslyn F.; de Wit, A.; Schaepman, M.; Lin, X.; Dettinger, M.; Bailey, A.S.; Kimball, J.; Schwartz, M.D.; Baldocchi, D.D.; Lee, J.T.; Lauenroth, W.K.
2009-01-01
Shifts in the timing of spring phenology are a central feature of global change research. Long-term observations of plant phenology have been used to track vegetation responses to climate variability but are often limited to particular species and locations and may not represent synoptic patterns. Satellite remote sensing is instead used for continental to global monitoring. Although numerous methods exist to extract phenological timing, in particular start-of-spring (SOS), from time series of reflectance data, a comprehensive intercomparison and interpretation of SOS methods has not been conducted. Here, we assess 10 SOS methods for North America between 1982 and 2006. The techniques include consistent inputs from the 8 km Global Inventory Modeling and Mapping Studies Advanced Very High Resolution Radiometer NDVIg dataset, independent data for snow cover, soil thaw, lake ice dynamics, spring streamflow timing, over 16 000 individual measurements of ground-based phenology, and two temperature-driven models of spring phenology. Compared with an ensemble of the 10 SOS methods, we found that individual methods differed in average day-of-year estimates by ±60 days and in standard deviation by ±20 days. The ability of the satellite methods to retrieve SOS estimates was highest in northern latitudes and lowest in arid, tropical, and Mediterranean ecoregions. The ordinal rank of SOS methods varied geographically, as did the relationships between SOS estimates and the cryospheric/hydrologic metrics. Compared with ground observations, SOS estimates were more related to the first leaf and first flowers expanding phenological stages. We found no evidence for time trends in spring arrival from ground- or model-based data; using an ensemble estimate from two methods that were more closely related to ground observations than other methods, SOS trends could be detected for only 12% of North America and were divided between trends towards both earlier and later spring.
PRELIMINARY RESULTS OF BTDF CALIBRATION OF TRANSMISSIVE SOLAR DIFFUSERS FOR REMOTE SENSING.
Georgiev, Georgi T; Butler, James J; Thome, Kurt; Cooksey, Catherine; Ding, Leibo
2016-01-01
Satellite instruments operating in the reflected solar wavelength region require accurate and precise determination of the optical properties of their diffusers used in pre-flight and post-flight calibrations. The majority of recent and current space instruments use reflective diffusers. As a result, numerous Bidirectional Reflectance Distribution Function (BRDF) calibration comparisons have been conducted between the National Institute of Standards and Technology (NIST) and other industry and university-based metrology laboratories. However, based on literature searches and communications with NIST and other laboratories, no Bidirectional Transmittance Distribution Function (BTDF) measurement comparisons have been conducted between National Measurement Laboratories (NMLs) and other metrology laboratories. On the other hand, there is a growing interest in the use of transmissive diffusers in the calibration of satellite, air-borne, and ground-based remote sensing instruments. Current remote sensing instruments employing transmissive diffusers include the Ozone Mapping and Profiler Suite instrument (OMPS) Limb instrument on the Suomi-National Polar-orbiting Partnership (S-NPP) platform,, the Geostationary Ocean Color Imager (GOCI) on the Korea Aerospace Research Institute's (KARI) Communication, Ocean, and Meteorological Satellite (COMS), the Ozone Monitoring Instrument (OMI) on NASA's Earth Observing System (EOS) Aura platform, the Tropospheric Emissions: Monitoring of Pollution (TEMPO) instrument and the Geostationary Environmental Monitoring Spectrometer (GEMS).. This ensemble of instruments requires validated BTDF measurements of their on-board transmissive diffusers from the ultraviolet through the near infrared. This paper presents the preliminary results of a BTDF comparison between the NASA Diffuser Calibration Laboratory (DCL) and NIST on quartz and thin Spectralon samples.
Preliminary Results of BTDF Calibration of Transmissive Solar Diffusers for Remote Sensing
NASA Technical Reports Server (NTRS)
Georgiev, Georgi T.; Butler, James J.; Thome, Kurt; Cooksey, Catherine; Ding, Leibo
2016-01-01
Satellite instruments operating in the reflected solar wavelength region require accurate and precise determination of the optical properties of their diffusers used in pre-flight and post-flight calibrations. The majority of recent and current space instruments use reflective diffusers. As a result, numerous Bidirectional Reflectance Distribution Function (BRDF) calibration comparisons have been conducted between the National Institute of Standards and Technology (NIST) and other industry and university-based metrology laboratories. However, based on literature searches and communications with NIST and other laboratories, no Bidirectional Transmittance Distribution Function (BTDF) measurement comparisons have been conducted between National Measurement Laboratories (NMLs) and other metrology laboratories. On the other hand, there is a growing interest in the use of transmissive diffusers in the calibration of satellite, air-borne, and ground-based remote sensing instruments. Current remote sensing instruments employing transmissive diffusers include the Ozone Mapping and Profiler Suite instrument (OMPS) Limb instrument on the Suomi-National Polar-orbiting Partnership (S-NPP) platform,, the Geostationary Ocean Color Imager (GOCI) on the Korea Aerospace Research Institute's (KARI) Communication, Ocean, and Meteorological Satellite (COMS), the Ozone Monitoring Instrument (OMI) on NASA's Earth Observing System (EOS) Aura platform, the Tropospheric Emissions: Monitoring of Pollution (TEMPO) instrument and the Geostationary Environmental Monitoring Spectrometer (GEMS).. This ensemble of instruments requires validated BTDF measurements of their on-board transmissive diffusers from the ultraviolet through the near infrared. This paper presents the preliminary results of a BTDF comparison between the NASA Diffuser Calibration Laboratory (DCL) and NIST on quartz and thin Spectralon samples.
PRELIMINARY RESULTS OF BTDF CALIBRATION OF TRANSMISSIVE SOLAR DIFFUSERS FOR REMOTE SENSING
Georgiev, Georgi T.; Butler, James J.; Thome, Kurt; Cooksey, Catherine; Ding, Leibo
2016-01-01
Satellite instruments operating in the reflected solar wavelength region require accurate and precise determination of the optical properties of their diffusers used in pre-flight and post-flight calibrations. The majority of recent and current space instruments use reflective diffusers. As a result, numerous Bidirectional Reflectance Distribution Function (BRDF) calibration comparisons have been conducted between the National Institute of Standards and Technology (NIST) and other industry and university-based metrology laboratories. However, based on literature searches and communications with NIST and other laboratories, no Bidirectional Transmittance Distribution Function (BTDF) measurement comparisons have been conducted between National Measurement Laboratories (NMLs) and other metrology laboratories. On the other hand, there is a growing interest in the use of transmissive diffusers in the calibration of satellite, air-borne, and ground-based remote sensing instruments. Current remote sensing instruments employing transmissive diffusers include the Ozone Mapping and Profiler Suite instrument (OMPS) Limb instrument on the Suomi-National Polar-orbiting Partnership (S-NPP) platform,, the Geostationary Ocean Color Imager (GOCI) on the Korea Aerospace Research Institute’s (KARI) Communication, Ocean, and Meteorological Satellite (COMS), the Ozone Monitoring Instrument (OMI) on NASA’s Earth Observing System (EOS) Aura platform, the Tropospheric Emissions: Monitoring of Pollution (TEMPO) instrument and the Geostationary Environmental Monitoring Spectrometer (GEMS).. This ensemble of instruments requires validated BTDF measurements of their on-board transmissive diffusers from the ultraviolet through the near infrared. This paper presents the preliminary results of a BTDF comparison between the NASA Diffuser Calibration Laboratory (DCL) and NIST on quartz and thin Spectralon samples. PMID:28003712
NASA Astrophysics Data System (ADS)
Christensen, C.; Liu, S.; Scorzelli, G.; Lee, J. W.; Bremer, P. T.; Summa, B.; Pascucci, V.
2017-12-01
The creation, distribution, analysis, and visualization of large spatiotemporal datasets is a growing challenge for the study of climate and weather phenomena in which increasingly massive domains are utilized to resolve finer features, resulting in datasets that are simply too large to be effectively shared. Existing workflows typically consist of pipelines of independent processes that preclude many possible optimizations. As data sizes increase, these pipelines are difficult or impossible to execute interactively and instead simply run as large offline batch processes. Rather than limiting our conceptualization of such systems to pipelines (or dataflows), we propose a new model for interactive data analysis and visualization systems in which we comprehensively consider the processes involved from data inception through analysis and visualization in order to describe systems composed of these processes in a manner that facilitates interactive implementations of the entire system rather than of only a particular component. We demonstrate the application of this new model with the implementation of an interactive system that supports progressive execution of arbitrary user scripts for the analysis and visualization of massive, disparately located climate data ensembles. It is currently in operation as part of the Earth System Grid Federation server running at Lawrence Livermore National Lab, and accessible through both web-based and desktop clients. Our system facilitates interactive analysis and visualization of massive remote datasets up to petabytes in size, such as the 3.5 PB 7km NASA GEOS-5 Nature Run simulation, previously only possible offline or at reduced resolution. To support the community, we have enabled general distribution of our application using public frameworks including Docker and Anaconda.
Remote sensing of aerosols in the Arctic for an evaluation of global climate model simulations
Glantz, Paul; Bourassa, Adam; Herber, Andreas; Iversen, Trond; Karlsson, Johannes; Kirkevåg, Alf; Maturilli, Marion; Seland, Øyvind; Stebel, Kerstin; Struthers, Hamish; Tesche, Matthias; Thomason, Larry
2014-01-01
In this study Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua retrievals of aerosol optical thickness (AOT) at 555 nm are compared to Sun photometer measurements from Svalbard for a period of 9 years. For the 642 daily coincident measurements that were obtained, MODIS AOT generally varies within the predicted uncertainty of the retrieval over ocean (ΔAOT = ±0.03 ± 0.05 · AOT). The results from the remote sensing have been used to examine the accuracy in estimates of aerosol optical properties in the Arctic, generated by global climate models and from in situ measurements at the Zeppelin station, Svalbard. AOT simulated with the Norwegian Earth System Model/Community Atmosphere Model version 4 Oslo global climate model does not reproduce the observed seasonal variability of the Arctic aerosol. The model overestimates clear-sky AOT by nearly a factor of 2 for the background summer season, while tending to underestimate the values in the spring season. Furthermore, large differences in all-sky AOT of up to 1 order of magnitude are found for the Coupled Model Intercomparison Project phase 5 model ensemble for the spring and summer seasons. Large differences between satellite/ground-based remote sensing of AOT and AOT estimated from dry and humidified scattering coefficients are found for the subarctic marine boundary layer in summer. Key Points Remote sensing of AOT is very useful in validation of climate models PMID:25821664
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
Ensemble-based docking: From hit discovery to metabolism and toxicity predictions.
Evangelista, Wilfredo; Weir, Rebecca L; Ellingson, Sally R; Harris, Jason B; Kapoor, Karan; Smith, Jeremy C; Baudry, Jerome
2016-10-15
This paper describes and illustrates the use of ensemble-based docking, i.e., using a collection of protein structures in docking calculations for hit discovery, the exploration of biochemical pathways and toxicity prediction of drug candidates. We describe the computational engineering work necessary to enable large ensemble docking campaigns on supercomputers. We show examples where ensemble-based docking has significantly increased the number and the diversity of validated drug candidates. Finally, we illustrate how ensemble-based docking can be extended beyond hit discovery and toward providing a structural basis for the prediction of metabolism and off-target binding relevant to pre-clinical and clinical trials. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bird Migration Under Climate Change - A Mechanistic Approach Using Remote Sensing
NASA Technical Reports Server (NTRS)
Smith, James A.; Blattner, Tim; Messmer, Peter
2010-01-01
The broad-scale reductions and shifts that may be expected under climate change in the availability and quality of stopover habitat for long-distance migrants is an area of increasing concern for conservation biologists. Researchers generally have taken two broad approaches to the modeling of migration behaviour to understand the impact of these changes on migratory bird populations. These include models based on causal processes and their response to environmental stimulation, "mechanistic models", or models that primarily are based on observed animal distribution patterns and the correlation of these patterns with environmental variables, i.e. "data driven" models. Investigators have applied the latter technique to forecast changes in migration patterns with changes in the environment, for example, as might be expected under climate change, by forecasting how the underlying environmental data layers upon which the relationships are built will change over time. The learned geostatstical correlations are then applied to the modified data layers.. However, this is problematic. Even if the projections of how the underlying data layers will change are correct, it is not evident that the statistical relationships will remain the same, i.e. that the animal organism may not adapt its' behaviour to the changing conditions. Mechanistic models that explicitly take into account the physical, biological, and behaviour responses of an organism as well as the underlying changes in the landscape offer an alternative to address these shortcomings. The availability of satellite remote sensing observations at multiple spatial and temporal scales, coupled with advances in climate modeling and information technologies enable the application of the mechanistic models to predict how continental bird migration patterns may change in response to environmental change. In earlier work, we simulated the impact of effects of wetland loss and inter-annual variability on the fitness of migratory shorebirds in the central fly ways of North America. We demonstrated the phenotypic plasticity of a migratory population of Pectoral sandpipers consisting of an ensemble of 10,000 individual birds in response to changes in stopover locations using an individual based migration model driven by remotely sensed land surface data, climate data and biological field data. With the advent of new computing capabilities enabled hy recent GPU-GP computing paradigms and commodity hardware, it now is possible to simulate both larger ensemble populations and to incorporate more realistic mechanistic factors into migration models. Here, we take our first steps use these tools to study the impact of long-term drought variability on shorebird survival.
NASA Astrophysics Data System (ADS)
Wu, Guocan; Zheng, Xiaogu; Dan, Bo
2016-04-01
The shallow soil moisture observations are assimilated into Common Land Model (CoLM) to estimate the soil moisture in different layers. The forecast error is inflated to improve the analysis state accuracy and the water balance constraint is adopted to reduce the water budget residual in the assimilation procedure. The experiment results illustrate that the adaptive forecast error inflation can reduce the analysis error, while the proper inflation layer can be selected based on the -2log-likelihood function of the innovation statistic. The water balance constraint can result in reducing water budget residual substantially, at a low cost of assimilation accuracy loss. The assimilation scheme can be potentially applied to assimilate the remote sensing data.
The suitability of remotely sensed soil moisture for improving operational flood forecasting
NASA Astrophysics Data System (ADS)
Wanders, N.; Karssenberg, D.; de Roo, A.; de Jong, S. M.; Bierkens, M. F. P.
2013-11-01
We evaluate the added value of assimilated remotely sensed soil moisture for the European Flood Awareness System (EFAS) and its potential to improve the prediction of the timing and height of the flood peak and low flows. EFAS is an operational flood forecasting system for Europe and uses a distributed hydrological model for flood predictions with lead times up to 10 days. For this study, satellite-derived soil moisture from ASCAT, AMSR-E and SMOS is assimilated into the EFAS system for the Upper Danube basin and results are compared to assimilation of discharge observations only. To assimilate soil moisture and discharge data into EFAS, an Ensemble Kalman Filter (EnKF) is used. Information on the spatial (cross-) correlation of the errors in the satellite products, is included to ensure optimal performance of the EnKF. For the validation, additional discharge observations not used in the EnKF, are used as an independent validation dataset. Our results show that the accuracy of flood forecasts is increased when more discharge observations are assimilated; the Mean Absolute Error (MAE) of the ensemble mean is reduced by 65%. The additional inclusion of satellite data results in a further increase of the performance: forecasts of base flows are better and the uncertainty in the overall discharge is reduced, shown by a 10% reduction in the MAE. In addition, floods are predicted with a higher accuracy and the Continuous Ranked Probability Score (CRPS) shows a performance increase of 5-10% on average, compared to assimilation of discharge only. When soil moisture data is used, the timing errors in the flood predictions are decreased especially for shorter lead times and imminent floods can be forecasted with more skill. The number of false flood alerts is reduced when more data is assimilated into the system and the best performance is achieved with the assimilation of both discharge and satellite observations. The additional gain is highest when discharge observations from both upstream and downstream areas are used in combination with the soil moisture data. These results show the potential of remotely sensed soil moisture observations to improve near-real time flood forecasting in large catchments.
NASA Astrophysics Data System (ADS)
Farhadi, L.; Bateni, S. M.; Auligne, T.; Navari, M.
2017-12-01
Snow emissivity is a key parameter for the estimation of snow surface temperature, which is needed as an initial value in climate models and determination of the outgoing long-wave radiation. Moreover, snow emissivity is required for retrieval of atmospheric parameters (e.g., temperature and humidity profiles) from satellite measurements and satellite data assimilations in numerical weather prediction systems. Microwave emission models and remote sensing data cannot accurately estimate snow emissivity due to limitations attributed to each of them. Existing microwave emission models introduce significant uncertainties in their snow emissivity estimates. This is mainly due to shortcomings of the dense media theory for snow medium at high frequencies, and erroneous forcing variables. The well-known limitations of passive microwave data such as coarse spatial resolution, saturation in deep snowpack, and signal loss in wet snow are the major drawbacks of passive microwave retrieval algorithms for estimation of snow emissivity. A full exploitation of the information contained in the remote sensing data can be achieved by merging them with snow emission models within a data assimilation framework. Such an optimal merging can overcome the specific limitations of models and remote sensing data. An Ensemble Batch Smoother (EnBS) data assimilation framework was developed in this study to combine the synthetically generated passive microwave brightness temperatures at 1.4-, 18.7-, 36.5-, and 89-GHz frequencies with the MEMLS microwave emission model to reduce the uncertainty of the snow emissivity estimates. We have used the EnBS algorithm in the context of observing system simulation experiment (or synthetic experiment) at the local scale observation site (LSOS) of the NASA CLPX field campaign. Our findings showed that the developed methodology significantly improves the estimates of the snow emissivity. The simultaneous assimilation of passive microwave brightness temperatures at all frequencies (i.e., 1.4-, 18.7-, 36.5-, and 89-GHz) reduce the root-mean-square-error (RMSE) of snow emissivity at 1.4-, 18.7-, 36.5-, and 89-GHz (H-pol.) by 80%, 42%, 52%, 40%, respectively compared to the corresponding snow emissivity estimates from the open-loop model.
A remote sensing data assimilation system for cold land processes hydrologic estimation
NASA Astrophysics Data System (ADS)
Andreadis, Konstantinos M.
2009-12-01
Accurate forecasting of snow properties is important for effective water resources management, especially in mountainous areas. Model-based approaches are limited by biases and uncertainties. Remote sensing offers an opportunity for observation of snow properties over larger areas. Traditional approaches to direct estimation of snow properties from passive microwave remote sensing have been plagued by limitations such as the tendency of estimates to saturate for moderately deep snowpacks and the effects of mixed land cover. To address these complications, a data assimilation system is developed and evaluated in a three-part research. The data assimilation system requires the embedding of a microwave emissions model which uses modeled snowpack properties. In the first part of this study, such a model is evaluated using multi-scale TB measurements from the Cold Land Processes Experiment. The model's ability to reproduce snowpack microphysical properties is evaluated through comparison with snowpit measurements, while TB predictions are evaluated through comparison with in-situ, aircraft and satellite measurements. Point comparisons showed limitations in the model, while the spatial averaging and the effects of forest cover suppressed errors in comparisons with aircraft measurements. The layered character of snowpacks increases the complexity of algorithms intended to retrieve snow properties from the snowpack microwave return signal. Implementation of a retrieval strategy requires knowledge of stratigraphy, which practically can only be produced by models. In the second part of this study, we describe a multi-layer model designed for such applications. The model coupled with a radiative transfer scheme improved the estimation of TB, while potential impacts when assimilating radiances are explored. A system that merges SWE model predictions and observations of SCE and TB, is evaluated in the third part of this study over one winter season in the Upper Snake River basin. Two data assimilation techniques, the Ensemble Kalman filter and the Ensemble Multiscale Kalman filter are tested with the multilayer snow model forced by downscaled re-analysis meteorological observations. Both the EnKF and EnMKF showed modest improvements when compared with the open-loop simulation, relative to a baseline simulation which used in-situ meteorological data, while comparisons with in-situ SWE measurements showed an overall improvement.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging
NASA Astrophysics Data System (ADS)
Chen, Lei; Kamel, Mohamed S.
2016-01-01
In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
NASA Astrophysics Data System (ADS)
Aalstad, Kristoffer; Westermann, Sebastian; Vikhamar Schuler, Thomas; Boike, Julia; Bertino, Laurent
2018-01-01
With its high albedo, low thermal conductivity and large water storing capacity, snow strongly modulates the surface energy and water balance, which makes it a critical factor in mid- to high-latitude and mountain environments. However, estimating the snow water equivalent (SWE) is challenging in remote-sensing applications already at medium spatial resolutions of 1 km. We present an ensemble-based data assimilation framework that estimates the peak subgrid SWE distribution (SSD) at the 1 km scale by assimilating fractional snow-covered area (fSCA) satellite retrievals in a simple snow model forced by downscaled reanalysis data. The basic idea is to relate the timing of the snow cover depletion (accessible from satellite products) to the peak SSD. Peak subgrid SWE is assumed to be lognormally distributed, which can be translated to a modeled time series of fSCA through the snow model. Assimilation of satellite-derived fSCA facilitates the estimation of the peak SSD, while taking into account uncertainties in both the model and the assimilated data sets. As an extension to previous studies, our method makes use of the novel (to snow data assimilation) ensemble smoother with multiple data assimilation (ES-MDA) scheme combined with analytical Gaussian anamorphosis to assimilate time series of Moderate Resolution Imaging Spectroradiometer (MODIS) and Sentinel-2 fSCA retrievals. The scheme is applied to Arctic sites near Ny-Ålesund (79° N, Svalbard, Norway) where field measurements of fSCA and SWE distributions are available. The method is able to successfully recover accurate estimates of peak SSD on most of the occasions considered. Through the ES-MDA assimilation, the root-mean-square error (RMSE) for the fSCA, peak mean SWE and peak subgrid coefficient of variation is improved by around 75, 60 and 20 %, respectively, when compared to the prior, yielding RMSEs of 0.01, 0.09 m water equivalent (w.e.) and 0.13, respectively. The ES-MDA either outperforms or at least nearly matches the performance of other ensemble-based batch smoother schemes with regards to various evaluation metrics. Given the modularity of the method, it could prove valuable for a range of satellite-era hydrometeorological reanalyses.
Project fires. Volume 2: Protective ensemble performance standards, phase 1B
NASA Astrophysics Data System (ADS)
Abeles, F. J.
1980-05-01
The design of the prototype protective ensemble was finalized. Prototype ensembles were fabricated and then subjected to a series of qualification tests which were based upon the protective ensemble performance standards PEPS requirements. Engineering drawings and purchase specifications were prepared for the new protective ensemble.
HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy.
Hu, Huan; Zhang, Li; Ai, Haixin; Zhang, Hui; Fan, Yetian; Zhao, Qi; Liu, Hongsheng
2018-03-27
LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .
Multi-parametric variational data assimilation for hydrological forecasting
NASA Astrophysics Data System (ADS)
Alvarado-Montero, R.; Schwanenberg, D.; Krahe, P.; Helmke, P.; Klein, B.
2017-12-01
Ensemble forecasting is increasingly applied in flow forecasting systems to provide users with a better understanding of forecast uncertainty and consequently to take better-informed decisions. A common practice in probabilistic streamflow forecasting is to force deterministic hydrological model with an ensemble of numerical weather predictions. This approach aims at the representation of meteorological uncertainty but neglects uncertainty of the hydrological model as well as its initial conditions. Complementary approaches use probabilistic data assimilation techniques to receive a variety of initial states or represent model uncertainty by model pools instead of single deterministic models. This paper introduces a novel approach that extends a variational data assimilation based on Moving Horizon Estimation to enable the assimilation of observations into multi-parametric model pools. It results in a probabilistic estimate of initial model states that takes into account the parametric model uncertainty in the data assimilation. The assimilation technique is applied to the uppermost area of River Main in Germany. We use different parametric pools, each of them with five parameter sets, to assimilate streamflow data, as well as remotely sensed data from the H-SAF project. We assess the impact of the assimilation in the lead time performance of perfect forecasts (i.e. observed data as forcing variables) as well as deterministic and probabilistic forecasts from ECMWF. The multi-parametric assimilation shows an improvement of up to 23% for CRPS performance and approximately 20% in Brier Skill Scores with respect to the deterministic approach. It also improves the skill of the forecast in terms of rank histogram and produces a narrower ensemble spread.
NASA Astrophysics Data System (ADS)
Alvarez-Garreton, C.; Ryu, D.; Western, A. W.; Su, C.-H.; Crow, W. T.; Robertson, D. E.; Leahy, C.
2014-09-01
Assimilation of remotely sensed soil moisture data (SM-DA) to correct soil water stores of rainfall-runoff models has shown skill in improving streamflow prediction. In the case of large and sparsely monitored catchments, SM-DA is a particularly attractive tool. Within this context, we assimilate active and passive satellite soil moisture (SSM) retrievals using an ensemble Kalman filter to improve operational flood prediction within a large semi-arid catchment in Australia (>40 000 km2). We assess the importance of accounting for channel routing and the spatial distribution of forcing data by applying SM-DA to a lumped and a semi-distributed scheme of the probability distributed model (PDM). Our scheme also accounts for model error representation and seasonal biases and errors in the satellite data. Before assimilation, the semi-distributed model provided more accurate streamflow prediction (Nash-Sutcliffe efficiency, NS = 0.77) than the lumped model (NS = 0.67) at the catchment outlet. However, this did not ensure good performance at the "ungauged" inner catchments. After SM-DA, the streamflow ensemble prediction at the outlet was improved in both the lumped and the semi-distributed schemes: the root mean square error of the ensemble was reduced by 27 and 31%, respectively; the NS of the ensemble mean increased by 7 and 38%, respectively; the false alarm ratio was reduced by 15 and 25%, respectively; and the ensemble prediction spread was reduced while its reliability was maintained. Our findings imply that even when rainfall is the main driver of flooding in semi-arid catchments, adequately processed SSM can be used to reduce errors in the model soil moisture, which in turn provides better streamflow ensemble prediction. We demonstrate that SM-DA efficacy is enhanced when the spatial distribution in forcing data and routing processes are accounted for. At ungauged locations, SM-DA is effective at improving streamflow ensemble prediction, however, the updated prediction is still poor since SM-DA does not address systematic errors in the model.
NASA Technical Reports Server (NTRS)
Abeles, F. J.
1980-01-01
The design of the prototype protective ensemble was finalized. Prototype ensembles were fabricated and then subjected to a series of qualification tests which were based upon the protective ensemble performance standards PEPS requirements. Engineering drawings and purchase specifications were prepared for the new protective ensemble.
Global system for hydrological monitoring and forecasting in real time at high resolution
NASA Astrophysics Data System (ADS)
Ortiz, Enrique; De Michele, Carlo; Todini, Ezio; Cifres, Enrique
2016-04-01
This project presented at the EGU 2016 born of solidarity and the need to dignify the most disadvantaged people living in the poorest countries (Africa, South America and Asia, which are continually exposed to changes in the hydrologic cycle suffering events of large floods and/or long periods of droughts. It is also a special year this 2016, Year of Mercy, in which we must engage with the most disadvantaged of our Planet (Gaia) making available to them what we do professionally and scientifically. The project called "Global system for hydrological monitoring and forecasting in real time at high resolution" is Non-Profit and aims to provide at global high resolution (1km2) hydrological monitoring and forecasting in real time and continuously coupling Weather Forecast of Global Circulation Models, such us GFS-0.25° (Deterministic and Ensembles Run) forcing a physically based distributed hydrological model computationally efficient, such as the latest version extended of TOPKAPI model, named TOPKAPI-eXtended. Finally using the MCP approach for the proper use of ensembles for Predictive Uncertainty assessment essentially based on a multiple regression in the Normal space, can be easily extended to use ensembles to represent the local (in time) smaller or larger conditional predictive uncertainty, as a function of the ensemble spread. In this way, each prediction in time accounts for both the predictive uncertainty of the ensemble mean and that of the ensemble spread. To perform a continuous hydrological modeling with TOPKAPI-X model and have hot start of hydrological status of watersheds, the system assimilated products of rainfall and temperature derived from remote sensing, such as product 3B42RT of TRMM NASA and others.The system will be integrated into a Decision Support System (DSS) platform, based on geographical data. The DSS is a web application (For Pc, Tablet/Mobile phone): It does not need installation (all you need is a web browser and an internet connection) and not need update (all upgrade are deployed on the remote server)and DSS is a classical client-server application. The client side will be an HTML 5-CSS 3 application, it runs in one of the most common browser. The server side consist in: A web server (Apache web server); a map server (Geoserver); a Geographical q3456Relational Database Management Sytem (Postgresql+Postgis); Tools based on GDAL Lybraries. A customized web page will be implemented to publish all hydrometeorological information and forecast runs (free) for all users in the world. In this first presentation of the project are invited to attend all those scientific / technical people, Universities, Research Centers (public or private) who want to collaborate in it, opening a brainstorming to improve the System. References: • Liu Z. and Todini E., (2002). Towards a comprehensive physically based rainfall-runoff model. Hydrology and Earth System Sciences (HESS), 6(5):859-881, 2002. • Thielen, J., Bartholmes, J., Ramos, M.-H., and de Roo, A., (2009): The European Flood Alert System - Part 1: Concept and development, Hydrol. Earth Syst. Sci., 13, 125-140, 2009. • Coccia C., Mazzetti C., Ortiz E., Todini E., (2010) - A different soil conceptualization for the TOPKAPI model application within the DMIP 2. American Geophysical Union. Fall Meeting, San Francisco H21H-07, 2010. • Pappenberger, F., Cloke, H. L., Balsamo, G., Ngo-Duc, T., and Oki,T., (2010) Global runoff routing with the hydrological component of the ECMWF NWP system, Int. J. Climatol., 30, 2155-2174, 2010. • Coccia, G. and Todini, E., (2011). Recent developments in predictive uncertainty assessment based on the Model Conditional Processor approach. Hydrology and Earth System Sciences, 15, 3253-3274, 2011. • Wu, H., Adler, R. F., Hong, Y., Tian, Y., and Policelli, F.,(2012): Evaluation of Global Flood Detection Using Satellite-Based Rainfall and a Hydrologic Model, J. Hydrometeorol., 13, 1268-1284, 2012. • Simth M. et al., (2013). The Distributed Model Intercomparison Project - Phase 2: Experiment Design and Summary Results of the Western Basin Experiments, Journal of Hydrology 507, 300-329, 2013. • Pontificiae Academiae Scientiarvm (2014). Proceedings of the Joint Workshop on 2-6 May 2014: Sustainable Humanity Sustainable Nature Our Responsibility. Pontificiae Academiae Scientiarvm Extra Series 41. Vatican City. 2014 • Encyclical letter CARITAS IN VERITATE of the supreme pontiff Benedict XVI to the bishops, priests and deacons, men and women religious the lay faithful and all people of good will on integral human development in charity and truth. Vatican City . 2009. • Encyclical letter LAUDATO SI' of the holy father Francis on care for our common home. Vatican City. 2015
Integrating remotely sensed surface water extent into continental scale hydrology
NASA Astrophysics Data System (ADS)
Revilla-Romero, Beatriz; Wanders, Niko; Burek, Peter; Salamon, Peter; de Roo, Ad
2016-12-01
In hydrological forecasting, data assimilation techniques are employed to improve estimates of initial conditions to update incorrect model states with observational data. However, the limited availability of continuous and up-to-date ground streamflow data is one of the main constraints for large-scale flood forecasting models. This is the first study that assess the impact of assimilating daily remotely sensed surface water extent at a 0.1° × 0.1° spatial resolution derived from the Global Flood Detection System (GFDS) into a global rainfall-runoff including large ungauged areas at the continental spatial scale in Africa and South America. Surface water extent is observed using a range of passive microwave remote sensors. The methodology uses the brightness temperature as water bodies have a lower emissivity. In a time series, the satellite signal is expected to vary with changes in water surface, and anomalies can be correlated with flood events. The Ensemble Kalman Filter (EnKF) is a Monte-Carlo implementation of data assimilation and used here by applying random sampling perturbations to the precipitation inputs to account for uncertainty obtaining ensemble streamflow simulations from the LISFLOOD model. Results of the updated streamflow simulation are compared to baseline simulations, without assimilation of the satellite-derived surface water extent. Validation is done in over 100 in situ river gauges using daily streamflow observations in the African and South American continent over a one year period. Some of the more commonly used metrics in hydrology were calculated: KGE', NSE, PBIAS%, R2, RMSE, and VE. Results show that, for example, NSE score improved on 61 out of 101 stations obtaining significant improvements in both the timing and volume of the flow peaks. Whereas the validation at gauges located in lowland jungle obtained poorest performance mainly due to the closed forest influence on the satellite signal retrieval. The conclusion is that remotely sensed surface water extent holds potential for improving rainfall-runoff streamflow simulations, potentially leading to a better forecast of the peak flow.
Towards an improved ensemble precipitation forecast: A probabilistic post-processing approach
NASA Astrophysics Data System (ADS)
Khajehei, Sepideh; Moradkhani, Hamid
2017-03-01
Recently, ensemble post-processing (EPP) has become a commonly used approach for reducing the uncertainty in forcing data and hence hydrologic simulation. The procedure was introduced to build ensemble precipitation forecasts based on the statistical relationship between observations and forecasts. More specifically, the approach relies on a transfer function that is developed based on a bivariate joint distribution between the observations and the simulations in the historical period. The transfer function is used to post-process the forecast. In this study, we propose a Bayesian EPP approach based on copula functions (COP-EPP) to improve the reliability of the precipitation ensemble forecast. Evaluation of the copula-based method is carried out by comparing the performance of the generated ensemble precipitation with the outputs from an existing procedure, i.e. mixed type meta-Gaussian distribution. Monthly precipitation from Climate Forecast System Reanalysis (CFS) and gridded observation from Parameter-Elevation Relationships on Independent Slopes Model (PRISM) have been employed to generate the post-processed ensemble precipitation. Deterministic and probabilistic verification frameworks are utilized in order to evaluate the outputs from the proposed technique. Distribution of seasonal precipitation for the generated ensemble from the copula-based technique is compared to the observation and raw forecasts for three sub-basins located in the Western United States. Results show that both techniques are successful in producing reliable and unbiased ensemble forecast, however, the COP-EPP demonstrates considerable improvement in the ensemble forecast in both deterministic and probabilistic verification, in particular in characterizing the extreme events in wet seasons.
NASA Technical Reports Server (NTRS)
Hewagama, TIlak; Aslam, Shahid; Talabac, Stephen; Allen, John E., Jr.; Annen, John N.; Jennings, Donald E.
2011-01-01
Fourier transform spectrometers have a venerable heritage as flight instruments. However, obtaining an accurate spectrum exacts a penalty in instrument mass and power requirements. Recent advances in a broad class of non-scanning Fourier transform spectrometer (FTS) devices, generally called spatial heterodyne spectrometers, offer distinct advantages as flight optimized systems. We are developing a miniaturized system that employs photonics lightwave circuit principles and functions as an FTS operating in the 7-14 micrometer spectral region. The inteferogram is constructed from an ensemble of Mach-Zehnder interferometers with path length differences calibrated to mimic scan mirror sample positions of a classic Michelson type FTS. One potential long-term application of this technology in low cost planetary missions is the concept of a self-contained sensor system. We are developing a systems architecture concept for wide area in situ and remote monitoring of characteristic properties that are of scientific interest. The system will be based on wavelength- and resolution-independent spectroscopic sensors for studying atmospheric and surface chemistry, physics, and mineralogy. The self-contained sensor network is based on our concept of an Addressable Photonics Cube (APC) which has real-time flexibility and broad science applications. It is envisaged that a spatially distributed autonomous sensor web concept that integrates multiple APCs will be reactive and dynamically driven. The network is designed to respond in an event- or model-driven manner or reconfigured as needed.
Skill of Global Raw and Postprocessed Ensemble Predictions of Rainfall over Northern Tropical Africa
NASA Astrophysics Data System (ADS)
Vogel, Peter; Knippertz, Peter; Fink, Andreas H.; Schlueter, Andreas; Gneiting, Tilmann
2018-04-01
Accumulated precipitation forecasts are of high socioeconomic importance for agriculturally dominated societies in northern tropical Africa. In this study, we analyze the performance of nine operational global ensemble prediction systems (EPSs) relative to climatology-based forecasts for 1 to 5-day accumulated precipitation based on the monsoon seasons 2007-2014 for three regions within northern tropical Africa. To assess the full potential of raw ensemble forecasts across spatial scales, we apply state-of-the-art statistical postprocessing methods in form of Bayesian Model Averaging (BMA) and Ensemble Model Output Statistics (EMOS), and verify against station and spatially aggregated, satellite-based gridded observations. Raw ensemble forecasts are uncalibrated, unreliable, and underperform relative to climatology, independently of region, accumulation time, monsoon season, and ensemble. Differences between raw ensemble and climatological forecasts are large, and partly stem from poor prediction for low precipitation amounts. BMA and EMOS postprocessed forecasts are calibrated, reliable, and strongly improve on the raw ensembles, but - somewhat disappointingly - typically do not outperform climatology. Most EPSs exhibit slight improvements over the period 2007-2014, but overall have little added value compared to climatology. We suspect that the parametrization of convection is a potential cause for the sobering lack of ensemble forecast skill in a region dominated by mesoscale convective systems.
NASA Astrophysics Data System (ADS)
De Vleeschouwer, Niels; Verhoest, Niko E. C.; Gobeyn, Sacha; De Baets, Bernard; Verwaeren, Jan; Pauwels, Valentijn R. N.
2015-04-01
The continuous monitoring of soil moisture in a permanent network can yield an interesting data product for use in hydrological modeling. Major advantages of in situ observations compared to remote sensing products are the potential vertical extent of the measurements, the smaller temporal resolution of the observation time series, the smaller impact of land cover variability on the observation bias, etc. However, two major disadvantages are the typically small integration volume of in situ measurements, and the often large spacing between monitoring locations. This causes only a small part of the modeling domain to be directly observed. Furthermore, the spatial configuration of the monitoring network is typically non-dynamic in time. Generally, e.g. when applying data assimilation, maximizing the observed information under given circumstances will lead to a better qualitative and quantitative insight of the hydrological system. It is therefore advisable to perform a prior analysis in order to select those monitoring locations which are most predictive for the unobserved modeling domain. This research focuses on optimizing the configuration of a soil moisture monitoring network in the catchment of the Bellebeek, situated in Belgium. A recursive algorithm, strongly linked to the equations of the Ensemble Kalman Filter, has been developed to select the most predictive locations in the catchment. The basic idea behind the algorithm is twofold. On the one hand a minimization of the modeled soil moisture ensemble error covariance between the different monitoring locations is intended. This causes the monitoring locations to be as independent as possible regarding the modeled soil moisture dynamics. On the other hand, the modeled soil moisture ensemble error covariance between the monitoring locations and the unobserved modeling domain is maximized. The latter causes a selection of monitoring locations which are more predictive towards unobserved locations. The main factors that will influence the outcome of the algorithm are the following: the choice of the hydrological model, the uncertainty model applied for ensemble generation, the general wetness of the catchment during which the error covariance is computed, etc. In this research the influence of the latter two is examined more in-depth. Furthermore, the optimal network configuration resulting from the newly developed algorithm is compared to network configurations obtained by two other algorithms. The first algorithm is based on a temporal stability analysis of the modeled soil moisture in order to identify catchment representative monitoring locations with regard to average conditions. The second algorithm involves the clustering of available spatially distributed data (e.g. land cover and soil maps) that is not obtained by hydrological modeling.
Nanni, Loris; Lumini, Alessandra
2009-01-01
The focuses of this work are: to propose a novel method for building an ensemble of classifiers for peptide classification based on substitution matrices; to show the importance to select a proper set of the parameters of the classifiers that build the ensemble of learning systems. The HIV-1 protease cleavage site prediction problem is here studied. The results obtained by a blind testing protocol are reported, the comparison with other state-of-the-art approaches, based on ensemble of classifiers, allows to quantify the performance improvement obtained by the systems proposed in this paper. The simulation based on experimentally determined protease cleavage data has demonstrated the success of these new ensemble algorithms. Particularly interesting it is to note that also if the HIV-1 protease cleavage site prediction problem is considered linearly separable we obtain the best performance using an ensemble of non-linear classifiers.
Sanchez-Martinez, M; Crehuet, R
2014-12-21
We present a method based on the maximum entropy principle that can re-weight an ensemble of protein structures based on data from residual dipolar couplings (RDCs). The RDCs of intrinsically disordered proteins (IDPs) provide information on the secondary structure elements present in an ensemble; however even two sets of RDCs are not enough to fully determine the distribution of conformations, and the force field used to generate the structures has a pervasive influence on the refined ensemble. Two physics-based coarse-grained force fields, Profasi and Campari, are able to predict the secondary structure elements present in an IDP, but even after including the RDC data, the re-weighted ensembles differ between both force fields. Thus the spread of IDP ensembles highlights the need for better force fields. We distribute our algorithm in an open-source Python code.
SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
NASA Astrophysics Data System (ADS)
Hobbs, J.; Turmon, M.; David, C. H.; Reager, J. T., II; Famiglietti, J. S.
2017-12-01
NASA's Western States Water Mission (WSWM) combines remote sensing of the terrestrial water cycle with hydrological models to provide high-resolution state estimates for multiple variables. The effort includes both land surface and river routing models that are subject to several sources of uncertainty, including errors in the model forcing and model structural uncertainty. Computational and storage constraints prohibit extensive ensemble simulations, so this work outlines efficient but flexible approaches for estimating and reporting uncertainty. Calibrated by remote sensing and in situ data where available, we illustrate the application of these techniques in producing state estimates with associated uncertainties at kilometer-scale resolution for key variables such as soil moisture, groundwater, and streamflow.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunham, Mark Edward; Baker, Zachary K; Stettler, Matthew W
2009-01-01
Los Alamos has recently completed the latest in a series of Reconfigurable Software Radios, which incorporates several key innovations in both hardware design and algorithms. Due to our focus on satellite applications, each design must extract the best size, weight, and power performance possible from the ensemble of Commodity Off-the-Shelf (COTS) parts available at the time of design. In this case we have achieved 1 TeraOps/second signal processing on a 1920 Megabit/second datastream, while using only 53 Watts mains power, 5.5 kg, and 3 liters. This processing capability enables very advanced algorithms such as our wideband RF compression scheme tomore » operate remotely, allowing network bandwidth constrained applications to deliver previously unattainable performance.« less
2014-07-01
a different impact on spectral normalized water leaving radiances and the derived ocean color products (inherent optical properties, chlorophyll ). We...leaving radiances and the derived ocean color products (inherent optical properties, chlorophyll ). We evaluated the influence of gains from open and...34gain" on ocean color products. These products include the spectral Remote Sensing Reflectance (RRS), chlorophyll concentration, and Inherent Optical
On Sensitivity Analysis within the 4DVAR Framework
2014-02-01
sitivity’’ (AS) approach, Lee et al. (2001) estimated the sensitivity of the Indonesian Throughflow to remote wind forcing, Losch and Heimbach ( 2007 ...of massive paral- lelization. The ensemble sensitivity (ES) analysis (e.g., Ancell and Hakim 2007 ; Torn and Hakim 2008) follows the basic principle of...variational assimila- tion techniques (e.g., Cao et al. 2007 ; Liu et al. 2008; Yaremchuk et al. 2009; Clayton et al. 2013). In particular, Yaremchuk
A Note on NCOM Temperature Forecast Error Calibration Using the Ensemble Transform
2009-01-01
Division Head Ruth H. Preller, 7300 Security, Code 1226 Office of Counsel,Code 1008.3 ADOR/Director NCST E. R. Franchi , 7000 Public Affairs...problem, local unbiased (correlation) and persistent errors (bias) of the Navy Coastal Ocean Modeling (NCOM) System nested in global ocean domains, are...system were made available in real-time without performing local data assimilation, though remote sensing and global data was assimilated on the
Asynchronous Replica Exchange Software for Grid and Heterogeneous Computing.
Gallicchio, Emilio; Xia, Junchao; Flynn, William F; Zhang, Baofeng; Samlalsingh, Sade; Mentes, Ahmet; Levy, Ronald M
2015-11-01
Parallel replica exchange sampling is an extended ensemble technique often used to accelerate the exploration of the conformational ensemble of atomistic molecular simulations of chemical systems. Inter-process communication and coordination requirements have historically discouraged the deployment of replica exchange on distributed and heterogeneous resources. Here we describe the architecture of a software (named ASyncRE) for performing asynchronous replica exchange molecular simulations on volunteered computing grids and heterogeneous high performance clusters. The asynchronous replica exchange algorithm on which the software is based avoids centralized synchronization steps and the need for direct communication between remote processes. It allows molecular dynamics threads to progress at different rates and enables parameter exchanges among arbitrary sets of replicas independently from other replicas. ASyncRE is written in Python following a modular design conducive to extensions to various replica exchange schemes and molecular dynamics engines. Applications of the software for the modeling of association equilibria of supramolecular and macromolecular complexes on BOINC campus computational grids and on the CPU/MIC heterogeneous hardware of the XSEDE Stampede supercomputer are illustrated. They show the ability of ASyncRE to utilize large grids of desktop computers running the Windows, MacOS, and/or Linux operating systems as well as collections of high performance heterogeneous hardware devices.
An Ensemble-Based Smoother with Retrospectively Updated Weights for Highly Nonlinear Systems
NASA Technical Reports Server (NTRS)
Chin, T. M.; Turmon, M. J.; Jewell, J. B.; Ghil, M.
2006-01-01
Monte Carlo computational methods have been introduced into data assimilation for nonlinear systems in order to alleviate the computational burden of updating and propagating the full probability distribution. By propagating an ensemble of representative states, algorithms like the ensemble Kalman filter (EnKF) and the resampled particle filter (RPF) rely on the existing modeling infrastructure to approximate the distribution based on the evolution of this ensemble. This work presents an ensemble-based smoother that is applicable to the Monte Carlo filtering schemes like EnKF and RPF. At the minor cost of retrospectively updating a set of weights for ensemble members, this smoother has demonstrated superior capabilities in state tracking for two highly nonlinear problems: the double-well potential and trivariate Lorenz systems. The algorithm does not require retrospective adaptation of the ensemble members themselves, and it is thus suited to a streaming operational mode. The accuracy of the proposed backward-update scheme in estimating non-Gaussian distributions is evaluated by comparison to the more accurate estimates provided by a Markov chain Monte Carlo algorithm.
A comparison of breeding and ensemble transform vectors for global ensemble generation
NASA Astrophysics Data System (ADS)
Deng, Guo; Tian, Hua; Li, Xiaoli; Chen, Jing; Gong, Jiandong; Jiao, Meiyan
2012-02-01
To compare the initial perturbation techniques using breeding vectors and ensemble transform vectors, three ensemble prediction systems using both initial perturbation methods but with different ensemble member sizes based on the spectral model T213/L31 are constructed at the National Meteorological Center, China Meteorological Administration (NMC/CMA). A series of ensemble verification scores such as forecast skill of the ensemble mean, ensemble resolution, and ensemble reliability are introduced to identify the most important attributes of ensemble forecast systems. The results indicate that the ensemble transform technique is superior to the breeding vector method in light of the evaluation of anomaly correlation coefficient (ACC), which is a deterministic character of the ensemble mean, the root-mean-square error (RMSE) and spread, which are of probabilistic attributes, and the continuous ranked probability score (CRPS) and its decomposition. The advantage of the ensemble transform approach is attributed to its orthogonality among ensemble perturbations as well as its consistence with the data assimilation system. Therefore, this study may serve as a reference for configuration of the best ensemble prediction system to be used in operation.
NASA Astrophysics Data System (ADS)
Clark, E.; Wood, A.; Nijssen, B.; Newman, A. J.; Mendoza, P. A.
2016-12-01
The System for Hydrometeorological Applications, Research and Prediction (SHARP), developed at the National Center for Atmospheric Research (NCAR), University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation, is a fully automated ensemble prediction system for short-term to seasonal applications. It incorporates uncertainty in initial hydrologic conditions (IHCs) and in hydrometeorological predictions. In this implementation, IHC uncertainty is estimated by propagating an ensemble of 100 plausible temperature and precipitation time series through the Sacramento/Snow-17 model. The forcing ensemble explicitly accounts for measurement and interpolation uncertainties in the development of gridded meteorological forcing time series. The resulting ensemble of derived IHCs exhibits a broad range of possible soil moisture and snow water equivalent (SWE) states. To select the IHCs that are most consistent with the observations, we employ a particle filter (PF) that weights IHC ensemble members based on observations of streamflow and SWE. These particles are then used to initialize ensemble precipitation and temperature forecasts downscaled from the Global Ensemble Forecast System (GEFS), generating a streamflow forecast ensemble. We test this method in two basins in the Pacific Northwest that are important for water resources management: 1) the Green River upstream of Howard Hanson Dam, and 2) the South Fork Flathead River upstream of Hungry Horse Dam. The first of these is characterized by mixed snow and rain, while the second is snow-dominated. The PF-based forecasts are compared to forecasts based on a single IHC (corresponding to median streamflow) paired with the full GEFS ensemble, and 2) the full IHC ensemble, without filtering, paired with the full GEFS ensemble. In addition to assessing improvements in the spread of IHCs, we perform a hindcast experiment to evaluate the utility of PF-based data assimilation on streamflow forecasts at 1- to 7-day lead times.
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S
2017-10-01
The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.
Ensemble Classifier Strategy Based on Transient Feature Fusion in Electronic Nose
NASA Astrophysics Data System (ADS)
Bagheri, Mohammad Ali; Montazer, Gholam Ali
2011-09-01
In this paper, we test the performance of several ensembles of classifiers and each base learner has been trained on different types of extracted features. Experimental results show the potential benefits introduced by the usage of simple ensemble classification systems for the integration of different types of transient features.
NASA Astrophysics Data System (ADS)
Huang, Danqing; Yan, Peiwen; Zhu, Jian; Zhang, Yaocun; Kuang, Xueyuan; Cheng, Jing
2018-04-01
The uncertainty of global summer precipitation simulated by the 23 CMIP5 CGCMs and the possible impacts of model resolutions are investigated in this study. Large uncertainties exist over the tropical and subtropical regions, which can be mainly attributed to convective precipitation simulation. High-resolution models (HRMs) and low-resolution models (LRMs) are further investigated to demonstrate their different contributions to the uncertainties of the ensemble mean. It shows that the high-resolution model ensemble means (HMME) and low-resolution model ensemble mean (LMME) mitigate the biases between the MME and observation over most continents and oceans, respectively. The HMME simulates more precipitation than the LMME over most oceans, but less precipitation over some continents. The dominant precipitation category in the HRMs (LRMs) is the heavy precipitation (moderate precipitation) over the tropic regions. The combinations of convective and stratiform precipitation are also quite different: the HMME has much higher ratio of stratiform precipitation while the LMME has more convective precipitation. Finally, differences in precipitation between the HMME and LMME can be traced to their differences in the SST simulations via the local and remote air-sea interaction.
NASA Astrophysics Data System (ADS)
Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph
2018-03-01
Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.
Shallow cumuli ensemble statistics for development of a stochastic parameterization
NASA Astrophysics Data System (ADS)
Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs
2014-05-01
According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a Poisson distribution, and cloud properties sub-sampled from a generalized ensemble distribution. We study the role of the different cloud subtypes in a shallow convective ensemble and how the diverse cloud properties and cloud lifetimes affect the system macro-state. To what extent does the cloud-base mass flux distribution deviate from the simple Boltzmann distribution and how does it affect the results from the stochastic model? Is the memory, provided by the finite lifetime of individual clouds, of importance for the ensemble statistics? We also test for the minimal information given as an input to the stochastic model, able to reproduce the ensemble mean statistics and the variability in a convective ensemble. An important property of the resulting distribution of the sub-grid convective states is its scale-adaptivity - the smaller the grid-size, the broader the compound distribution of the sub-grid states.
Predicting Power Outages Using Multi-Model Ensemble Forecasts
NASA Astrophysics Data System (ADS)
Cerrai, D.; Anagnostou, E. N.; Yang, J.; Astitha, M.
2017-12-01
Power outages affect every year millions of people in the United States, affecting the economy and conditioning the everyday life. An Outage Prediction Model (OPM) has been developed at the University of Connecticut for helping utilities to quickly restore outages and to limit their adverse consequences on the population. The OPM, operational since 2015, combines several non-parametric machine learning (ML) models that use historical weather storm simulations and high-resolution weather forecasts, satellite remote sensing data, and infrastructure and land cover data to predict the number and spatial distribution of power outages. A new methodology, developed for improving the outage model performances by combining weather- and soil-related variables using three different weather models (WRF 3.7, WRF 3.8 and RAMS/ICLAMS), will be presented in this study. First, we will present a performance evaluation of each model variable, by comparing historical weather analyses with station data or reanalysis over the entire storm data set. Hence, each variable of the new outage model version is extracted from the best performing weather model for that variable, and sensitivity tests are performed for investigating the most efficient variable combination for outage prediction purposes. Despite that the final variables combination is extracted from different weather models, this ensemble based on multi-weather forcing and multi-statistical model power outage prediction outperforms the currently operational OPM version that is based on a single weather forcing variable (WRF 3.7), because each model component is the closest to the actual atmospheric state.
Abuassba, Adnan O M; Zhang, Dezheng; Luo, Xiong; Shaheryar, Ahmad; Ali, Hazrat
2017-01-01
Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L 2 -norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets.
Abuassba, Adnan O. M.; Ali, Hazrat
2017-01-01
Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets. PMID:28546808
Amozegar, M; Khorasani, K
2016-04-01
In this paper, a new approach for Fault Detection and Isolation (FDI) of gas turbine engines is proposed by developing an ensemble of dynamic neural network identifiers. For health monitoring of the gas turbine engine, its dynamics is first identified by constructing three separate or individual dynamic neural network architectures. Specifically, a dynamic multi-layer perceptron (MLP), a dynamic radial-basis function (RBF) neural network, and a dynamic support vector machine (SVM) are trained to individually identify and represent the gas turbine engine dynamics. Next, three ensemble-based techniques are developed to represent the gas turbine engine dynamics, namely, two heterogeneous ensemble models and one homogeneous ensemble model. It is first shown that all ensemble approaches do significantly improve the overall performance and accuracy of the developed system identification scheme when compared to each of the stand-alone solutions. The best selected stand-alone model (i.e., the dynamic RBF network) and the best selected ensemble architecture (i.e., the heterogeneous ensemble) in terms of their performances in achieving an accurate system identification are then selected for solving the FDI task. The required residual signals are generated by using both a single model-based solution and an ensemble-based solution under various gas turbine engine health conditions. Our extensive simulation studies demonstrate that the fault detection and isolation task achieved by using the residuals that are obtained from the dynamic ensemble scheme results in a significantly more accurate and reliable performance as illustrated through detailed quantitative confusion matrix analysis and comparative studies. Copyright © 2016 Elsevier Ltd. All rights reserved.
EMPIRE and pyenda: Two ensemble-based data assimilation systems written in Fortran and Python
NASA Astrophysics Data System (ADS)
Geppert, Gernot; Browne, Phil; van Leeuwen, Peter Jan; Merker, Claire
2017-04-01
We present and compare the features of two ensemble-based data assimilation frameworks, EMPIRE and pyenda. Both frameworks allow to couple models to the assimilation codes using the Message Passing Interface (MPI), leading to extremely efficient and fast coupling between models and the data-assimilation codes. The Fortran-based system EMPIRE (Employing Message Passing Interface for Researching Ensembles) is optimized for parallel, high-performance computing. It currently includes a suite of data assimilation algorithms including variants of the ensemble Kalman and several the particle filters. EMPIRE is targeted at models of all kinds of complexity and has been coupled to several geoscience models, eg. the Lorenz-63 model, a barotropic vorticity model, the general circulation model HadCM3, the ocean model NEMO, and the land-surface model JULES. The Python-based system pyenda (Python Ensemble Data Assimilation) allows Fortran- and Python-based models to be used for data assimilation. Models can be coupled either using MPI or by using a Python interface. Using Python allows quick prototyping and pyenda is aimed at small to medium scale models. pyenda currently includes variants of the ensemble Kalman filter and has been coupled to the Lorenz-63 model, an advection-based precipitation nowcasting scheme, and the dynamic global vegetation model JSBACH.
Determination of the conformational ensemble of the TAR RNA by X-ray scattering interferometry
Walker, Peter
2017-01-01
Abstract The conformational ensembles of structured RNA's are crucial for biological function, but they remain difficult to elucidate experimentally. We demonstrate with HIV-1 TAR RNA that X-ray scattering interferometry (XSI) can be used to determine RNA conformational ensembles. X-ray scattering interferometry (XSI) is based on site-specifically labeling RNA with pairs of heavy atom probes, and precisely measuring the distribution of inter-probe distances that arise from a heterogeneous mixture of RNA solution structures. We show that the XSI-based model of the TAR RNA ensemble closely resembles an independent model derived from NMR-RDC data. Further, we show how the TAR RNA ensemble changes shape at different salt concentrations. Finally, we demonstrate that a single hybrid model of the TAR RNA ensemble simultaneously fits both the XSI and NMR-RDC data set and show that XSI can be combined with NMR-RDC to further improve the quality of the determined ensemble. The results suggest that XSI-RNA will be a powerful approach for characterizing the solution conformational ensembles of RNAs and RNA-protein complexes under diverse solution conditions. PMID:28108663
NASA Astrophysics Data System (ADS)
Kim, D.; Lee, H.; Yu, H.; Beighley, E.; Durand, M. T.; Alsdorf, D. E.; Hwang, E.
2017-12-01
River discharge is a prerequisite for an understanding of flood hazard and water resource management, yet we have poor knowledge of it, especially over remote basins. Previous studies have successfully used a classic hydraulic geometry, at-many-stations hydraulic geometry (AMHG), and Manning's equation to estimate the river discharge. Theoretical bases of these empirical methods were introduced by Leopold and Maddock (1953) and Manning (1889), and those have been long used in the field of hydrology, water resources, and geomorphology. However, the methods to estimate the river discharge from remotely sensed data essentially require bathymetric information of the river or are not applicable to braided rivers. Furthermore, the methods used in the previous studies adopted assumptions of river conditions to be steady and uniform. Consequently, those methods have limitations in estimating the river discharge in complex and unsteady flow in nature. In this study, we developed a novel approach to estimating river discharges by applying the weak learner method (here termed WLQ), which is one of the ensemble methods using multiple classifiers, to the remotely sensed measurements of water levels from Envisat altimetry, effective river widths from PALSAR images, and multi-temporal surface water slopes over a part of the mainstem Congo. Compared with the methods used in the previous studies, the root mean square error (RMSE) decreased from 5,089 m3s-1 to 3,701 m3s-1, and the relative RMSE (RRMSE) improved from 12% to 8%. It is expected that our method can provide improved estimates of river discharges in complex and unsteady flow conditions based on the data-driven prediction model by machine learning (i.e. WLQ), even when the bathymetric data is not available or in case of the braided rivers. Moreover, it is also expected that the WLQ can be applied to the measurements of river levels, slopes and widths from the future Surface Water Ocean Topography (SWOT) mission to be launched in 2021.
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng
2017-05-18
Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).
Mapping of Coral Reef Environment in the Arabian Gulf Using Multispectral Remote Sensing
NASA Astrophysics Data System (ADS)
Ben-Romdhane, H.; Marpu, P. R.; Ghedira, H.; Ouarda, T. B. M. J.
2016-06-01
Coral reefs of the Arabian Gulf are subject to several pressures, thus requiring conservation actions. Well-designed conservation plans involve efficient mapping and monitoring systems. Satellite remote sensing is a cost-effective tool for seafloor mapping at large scales. Multispectral remote sensing of coastal habitats, like those of the Arabian Gulf, presents a special challenge due to their complexity and heterogeneity. The present study evaluates the potential of multispectral sensor DubaiSat-2 in mapping benthic communities of United Arab Emirates. We propose to use a spectral-spatial method that includes multilevel segmentation, nonlinear feature analysis and ensemble learning methods. Support Vector Machine (SVM) is used for comparison of classification performances. Comparative data were derived from the habitat maps published by the Environment Agency-Abu Dhabi. The spectral-spatial method produced 96.41% mapping accuracy. SVM classification is assessed to be 94.17% accurate. The adaptation of these methods can help achieving well-designed coastal management plans in the region.
NASA Astrophysics Data System (ADS)
Zhongqin, G.; Chen, Y.
2017-12-01
Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.
Overlapped Partitioning for Ensemble Classifiers of P300-Based Brain-Computer Interfaces
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance. PMID:24695550
Overlapped partitioning for ensemble classifiers of P300-based brain-computer interfaces.
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.
EnsembleGraph: Interactive Visual Analysis of Spatial-Temporal Behavior for Ensemble Simulation Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Qingya; Guo, Hanqi; Che, Limei
We present a novel visualization framework—EnsembleGraph— for analyzing ensemble simulation data, in order to help scientists understand behavior similarities between ensemble members over space and time. A graph-based representation is used to visualize individual spatiotemporal regions with similar behaviors, which are extracted by hierarchical clustering algorithms. A user interface with multiple-linked views is provided, which enables users to explore, locate, and compare regions that have similar behaviors between and then users can investigate and analyze the selected regions in detail. The driving application of this paper is the studies on regional emission influences over tropospheric ozone, which is based onmore » ensemble simulations conducted with different anthropogenic emission absences using the MOZART-4 (model of ozone and related tracers, version 4) model. We demonstrate the effectiveness of our method by visualizing the MOZART-4 ensemble simulation data and evaluating the relative regional emission influences on tropospheric ozone concentrations. Positive feedbacks from domain experts and two case studies prove efficiency of our method.« less
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Liechti, K.; Panziera, L.; Germann, U.; Zappa, M.
2013-10-01
This study explores the limits of radar-based forecasting for hydrological runoff prediction. Two novel radar-based ensemble forecasting chains for flash-flood early warning are investigated in three catchments in the southern Swiss Alps and set in relation to deterministic discharge forecasts for the same catchments. The first radar-based ensemble forecasting chain is driven by NORA (Nowcasting of Orographic Rainfall by means of Analogues), an analogue-based heuristic nowcasting system to predict orographic rainfall for the following eight hours. The second ensemble forecasting system evaluated is REAL-C2, where the numerical weather prediction COSMO-2 is initialised with 25 different initial conditions derived from a four-day nowcast with the radar ensemble REAL. Additionally, three deterministic forecasting chains were analysed. The performance of these five flash-flood forecasting systems was analysed for 1389 h between June 2007 and December 2010 for which NORA forecasts were issued, due to the presence of orographic forcing. A clear preference was found for the ensemble approach. Discharge forecasts perform better when forced by NORA and REAL-C2 rather then by deterministic weather radar data. Moreover, it was observed that using an ensemble of initial conditions at the forecast initialisation, as in REAL-C2, significantly improved the forecast skill. These forecasts also perform better then forecasts forced by ensemble rainfall forecasts (NORA) initialised form a single initial condition of the hydrological model. Thus the best results were obtained with the REAL-C2 forecasting chain. However, for regions where REAL cannot be produced, NORA might be an option for forecasting events triggered by orographic precipitation.
NASA Astrophysics Data System (ADS)
Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris
2017-04-01
Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.
Creating "Intelligent" Ensemble Averages Using a Process-Based Framework
NASA Astrophysics Data System (ADS)
Baker, Noel; Taylor, Patrick
2014-05-01
The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is used to add value to individual model projections and construct a consensus projection. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, individual models reproduce certain climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. The intention is to produce improved ("intelligent") unequal-weight ensemble averages. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Several climate process metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument in combination with surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing the equal-weighted ensemble average and an ensemble weighted using the process-based metric. Additionally, this study investigates the dependence of the metric weighting scheme on the climate state using a combination of model simulations including a non-forced preindustrial control experiment, historical simulations, and several radiative forcing Representative Concentration Pathway (RCP) scenarios. Ultimately, the goal of the framework is to advise better methods for ensemble averaging models and create better climate predictions.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
NASA Astrophysics Data System (ADS)
Edouard, Simon; Vincendon, Béatrice; Ducrocq, Véronique
2018-05-01
Intense precipitation events in the Mediterranean often lead to devastating flash floods (FF). FF modelling is affected by several kinds of uncertainties and Hydrological Ensemble Prediction Systems (HEPS) are designed to take those uncertainties into account. The major source of uncertainty comes from rainfall forcing and convective-scale meteorological ensemble prediction systems can manage it for forecasting purpose. But other sources are related to the hydrological modelling part of the HEPS. This study focuses on the uncertainties arising from the hydrological model parameters and initial soil moisture with aim to design an ensemble-based version of an hydrological model dedicated to Mediterranean fast responding rivers simulations, the ISBA-TOP coupled system. The first step consists in identifying the parameters that have the strongest influence on FF simulations by assuming perfect precipitation. A sensitivity study is carried out first using a synthetic framework and then for several real events and several catchments. Perturbation methods varying the most sensitive parameters as well as initial soil moisture allow designing an ensemble-based version of ISBA-TOP. The first results of this system on some real events are presented. The direct perspective of this work will be to drive this ensemble-based version with the members of a convective-scale meteorological ensemble prediction system to design a complete HEPS for FF forecasting.
Skill of Ensemble Seasonal Probability Forecasts
NASA Astrophysics Data System (ADS)
Smith, Leonard A.; Binter, Roman; Du, Hailiang; Niehoerster, Falk
2010-05-01
In operational forecasting, the computational complexity of large simulation models is, ideally, justified by enhanced performance over simpler models. We will consider probability forecasts and contrast the skill of ENSEMBLES-based seasonal probability forecasts of interest to the finance sector (specifically temperature forecasts for Nino 3.4 and the Atlantic Main Development Region (MDR)). The ENSEMBLES model simulations will be contrasted against forecasts from statistical models based on the observations (climatological distributions) and empirical dynamics based on the observations but conditioned on the current state (dynamical climatology). For some start dates, individual ENSEMBLES models yield significant skill even at a lead-time of 14 months. The nature of this skill is discussed, and chances of application are noted. Questions surrounding the interpretation of probability forecasts based on these multi-model ensemble simulations are then considered; the distributions considered are formed by kernel dressing the ensemble and blending with the climatology. The sources of apparent (RMS) skill in distributions based on multi-model simulations is discussed, and it is demonstrated that the inclusion of "zero-skill" models in the long range can improve Root-Mean-Square-Error scores, casting some doubt on the common justification for the claim that all models should be included in forming an operational probability forecast. It is argued that the rational response varies with lead time.
NASA Astrophysics Data System (ADS)
Hori, Y.; Cheng, V. Y. S.; Gough, W. A.
2017-12-01
A network of winter roads in northern Canada connects a number of remote First Nations communities to all-season roads and rails. The extent of the winter road networks depends on the geographic features, socio-economic activities, and the numbers of remote First Nations so that it differs among the provinces. The most extensive winter road networks below the 60th parallel south are located in Ontario and Manitoba, serving 32 and 18 communities respectively. In recent years, a warmer climate has resulted in a shorter winter road season and an increase in unreliable road conditions; thus, limiting access among remote communities. This study focused on examining the future freezing degree-days (FDDs) accumulations during the winter road season at selected locations throughout Ontario's Far North and northern Manitoba using recent climate model projections from the multi-model ensembles of General Circulation Models (GCMs) under the Representative Concentration Pathway (RCP) scenarios. First, the non-parametric Mann-Kendall correlation test and the Theil-Sen method were used to identify any statistically significant trends between FDDs and time for the base period (1981-2010). Second, future climate scenarios are developed for the study areas using statistical downscaling methods. This study also examined the lowest threshold of FDDs during the winter road construction in a future period. Our previous study established the lowest threshold of 380 FDDs, which derived from the relationship between the FDDs and the opening dates of James Bay Winter Road near the Hudson-James Bay coast. Thus, this study applied the threshold measure as a conservative estimate of the minimum threshold of FDDs to examine the effects of climate change on the winter road construction period.
Hydro and morphodynamic simulations for probabilistic estimates of munitions mobility
NASA Astrophysics Data System (ADS)
Palmsten, M.; Penko, A.
2017-12-01
Probabilistic estimates of waves, currents, and sediment transport at underwater munitions remediation sites are necessary to constrain probabilistic predictions of munitions exposure, burial, and migration. To address this need, we produced ensemble simulations of hydrodynamic flow and morphologic change with Delft3D, a coupled system of wave, circulation, and sediment transport models. We have set up the Delft3D model simulations at the Army Corps of Engineers Field Research Facility (FRF) in Duck, NC, USA. The FRF is the prototype site for the near-field munitions mobility model, which integrates far-field and near-field field munitions mobility simulations. An extensive array of in-situ and remotely sensed oceanographic, bathymetric, and meteorological data are available at the FRF, as well as existing observations of munitions mobility for model testing. Here, we present results of ensemble Delft3D hydro- and morphodynamic simulations at Duck. A nested Delft3D simulation runs an outer grid that extends 12-km in the along-shore and 3.7-km in the cross-shore with 50-m resolution and a maximum depth of approximately 17-m. The inner nested grid extends 3.2-km in the along-shore and 1.2-km in the cross-shore with 5-m resolution and a maximum depth of approximately 11-m. The inner nested grid initial model bathymetry is defined as the most recent survey or remotely sensed estimate of water depth. Delft3D-WAVE and FLOW is driven with spectral wave measurements from a Waverider buoy in 17-m depth located on the offshore boundary of the outer grid. The spectral wave output and the water levels from the outer grid are used to define the boundary conditions for the inner nested high-resolution grid, in which the coupled Delft3D WAVE-FLOW-MORPHOLOGY model is run. The ensemble results are compared to the wave, current, and bathymetry observations collected at the FRF.
Monte Carlo replica-exchange based ensemble docking of protein conformations.
Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin
2017-05-01
A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Durner, Maximilian; Márton, Zoltán.; Hillenbrand, Ulrich; Ali, Haider; Kleinsteuber, Martin
2017-03-01
In this work, a new ensemble method for the task of category recognition in different environments is presented. The focus is on service robotic perception in an open environment, where the robot's task is to recognize previously unseen objects of predefined categories, based on training on a public dataset. We propose an ensemble learning approach to be able to flexibly combine complementary sources of information (different state-of-the-art descriptors computed on color and depth images), based on a Markov Random Field (MRF). By exploiting its specific characteristics, the MRF ensemble method can also be executed as a Dynamic Classifier Selection (DCS) system. In the experiments, the committee- and topology-dependent performance boost of our ensemble is shown. Despite reduced computational costs and using less information, our strategy performs on the same level as common ensemble approaches. Finally, the impact of large differences between datasets is analyzed.
A Novel Scoring Metrics for Quality Assurance of Ocean Color Observations
NASA Astrophysics Data System (ADS)
Wei, J.; Lee, Z.
2016-02-01
Interpretation of the ocean bio-optical properties from ocean color observations depends on the quality of the ocean color data, specifically the spectrum of remote sensing reflectance (Rrs). The in situ and remotely measured Rrs spectra are inevitably subject to errors induced by instrument calibration, sea-surface correction and atmospheric correction, and other environmental factors. Great efforts have been devoted to the ocean color calibration and validation. Yet, there exist no objective and consensus criteria for assessment of the ocean color data quality. In this study, the gap is filled by developing a novel metrics for such data quality assurance and quality control (QA/QC). This new QA metrics is not intended to discard "suspicious" Rrs spectra from available datasets. Rather, it takes into account the Rrs spectral shapes and amplitudes as a whole and grades each Rrs spectrum. This scoring system is developed based on a large ensemble of in situ hyperspectral remote sensing reflectance data measured from various aquatic environments and processed with robust procedures. This system is further tested with the NASA bio-Optical Marine Algorithm Data set (NOMAD), with results indicating significant improvements in the estimation of bio-optical properties when Rrs spectra marked with higher quality assurance are used. This scoring system is further verified with simulated data and satellite ocean color data in various regions, and we envision higher quality ocean color products with the implementation of such a quality screening system.
NASA Astrophysics Data System (ADS)
Chellasamy, Menaka; Ferré, Ty Paul Andrew; Greve, Mogens Humlekrog
2016-07-01
Beginning in 2015, Danish farmers are obliged to meet specific crop diversification rules based on total land area and number of crops cultivated to be eligible for new greening subsidies. Hence, there is a need for the Danish government to extend their subsidy control system to verify farmers' declarations to warrant greening payments under the new crop diversification rules. Remote Sensing (RS) technology has been used since 1992 to control farmers' subsidies in Denmark. However, a proper RS-based approach is yet to be finalised to validate new crop diversity requirements designed for assessing compliance under the recent subsidy scheme (2014-2020); This study uses an ensemble classification approach (proposed by the authors in previous studies) for validating the crop diversity requirements of the new rules. The approach uses a neural network ensemble classification system with bi-temporal (spring and early summer) WorldView-2 imagery (WV2) and includes the following steps: (1) automatic computation of pixel-based prediction probabilities using multiple neural networks; (2) quantification of the classification uncertainty using Endorsement Theory (ET); (3) discrimination of crop pixels and validation of the crop diversification rules at farm level; and (4) identification of farmers who are violating the requirements for greening subsidies. The prediction probabilities are computed by a neural network ensemble supplied with training samples selected automatically using farmers declared parcels (field vectors containing crop information and the field boundary of each crop). Crop discrimination is performed by considering a set of conclusions derived from individual neural networks based on ET. Verification of the diversification rules is performed by incorporating pixel-based classification uncertainty or confidence intervals with the class labels at the farmer level. The proposed approach was tested with WV2 imagery acquired in 2011 for a study area in Vennebjerg, Denmark, containing 132 farmers, 1258 fields, and 18 crops. The classification results obtained show an overall accuracy of 90.2%. The RS-based results suggest that 36 farmers did not follow the crop diversification rules that would qualify for the greening subsidies. When compared to the farmers' reported crop mixes, irrespective of the rule, the RS results indicate that false crop declarations were made by 8 farmers, covering 15 fields. If the farmers' reports had been submitted for the new greening subsidies, 3 farmers would have made a false claim; while remaining 5 farmers obey the rules of required crop proportion even though they have submitted the false crop code due to their small holding size. The RS results would have supported 96 farmers for greening subsidy claims, with no instances of suggesting a greening subsidy for a holding that the farmer did not report as meeting the required conditions. These results suggest that the proposed RS based method shows great promise for validating the new greening subsidies in Denmark.
NASA Technical Reports Server (NTRS)
Tsay, Si-Chee; Maring, Hal B.; Lin, Neng-Huei; Buntoung, Sumaman; Chantara, Somporn; Chuang, Hsiao-Chi; Gabriel, Philip M.; Goodloe, Colby S.; Holben, Brent N.; Hsiao, Ta-Chih;
2016-01-01
The objectives of 7-SEASBASELInE (Seven SouthEast Asian Studies Biomass-burning Aerosols and Stratocumulus Environment: Lifecycles and Interactions Experiment) campaigns in spring 2013-2015 were to synergize measurements from uniquely distributed ground-based networks (e.g., AERONET (AErosol RObotic NETwork)), MPLNET ( NASA Micro-Pulse Lidar Network)) and sophisticated platforms (e.g.,SMARTLabs (Surface-based Mobile Atmospheric Research and Testbed Laboratories), regional contributing instruments), along with satellite observations retrievals and regional atmospheric transport chemical models to establish a critically needed database, and to advance our understanding of biomass-burning aerosols and trace gases in Southeast Asia (SEA). We present a satellite-surface perspective of 7-SEASBASELInE and highlight scientific findings concerning: (1) regional meteorology of moisture fields conducive to the production and maintenance of low-level stratiform clouds over land; (2) atmospheric composition in a biomass-burning environment, particularly tracers-markers to serve as important indicators for assessing the state and evolution of atmospheric constituents; (3) applications of remote sensing to air quality and impact on radiative energetics, examining the effect of diurnal variability of boundary-layer height on aerosol loading; (4) aerosol hygroscopicity and ground-based cloud radar measurements in aerosol-cloud processes by advanced cloud ensemble models; and (5) implications of air quality, in terms of toxicity of nanoparticles and trace gases, to human health. This volume is the third 7-SEAS special issue (after Atmospheric Research, vol. 122, 2013; and Atmospheric Environment, vol. 78, 2013) and includes 27 papers published, with emphasis on air quality and aerosol-cloud effects on the environment. BASELInE observations of stratiform clouds over SEA are unique, such clouds are embedded in a heavy aerosol-laden environment and feature characteristically greater stability over land than over ocean, with minimal radar surface clutter at a high vertical spatial resolution. To facilitate an improved understanding of regional aerosol-cloud effects, we envision that future BASELInE-like measurement modeling needs fall into two categories: (1) efficient yet critical in-situ profiling of the boundary layer for validating remote-sensing retrievals and for initializing regional transport chemical and cloud ensemble models; and (2) fully utilizing the high observing frequencies of geostationary satellites for resolving the diurnal cycle of the boundary layerheight as it affects the loading of biomass-burning aerosols, air quality and radiative energetics.
An efficient ensemble learning method for gene microarray classification.
Osareh, Alireza; Shadgar, Bita
2013-01-01
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
Determination of the conformational ensemble of the TAR RNA by X-ray scattering interferometry.
Shi, Xuesong; Walker, Peter; Harbury, Pehr B; Herschlag, Daniel
2017-05-05
The conformational ensembles of structured RNA's are crucial for biological function, but they remain difficult to elucidate experimentally. We demonstrate with HIV-1 TAR RNA that X-ray scattering interferometry (XSI) can be used to determine RNA conformational ensembles. X-ray scattering interferometry (XSI) is based on site-specifically labeling RNA with pairs of heavy atom probes, and precisely measuring the distribution of inter-probe distances that arise from a heterogeneous mixture of RNA solution structures. We show that the XSI-based model of the TAR RNA ensemble closely resembles an independent model derived from NMR-RDC data. Further, we show how the TAR RNA ensemble changes shape at different salt concentrations. Finally, we demonstrate that a single hybrid model of the TAR RNA ensemble simultaneously fits both the XSI and NMR-RDC data set and show that XSI can be combined with NMR-RDC to further improve the quality of the determined ensemble. The results suggest that XSI-RNA will be a powerful approach for characterizing the solution conformational ensembles of RNAs and RNA-protein complexes under diverse solution conditions. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Creating "Intelligent" Climate Model Ensemble Averages Using a Process-Based Framework
NASA Astrophysics Data System (ADS)
Baker, N. C.; Taylor, P. C.
2014-12-01
The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is often used to add value to model projections: consensus projections have been shown to consistently outperform individual models. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, certain models reproduce climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument and surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing weighted and unweighted model ensembles. For example, one tested metric weights the ensemble by how well models reproduce the time-series probability distribution of the cloud forcing component of reflected shortwave radiation. The weighted ensemble for this metric indicates lower simulated precipitation (up to .7 mm/day) in tropical regions than the unweighted ensemble: since CMIP5 models have been shown to overproduce precipitation, this result could indicate that the metric is effective in identifying models which simulate more realistic precipitation. Ultimately, the goal of the framework is to identify performance metrics for advising better methods for ensemble averaging models and create better climate predictions.
Integrating remotely sensed surface water extent into continental scale hydrology.
Revilla-Romero, Beatriz; Wanders, Niko; Burek, Peter; Salamon, Peter; de Roo, Ad
2016-12-01
In hydrological forecasting, data assimilation techniques are employed to improve estimates of initial conditions to update incorrect model states with observational data. However, the limited availability of continuous and up-to-date ground streamflow data is one of the main constraints for large-scale flood forecasting models. This is the first study that assess the impact of assimilating daily remotely sensed surface water extent at a 0.1° × 0.1° spatial resolution derived from the Global Flood Detection System (GFDS) into a global rainfall-runoff including large ungauged areas at the continental spatial scale in Africa and South America. Surface water extent is observed using a range of passive microwave remote sensors. The methodology uses the brightness temperature as water bodies have a lower emissivity. In a time series, the satellite signal is expected to vary with changes in water surface, and anomalies can be correlated with flood events. The Ensemble Kalman Filter (EnKF) is a Monte-Carlo implementation of data assimilation and used here by applying random sampling perturbations to the precipitation inputs to account for uncertainty obtaining ensemble streamflow simulations from the LISFLOOD model. Results of the updated streamflow simulation are compared to baseline simulations, without assimilation of the satellite-derived surface water extent. Validation is done in over 100 in situ river gauges using daily streamflow observations in the African and South American continent over a one year period. Some of the more commonly used metrics in hydrology were calculated: KGE', NSE, PBIAS%, R 2 , RMSE, and VE. Results show that, for example, NSE score improved on 61 out of 101 stations obtaining significant improvements in both the timing and volume of the flow peaks. Whereas the validation at gauges located in lowland jungle obtained poorest performance mainly due to the closed forest influence on the satellite signal retrieval. The conclusion is that remotely sensed surface water extent holds potential for improving rainfall-runoff streamflow simulations, potentially leading to a better forecast of the peak flow.
NASA Astrophysics Data System (ADS)
Lahmiri, Salim; Boukadoum, Mounir
2015-08-01
We present a new ensemble system for stock market returns prediction where continuous wavelet transform (CWT) is used to analyze return series and backpropagation neural networks (BPNNs) for processing CWT-based coefficients, determining the optimal ensemble weights, and providing final forecasts. Particle swarm optimization (PSO) is used for finding optimal weights and biases for each BPNN. To capture symmetry/asymmetry in the underlying data, three wavelet functions with different shapes are adopted. The proposed ensemble system was tested on three Asian stock markets: The Hang Seng, KOSPI, and Taiwan stock market data. Three statistical metrics were used to evaluate the forecasting accuracy; including, mean of absolute errors (MAE), root mean of squared errors (RMSE), and mean of absolute deviations (MADs). Experimental results showed that our proposed ensemble system outperformed the individual CWT-ANN models each with different wavelet function. In addition, the proposed ensemble system outperformed the conventional autoregressive moving average process. As a result, the proposed ensemble system is suitable to capture symmetry/asymmetry in financial data fluctuations for better prediction accuracy.
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
The Oral Tradition in the Sankofa Drum and Dance Ensemble: Student Perceptions
ERIC Educational Resources Information Center
Hess, Juliet
2009-01-01
The Sankofa Drum and Dance Ensemble is a Ghanaian drum and dance ensemble that focusses on music in the Ewe tradition. It is based in an elementary school in the Greater Toronto Area and consists of students in Grade 4 through Grade 8. Students in the ensemble study Ghanaian traditional Ewe drumming and dancing in the oral tradition. Nine students…
Cerruela García, G; García-Pedrajas, N; Luque Ruiz, I; Gómez-Nieto, M Á
2018-03-01
This paper proposes a method for molecular activity prediction in QSAR studies using ensembles of classifiers constructed by means of two supervised subspace projection methods, namely nonparametric discriminant analysis (NDA) and hybrid discriminant analysis (HDA). We studied the performance of the proposed ensembles compared to classical ensemble methods using four molecular datasets and eight different models for the representation of the molecular structure. Using several measures and statistical tests for classifier comparison, we observe that our proposal improves the classification results with respect to classical ensemble methods. Therefore, we show that ensembles constructed using supervised subspace projections offer an effective way of creating classifiers in cheminformatics.
NASA Astrophysics Data System (ADS)
Tito Arandia Martinez, Fabian
2014-05-01
Adequate uncertainty assessment is an important issue in hydrological modelling. An important issue for hydropower producers is to obtain ensemble forecasts which truly grasp the uncertainty linked to upcoming streamflows. If properly assessed, this uncertainty can lead to optimal reservoir management and energy production (ex. [1]). The meteorological inputs to the hydrological model accounts for an important part of the total uncertainty in streamflow forecasting. Since the creation of the THORPEX initiative and the TIGGE database, access to meteorological ensemble forecasts from nine agencies throughout the world have been made available. This allows for hydrological ensemble forecasts based on multiple meteorological ensemble forecasts. Consequently, both the uncertainty linked to the architecture of the meteorological model and the uncertainty linked to the initial condition of the atmosphere can be accounted for. The main objective of this work is to show that a weighted combination of meteorological ensemble forecasts based on different atmospheric models can lead to improved hydrological ensemble forecasts, for horizons from one to ten days. This experiment is performed for the Baskatong watershed, a head subcatchment of the Gatineau watershed in the province of Quebec, in Canada. Baskatong watershed is of great importance for hydro-power production, as it comprises the main reservoir for the Gatineau watershed, on which there are six hydropower plants managed by Hydro-Québec. Since the 70's, they have been using pseudo ensemble forecast based on deterministic meteorological forecasts to which variability derived from past forecasting errors is added. We use a combination of meteorological ensemble forecasts from different models (precipitation and temperature) as the main inputs for hydrological model HSAMI ([2]). The meteorological ensembles from eight of the nine agencies available through TIGGE are weighted according to their individual performance and combined to form a grand ensemble. Results show that the hydrological forecasts derived from the grand ensemble perform better than the pseudo ensemble forecasts actually used operationally at Hydro-Québec. References: [1] M. Verbunt, A. Walser, J. Gurtz et al., "Probabilistic flood forecasting with a limited-area ensemble prediction system: Selected case studies," Journal of Hydrometeorology, vol. 8, no. 4, pp. 897-909, Aug, 2007. [2] N. Evora, Valorisation des prévisions météorologiques d'ensemble, Institu de recherceh d'Hydro-Québec 2005. [3] V. Fortin, Le modèle météo-apport HSAMI: historique, théorie et application, Institut de recherche d'Hydro-Québec, 2000.
NASA Astrophysics Data System (ADS)
Flores, Alejandro N.; Bras, Rafael L.; Entekhabi, Dara
2012-08-01
Soil moisture information is critical for applications like landslide susceptibility analysis and military trafficability assessment. Existing technologies cannot observe soil moisture at spatial scales of hillslopes (e.g., 100 to 102 m) and over large areas (e.g., 102 to 105 km2) with sufficiently high temporal coverage (e.g., days). Physics-based hydrologic models can simulate soil moisture at the necessary spatial and temporal scales, albeit with error. We develop and test a data assimilation framework based on the ensemble Kalman filter for constraining uncertain simulated high-resolution soil moisture fields to anticipated remote sensing products, specifically NASA's Soil Moisture Active-Passive (SMAP) mission, which will provide global L band microwave observation approximately every 2-3 days. The framework directly assimilates SMAP synthetic 3 km radar backscatter observations to update hillslope-scale bare soil moisture estimates from a physics-based model. Downscaling from 3 km observations to hillslope scales is achieved through the data assimilation algorithm. Assimilation reduces bias in near-surface soil moisture (e.g., top 10 cm) by approximately 0.05 m3/m3and expected root-mean-square errors by at least 60% in much of the watershed, relative to an open loop simulation. However, near-surface moisture estimates in channel and valley bottoms do not improve, and estimates of profile-integrated moisture throughout the watershed do not substantially improve. We discuss the implications of this work, focusing on ongoing efforts to improve soil moisture estimation in the entire soil profile through joint assimilation of other satellite (e.g., vegetation) and in situ soil moisture measurements.
NASA Technical Reports Server (NTRS)
Olson, William S.; Raymond, William H.
1990-01-01
The physical retrieval of geophysical parameters based upon remotely sensed data requires a sensor response model which relates the upwelling radiances that the sensor observes to the parameters to be retrieved. In the retrieval of precipitation water contents from satellite passive microwave observations, the sensor response model has two basic components. First, a description of the radiative transfer of microwaves through a precipitating atmosphere must be considered, because it is necessary to establish the physical relationship between precipitation water content and upwelling microwave brightness temperature. Also the spatial response of the satellite microwave sensor (or antenna pattern) must be included in the description of sensor response, since precipitation and the associated brightness temperature field can vary over a typical microwave sensor resolution footprint. A 'population' of convective cells, as well as stratiform clouds, are simulated using a computationally-efficient multi-cylinder cloud model. Ensembles of clouds selected at random from the population, distributed over a 25 km x 25 km model domain, serve as the basis for radiative transfer calculations of upwelling brightness temperatures at the SSM/I frequencies. Sensor spatial response is treated explicitly by convolving the upwelling brightness temperature by the domain-integrated SSM/I antenna patterns. The sensor response model is utilized in precipitation water content retrievals.
Orchestrating Distributed Resource Ensembles for Petascale Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baldin, Ilya; Mandal, Anirban; Ruth, Paul
2014-04-24
Distributed, data-intensive computational science applications of interest to DOE scientific com- munities move large amounts of data for experiment data management, distributed analysis steps, remote visualization, and accessing scientific instruments. These applications need to orchestrate ensembles of resources from multiple resource pools and interconnect them with high-capacity multi- layered networks across multiple domains. It is highly desirable that mechanisms are designed that provide this type of resource provisioning capability to a broad class of applications. It is also important to have coherent monitoring capabilities for such complex distributed environments. In this project, we addressed these problems by designing an abstractmore » API, enabled by novel semantic resource descriptions, for provisioning complex and heterogeneous resources from multiple providers using their native provisioning mechanisms and control planes: computational, storage, and multi-layered high-speed network domains. We used an extensible resource representation based on semantic web technologies to afford maximum flexibility to applications in specifying their needs. We evaluated the effectiveness of provisioning using representative data-intensive ap- plications. We also developed mechanisms for providing feedback about resource performance to the application, to enable closed-loop feedback control and dynamic adjustments to resource allo- cations (elasticity). This was enabled through development of a novel persistent query framework that consumes disparate sources of monitoring data, including perfSONAR, and provides scalable distribution of asynchronous notifications.« less
FLUXCOM - Overview and First Synthesis
NASA Astrophysics Data System (ADS)
Jung, M.; Ichii, K.; Tramontana, G.; Camps-Valls, G.; Schwalm, C. R.; Papale, D.; Reichstein, M.; Gans, F.; Weber, U.
2015-12-01
We present a community effort aiming at generating an ensemble of global gridded flux products by upscaling FLUXNET data using an array of different machine learning methods including regression/model tree ensembles, neural networks, and kernel machines. We produced products for gross primary production, terrestrial ecosystem respiration, net ecosystem exchange, latent heat, sensible heat, and net radiation for two experimental protocols: 1) at a high spatial and 8-daily temporal resolution (5 arc-minute) using only remote sensing based inputs for the MODIS era; 2) 30 year records of daily, 0.5 degree spatial resolution by incorporating meteorological driver data. Within each set-up, all machine learning methods were trained with the same input data for carbon and energy fluxes respectively. Sets of input driver variables were derived using an extensive formal variable selection exercise. The performance of the extrapolation capacities of the approaches is assessed with a fully internally consistent cross-validation. We perform cross-consistency checks of the gridded flux products with independent data streams from atmospheric inversions (NEE), sun-induced fluorescence (GPP), catchment water balances (LE, H), satellite products (Rn), and process-models. We analyze the uncertainties of the gridded flux products and for example provide a breakdown of the uncertainty of mean annual GPP originating from different machine learning methods, different climate input data sets, and different flux partitioning methods. The FLUXCOM archive will provide an unprecedented source of information for water, energy, and carbon cycle studies.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.
Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil
2016-10-01
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan
2018-05-01
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-01-01
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-11-14
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP
Staras, Kevin
2016-01-01
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture. PMID:27760125
Stanescu, Ana; Caragea, Doina
2015-01-01
Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.
2015-01-01
Background Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Results Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. Conclusions In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework. PMID:26356316
A Random Forest-based ensemble method for activity recognition.
Feng, Zengtao; Mo, Lingfei; Li, Meng
2015-01-01
This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
Application of an Ensemble Smoother to Precipitation Assimilation
NASA Technical Reports Server (NTRS)
Zhang, Sara; Zupanski, Dusanka; Hou, Arthur; Zupanski, Milija
2008-01-01
Assimilation of precipitation in a global modeling system poses a special challenge in that the observation operators for precipitation processes are highly nonlinear. In the variational approach, substantial development work and model simplifications are required to include precipitation-related physical processes in the tangent linear model and its adjoint. An ensemble based data assimilation algorithm "Maximum Likelihood Ensemble Smoother (MLES)" has been developed to explore the ensemble representation of the precipitation observation operator with nonlinear convection and large-scale moist physics. An ensemble assimilation system based on the NASA GEOS-5 GCM has been constructed to assimilate satellite precipitation data within the MLES framework. The configuration of the smoother takes the time dimension into account for the relationship between state variables and observable rainfall. The full nonlinear forward model ensembles are used to represent components involving the observation operator and its transpose. Several assimilation experiments using satellite precipitation observations have been carried out to investigate the effectiveness of the ensemble representation of the nonlinear observation operator and the data impact of assimilating rain retrievals from the TMI and SSM/I sensors. Preliminary results show that this ensemble assimilation approach is capable of extracting information from nonlinear observations to improve the analysis and forecast if ensemble size is adequate, and a suitable localization scheme is applied. In addition to a dynamically consistent precipitation analysis, the assimilation system produces a statistical estimate of the analysis uncertainty.
Mixture EMOS model for calibrating ensemble forecasts of wind speed.
Baran, S; Lerch, S
2016-03-01
Ensemble model output statistics (EMOS) is a statistical tool for post-processing forecast ensembles of weather variables obtained from multiple runs of numerical weather prediction models in order to produce calibrated predictive probability density functions. The EMOS predictive probability density function is given by a parametric distribution with parameters depending on the ensemble forecasts. We propose an EMOS model for calibrating wind speed forecasts based on weighted mixtures of truncated normal (TN) and log-normal (LN) distributions where model parameters and component weights are estimated by optimizing the values of proper scoring rules over a rolling training period. The new model is tested on wind speed forecasts of the 50 member European Centre for Medium-range Weather Forecasts ensemble, the 11 member Aire Limitée Adaptation dynamique Développement International-Hungary Ensemble Prediction System ensemble of the Hungarian Meteorological Service, and the eight-member University of Washington mesoscale ensemble, and its predictive performance is compared with that of various benchmark EMOS models based on single parametric families and combinations thereof. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison with the raw ensemble and climatological forecasts. The mixture EMOS model significantly outperforms the TN and LN EMOS methods; moreover, it provides better calibrated forecasts than the TN-LN combination model and offers an increased flexibility while avoiding covariate selection problems. © 2016 The Authors Environmetrics Published by JohnWiley & Sons Ltd.
Algorithms that Defy the Gravity of Learning Curve
2017-04-28
three nearest neighbour-based anomaly detectors, i.e., an ensemble of nearest neigh- bours, a recent nearest neighbour-based ensemble method called iNNE...streams. Note that the change in sample size does not alter the geometrical data characteristics discussed here. 3.1 Experimental Methodology ...need to be answered. 3.6 Comparison with conventional ensemble methods Given the theoretical results, the third aim of this project (i.e., identify the
DOE Office of Scientific and Technical Information (OSTI.GOV)
Man, Jun; Zhang, Jiangjiang; Li, Weixuan
2016-10-01
The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees ofmore » freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.« less
Novel layered clustering-based approach for generating ensemble of classifiers.
Rahman, Ashfaqur; Verma, Brijesh
2011-05-01
This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.
Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi
2014-12-08
Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi
2014-01-01
Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350
Hall, S; Poller, B; Bailey, C; Gregory, S; Clark, R; Roberts, P; Tunbridge, A; Poran, V; Evans, C; Crook, B
2018-06-01
Variations currently exist across the UK in the choice of personal protective equipment (PPE) used by healthcare workers when caring for patients with suspected high-consequence infectious diseases (HCIDs). To test the protection afforded to healthcare workers by current PPE ensembles during assessment of a suspected HCID case, and to provide an evidence base to justify proposal of a unified PPE ensemble for healthcare workers across the UK. One 'basic level' (enhanced precautions) PPE ensemble and five 'suspected case' PPE ensembles were evaluated in volunteer trials using 'Violet'; an ultraviolet-fluorescence-based simulation exercise to visualize exposure/contamination events. Contamination was photographed and mapped. There were 147 post-simulation and 31 post-doffing contamination events, from a maximum of 980, when evaluating the basic level of PPE. Therefore, this PPE ensemble did not afford adequate protection, primarily due to direct contamination of exposed areas of the skin. For the five suspected case ensembles, 1584 post-simulation contamination events were recorded, from a maximum of 5110. Twelve post-doffing contamination events were also observed (face, two events; neck, one event; forearm, one event; lower legs, eight events). All suspected case PPE ensembles either had post-doffing contamination events or other significant disadvantages to their use. This identified the need to design a unified PPE ensemble and doffing procedure, incorporating the most protective PPE considered for each body area. This work has been presented to, and reviewed by, key stakeholders to decide on a proposed unified ensemble, subject to further evaluation. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
MIittman, David S
2011-01-01
Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.
Project FIRES. Volume 1: Program Overview and Summary, Phase 1B
NASA Technical Reports Server (NTRS)
Abeles, F. J.
1980-01-01
Overall performance requirements and evaluation methods for firefighters protective equipment were established and published as the Protective Ensemble Performance Standards (PEPS). Current firefighters protective equipment was tested and evaluated against the PEPS requirements, and the preliminary design of a prototype protective ensemble was performed. In phase 1B, the design of the prototype ensemble was finalized. Prototype ensembles were fabricated and then subjected to a series of qualification tests which were based upon the PEPS requirements. Engineering drawings and purchase specifications were prepared for the new protective ensemble.
NASA Astrophysics Data System (ADS)
Lee, H. S.; Liu, Y.; Ward, J.; Brown, J.; Maestre, A.; Herr, H.; Fresch, M. A.; Wells, E.; Reed, S. M.; Jones, E.
2017-12-01
The National Weather Service's (NWS) Office of Water Prediction (OWP) recently launched a nationwide effort to verify streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS) for a majority of forecast locations across the 13 River Forecast Centers (RFCs). Known as the HEFS Baseline Validation (BV), the project involves a joint effort between the OWP and the RFCs. It aims to provide a geographically consistent, statistically robust validation, and a benchmark to guide the operational implementation of the HEFS, inform practical applications, such as impact-based decision support services, and to provide an objective framework for evaluating strategic investments in the HEFS. For the BV, HEFS hindcasts are issued once per day on a 12Z cycle for the period of 1985-2015 with a forecast horizon of 30 days. For the first two weeks, the hindcasts are forced with precipitation and temperature ensemble forecasts from the Global Ensemble Forecast System of the National Centers for Environmental Prediction, and by resampled climatology for the remaining period. The HEFS-generated ensemble streamflow hindcasts are verified using the Ensemble Verification System. Skill is assessed relative to streamflow hindcasts generated from NWS' current operational system, namely climatology-based Ensemble Streamflow Prediction. In this presentation, we summarize the results and findings to date.
Jung, Kwan Ho; Lee, Keun-Hyeung
2015-09-15
A peptide-based ensemble for the detection of cyanide ions in 100% aqueous solutions was designed on the basis of the copper binding motif. 7-Nitro-2,1,3-benzoxadiazole-labeled tripeptide (NBD-SSH, NBD-SerSerHis) formed the ensemble with Cu(2+), leading to a change in the color of the solution from yellow to orange and a complete decrease of fluorescence emission. The ensemble (NBD-SSH-Cu(2+)) sensitively and selectively detected a low concentration of cyanide ions in 100% aqueous solutions by a colorimetric change as well as a fluorescent change. The addition of cyanide ions instantly removed Cu(2+) from the ensemble (NBD-SSH-Cu(2+)) in 100% aqueous solutions, resulting in a color change of the solution from orange to yellow and a "turn-on" fluorescent response. The detection limits for cyanide ions were lower than the maximum allowable level of cyanide ions in drinking water set by the World Health Organization. The peptide-based ensemble system is expected to be a potential and practical way for the detection of submicromolar concentrations of cyanide ions in 100% aqueous solutions.
NASA Astrophysics Data System (ADS)
Wang, C.; Luo, Z. J.; Chen, X.; Zeng, X.; Tao, W.; Huang, X.
2012-12-01
Cloud top temperature is a key parameter to retrieval in the remote sensing of convective clouds. Passive remote sensing cannot directly measure the temperature at the cloud tops. Here we explore a synergistic way of estimating cloud top temperature by making use of the simultaneous passive and active remote sensing of clouds (in this case, CloudSat and MODIS). Weighting function of the MODIS 11μm band is explicitly calculated by feeding cloud hydrometer profiles from CloudSat retrievals and temperature and humidity profiles based on ECMWF ERA-interim reanalysis into a radiation transfer model. Among 19,699 tropical deep convective clouds observed by the CloudSat in 2008, the averaged effective emission level (EEL, where the weighting function attains its maximum) is at optical depth 0.91 with a standard deviation of 0.33. Furthermore, the vertical gradient of CloudSat radar reflectivity, an indicator of the fuzziness of convective cloud top, is linearly proportional to, d_{CTH-EEL}, the distance between the EEL of 11μm channel and cloud top height (CTH) determined by the CloudSat when d_{CTH-EEL}<0.6km. Beyond 0.6km, the distance has little sensitivity to the vertical gradient of CloudSat radar reflectivity. Based on these findings, we derive a formula between the fuzziness in the cloud top region, which is measurable by CloudSat, and the MODIS 11μm brightness temperature assuming that the difference between effective emission temperature and the 11μm brightness temperature is proportional to the cloud top fuzziness. This formula is verified using the simulated deep convective cloud profiles by the Goddard Cumulus Ensemble model. We further discuss the application of this formula in estimating cloud top buoyancy as well as the error characteristics of the radiative calculation within such deep-convective clouds.
Bassen, David M; Vilkhovoy, Michael; Minot, Mason; Butcher, Jonathan T; Varner, Jeffrey D
2017-01-25
Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. Ensemble approaches address model uncertainty by using parameter or model families instead of single best-fit parameters or fixed model structures. Parameter ensembles can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Simulations using parameter ensembles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters. In this software note, we present a multiobjective based technique to estimate parameter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia programming language (JuPOETs). JuPOETs integrates simulated annealing with Pareto optimality to estimate ensembles on or near the optimal tradeoff surface between competing training objectives. We demonstrate JuPOETs on a suite of multiobjective problems, including test functions with parameter bounds and system constraints as well as for the identification of a proof-of-concept biochemical model with four conflicting training objectives. JuPOETs identified optimal or near optimal solutions approximately six-fold faster than a corresponding implementation in Octave for the suite of test functions. For the proof-of-concept biochemical model, JuPOETs produced an ensemble of parameters that gave both the mean of the training data for conflicting data sets, while simultaneously estimating parameter sets that performed well on each of the individual objective functions. JuPOETs is a promising approach for the estimation of parameter and model ensembles using multiobjective optimization. JuPOETs can be adapted to solve many problem types, including mixed binary and continuous variable types, bilevel optimization problems and constrained problems without altering the base algorithm. JuPOETs is open source, available under an MIT license, and can be installed using the Julia package manager from the JuPOETs GitHub repository.
Ocean Predictability and Uncertainty Forecasts Using Local Ensemble Transfer Kalman Filter (LETKF)
NASA Astrophysics Data System (ADS)
Wei, M.; Hogan, P. J.; Rowley, C. D.; Smedstad, O. M.; Wallcraft, A. J.; Penny, S. G.
2017-12-01
Ocean predictability and uncertainty are studied with an ensemble system that has been developed based on the US Navy's operational HYCOM using the Local Ensemble Transfer Kalman Filter (LETKF) technology. One of the advantages of this method is that the best possible initial analysis states for the HYCOM forecasts are provided by the LETKF which assimilates operational observations using ensemble method. The background covariance during this assimilation process is implicitly supplied with the ensemble avoiding the difficult task of developing tangent linear and adjoint models out of HYCOM with the complicated hybrid isopycnal vertical coordinate for 4D-VAR. The flow-dependent background covariance from the ensemble will be an indispensable part in the next generation hybrid 4D-Var/ensemble data assimilation system. The predictability and uncertainty for the ocean forecasts are studied initially for the Gulf of Mexico. The results are compared with another ensemble system using Ensemble Transfer (ET) method which has been used in the Navy's operational center. The advantages and disadvantages are discussed.
An adaptive Gaussian process-based iterative ensemble smoother for data assimilation
NASA Astrophysics Data System (ADS)
Ju, Lei; Zhang, Jiangjiang; Meng, Long; Wu, Laosheng; Zeng, Lingzao
2018-05-01
Accurate characterization of subsurface hydraulic conductivity is vital for modeling of subsurface flow and transport. The iterative ensemble smoother (IES) has been proposed to estimate the heterogeneous parameter field. As a Monte Carlo-based method, IES requires a relatively large ensemble size to guarantee its performance. To improve the computational efficiency, we propose an adaptive Gaussian process (GP)-based iterative ensemble smoother (GPIES) in this study. At each iteration, the GP surrogate is adaptively refined by adding a few new base points chosen from the updated parameter realizations. Then the sensitivity information between model parameters and measurements is calculated from a large number of realizations generated by the GP surrogate with virtually no computational cost. Since the original model evaluations are only required for base points, whose number is much smaller than the ensemble size, the computational cost is significantly reduced. The applicability of GPIES in estimating heterogeneous conductivity is evaluated by the saturated and unsaturated flow problems, respectively. Without sacrificing estimation accuracy, GPIES achieves about an order of magnitude of speed-up compared with the standard IES. Although subsurface flow problems are considered in this study, the proposed method can be equally applied to other hydrological models.
Multi-model ensembles for assessment of flood losses and associated uncertainty
NASA Astrophysics Data System (ADS)
Figueiredo, Rui; Schröter, Kai; Weiss-Motz, Alexander; Martina, Mario L. V.; Kreibich, Heidi
2018-05-01
Flood loss modelling is a crucial part of risk assessments. However, it is subject to large uncertainty that is often neglected. Most models available in the literature are deterministic, providing only single point estimates of flood loss, and large disparities tend to exist among them. Adopting any one such model in a risk assessment context is likely to lead to inaccurate loss estimates and sub-optimal decision-making. In this paper, we propose the use of multi-model ensembles to address these issues. This approach, which has been applied successfully in other scientific fields, is based on the combination of different model outputs with the aim of improving the skill and usefulness of predictions. We first propose a model rating framework to support ensemble construction, based on a probability tree of model properties, which establishes relative degrees of belief between candidate models. Using 20 flood loss models in two test cases, we then construct numerous multi-model ensembles, based both on the rating framework and on a stochastic method, differing in terms of participating members, ensemble size and model weights. We evaluate the performance of ensemble means, as well as their probabilistic skill and reliability. Our results demonstrate that well-designed multi-model ensembles represent a pragmatic approach to consistently obtain more accurate flood loss estimates and reliable probability distributions of model uncertainty.
NASA Astrophysics Data System (ADS)
Enterkine, J.; Spaete, L.; Glenn, N. F.; Gallagher, M.
2017-12-01
Remote sensing and mapping of dryland ecosystem vegetation is notably problematic due to the low canopy cover and fugacious growing seasons. Recent improvements in available satellite imagery and machine learning techniques have enabled enhanced approaches to mapping and monitoring vegetation across dryland ecosystems. The Sentinel-2 satellites (launched June 2015 and March 2017) of ESA's Copernicus Programme offer promising developments from existing multispectral satellite systems such as Landsat. Freely-available, Sentinel-2 imagery offers a five-day revisit frequency, thirteen spectral bands (in the visible, near infrared, and shortwave infrared), and high spatial resolution (from 10m to 60m). Three narrow spectral bands located between the visible and the near infrared are designed to observe changes in photosynthesis. The high temporal, spatial, and spectral resolution of this imagery makes it ideal for monitoring vegetation in dryland ecosystems. In this study, we calculated a large number of vegetation and spectral indices from Sentinel-2 imagery spanning a growing season. This data was leveraged with robust field data of canopy cover at precise geolocations. We then used a Random Forests ensemble learning model to identify the most predictive variables for each landcover class, which were then used to impute landcover over the study area. The resulting vegetation map product will be used by land managers, and the mapping approaches will serve as a basis for future remote sensing projects using Sentinel-2 imagery and machine learning.
Glyph-based analysis of multimodal directional distributions in vector field ensembles
NASA Astrophysics Data System (ADS)
Jarema, Mihaela; Demir, Ismail; Kehrer, Johannes; Westermann, Rüdiger
2015-04-01
Ensemble simulations are increasingly often performed in the geosciences in order to study the uncertainty and variability of model predictions. Describing ensemble data by mean and standard deviation can be misleading in case of multimodal distributions. We present first results of a glyph-based visualization of multimodal directional distributions in 2D and 3D vector ensemble data. Directional information on the circle/sphere is modeled using mixtures of probability density functions (pdfs), which enables us to characterize the distributions with relatively few parameters. The resulting mixture models are represented by 2D and 3D lobular glyphs showing direction, spread and strength of each principal mode of the distributions. A 3D extension of our approach is realized by means of an efficient GPU rendering technique. We demonstrate our method in the context of ensemble weather simulations.
Ensembl comparative genomics resources.
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.
Ensembl comparative genomics resources
Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Estimation of Land Surface Temperature from GCOM-W1 AMSR2 Data over the Chinese Landmass
NASA Astrophysics Data System (ADS)
Zhou, Ji; Dai, Fengnan; Zhang, Xiaodong
2016-04-01
As one of the most important parameter at the interface between the earth's surface and atmosphere, land surface temperature (LST) plays a crucial role in many fields, such as climate change monitoring and hydrological modeling. Satellite remote sensing provides the unique possibility to observe LST of large regions at diverse spatial and temporal scales. Compared with thermal infrared (TIR) remote sensing, passive microwave (PW) remote sensing has a better ability in overcoming the influences of clouds; thus, it can be used to improve the temporal resolution of current satellite TIR LST. However, most of current methods for estimation LST from PW remote sensing are empirical and have unsatisfied generalization. In this study, a semi-empirical method is proposed to estimate LST from the observation of the Advanced Microwave Scanning Radiometer 2 (AMSR2) on board the Global Change Observation Mission 1st-WATER "SHIZUKU" satellite (GCOM-W1). The method is based on the PW radiation transfer equation, which is simplified based on (1) the linear relationship between the emissivities of horizontal and vertical polarization channels at the same frequency and (2) the significant relationship between atmospheric parameters and the atmospheric water vapor content. An iteration approach is used to best fit the pixel-based coefficients in the simplified radiation transfer equation of the horizontal and vertical polarization channels at each frequency. Then an integration approach is proposed to generate the ensemble estimation from estimations of multiple frequencies for different land cover types. This method is trained with the AMSR2 brightness temperature and MODIS LST in 2013 over the entire Chinese landmass and then it is tested with the data in 2014. Validation based on in situ LSTs measured in northwestern China demonstrates that the proposed method has a better accuracy than the polarization radiation method, with a root-mean squared error of 3 K. Although the proposal method is applied to AMSR2 data, it has good ability to extend to other satellite PW sensors, such as the Advanced Microwave Scanning Radiometer for EOS (AMSR-E) on board the Aqua satellite and the Special Sensor Microwave/Imager (SSM/I) on board the Defense Meteorological Satellite Program (DMSP) satellite. It would be beneficial in providing LST to applications at continental and global scales.
Thermodynamic-ensemble independence of solvation free energy.
Chong, Song-Ho; Ham, Sihyun
2015-02-10
Solvation free energy is the fundamental thermodynamic quantity in solution chemistry. Recently, it has been suggested that the partial molar volume correction is necessary to convert the solvation free energy determined in different thermodynamic ensembles. Here, we demonstrate ensemble-independence of the solvation free energy on general thermodynamic grounds. Theoretical estimates of the solvation free energy based on the canonical or grand-canonical ensemble are pertinent to experiments carried out under constant pressure without any conversion.
NASA Astrophysics Data System (ADS)
Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu
2016-06-01
To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.
JEnsembl: a version-aware Java API to Ensembl data systems.
Paterson, Trevor; Law, Andy
2012-11-01
The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).
Competitive Learning Neural Network Ensemble Weighted by Predicted Performance
ERIC Educational Resources Information Center
Ye, Qiang
2010-01-01
Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…
An investigation of turbulent transport in the extreme lower atmosphere
NASA Technical Reports Server (NTRS)
Koper, C. A., Jr.; Sadeh, W. Z.
1975-01-01
A model in which the Lagrangian autocorrelation is expressed by a domain integral over a set of usual Eulerian autocorrelations acquired concurrently at all points within a turbulence box is proposed along with a method for ascertaining the statistical stationarity of turbulent velocity by creating an equivalent ensemble to investigate the flow in the extreme lower atmosphere. Simultaneous measurements of turbulent velocity on a turbulence line along the wake axis were carried out utilizing a longitudinal array of five hot-wire anemometers remotely operated. The stationarity test revealed that the turbulent velocity is approximated as a realization of a weakly self-stationary random process. Based on the Lagrangian autocorrelation it is found that: (1) large diffusion time predominated; (2) ratios of Lagrangian to Eulerian time and spatial scales were smaller than unity; and, (3) short and long diffusion time scales and diffusion spatial scales were constrained within their Eulerian counterparts.
Rapid cell separation with minimal manipulation for autologous cell therapies
NASA Astrophysics Data System (ADS)
Smith, Alban J.; O'Rorke, Richard D.; Kale, Akshay; Rimsa, Roberts; Tomlinson, Matthew J.; Kirkham, Jennifer; Davies, A. Giles; Wälti, Christoph; Wood, Christopher D.
2017-02-01
The ability to isolate specific, viable cell populations from mixed ensembles with minimal manipulation and within intra-operative time would provide significant advantages for autologous, cell-based therapies in regenerative medicine. Current cell-enrichment technologies are either slow, lack specificity and/or require labelling. Thus a rapid, label-free separation technology that does not affect cell functionality, viability or phenotype is highly desirable. Here, we demonstrate separation of viable from non-viable human stromal cells using remote dielectrophoresis, in which an electric field is coupled into a microfluidic channel using shear-horizontal surface acoustic waves, producing an array of virtual electrodes within the channel. This allows high-throughput dielectrophoretic cell separation in high conductivity, physiological-like fluids, overcoming the limitations of conventional dielectrophoresis. We demonstrate viable/non-viable separation efficacy of >98% in pre-purified mesenchymal stromal cells, extracted from human dental pulp, with no adverse effects on cell viability, or on their subsequent osteogenic capabilities.
Ensembles of satellite aerosol retrievals based on three AATSR algorithms within aerosol_cci
NASA Astrophysics Data System (ADS)
Kosmale, Miriam; Popp, Thomas
2016-04-01
Ensemble techniques are widely used in the modelling community, combining different modelling results in order to reduce uncertainties. This approach could be also adapted to satellite measurements. Aerosol_cci is an ESA funded project, where most of the European aerosol retrieval groups work together. The different algorithms are homogenized as far as it makes sense, but remain essentially different. Datasets are compared with ground based measurements and between each other. Three AATSR algorithms (Swansea university aerosol retrieval, ADV aerosol retrieval by FMI and Oxford aerosol retrieval ORAC) provide within this project 17 year global aerosol records. Each of these algorithms provides also uncertainty information on pixel level. Within the presented work, an ensembles of the three AATSR algorithms is performed. The advantage over each single algorithm is the higher spatial coverage due to more measurement pixels per gridbox. A validation to ground based AERONET measurements shows still a good correlation of the ensemble, compared to the single algorithms. Annual mean maps show the global aerosol distribution, based on a combination of the three aerosol algorithms. In addition, pixel level uncertainties of each algorithm are used for weighting the contributions, in order to reduce the uncertainty of the ensemble. Results of different versions of the ensembles for aerosol optical depth will be presented and discussed. The results are validated against ground based AERONET measurements. A higher spatial coverage on daily basis allows better results in annual mean maps. The benefit of using pixel level uncertainties is analysed.
Ozcift, Akin; Gulten, Arif
2011-12-01
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Interaction between Tropical Atlantic Variability and El Niño-Southern Oscillation.
NASA Astrophysics Data System (ADS)
Saravanan, R.; Chang, Ping
2000-07-01
The interaction between tropical Atlantic variability and El Niño-Southern Oscillation (ENSO) is investigated using three ensembles of atmospheric general circulation model integrations. The integrations are forced by specifying observed sea surface temperature (SST) variability over a forcing domain. The forcing domain is the global ocean for the first ensemble, limited to the tropical ocean for the second ensemble, and further limited to the tropical Atlantic region for the third ensemble. The ensemble integrations show that extratropical SST anomalies have little impact on tropical variability, but the effect of ENSO is pervasive in the Tropics. Consistent with previous studies, the most significant influence of ENSO is found during the boreal spring season and is associated with an anomalous Walker circulation. Two important aspects of ENSO's influence on tropical Atlantic variability are noted. First, the ENSO signal contributes significantly to the `dipole' correlation structure between tropical Atlantic SST and rainfall in the Nordeste Brazil region. In the absence of the ENSO signal, the correlations are dominated by SST variability in the southern tropical Atlantic, resulting in less of a dipole structure. Second, the remote influence of ENSO also contributes to positive correlations between SST anomalies and downward surface heat flux in the tropical Atlantic during the boreal spring season. However, even when ENSO forcing is absent, the model integrations provide evidence for a positive surface heat flux feedback in the deep Tropics, which is analyzed in a companion study by Chang et al. The analysis of model simulations shows that interannual atmospheric variability in the tropical Pacific-Atlantic system is dominated by the interaction between two distinct sources of tropical heating: (i) an equatorial heat source in the eastern Pacific associated with ENSO and (ii) an off-equatorial heat source associated with SST anomalies near the Caribbean. Modeling this Caribbean heat source accurately could be very important for seasonal forecasting in the Central American-Caribbean region.
X-ray-generated heralded macroscopical quantum entanglement of two nuclear ensembles.
Liao, Wen-Te; Keitel, Christoph H; Pálffy, Adriana
2016-09-19
Heralded entanglement between macroscopical samples is an important resource for present quantum technology protocols, allowing quantum communication over large distances. In such protocols, optical photons are typically used as information and entanglement carriers between macroscopic quantum memories placed in remote locations. Here we investigate theoretically a new implementation which employs more robust x-ray quanta to generate heralded entanglement between two crystal-hosted macroscopical nuclear ensembles. Mössbauer nuclei in the two crystals interact collectively with an x-ray spontaneous parametric down conversion photon that generates heralded macroscopical entanglement with coherence times of approximately 100 ns at room temperature. The quantum phase between the entangled crystals can be conveniently manipulated by magnetic field rotations at the samples. The inherent long nuclear coherence times allow also for mechanical manipulations of the samples, for instance to check the stability of entanglement in the x-ray setup. Our results pave the way for first quantum communication protocols that use x-ray qubits.
Multiple-instance ensemble learning for hyperspectral images
NASA Astrophysics Data System (ADS)
Ergul, Ugur; Bilgin, Gokhan
2017-10-01
An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.
Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel
2017-06-01
Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Characterizing RNA ensembles from NMR data with kinematic models
Fonseca, Rasmus; Pachov, Dimitar V.; Bernauer, Julie; van den Bedem, Henry
2014-01-01
Functional mechanisms of biomolecules often manifest themselves precisely in transient conformational substates. Researchers have long sought to structurally characterize dynamic processes in non-coding RNA, combining experimental data with computer algorithms. However, adequate exploration of conformational space for these highly dynamic molecules, starting from static crystal structures, remains challenging. Here, we report a new conformational sampling procedure, KGSrna, which can efficiently probe the native ensemble of RNA molecules in solution. We found that KGSrna ensembles accurately represent the conformational landscapes of 3D RNA encoded by NMR proton chemical shifts. KGSrna resolves motionally averaged NMR data into structural contributions; when coupled with residual dipolar coupling data, a KGSrna ensemble revealed a previously uncharacterized transient excited state of the HIV-1 trans-activation response element stem–loop. Ensemble-based interpretations of averaged data can aid in formulating and testing dynamic, motion-based hypotheses of functional mechanisms in RNAs with broad implications for RNA engineering and therapeutic intervention. PMID:25114056
A target recognition method for maritime surveillance radars based on hybrid ensemble selection
NASA Astrophysics Data System (ADS)
Fan, Xueman; Hu, Shengliang; He, Jingbo
2017-11-01
In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.
Hybrid Data Assimilation without Ensemble Filtering
NASA Technical Reports Server (NTRS)
Todling, Ricardo; Akkraoui, Amal El
2014-01-01
The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional variational system to a hybrid approach in which the ensemble is generated using a square-root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid-point Statistical Interpolation system. As in most EnKF applications, we found it necessary to employ a combination of multiplicative and additive inflations, to compensate for sampling and modeling errors, respectively and, to maintain the small-member ensemble solution close to the variational solution; we also found it necessary to re-center the members of the ensemble about the variational analysis. During tuning of the filter we have found re-centering and additive inflation to play a considerably larger role than expected, particularly in a dual-resolution context when the variational analysis is ran at larger resolution than the ensemble. This led us to consider a hybrid strategy in which the members of the ensemble are generated by simply converting the variational analysis to the resolution of the ensemble and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, filter-free hybrid procedure with an EnKF-based hybrid procedure and a control non-hybrid, traditional, scheme show both hybrid strategies to provide equally significant improvement over the control; more interestingly, the filter-free procedure was found to give qualitatively similar results to the EnKF-based procedure.
The mysteries of remote memory.
Albo, Zimbul; Gräff, Johannes
2018-03-19
Long-lasting memories form the basis of our identity as individuals and lie central in shaping future behaviours that guide survival. Surprisingly, however, our current knowledge of how such memories are stored in the brain and retrieved, as well as the dynamics of the circuits involved, remains scarce despite seminal technical and experimental breakthroughs in recent years. Traditionally, it has been proposed that, over time, information initially learnt in the hippocampus is stored in distributed cortical networks. This process-the standard theory of memory consolidation-would stabilize the newly encoded information into a lasting memory, become independent of the hippocampus, and remain essentially unmodifiable throughout the lifetime of the individual. In recent years, several pieces of evidence have started to challenge this view and indicate that long-lasting memories might already ab ovo be encoded, and subsequently stored in distributed cortical networks, akin to the multiple trace theory of memory consolidation. In this review, we summarize these recent findings and attempt to identify the biologically plausible mechanisms based on which a contextual memory becomes remote by integrating different levels of analysis: from neural circuits to cell ensembles across synaptic remodelling and epigenetic modifications. From these studies, remote memory formation and maintenance appear to occur through a multi-trace, dynamic and integrative cellular process ranging from the synapse to the nucleus, and represent an exciting field of research primed to change quickly as new experimental evidence emerges.This article is part of a discussion meeting issue 'Of mice and mental health: facilitating dialogue between basic and clinical neuroscientists'. © 2018 The Authors.
NASA Astrophysics Data System (ADS)
Cheng, Tianhai; Gu, Xingfa; Wu, Yu; Chen, Hao; Yu, Tao
2013-08-01
Applying sphere aerosol models to replace the absorbing fine-sized dominated aerosols can potentially result in significant errors in the climate models and aerosol remote sensing retrieval. In this paper, the optical properties of absorbing fine-sized dominated aerosol were modeled, which are taking into account the fresh emitted soot particles (agglomerates of primary spherules), aged soot particles (semi-externally mixed with other weakly absorbing aerosols), and coarse aerosol particles (dust particles). The optical properties of the individual fresh and aged soot aggregates are calculated using the superposition T-matrix method. In order to quantify the morphology effect of absorbing aerosol models on the aerosol remote sensing retrieval, the ensemble averaged optical properties of absorbing fine-sized dominated aerosols are calculated based on the size distribution of fine aerosols (fresh and aged soot) and coarse aerosols. The corresponding optical properties of sphere absorbing aerosol models using Lorenz-Mie solutions were presented for comparison. The comparison study demonstrates that the sphere absorbing aerosol models underestimate the absorption ability of the fine-sized dominated aerosol particles. The morphology effect of absorbing fine-sized dominated aerosols on the TOA radiances and polarized radiances is also investigated. It is found that the sphere aerosol models overestimate the TOA reflectance and polarized reflectance by approximately a factor of 3 at wavelength of 0.865 μm. In other words, the fine-sized dominated aerosol models can cause large errors in the retrieved aerosol properties if satellite reflectance measurements are analyzed using the conventional Mie theory for spherical particles.
The mysteries of remote memory
2018-01-01
Long-lasting memories form the basis of our identity as individuals and lie central in shaping future behaviours that guide survival. Surprisingly, however, our current knowledge of how such memories are stored in the brain and retrieved, as well as the dynamics of the circuits involved, remains scarce despite seminal technical and experimental breakthroughs in recent years. Traditionally, it has been proposed that, over time, information initially learnt in the hippocampus is stored in distributed cortical networks. This process—the standard theory of memory consolidation—would stabilize the newly encoded information into a lasting memory, become independent of the hippocampus, and remain essentially unmodifiable throughout the lifetime of the individual. In recent years, several pieces of evidence have started to challenge this view and indicate that long-lasting memories might already ab ovo be encoded, and subsequently stored in distributed cortical networks, akin to the multiple trace theory of memory consolidation. In this review, we summarize these recent findings and attempt to identify the biologically plausible mechanisms based on which a contextual memory becomes remote by integrating different levels of analysis: from neural circuits to cell ensembles across synaptic remodelling and epigenetic modifications. From these studies, remote memory formation and maintenance appear to occur through a multi-trace, dynamic and integrative cellular process ranging from the synapse to the nucleus, and represent an exciting field of research primed to change quickly as new experimental evidence emerges. This article is part of a discussion meeting issue ‘Of mice and mental health: facilitating dialogue between basic and clinical neuroscientists’. PMID:29352028
Mass Conservation and Positivity Preservation with Ensemble-type Kalman Filter Algorithms
NASA Technical Reports Server (NTRS)
Janjic, Tijana; McLaughlin, Dennis B.; Cohn, Stephen E.; Verlaan, Martin
2013-01-01
Maintaining conservative physical laws numerically has long been recognized as being important in the development of numerical weather prediction (NWP) models. In the broader context of data assimilation, concerted efforts to maintain conservation laws numerically and to understand the significance of doing so have begun only recently. In order to enforce physically based conservation laws of total mass and positivity in the ensemble Kalman filter, we incorporate constraints to ensure that the filter ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. We show that the analysis steps of ensemble transform Kalman filter (ETKF) algorithm and ensemble Kalman filter algorithm (EnKF) can conserve the mass integral, but do not preserve positivity. Further, if localization is applied or if negative values are simply set to zero, then the total mass is not conserved either. In order to ensure mass conservation, a projection matrix that corrects for localization effects is constructed. In order to maintain both mass conservation and positivity preservation through the analysis step, we construct a data assimilation algorithms based on quadratic programming and ensemble Kalman filtering. Mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate constraints. Some simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. The results show clear improvements in both analyses and forecasts, particularly in the presence of localized features. Behavior of the algorithm is also tested in presence of model error.
Zhang, Lijun; Huang, Xinyan; Cao, Yuan; Xin, Yunhong; Ding, Liping
2017-12-22
Enormous effort has been put to the detection and recognition of various heavy metal ions due to their involvement in serious environmental pollution and many major diseases. The present work has developed a single fluorescent sensor ensemble that can distinguish and identify a variety of heavy metal ions. A pyrene-based fluorophore (PB) containing a metal ion receptor group was specially designed and synthesized. Anionic surfactant sodium dodecyl sulfate (SDS) assemblies can effectively adjust its fluorescence behavior. The selected binary ensemble based on PB/SDS assemblies can exhibit multiple emission bands and provide wavelength-based cross-reactive responses to a series of metal ions to realize pattern recognition ability. The combination of surfactant assembly modulation and the receptor for metal ions empowers the present sensor ensemble with strong discrimination power, which could well differentiate 13 metal ions, including Cu 2+ , Co 2+ , Ni 2+ , Cr 3+ , Hg 2+ , Fe 3+ , Zn 2+ , Cd 2+ , Al 3+ , Pb 2+ , Ca 2+ , Mg 2+ , and Ba 2+ . Moreover, this single sensing ensemble could be further applied for identifying different brands of drinking water.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Ocean state and uncertainty forecasts using HYCOM with Local Ensemble Transfer Kalman Filter (LETKF)
NASA Astrophysics Data System (ADS)
Wei, Mozheng; Hogan, Pat; Rowley, Clark; Smedstad, Ole-Martin; Wallcraft, Alan; Penny, Steve
2017-04-01
An ensemble forecast system based on the US Navy's operational HYCOM using Local Ensemble Transfer Kalman Filter (LETKF) technology has been developed for ocean state and uncertainty forecasts. One of the advantages is that the best possible initial analysis states for the HYCOM forecasts are provided by the LETKF which assimilates the operational observations using ensemble method. The background covariance during this assimilation process is supplied with the ensemble, thus it avoids the difficulty of developing tangent linear and adjoint models for 4D-VAR from the complicated hybrid isopycnal vertical coordinate in HYCOM. Another advantage is that the ensemble system provides the valuable uncertainty estimate corresponding to every state forecast from HYCOM. Uncertainty forecasts have been proven to be critical for the downstream users and managers to make more scientifically sound decisions in numerical prediction community. In addition, ensemble mean is generally more accurate and skilful than the single traditional deterministic forecast with the same resolution. We will introduce the ensemble system design and setup, present some results from 30-member ensemble experiment, and discuss scientific, technical and computational issues and challenges, such as covariance localization, inflation, model related uncertainties and sensitivity to the ensemble size.
Optimizing inhomogeneous spin ensembles for quantum memory
NASA Astrophysics Data System (ADS)
Bensky, Guy; Petrosyan, David; Majer, Johannes; Schmiedmayer, Jörg; Kurizki, Gershon
2012-07-01
We propose a method to maximize the fidelity of quantum memory implemented by a spectrally inhomogeneous spin ensemble. The method is based on preselecting the optimal spectral portion of the ensemble by judiciously designed pulses. This leads to significant improvement of the transfer and storage of quantum information encoded in the microwave or optical field.
Conservation of Mass and Preservation of Positivity with Ensemble-Type Kalman Filter Algorithms
NASA Technical Reports Server (NTRS)
Janjic, Tijana; Mclaughlin, Dennis; Cohn, Stephen E.; Verlaan, Martin
2014-01-01
This paper considers the incorporation of constraints to enforce physically based conservation laws in the ensemble Kalman filter. In particular, constraints are used to ensure that the ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. In certain situations filtering algorithms such as the ensemble Kalman filter (EnKF) and ensemble transform Kalman filter (ETKF) yield updated ensembles that conserve mass but are negative, even though the actual states must be nonnegative. In such situations if negative values are set to zero, or a log transform is introduced, the total mass will not be conserved. In this study, mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate non-negativity constraints. Simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. In two examples, an update that includes a non-negativity constraint is able to properly describe the transport of a sharp feature (e.g., a triangle or cone). A number of implementation questions still need to be addressed, particularly the need to develop a computationally efficient quadratic programming update for large ensemble.
Huisman, J.A.; Breuer, L.; Bormann, H.; Bronstert, A.; Croke, B.F.W.; Frede, H.-G.; Graff, T.; Hubrechts, L.; Jakeman, A.J.; Kite, G.; Lanini, J.; Leavesley, G.; Lettenmaier, D.P.; Lindstrom, G.; Seibert, J.; Sivapalan, M.; Viney, N.R.; Willems, P.
2009-01-01
An ensemble of 10 hydrological models was applied to the same set of land use change scenarios. There was general agreement about the direction of changes in the mean annual discharge and 90% discharge percentile predicted by the ensemble members, although a considerable range in the magnitude of predictions for the scenarios and catchments under consideration was obvious. Differences in the magnitude of the increase were attributed to the different mean annual actual evapotranspiration rates for each land use type. The ensemble of model runs was further analyzed with deterministic and probabilistic ensemble methods. The deterministic ensemble method based on a trimmed mean resulted in a single somewhat more reliable scenario prediction. The probabilistic reliability ensemble averaging (REA) method allowed a quantification of the model structure uncertainty in the scenario predictions. It was concluded that the use of a model ensemble has greatly increased our confidence in the reliability of the model predictions. ?? 2008 Elsevier Ltd.
Multi-objective optimization for generating a weighted multi-model ensemble
NASA Astrophysics Data System (ADS)
Lee, H.
2017-12-01
Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic ensemble mean and may provide reliable future projections.
Rethinking the Default Construction of Multimodel Climate Ensembles
Rauser, Florian; Gleckler, Peter; Marotzke, Jochem
2015-07-21
Here, we discuss the current code of practice in the climate sciences to routinely create climate model ensembles as ensembles of opportunity from the newest phase of the Coupled Model Intercomparison Project (CMIP). We give a two-step argument to rethink this process. First, the differences between generations of ensembles corresponding to different CMIP phases in key climate quantities are not large enough to warrant an automatic separation into generational ensembles for CMIP3 and CMIP5. Second, we suggest that climate model ensembles cannot continue to be mere ensembles of opportunity but should always be based on a transparent scientific decision process.more » If ensembles can be constrained by observation, then they should be constructed as target ensembles that are specifically tailored to a physical question. If model ensembles cannot be constrained by observation, then they should be constructed as cross-generational ensembles, including all available model data to enhance structural model diversity and to better sample the underlying uncertainties. To facilitate this, CMIP should guide the necessarily ongoing process of updating experimental protocols for the evaluation and documentation of coupled models. Finally, with an emphasis on easy access to model data and facilitating the filtering of climate model data across all CMIP generations and experiments, our community could return to the underlying idea of using model data ensembles to improve uncertainty quantification, evaluation, and cross-institutional exchange.« less
Radar Altimetry for Hydrological Modeling and Monitoring in the Zambezi River Basin
NASA Astrophysics Data System (ADS)
Michailovsky, C. I.; Berry, P. A.; Smith, R. G.; Bauer-Gottwein, P.
2011-12-01
Hydrological model forecasts are subject to large uncertainties stemming from uncertain input data, model structure, parameterization and lack of sufficient calibration/validation data. For real-time or near-real-time applications data assimilation techniques such as the Ensemble Kalman Filter (EnKF) can be used to reduce forecast uncertainty by updating model states as new data becomes available. The use of remote sensing data is attractive for such applications as it provides wide geographical coverage and continuous time-series without the typically long delays that exist in obtaining in-situ data. River discharge is one of the main hydrological variables of interest, and while it cannot currently be directly measured remotely, water levels in rivers can be obtained from satellite based radar altimetry and converted to discharge through rating curves. This study aims to give a realistic assessment of the improvements that can be derived from the use of satellite radar altimetry measurements from the Envisat mission for discharge monitoring and modeling on the basin scale for the Zambezi River. The altimetry data used is the Radar AlTimetry (RAT) product developed at the Earth and Planetary Remote Sensing Laboratory at the De Montfort University. The first step in analyzing the data is the determination of potential altimetry targets which are the locations at which the Envisat orbit and the river network cross in order to select data points corresponding to surface water. The quality of the water level time-series is then analyzed for all targets and the exploitable targets identified. Rating curves are derived from in-situ or remotely-sensed data depending on data-availability at the various locations and discharge time-series are established. A Monte Carlo analysis is carried out to assess the uncertainties on the computed discharge. It was found that having a single cross-section and associated discharge measurement at one point in time significantly reduces discharge uncertainty. To assess improvements in model predictions, a model of the Zambezi River basin based on remote sensing data is set up with the Soil and Water Assessment Tool and calibrated with available in-situ data. The discharge data from altimetry is then used in an EnKF framework to update discharge in the model as it runs. The method showed improvements in prediction uncertainties for short lead times.
Automated Historical and Real-Time Cyclone Discovery With Multimodal Remote Satellite Measurements
NASA Astrophysics Data System (ADS)
Ho, S.; Talukder, A.; Liu, T.; Tang, W.; Bingham, A.
2008-12-01
Existing cyclone detection and tracking solutions involve extensive manual analysis of modeled-data and field campaign data by teams of experts. We have developed a novel automated global cyclone detection and tracking system by assimilating and sharing information from multiple remote satellites. This unprecedented solution of combining multiple remote satellite measurements in an autonomous manner allows leveraging off the strengths of each individual satellite. Use of multiple satellite data sources also results in significantly improved temporal tracking accuracy for cyclones. Our solution involves an automated feature extraction and machine learning technique based on an ensemble classifier and Kalman filter for cyclone detection and tracking from multiple heterogeneous satellite data sources. Our feature-based methodology that focuses on automated cyclone discovery is fundamentally different from, and actually complements, the well-known Dvorak technique for cyclone intensity estimation (that often relies on manual detection of cyclonic regions) from field and remote data. Our solution currently employs the QuikSCAT wind measurement and the merged level 3 TRMM precipitation data for automated cyclone discovery. Assimilation of other types of remote measurements is ongoing and planned in the near future. Experimental results of our automated solution on historical cyclone datasets demonstrate the superior performance of our automated approach compared to previous work. Performance of our detection solution compares favorably against the list of cyclones occurring in North Atlantic Ocean for the 2005 calendar year reported by the National Hurricane Center (NHC) in our initial analysis. We have also demonstrated the robustness of our cyclone tracking methodology in other regions over the world by using multiple heterogeneous satellite data for detection and tracking of three arbitrary historical cyclones in other regions. Our cyclone detection and tracking methodology can be applied to (i) historical data to support Earth scientists in climate modeling, cyclonic-climate interactions, and obtain a better understanding of the cause and effects of cyclone (e.g. cyclo-genesis), and (ii) automatic cyclone discovery in near real-time using streaming satellite to support and improve the planning of global cyclone field campaigns. Additional satellite data from GOES and other orbiting satellites can be easily assimilated and integrated into our automated cyclone detection and tracking module to improve the temporal tracking accuracy of cyclones down to ½ hr and reduce the incidence of false alarms.
NASA Astrophysics Data System (ADS)
Lahmiri, S.; Boukadoum, M.
2015-10-01
Accurate forecasting of stock market volatility is an important issue in portfolio risk management. In this paper, an ensemble system for stock market volatility is presented. It is composed of three different models that hybridize the exponential generalized autoregressive conditional heteroscedasticity (GARCH) process and the artificial neural network trained with the backpropagation algorithm (BPNN) to forecast stock market volatility under normal, t-Student, and generalized error distribution (GED) assumption separately. The goal is to design an ensemble system where each single hybrid model is capable to capture normality, excess skewness, or excess kurtosis in the data to achieve complementarity. The performance of each EGARCH-BPNN and the ensemble system is evaluated by the closeness of the volatility forecasts to realized volatility. Based on mean absolute error and mean of squared errors, the experimental results show that proposed ensemble model used to capture normality, skewness, and kurtosis in data is more accurate than the individual EGARCH-BPNN models in forecasting the S&P 500 intra-day volatility based on one and five-minute time horizons data.
NASA Astrophysics Data System (ADS)
Flampouris, Stylianos; Penny, Steve; Alves, Henrique
2017-04-01
The National Centers for Environmental Prediction (NCEP) of the National Oceanic and Atmospheric Administration (NOAA) provides the operational wave forecast for the US National Weather Service (NWS). Given the continuous efforts to improve forecast, NCEP is developing an ensemble-based data assimilation system, based on the local ensemble transform Kalman filter (LETKF), the existing operational global wave ensemble system (GWES) and on satellite and in-situ observations. While the LETKF was designed for atmospheric applications (Hunt et al 2007), and has been adapted for several ocean models (e.g. Penny 2016), this is the first time applied for oceanic waves assimilation. This new wave assimilation system provides a global estimation of the surface sea state and its approximate uncertainty. It achieves this by analyzing the 21-member ensemble of the significant wave height provided by GWES every 6h. Observations from four altimeters and all the available in-situ measurements are used in this analysis. The analysis of the significant wave height is used for initializing the next forecasting cycle; the data assimilation system is currently being tested for operational use.
Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.
Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold
2016-12-07
Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
Cooperative Autonomous Observation of Volcanic Environments with sUAS
NASA Astrophysics Data System (ADS)
Ravela, S.
2015-12-01
The Cooperative Autonomous Observing System Project (CAOS) at the MIT Earth Signals and Systems Group has developed methodology and systems for dynamically mapping coherent fluids such as plumes using small unmanned aircraft systems (sUAS). In the CAOS approach, two classes of sUAS, one remote the other in-situ, implement a dynamic data-driven mapping system by closing the loop between Modeling, Estimation, Sampling, Planning and Control (MESPAC). The continually gathered measurements are assimilated to produce maps/analyses which also guide the sUAS network to adaptively resample the environment. Rather than scan the volume in fixed Eulerian or Lagrangian flight plans, the adaptive nature of the sampling process enables objectives for efficiency and resilience to be incorporated. Modeling includes realtime prediction using two types of reduced models, one based on nowcasting remote observations of plume tracer using scale-cascaded alignment, and another based on dynamically-deformable EOF/POD developed for coherent structures. Ensemble-based Information-theoretic machine learning approaches are used for the highly non-linear/non-Gaussian state/parameter estimation, and for planning. Control of the sUAS is based on model reference control coupled with hierarchical PID. MESPAC is implemented in part on a SkyCandy platform, and implements an airborne mesh that provides instantaneous situational awareness and redundant communication to an operating fleet. SkyCandy is deployed on Itzamna Aero's I9X/W UAS with low-cost sensors, and is currently being used to study the Popocatepetl volcano. Results suggest that operational communities can deploy low-cost sUAS to systematically monitor whilst optimizing for efficiency/maximizing resilience. The CAOS methodology is applicable to many other environments where coherent structures are present in the background. More information can be found at caos.mit.edu.
Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms
Tramontana, Gianluca; Jung, Martin; Schwalm, Christopher R.; ...
2016-07-29
Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data andmore » (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange ( R 2 < 0.5), ecosystem respiration ( R 2 > 0.6), gross primary production ( R 2> 0.7), latent heat ( R 2 > 0.7), sensible heat ( R 2 > 0.7), and net radiation ( R 2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well ( R 2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted ( R 2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). Finally, the evaluated large ensemble of ML-based models will be the basis of new global flux products.« less
Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tramontana, Gianluca; Jung, Martin; Schwalm, Christopher R.
Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data andmore » (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange ( R 2 < 0.5), ecosystem respiration ( R 2 > 0.6), gross primary production ( R 2> 0.7), latent heat ( R 2 > 0.7), sensible heat ( R 2 > 0.7), and net radiation ( R 2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well ( R 2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted ( R 2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). Finally, the evaluated large ensemble of ML-based models will be the basis of new global flux products.« less
A novel method to improve MODIS AOD retrievals in cloudy pixels using an analog ensemble approach
NASA Astrophysics Data System (ADS)
Kumar, R.; Raman, A.; Delle Monache, L.; Alessandrini, S.; Cheng, W. Y. Y.; Gaubert, B.; Arellano, A. F.
2016-12-01
Particulate matter (PM) concentrations are one of the fundamental indicators of air quality. Earth orbiting satellite platforms acquire column aerosol abundance that can in turn provide information about the PM concentrations. One of the serious limitations of column aerosol retrievals from low earth orbiting satellites is that these algorithms are based on clear sky assumptions. They do not retrieve AOD in cloudy pixels. After filtering cloudy pixels, these algorithms also arbitrarily remove brightest and darkest 25% of remaining pixels over ocean and brightest and darkest 50% pixels over land to filter any residual contamination from clouds. This becomes a critical issue especially in regions that experience monsoon, like Asia and North America. In case of North America, monsoon season experiences wide variety of extreme air quality events such as fires in California and dust storms in Arizona. Assessment of these episodic events warrants frequent monitoring of aerosol observations from remote sensing retrievals. In this study, we demonstrate a method to fill in cloudy pixels in Moderate Imaging Resolution Spectroradiometer (MODIS) AOD retrievals based on ensembles generated using an analog-based approach (AnEn). It provides a probabilistic distribution of AOD in cloudy pixels using historical records of model simulations of meteorological predictors such as AOD, relative humidity, and wind speed, and past observational records of MODIS AOD at a given target site. We use simulations from a coupled community weather forecasting model with chemistry (WRF-Chem) run at a resolution comparable to MODIS AOD. Analogs selected from summer months (June, July) of 2011-2013 from model and corresponding observations are used as a training dataset. Then, missing AOD retrievals in cloudy pixels in the last 31 days of the selected period are estimated. Here, we use AERONET stations as target sites to facilitate comparison against in-situ measurements. We use two approaches to evaluate the estimated AOD: 1) by comparing against reanalysis AOD, 2) by inverting AOD to PM10 concentrations and then comparing those with measured PM10. AnEn is an efficient approach to generate an ensemble as it involves only one model run and provides an estimate of uncertainty that complies with the physical and chemical state of the atmosphere.
JEnsembl: a version-aware Java API to Ensembl data systems
Paterson, Trevor; Law, Andy
2012-01-01
Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing ‘through time’ comparative analyses to be performed. Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net). Contact: jensembl-develop@lists.sf.net, andy.law@roslin.ed.ac.uk, trevor.paterson@roslin.ed.ac.uk PMID:22945789
Blanton, Brian; Dresback, Kendra; Colle, Brian; Kolar, Randy; Vergara, Humberto; Hong, Yang; Leonardo, Nicholas; Davidson, Rachel; Nozick, Linda; Wachtendorf, Tricia
2018-04-25
Hurricane track and intensity can change rapidly in unexpected ways, thus making predictions of hurricanes and related hazards uncertain. This inherent uncertainty often translates into suboptimal decision-making outcomes, such as unnecessary evacuation. Representing this uncertainty is thus critical in evacuation planning and related activities. We describe a physics-based hazard modeling approach that (1) dynamically accounts for the physical interactions among hazard components and (2) captures hurricane evolution uncertainty using an ensemble method. This loosely coupled model system provides a framework for probabilistic water inundation and wind speed levels for a new, risk-based approach to evacuation modeling, described in a companion article in this issue. It combines the Weather Research and Forecasting (WRF) meteorological model, the Coupled Routing and Excess STorage (CREST) hydrologic model, and the ADvanced CIRCulation (ADCIRC) storm surge, tide, and wind-wave model to compute inundation levels and wind speeds for an ensemble of hurricane predictions. Perturbations to WRF's initial and boundary conditions and different model physics/parameterizations generate an ensemble of storm solutions, which are then used to drive the coupled hydrologic + hydrodynamic models. Hurricane Isabel (2003) is used as a case study to illustrate the ensemble-based approach. The inundation, river runoff, and wind hazard results are strongly dependent on the accuracy of the mesoscale meteorological simulations, which improves with decreasing lead time to hurricane landfall. The ensemble envelope brackets the observed behavior while providing "best-case" and "worst-case" scenarios for the subsequent risk-based evacuation model. © 2018 Society for Risk Analysis.
NASA Astrophysics Data System (ADS)
Wu, Xiongwu; Brooks, Bernard R.
2011-11-01
The self-guided Langevin dynamics (SGLD) is a method to accelerate conformational searching. This method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. It has been applied to studies of many long time scale events, such as protein folding. Recent progress in the understanding of the conformational distribution in SGLD simulations makes SGLD also an accurate method for quantitative studies. The SGLD partition function provides a way to convert the SGLD conformational distribution to the canonical ensemble distribution and to calculate ensemble average properties through reweighting. Based on the SGLD partition function, this work presents a force-momentum-based self-guided Langevin dynamics (SGLDfp) simulation method to directly sample the canonical ensemble. This method includes interaction forces in its guiding force to compensate the perturbation caused by the momentum-based guiding force so that it can approximately sample the canonical ensemble. Using several example systems, we demonstrate that SGLDfp simulations can approximately maintain the canonical ensemble distribution and significantly accelerate conformational searching. With optimal parameters, SGLDfp and SGLD simulations can cross energy barriers of more than 15 kT and 20 kT, respectively, at similar rates for LD simulations to cross energy barriers of 10 kT. The SGLDfp method is size extensive and works well for large systems. For studies where preserving accessible conformational space is critical, such as free energy calculations and protein folding studies, SGLDfp is an efficient approach to search and sample the conformational space.
NASA Astrophysics Data System (ADS)
Drótos, Gábor; Bódai, Tamás; Tél, Tamás
2016-08-01
In nonautonomous dynamical systems, like in climate dynamics, an ensemble of trajectories initiated in the remote past defines a unique probability distribution, the natural measure of a snapshot attractor, for any instant of time, but this distribution typically changes in time. In cases with an aperiodic driving, temporal averages taken along a single trajectory would differ from the corresponding ensemble averages even in the infinite-time limit: ergodicity does not hold. It is worth considering this difference, which we call the nonergodic mismatch, by taking time windows of finite length for temporal averaging. We point out that the probability distribution of the nonergodic mismatch is qualitatively different in ergodic and nonergodic cases: its average is zero and typically nonzero, respectively. A main conclusion is that the difference of the average from zero, which we call the bias, is a useful measure of nonergodicity, for any window length. In contrast, the standard deviation of the nonergodic mismatch, which characterizes the spread between different realizations, exhibits a power-law decrease with increasing window length in both ergodic and nonergodic cases, and this implies that temporal and ensemble averages differ in dynamical systems with finite window lengths. It is the average modulus of the nonergodic mismatch, which we call the ergodicity deficit, that represents the expected deviation from fulfilling the equality of temporal and ensemble averages. As an important finding, we demonstrate that the ergodicity deficit cannot be reduced arbitrarily in nonergodic systems. We illustrate via a conceptual climate model that the nonergodic framework may be useful in Earth system dynamics, within which we propose the measure of nonergodicity, i.e., the bias, as an order-parameter-like quantifier of climate change.
Chakravorty, Arghya; Jia, Zhe; Li, Lin; Zhao, Shan; Alexov, Emil
2018-02-13
Typically, the ensemble average polar component of solvation energy (ΔG polar solv ) of a macromolecule is computed using molecular dynamics (MD) or Monte Carlo (MC) simulations to generate conformational ensemble and then single/rigid conformation solvation energy calculation is performed on each snapshot. The primary objective of this work is to demonstrate that Poisson-Boltzmann (PB)-based approach using a Gaussian-based smooth dielectric function for macromolecular modeling previously developed by us (Li et al. J. Chem. Theory Comput. 2013, 9 (4), 2126-2136) can reproduce that ensemble average (ΔG polar solv ) of a protein from a single structure. We show that the Gaussian-based dielectric model reproduces the ensemble average ΔG polar solv (⟨ΔG polar solv ⟩) from an energy-minimized structure of a protein regardless of the minimization environment (structure minimized in vacuo, implicit or explicit waters, or crystal structure); the best case, however, is when it is paired with an in vacuo-minimized structure. In other minimization environments (implicit or explicit waters or crystal structure), the traditional two-dielectric model can still be selected with which the model produces correct solvation energies. Our observations from this work reflect how the ability to appropriately mimic the motion of residues, especially the salt bridge residues, influences a dielectric model's ability to reproduce the ensemble average value of polar solvation free energy from a single in vacuo-minimized structure.
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)
2001-01-01
Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Mitosis detection using generic features and an ensemble of cascade adaboosts.
Tek, F Boray
2013-01-01
Mitosis count is one of the factors that pathologists use to assess the risk of metastasis and survival of the patients, which are affected by the breast cancer. We investigate an application of a set of generic features and an ensemble of cascade adaboosts to the automated mitosis detection. Calculation of the features rely minimally on object-level descriptions and thus require minimal segmentation. The proposed work was developed and tested on International Conference on Pattern Recognition (ICPR) 2012 mitosis detection contest data. We plotted receiver operating characteristics curves of true positive versus false positive rates; calculated recall, precision, F-measure, and region overlap ratio measures. WE TESTED OUR FEATURES WITH TWO DIFFERENT CLASSIFIER CONFIGURATIONS: 1) An ensemble of single adaboosts, 2) an ensemble of cascade adaboosts. On the ICPR 2012 mitosis detection contest evaluation, the cascade ensemble scored 54, 62.7, and 58, whereas the non-cascade version scored 68, 28.1, and 39.7 for the recall, precision, and F-measure measures, respectively. Mostly used features in the adaboost classifier rules were a shape-based feature, which counted granularity and a color-based feature, which relied on Red, Green, and Blue channel statistics. The features, which express the granular structure and color variations, are found useful for mitosis detection. The ensemble of adaboosts performs better than the individual adaboost classifiers. Moreover, the ensemble of cascaded adaboosts was better than the ensemble of single adaboosts for mitosis detection.
Optical modeling of volcanic ash particles using ellipsoids
NASA Astrophysics Data System (ADS)
Merikallio, Sini; Muñoz, Olga; Sundström, Anu-Maija; Virtanen, Timo H.; Horttanainen, Matti; de Leeuw, Gerrit; Nousiainen, Timo
2015-05-01
The single-scattering properties of volcanic ash particles are modeled here by using ellipsoidal shapes. Ellipsoids are expected to improve the accuracy of the retrieval of aerosol properties using remote sensing techniques, which are currently often based on oversimplified assumptions of spherical ash particles. Measurements of the single-scattering optical properties of ash particles from several volcanoes across the globe, including previously unpublished measurements from the Eyjafjallajökull and Puyehue volcanoes, are used to assess the performance of the ellipsoidal particle models. These comparisons between the measurements and the ellipsoidal particle model include consideration of the whole scattering matrix, as well as sensitivity studies on the point of view of the Advanced Along Track Scanning Radiometer (AATSR) instrument. AATSR, which flew on the ENVISAT satellite, offers two viewing directions but no information on polarization, so usually only the phase function is relevant for interpreting its measurements. As expected, ensembles of ellipsoids are able to reproduce the observed scattering matrix more faithfully than spheres. Performance of ellipsoid ensembles depends on the distribution of particle shapes, which we tried to optimize. No single specific shape distribution could be found that would perform superiorly in all situations, but all of the best-fit ellipsoidal distributions, as well as the additionally tested equiprobable distribution, improved greatly over the performance of spheres. We conclude that an equiprobable shape distribution of ellipsoidal model particles is a relatively good, yet enticingly simple, approach for modeling volcanic ash single-scattering optical properties.
Sources of Uncertainty in Predicting Land Surface Fluxes Using Diverse Data and Models
NASA Technical Reports Server (NTRS)
Dungan, Jennifer L.; Wang, Weile; Michaelis, Andrew; Votava, Petr; Nemani, Ramakrishma
2010-01-01
In the domain of predicting land surface fluxes, models are used to bring data from large observation networks and satellite remote sensing together to make predictions about present and future states of the Earth. Characterizing the uncertainty about such predictions is a complex process and one that is not yet fully understood. Uncertainty exists about initialization, measurement and interpolation of input variables; model parameters; model structure; and mixed spatial and temporal supports. Multiple models or structures often exist to describe the same processes. Uncertainty about structure is currently addressed by running an ensemble of different models and examining the distribution of model outputs. To illustrate structural uncertainty, a multi-model ensemble experiment we have been conducting using the Terrestrial Observation and Prediction System (TOPS) will be discussed. TOPS uses public versions of process-based ecosystem models that use satellite-derived inputs along with surface climate data and land surface characterization to produce predictions of ecosystem fluxes including gross and net primary production and net ecosystem exchange. Using the TOPS framework, we have explored the uncertainty arising from the application of models with different assumptions, structures, parameters, and variable definitions. With a small number of models, this only begins to capture the range of possible spatial fields of ecosystem fluxes. Few attempts have been made to systematically address the components of uncertainty in such a framework. We discuss the characterization of uncertainty for this approach including both quantifiable and poorly known aspects.
NASA Astrophysics Data System (ADS)
Liu, Li; Xu, Yue-Ping
2017-04-01
Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.
Lessons from Climate Modeling on the Design and Use of Ensembles for Crop Modeling
NASA Technical Reports Server (NTRS)
Wallach, Daniel; Mearns, Linda O.; Ruane, Alexander C.; Roetter, Reimund P.; Asseng, Senthold
2016-01-01
Working with ensembles of crop models is a recent but important development in crop modeling which promises to lead to better uncertainty estimates for model projections and predictions, better predictions using the ensemble mean or median, and closer collaboration within the modeling community. There are numerous open questions about the best way to create and analyze such ensembles. Much can be learned from the field of climate modeling, given its much longer experience with ensembles. We draw on that experience to identify questions and make propositions that should help make ensemble modeling with crop models more rigorous and informative. The propositions include defining criteria for acceptance of models in a crop MME, exploring criteria for evaluating the degree of relatedness of models in a MME, studying the effect of number of models in the ensemble, development of a statistical model of model sampling, creation of a repository for MME results, studies of possible differential weighting of models in an ensemble, creation of single model ensembles based on sampling from the uncertainty distribution of parameter values or inputs specifically oriented toward uncertainty estimation, the creation of super ensembles that sample more than one source of uncertainty, the analysis of super ensemble results to obtain information on total uncertainty and the separate contributions of different sources of uncertainty and finally further investigation of the use of the multi-model mean or median as a predictor.
Vogelmann, Andrew M.; Fridlind, Ann M.; Toto, Tami; ...
2015-06-19
Observation-based modeling case studies of continental boundary layer clouds have been developed to study cloudy boundary layers, aerosol influences upon them, and their representation in cloud- and global-scale models. Three 60-hour case study periods span the temporal evolution of cumulus, stratiform, and drizzling boundary layer cloud systems, representing mixed and transitional states rather than idealized or canonical cases. Based on in-situ measurements from the RACORO field campaign and remote-sensing observations, the cases are designed with a modular configuration to simplify use in large-eddy simulations (LES) and single-column models. Aircraft measurements of aerosol number size distribution are fit to lognormal functionsmore » for concise representation in models. Values of the aerosol hygroscopicity parameter, κ, are derived from observations to be ~0.10, which are lower than the 0.3 typical over continents and suggestive of a large aerosol organic fraction. Ensemble large-scale forcing datasets are derived from the ARM variational analysis, ECMWF forecasts, and a multi-scale data assimilation system. The forcings are assessed through comparison of measured bulk atmospheric and cloud properties to those computed in 'trial' large-eddy simulations, where more efficient run times are enabled through modest reductions in grid resolution and domain size compared to the full-sized LES grid. Simulations capture many of the general features observed, but the state-of-the-art forcings were limited at representing details of cloud onset, and tight gradients and high-resolution transients of importance. Methods for improving the initial conditions and forcings are discussed. The cases developed are available to the general modeling community for studying continental boundary clouds.« less
NASA Technical Reports Server (NTRS)
Vogelmann, Andrew M.; Fridlind, Ann M.; Toto, Tami; Endo, Satoshi; Lin, Wuyin; Wang, Jian; Feng, Sha; Zhang, Yunyan; Turner, David D.; Liu, Yangang;
2015-01-01
Observation-based modeling case studies of continental boundary layer clouds have been developed to study cloudy boundary layers, aerosol influences upon them, and their representation in cloud- and global-scale models. Three 60 h case study periods span the temporal evolution of cumulus, stratiform, and drizzling boundary layer cloud systems, representing mixed and transitional states rather than idealized or canonical cases. Based on in situ measurements from the Routine AAF (Atmospheric Radiation Measurement (ARM) Aerial Facility) CLOWD (Clouds with Low Optical Water Depth) Optical Radiative Observations (RACORO) field campaign and remote sensing observations, the cases are designed with a modular configuration to simplify use in large-eddy simulations (LES) and single-column models. Aircraft measurements of aerosol number size distribution are fit to lognormal functions for concise representation in models. Values of the aerosol hygroscopicity parameter, kappa, are derived from observations to be approximately 0.10, which are lower than the 0.3 typical over continents and suggestive of a large aerosol organic fraction. Ensemble large-scale forcing data sets are derived from the ARM variational analysis, European Centre for Medium-Range Weather Forecasts, and a multiscale data assimilation system. The forcings are assessed through comparison of measured bulk atmospheric and cloud properties to those computed in "trial" large-eddy simulations, where more efficient run times are enabled through modest reductions in grid resolution and domain size compared to the full-sized LES grid. Simulations capture many of the general features observed, but the state-of-the-art forcings were limited at representing details of cloud onset, and tight gradients and high-resolution transients of importance. Methods for improving the initial conditions and forcings are discussed. The cases developed are available to the general modeling community for studying continental boundary clouds.
NASA Astrophysics Data System (ADS)
Vogelmann, Andrew M.; Fridlind, Ann M.; Toto, Tami; Endo, Satoshi; Lin, Wuyin; Wang, Jian; Feng, Sha; Zhang, Yunyan; Turner, David D.; Liu, Yangang; Li, Zhijin; Xie, Shaocheng; Ackerman, Andrew S.; Zhang, Minghua; Khairoutdinov, Marat
2015-06-01
Observation-based modeling case studies of continental boundary layer clouds have been developed to study cloudy boundary layers, aerosol influences upon them, and their representation in cloud- and global-scale models. Three 60 h case study periods span the temporal evolution of cumulus, stratiform, and drizzling boundary layer cloud systems, representing mixed and transitional states rather than idealized or canonical cases. Based on in situ measurements from the Routine AAF (Atmospheric Radiation Measurement (ARM) Aerial Facility) CLOWD (Clouds with Low Optical Water Depth) Optical Radiative Observations (RACORO) field campaign and remote sensing observations, the cases are designed with a modular configuration to simplify use in large-eddy simulations (LES) and single-column models. Aircraft measurements of aerosol number size distribution are fit to lognormal functions for concise representation in models. Values of the aerosol hygroscopicity parameter, κ, are derived from observations to be 0.10, which are lower than the 0.3 typical over continents and suggestive of a large aerosol organic fraction. Ensemble large-scale forcing data sets are derived from the ARM variational analysis, European Centre for Medium-Range Weather Forecasts, and a multiscale data assimilation system. The forcings are assessed through comparison of measured bulk atmospheric and cloud properties to those computed in "trial" large-eddy simulations, where more efficient run times are enabled through modest reductions in grid resolution and domain size compared to the full-sized LES grid. Simulations capture many of the general features observed, but the state-of-the-art forcings were limited at representing details of cloud onset, and tight gradients and high-resolution transients of importance. Methods for improving the initial conditions and forcings are discussed. The cases developed are available to the general modeling community for studying continental boundary clouds.
NASA Astrophysics Data System (ADS)
Fayaz, S. M.; Rajanikant, G. K.
2014-07-01
Programmed cell death has been a fascinating area of research since it throws new challenges and questions in spite of the tremendous ongoing research in this field. Recently, necroptosis, a programmed form of necrotic cell death, has been implicated in many diseases including neurological disorders. Receptor interacting serine/threonine protein kinase 1 (RIPK1) is an important regulatory protein involved in the necroptosis and inhibition of this protein is essential to stop necroptotic process and eventually cell death. Current structure-based virtual screening methods involve a wide range of strategies and recently, considering the multiple protein structures for pharmacophore extraction has been emphasized as a way to improve the outcome. However, using the pharmacophoric information completely during docking is very important. Further, in such methods, using the appropriate protein structures for docking is desirable. If not, potential compound hits, obtained through pharmacophore-based screening, may not have correct ranks and scores after docking. Therefore, a comprehensive integration of different ensemble methods is essential, which may provide better virtual screening results. In this study, dual ensemble screening, a novel computational strategy was used to identify diverse and potent inhibitors against RIPK1. All the pharmacophore features present in the binding site were captured using both the apo and holo protein structures and an ensemble pharmacophore was built by combining these features. This ensemble pharmacophore was employed in pharmacophore-based screening of ZINC database. The compound hits, thus obtained, were subjected to ensemble docking. The leads acquired through docking were further validated through feature evaluation and molecular dynamics simulation.
Constructing better classifier ensemble based on weighted accuracy and diversity measure.
Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure
Chao, Lidia S.
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402
Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.
2007-01-01
To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.
Ren, Fulong; Cao, Peng; Li, Wei; Zhao, Dazhe; Zaiane, Osmar
2017-01-01
Diabetic retinopathy (DR) is a progressive disease, and its detection at an early stage is crucial for saving a patient's vision. An automated screening system for DR can help in reduce the chances of complete blindness due to DR along with lowering the work load on ophthalmologists. Among the earliest signs of DR are microaneurysms (MAs). However, current schemes for MA detection appear to report many false positives because detection algorithms have high sensitivity. Inevitably some non-MAs structures are labeled as MAs in the initial MAs identification step. This is a typical "class imbalance problem". Class imbalanced data has detrimental effects on the performance of conventional classifiers. In this work, we propose an ensemble based adaptive over-sampling algorithm for overcoming the class imbalance problem in the false positive reduction, and we use Boosting, Bagging, Random subspace as the ensemble framework to improve microaneurysm detection. The ensemble based over-sampling methods we proposed combine the strength of adaptive over-sampling and ensemble. The objective of the amalgamation of ensemble and adaptive over-sampling is to reduce the induction biases introduced from imbalanced data and to enhance the generalization classification performance of extreme learning machines (ELM). Experimental results show that our ASOBoost method has higher area under the ROC curve (AUC) and G-mean values than many existing class imbalance learning methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Oh, Seok-Geun; Suh, Myoung-Seok
2017-07-01
The projection skills of five ensemble methods were analyzed according to simulation skills, training period, and ensemble members, using 198 sets of pseudo-simulation data (PSD) produced by random number generation assuming the simulated temperature of regional climate models. The PSD sets were classified into 18 categories according to the relative magnitude of bias, variance ratio, and correlation coefficient, where each category had 11 sets (including 1 truth set) with 50 samples. The ensemble methods used were as follows: equal weighted averaging without bias correction (EWA_NBC), EWA with bias correction (EWA_WBC), weighted ensemble averaging based on root mean square errors and correlation (WEA_RAC), WEA based on the Taylor score (WEA_Tay), and multivariate linear regression (Mul_Reg). The projection skills of the ensemble methods improved generally as compared with the best member for each category. However, their projection skills are significantly affected by the simulation skills of the ensemble member. The weighted ensemble methods showed better projection skills than non-weighted methods, in particular, for the PSD categories having systematic biases and various correlation coefficients. The EWA_NBC showed considerably lower projection skills than the other methods, in particular, for the PSD categories with systematic biases. Although Mul_Reg showed relatively good skills, it showed strong sensitivity to the PSD categories, training periods, and number of members. On the other hand, the WEA_Tay and WEA_RAC showed relatively superior skills in both the accuracy and reliability for all the sensitivity experiments. This indicates that WEA_Tay and WEA_RAC are applicable even for simulation data with systematic biases, a short training period, and a small number of ensemble members.
ERIC Educational Resources Information Center
Hall, Richard
2015-01-01
Ensemble work is a key part of any performance-based popular music course and involves students replicating existing music or playing "covers". The creative process in popular music is a collaborative one and the ensemble workshop can be utilised to facilitate active learning and develop musical creativity within a group setting. This is…
Stress-stress fluctuation formula for elastic constants in the NPT ensemble
NASA Astrophysics Data System (ADS)
Lips, Dominik; Maass, Philipp
2018-05-01
Several fluctuation formulas are available for calculating elastic constants from equilibrium correlation functions in computer simulations, but the ones available for simulations at constant pressure exhibit slow convergence properties and cannot be used for the determination of local elastic constants. To overcome these drawbacks, we derive a stress-stress fluctuation formula in the NPT ensemble based on known expressions in the NVT ensemble. We validate the formula in the NPT ensemble by calculating elastic constants for the simple nearest-neighbor Lennard-Jones crystal and by comparing the results with those obtained in the NVT ensemble. For both local and bulk elastic constants we find an excellent agreement between the simulated data in the two ensembles. To demonstrate the usefulness of the formula, we apply it to determine the elastic constants of a simulated lipid bilayer.
Sampling-based ensemble segmentation against inter-operator variability
NASA Astrophysics Data System (ADS)
Huo, Jing; Okada, Kazunori; Pope, Whitney; Brown, Matthew
2011-03-01
Inconsistency and a lack of reproducibility are commonly associated with semi-automated segmentation methods. In this study, we developed an ensemble approach to improve reproducibility and applied it to glioblastoma multiforme (GBM) brain tumor segmentation on T1-weigted contrast enhanced MR volumes. The proposed approach combines samplingbased simulations and ensemble segmentation into a single framework; it generates a set of segmentations by perturbing user initialization and user-specified internal parameters, then fuses the set of segmentations into a single consensus result. Three combination algorithms were applied: majority voting, averaging and expectation-maximization (EM). The reproducibility of the proposed framework was evaluated by a controlled experiment on 16 tumor cases from a multicenter drug trial. The ensemble framework had significantly better reproducibility than the individual base Otsu thresholding method (p<.001).
Gavrishchaka, Valeriy; Senyukova, Olga; Davis, Kristina
2015-01-01
Previously, we have proposed to use complementary complexity measures discovered by boosting-like ensemble learning for the enhancement of quantitative indicators dealing with necessarily short physiological time series. We have confirmed robustness of such multi-complexity measures for heart rate variability analysis with the emphasis on detection of emerging and intermittent cardiac abnormalities. Recently, we presented preliminary results suggesting that such ensemble-based approach could be also effective in discovering universal meta-indicators for early detection and convenient monitoring of neurological abnormalities using gait time series. Here, we argue and demonstrate that these multi-complexity ensemble measures for gait time series analysis could have significantly wider application scope ranging from diagnostics and early detection of physiological regime change to gait-based biometrics applications.
Impact of ensemble learning in the assessment of skeletal maturity.
Cunha, Pedro; Moura, Daniel C; Guevara López, Miguel Angel; Guerra, Conceição; Pinto, Daniela; Ramos, Isabel
2014-09-01
The assessment of the bone age, or skeletal maturity, is an important task in pediatrics that measures the degree of maturation of children's bones. Nowadays, there is no standard clinical procedure for assessing bone age and the most widely used approaches are the Greulich and Pyle and the Tanner and Whitehouse methods. Computer methods have been proposed to automatize the process; however, there is a lack of exploration about how to combine the features of the different parts of the hand, and how to take advantage of ensemble techniques for this purpose. This paper presents a study where the use of ensemble techniques for improving bone age assessment is evaluated. A new computer method was developed that extracts descriptors for each joint of each finger, which are then combined using different ensemble schemes for obtaining a final bone age value. Three popular ensemble schemes are explored in this study: bagging, stacking and voting. Best results were achieved by bagging with a rule-based regression (M5P), scoring a mean absolute error of 10.16 months. Results show that ensemble techniques improve the prediction performance of most of the evaluated regression algorithms, always achieving best or comparable to best results. Therefore, the success of the ensemble methods allow us to conclude that their use may improve computer-based bone age assessment, offering a scalable option for utilizing multiple regions of interest and combining their output.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jie; Draxl, Caroline; Hopson, Thomas
Numerical weather prediction (NWP) models have been widely used for wind resource assessment. Model runs with higher spatial resolution are generally more accurate, yet extremely computational expensive. An alternative approach is to use data generated by a low resolution NWP model, in conjunction with statistical methods. In order to analyze the accuracy and computational efficiency of different types of NWP-based wind resource assessment methods, this paper performs a comparison of three deterministic and probabilistic NWP-based wind resource assessment methodologies: (i) a coarse resolution (0.5 degrees x 0.67 degrees) global reanalysis data set, the Modern-Era Retrospective Analysis for Research and Applicationsmore » (MERRA); (ii) an analog ensemble methodology based on the MERRA, which provides both deterministic and probabilistic predictions; and (iii) a fine resolution (2-km) NWP data set, the Wind Integration National Dataset (WIND) Toolkit, based on the Weather Research and Forecasting model. Results show that: (i) as expected, the analog ensemble and WIND Toolkit perform significantly better than MERRA confirming their ability to downscale coarse estimates; (ii) the analog ensemble provides the best estimate of the multi-year wind distribution at seven of the nine sites, while the WIND Toolkit is the best at one site; (iii) the WIND Toolkit is more accurate in estimating the distribution of hourly wind speed differences, which characterizes the wind variability, at five of the available sites, with the analog ensemble being best at the remaining four locations; and (iv) the analog ensemble computational cost is negligible, whereas the WIND Toolkit requires large computational resources. Future efforts could focus on the combination of the analog ensemble with intermediate resolution (e.g., 10-15 km) NWP estimates, to considerably reduce the computational burden, while providing accurate deterministic estimates and reliable probabilistic assessments.« less
Bioactive focus in conformational ensembles: a pluralistic approach
NASA Astrophysics Data System (ADS)
Habgood, Matthew
2017-12-01
Computational generation of conformational ensembles is key to contemporary drug design. Selecting the members of the ensemble that will approximate the conformation most likely to bind to a desired target (the bioactive conformation) is difficult, given that the potential energy usually used to generate and rank the ensemble is a notoriously poor discriminator between bioactive and non-bioactive conformations. In this study an approach to generating a focused ensemble is proposed in which each conformation is assigned multiple rankings based not just on potential energy but also on solvation energy, hydrophobic or hydrophilic interaction energy, radius of gyration, and on a statistical potential derived from Cambridge Structural Database data. The best ranked structures derived from each system are then assembled into a new ensemble that is shown to be better focused on bioactive conformations. This pluralistic approach is tested on ensembles generated by the Molecular Operating Environment's Low Mode Molecular Dynamics module, and by the Cambridge Crystallographic Data Centre's conformation generator software.
Peer Connectedness in the Middle School Band Program
ERIC Educational Resources Information Center
Rawlings, Jared R.; Stoddard, Sarah A.
2017-01-01
Previous research suggests that students participating in school-based musical ensembles are more engaged in school and more likely to connect to their peers in school; however, researchers have not specifically investigated peer connectedness among adolescents in school-based music ensembles. The purpose of this study was to explore middle school…
Assessment in Performance-Based Secondary Music Classes
ERIC Educational Resources Information Center
Pellegrino, Kristen; Conway, Colleen M.; Russell, Joshua A.
2015-01-01
After sharing research findings about grading and assessment practices in secondary music ensemble classes, we offer examples of commonly used assessment tools (ratings scale, checklist, rubric) for the performance ensemble. Then, we explore the various purposes of assessment in performance-based music courses: (1) to meet state, national, and…
NASA Astrophysics Data System (ADS)
Liu, Fang; Cao, San-xing; Lu, Rui
2012-04-01
This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.
NWS Operational Requirements for Ensemble-Based Hydrologic Forecasts
NASA Astrophysics Data System (ADS)
Hartman, R. K.
2008-12-01
Ensemble-based hydrologic forecasts have been developed and issued by National Weather Service (NWS) staff at River Forecast Centers (RFCs) for many years. Used principally for long-range water supply forecasts, only the uncertainty associated with weather and climate have been traditionally considered. As technology and societal expectations of resource managers increase, the use and desire for risk-based decision support tools has also increased. These tools require forecast information that includes reliable uncertainty estimates across all time and space domains. The development of reliable uncertainty estimates associated with hydrologic forecasts is being actively pursued within the United States and internationally. This presentation will describe the challenges, components, and requirements for operational hydrologic ensemble-based forecasts from the perspective of a NOAA/NWS River Forecast Center.
Verification of Ensemble Forecasts for the New York City Operations Support Tool
NASA Astrophysics Data System (ADS)
Day, G.; Schaake, J. C.; Thiemann, M.; Draijer, S.; Wang, L.
2012-12-01
The New York City water supply system operated by the Department of Environmental Protection (DEP) serves nine million people. It covers 2,000 square miles of portions of the Catskill, Delaware, and Croton watersheds, and it includes nineteen reservoirs and three controlled lakes. DEP is developing an Operations Support Tool (OST) to support its water supply operations and planning activities. OST includes historical and real-time data, a model of the water supply system complete with operating rules, and lake water quality models developed to evaluate alternatives for managing turbidity in the New York City Catskill reservoirs. OST will enable DEP to manage turbidity in its unfiltered system while satisfying its primary objective of meeting the City's water supply needs, in addition to considering secondary objectives of maintaining ecological flows, supporting fishery and recreation releases, and mitigating downstream flood peaks. The current version of OST relies on statistical forecasts of flows in the system based on recent observed flows. To improve short-term decision making, plans are being made to transition to National Weather Service (NWS) ensemble forecasts based on hydrologic models that account for short-term weather forecast skill, longer-term climate information, as well as the hydrologic state of the watersheds and recent observed flows. To ensure that the ensemble forecasts are unbiased and that the ensemble spread reflects the actual uncertainty of the forecasts, a statistical model has been developed to post-process the NWS ensemble forecasts to account for hydrologic model error as well as any inherent bias and uncertainty in initial model states, meteorological data and forecasts. The post-processor is designed to produce adjusted ensemble forecasts that are consistent with the DEP historical flow sequences that were used to develop the system operating rules. A set of historical hindcasts that is representative of the real-time ensemble forecasts is needed to verify that the post-processed forecasts are unbiased, statistically reliable, and preserve the skill inherent in the "raw" NWS ensemble forecasts. A verification procedure and set of metrics will be presented that provide an objective assessment of ensemble forecasts. The procedure will be applied to both raw ensemble hindcasts and to post-processed ensemble hindcasts. The verification metrics will be used to validate proper functioning of the post-processor and to provide a benchmark for comparison of different types of forecasts. For example, current NWS ensemble forecasts are based on climatology, using each historical year to generate a forecast trace. The NWS Hydrologic Ensemble Forecast System (HEFS) under development will utilize output from both the National Oceanic Atmospheric Administration (NOAA) Global Ensemble Forecast System (GEFS) and the Climate Forecast System (CFS). Incorporating short-term meteorological forecasts and longer-term climate forecast information should provide sharper, more accurate forecasts. Hindcasts from HEFS will enable New York City to generate verification results to validate the new forecasts and further fine-tune system operating rules. Project verification results will be presented for different watersheds across a range of seasons, lead times, and flow levels to assess the quality of the current ensemble forecasts.
Effect of Data Assimilation Parameters on The Optimized Surface CO2 Flux in Asia
NASA Astrophysics Data System (ADS)
Kim, Hyunjung; Kim, Hyun Mee; Kim, Jinwoong; Cho, Chun-Ho
2018-02-01
In this study, CarbonTracker, an inverse modeling system based on the ensemble Kalman filter, was used to evaluate the effects of data assimilation parameters (assimilation window length and ensemble size) on the estimation of surface CO2 fluxes in Asia. Several experiments with different parameters were conducted, and the results were verified using CO2 concentration observations. The assimilation window lengths tested were 3, 5, 7, and 10 weeks, and the ensemble sizes were 100, 150, and 300. Therefore, a total of 12 experiments using combinations of these parameters were conducted. The experimental period was from January 2006 to December 2009. Differences between the optimized surface CO2 fluxes of the experiments were largest in the Eurasian Boreal (EB) area, followed by Eurasian Temperate (ET) and Tropical Asia (TA), and were larger in boreal summer than in boreal winter. The effect of ensemble size on the optimized biosphere flux is larger than the effect of the assimilation window length in Asia, but the importance of them varies in specific regions in Asia. The optimized biosphere flux was more sensitive to the assimilation window length in EB, whereas it was sensitive to the ensemble size as well as the assimilation window length in ET. The larger the ensemble size and the shorter the assimilation window length, the larger the uncertainty (i.e., spread of ensemble) of optimized surface CO2 fluxes. The 10-week assimilation window and 300 ensemble size were the optimal configuration for CarbonTracker in the Asian region based on several verifications using CO2 concentration measurements.
Knowledge-Based Methods To Train and Optimize Virtual Screening Ensembles
2016-01-01
Ensemble docking can be a successful virtual screening technique that addresses the innate conformational heterogeneity of macromolecular drug targets. Yet, lacking a method to identify a subset of conformational states that effectively segregates active and inactive small molecules, ensemble docking may result in the recommendation of a large number of false positives. Here, three knowledge-based methods that construct structural ensembles for virtual screening are presented. Each method selects ensembles by optimizing an objective function calculated using the receiver operating characteristic (ROC) curve: either the area under the ROC curve (AUC) or a ROC enrichment factor (EF). As the number of receptor conformations, N, becomes large, the methods differ in their asymptotic scaling. Given a set of small molecules with known activities and a collection of target conformations, the most resource intense method is guaranteed to find the optimal ensemble but scales as O(2N). A recursive approximation to the optimal solution scales as O(N2), and a more severe approximation leads to a faster method that scales linearly, O(N). The techniques are generally applicable to any system, and we demonstrate their effectiveness on the androgen nuclear hormone receptor (AR), cyclin-dependent kinase 2 (CDK2), and the peroxisome proliferator-activated receptor δ (PPAR-δ) drug targets. Conformations that consisted of a crystal structure and molecular dynamics simulation cluster centroids were used to form AR and CDK2 ensembles. Multiple available crystal structures were used to form PPAR-δ ensembles. For each target, we show that the three methods perform similarly to one another on both the training and test sets. PMID:27097522
Effects of ensemble and summary displays on interpretations of geospatial uncertainty data.
Padilla, Lace M; Ruginski, Ian T; Creem-Regehr, Sarah H
2017-01-01
Ensemble and summary displays are two widely used methods to represent visual-spatial uncertainty; however, there is disagreement about which is the most effective technique to communicate uncertainty to the general public. Visualization scientists create ensemble displays by plotting multiple data points on the same Cartesian coordinate plane. Despite their use in scientific practice, it is more common in public presentations to use visualizations of summary displays, which scientists create by plotting statistical parameters of the ensemble members. While prior work has demonstrated that viewers make different decisions when viewing summary and ensemble displays, it is unclear what components of the displays lead to diverging judgments. This study aims to compare the salience of visual features - or visual elements that attract bottom-up attention - as one possible source of diverging judgments made with ensemble and summary displays in the context of hurricane track forecasts. We report that salient visual features of both ensemble and summary displays influence participant judgment. Specifically, we find that salient features of summary displays of geospatial uncertainty can be misunderstood as displaying size information. Further, salient features of ensemble displays evoke judgments that are indicative of accurate interpretations of the underlying probability distribution of the ensemble data. However, when participants use ensemble displays to make point-based judgments, they may overweight individual ensemble members in their decision-making process. We propose that ensemble displays are a promising alternative to summary displays in a geospatial context but that decisions about visualization methods should be informed by the viewer's task.
Verifying and Postprocesing the Ensemble Spread-Error Relationship
NASA Astrophysics Data System (ADS)
Hopson, Tom; Knievel, Jason; Liu, Yubao; Roux, Gregory; Wu, Wanli
2013-04-01
With the increased utilization of ensemble forecasts in weather and hydrologic applications, there is a need to verify their benefit over less expensive deterministic forecasts. One such potential benefit of ensemble systems is their capacity to forecast their own forecast error through the ensemble spread-error relationship. The paper begins by revisiting the limitations of the Pearson correlation alone in assessing this relationship. Next, we introduce two new metrics to consider in assessing the utility an ensemble's varying dispersion. We argue there are two aspects of an ensemble's dispersion that should be assessed. First, and perhaps more fundamentally: is there enough variability in the ensembles dispersion to justify the maintenance of an expensive ensemble prediction system (EPS), irrespective of whether the EPS is well-calibrated or not? To diagnose this, the factor that controls the theoretical upper limit of the spread-error correlation can be useful. Secondly, does the variable dispersion of an ensemble relate to variable expectation of forecast error? Representing the spread-error correlation in relation to its theoretical limit can provide a simple diagnostic of this attribute. A context for these concepts is provided by assessing two operational ensembles: 30-member Western US temperature forecasts for the U.S. Army Test and Evaluation Command and 51-member Brahmaputra River flow forecasts of the Climate Forecast and Applications Project for Bangladesh. Both of these systems utilize a postprocessing technique based on quantile regression (QR) under a step-wise forward selection framework leading to ensemble forecasts with both good reliability and sharpness. In addition, the methodology utilizes the ensemble's ability to self-diagnose forecast instability to produce calibrated forecasts with informative skill-spread relationships. We will describe both ensemble systems briefly, review the steps used to calibrate the ensemble forecast, and present verification statistics using error-spread metrics, along with figures from operational ensemble forecasts before and after calibration.
Analog-Based Postprocessing of Navigation-Related Hydrological Ensemble Forecasts
NASA Astrophysics Data System (ADS)
Hemri, S.; Klein, B.
2017-11-01
Inland waterway transport benefits from probabilistic forecasts of water levels as they allow to optimize the ship load and, hence, to minimize the transport costs. Probabilistic state-of-the-art hydrologic ensemble forecasts inherit biases and dispersion errors from the atmospheric ensemble forecasts they are driven with. The use of statistical postprocessing techniques like ensemble model output statistics (EMOS) allows for a reduction of these systematic errors by fitting a statistical model based on training data. In this study, training periods for EMOS are selected based on forecast analogs, i.e., historical forecasts that are similar to the forecast to be verified. Due to the strong autocorrelation of water levels, forecast analogs have to be selected based on entire forecast hydrographs in order to guarantee similar hydrograph shapes. Custom-tailored measures of similarity for forecast hydrographs comprise hydrological series distance (SD), the hydrological matching algorithm (HMA), and dynamic time warping (DTW). Verification against observations reveals that EMOS forecasts for water level at three gauges along the river Rhine with training periods selected based on SD, HMA, and DTW compare favorably with reference EMOS forecasts, which are based on either seasonal training periods or on training periods obtained by dividing the hydrological forecast trajectories into runoff regimes.
An evaluation of soil water outlooks for winter wheat in south-eastern Australia
NASA Astrophysics Data System (ADS)
Western, A. W.; Dassanayake, K. B.; Perera, K. C.; Alves, O.; Young, G.; Argent, R.
2015-12-01
Abstract: Soil moisture is a key limiting resource for rain-fed cropping in Australian broad-acre cropping zones. Seasonal rainfall and temperature outlooks are standard operational services offered by the Australian Bureau of Meteorology and are routinely used to support agricultural decisions. This presentation examines the performance of proposed soil water seasonal outlooks in the context of wheat cropping in south-eastern Australia (autumn planting, late spring harvest). We used weather ensembles simulated by the Predictive Ocean-Atmosphere Model for Australia (POAMA), as input to the Agricultural Production Simulator (APSIM) to construct ensemble soil water "outlooks" at twenty sites. Hindcasts were made over a 33 year period using the 33 POAMA ensemble members. The overall modelling flow involved: 1. Downscaling of the daily weather series (rainfall, minimum and maximum temperature, humidity, radiation) from the ~250km POAMA grid scale to a local weather station using quantile-quantile correction. This was based on a 33 year observation record extracted from the SILO data drill product. 2. Using APSIM to produce soil water ensembles from the downscaled weather ensembles. A warm up period of 5 years of observed weather was followed by a 9 month hindcast period based on each ensemble member. 3. The soil water ensembles were summarized by estimating the proportion of outlook ensembles in each climatological tercile, where the climatology was constructed using APSIM and observed weather from the 33 years of hindcasts at the relevant site. 4. The soil water outlooks were evaluated for different lead times and months using a "truth" run of APSIM based on observed weather. Outlooks generally have useful some forecast skill for lead times of up to two-three months, except late spring; in line with current useful lead times for rainfall outlooks. Better performance was found in summer and autumn when vegetation cover and water use is low.
Adaptive correction of ensemble forecasts
NASA Astrophysics Data System (ADS)
Pelosi, Anna; Battista Chirico, Giovanni; Van den Bergh, Joris; Vannitsem, Stephane
2017-04-01
Forecasts from numerical weather prediction (NWP) models often suffer from both systematic and non-systematic errors. These are present in both deterministic and ensemble forecasts, and originate from various sources such as model error and subgrid variability. Statistical post-processing techniques can partly remove such errors, which is particularly important when NWP outputs concerning surface weather variables are employed for site specific applications. Many different post-processing techniques have been developed. For deterministic forecasts, adaptive methods such as the Kalman filter are often used, which sequentially post-process the forecasts by continuously updating the correction parameters as new ground observations become available. These methods are especially valuable when long training data sets do not exist. For ensemble forecasts, well-known techniques are ensemble model output statistics (EMOS), and so-called "member-by-member" approaches (MBM). Here, we introduce a new adaptive post-processing technique for ensemble predictions. The proposed method is a sequential Kalman filtering technique that fully exploits the information content of the ensemble. One correction equation is retrieved and applied to all members, however the parameters of the regression equations are retrieved by exploiting the second order statistics of the forecast ensemble. We compare our new method with two other techniques: a simple method that makes use of a running bias correction of the ensemble mean, and an MBM post-processing approach that rescales the ensemble mean and spread, based on minimization of the Continuous Ranked Probability Score (CRPS). We perform a verification study for the region of Campania in southern Italy. We use two years (2014-2015) of daily meteorological observations of 2-meter temperature and 10-meter wind speed from 18 ground-based automatic weather stations distributed across the region, comparing them with the corresponding COSMO-LEPS ensemble forecasts. Deterministic verification scores (e.g., mean absolute error, bias) and probabilistic scores (e.g., CRPS) are used to evaluate the post-processing techniques. We conclude that the new adaptive method outperforms the simpler running bias-correction. The proposed adaptive method often outperforms the MBM method in removing bias. The MBM method has the advantage of correcting the ensemble spread, although it needs more training data.
Sun, Jian; Yang, Xiurong
2015-12-15
Based on the specific binding of Cu(2+) ions to the 11-mercaptoundecanoic acid (11-MUA)-protected AuNCs with intense orange-red emission, we have proposed and constructed a novel fluorescent nanomaterials-metal ions ensemble at a nonfluorescence off-state. Subsequently, an AuNCs@11-MUA-Cu(2+) ensemble-based fluorescent chemosensor, which is amenable to convenient, sensitive, selective, turn-on and real-time assay of acetylcholinesterase (AChE), could be developed by using acetylthiocholine (ATCh) as the substrate. Herein, the sensing ensemble solution exhibits a marvelous fluorescent enhancement in the presence of AChE and ATCh, where AChE hydrolyzes its active substrate ATCh into thiocholine (TCh), and then TCh captures Cu(2+) from the ensemble, accompanied by the conversion from fluorescence off-state to on-state of the AuNCs. The AChE activity could be detected less than 0.05 mU/mL within a good linear range from 0.05 to 2.5 mU/mL. Our proposed fluorescence assay can be utilized to evaluate the AChE activity quantitatively in real biological sample, and furthermore to screen the inhibitor of AChE. As far as we know, the present study has reported the first analytical proposal for sensing AChE activity in real time by using a fluorescent nanomaterials-Cu(2+) ensemble or focusing on the Cu(2+)-triggered fluorescence quenching/recovery. This strategy paves a new avenue for exploring the biosensing applications of fluorescent AuNCs, and presents the prospect of AuNCs@11-MUA-Cu(2+) ensemble as versatile enzyme activity assay platforms by means of other appropriate substrates/analytes. Copyright © 2015 Elsevier B.V. All rights reserved.
Evaluating Alignment of Shapes by Ensemble Visualization
Raj, Mukund; Mirzargar, Mahsa; Preston, J. Samuel; Kirby, Robert M.; Whitaker, Ross T.
2016-01-01
The visualization of variability in surfaces embedded in 3D, which is a type of ensemble uncertainty visualization, provides a means of understanding the underlying distribution of a collection or ensemble of surfaces. Although ensemble visualization for isosurfaces has been described in the literature, we conduct an expert-based evaluation of various ensemble visualization techniques in a particular medical imaging application: the construction of atlases or templates from a population of images. In this work, we extend contour boxplot to 3D, allowing us to evaluate it against an enumeration-style visualization of the ensemble members and other conventional visualizations used by atlas builders, namely examining the atlas image and the corresponding images/data provided as part of the construction process. We present feedback from domain experts on the efficacy of contour boxplot compared to other modalities when used as part of the atlas construction and analysis stages of their work. PMID:26186768
A study of fuzzy logic ensemble system performance on face recognition problem
NASA Astrophysics Data System (ADS)
Polyakova, A.; Lipinskiy, L.
2017-02-01
Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.
The Road to DLCZ Protocol in Rubidium Ensemble
NASA Astrophysics Data System (ADS)
Li, Chang; Pu, Yunfei; Jiang, Nan; Chang, Wei; Zhang, Sheng; CenterQuantum Information, InstituteInterdisciplinary Information Sciences, Tsinghua Univ Team
2017-04-01
Quantum communication is the powerful approach achieving a fully secure information transferal. The DLCZ protocol ensures that photon linearly decays with transferring distance increasing, which improves the success potential and shortens the time to build up an entangled channel. Apart from that, it provides an advanced idea that building up a quantum internet based on different nodes connected to different sites and themselves. In our laboratory, three sets of laser-cooled Rubidium 87 ensemble have been built. Two of them serve as the single photon emitter, which generate the entanglement between ensemble and photon. What's more, crossed AODs are equipped to multiplex and demultiplex optical circuit so that ensemble is divided into 2 hundred of 2D sub-memory cells. And the third ensemble is used as quantum telecommunication, which converts 780nm photon into telecom-wavelength one. And we have been building double-MOT system, which provides more atoms in ensemble and larger optical density.
ESA GlobPermafrost - mapping the extent and thermal state of permafrost with satellite data
NASA Astrophysics Data System (ADS)
Westermann, Sebastian; Obu, Jaroslav; Aalstad, Kristoffer; Bartsch, Annett; Kääb, Andreas
2017-04-01
The ESA GlobPermafrost initiative (2016-2019) aims at developing, validating and implementing information products based on remote sensing data to support permafrost research. Mapping of permafrost extent and ground temperatures is conducted at 1 km scale using remotely sensed land surface temperatures (MODIS), snow water equivalent (ESA GlobSnow) and land cover (ESA CCI landcover) in conjunction with a simple ground thermal model (CryoGrid 1). The spatial variability of the ground thermal regime at scales smaller than the model resolution is explicitly taken into account by considering an ensemble of realizations with different model properties. The approach has been tested for the unglacierized land areas in the North Atlantic region, an area of more than 5 million km2. The results have been compared to in-situ temperature measurements in more than 100 boreholes, indicating an accuracy of approximately 2.5°C. Within GlobPermafrost, the scheme will be extended to cover the entire the circum-polar permafrost area. Here, we provide an evaluation of the first prototype covering "lowland" permafrost areas north of 40° latitude (available on www.globpermafrost.info in early 2017). We give a feasibility assessment for extending the scheme to global scale, including both mountain and Antarctic permafrost. Finally, we discuss the potential and limitations for estimating changes of permafrost extent on decadal timescales.
Modality-Driven Classification and Visualization of Ensemble Variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bensema, Kevin; Gosink, Luke; Obermaier, Harald
Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no informationmore » about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.« less
On the Likely Utility of Hybrid Weights Optimized for Variances in Hybrid Error Covariance Models
NASA Astrophysics Data System (ADS)
Satterfield, E.; Hodyss, D.; Kuhl, D.; Bishop, C. H.
2017-12-01
Because of imperfections in ensemble data assimilation schemes, one cannot assume that the ensemble covariance is equal to the true error covariance of a forecast. Previous work demonstrated how information about the distribution of true error variances given an ensemble sample variance can be revealed from an archive of (observation-minus-forecast, ensemble-variance) data pairs. Here, we derive a simple and intuitively compelling formula to obtain the mean of this distribution of true error variances given an ensemble sample variance from (observation-minus-forecast, ensemble-variance) data pairs produced by a single run of a data assimilation system. This formula takes the form of a Hybrid weighted average of the climatological forecast error variance and the ensemble sample variance. Here, we test the extent to which these readily obtainable weights can be used to rapidly optimize the covariance weights used in Hybrid data assimilation systems that employ weighted averages of static covariance models and flow-dependent ensemble based covariance models. Univariate data assimilation and multi-variate cycling ensemble data assimilation are considered. In both cases, it is found that our computationally efficient formula gives Hybrid weights that closely approximate the optimal weights found through the simple but computationally expensive process of testing every plausible combination of weights.
Cluster ensemble based on Random Forests for genetic data.
Alhusain, Luluah; Hafez, Alaaeldin M
2017-01-01
Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.
Causal network in a deafferented non-human primate brain.
Balasubramanian, Karthikeyan; Takahashi, Kazutaka; Hatsopoulos, Nicholas G
2015-01-01
De-afferented/efferented neural ensembles can undergo causal changes when interfaced to neuroprosthetic devices. These changes occur via recruitment or isolation of neurons, alterations in functional connectivity within the ensemble and/or changes in the role of neurons, i.e., excitatory/inhibitory. In this work, emergence of a causal network and changes in the dynamics are demonstrated for a deafferented brain region exposed to BMI (brain-machine interface) learning. The BMI was controlling a robot for reach-and-grasp behavior. And, the motor cortical regions used for the BMI were deafferented due to chronic amputation, and ensembles of neurons were decoded for velocity control of the multi-DOF robot. A generalized linear model-framework based Granger causality (GLM-GC) technique was used in estimating the ensemble connectivity. Model selection was based on the AIC (Akaike Information Criterion).
An ensemble predictive modeling framework for breast cancer classification.
Nagarajan, Radhakrishnan; Upreti, Meenakshi
2017-12-01
Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
An ensemble forecast of the South China Sea monsoon
NASA Astrophysics Data System (ADS)
Krishnamurti, T. N.; Tewari, Mukul; Bensman, Ed; Han, Wei; Zhang, Zhan; Lau, William K. M.
1999-05-01
This paper presents a generalized ensemble forecast procedure for the tropical latitudes. Here we propose an empirical orthogonal function-based procedure for the definition of a seven-member ensemble. The wind and the temperature fields are perturbed over the global tropics. Although the forecasts are made over the global belt with a high-resolution model, the emphasis of this study is on a South China Sea monsoon. Over this domain of the South China Sea includes the passage of a Tropical Storm, Gary, that moved eastwards north of the Philippines. The ensemble forecast handled the precipitation of this storm reasonably well. A global model at the resolution Triangular Truncation 126 waves is used to carry out these seven forecasts. The evaluation of the ensemble of forecasts is carried out via standard root mean square errors of the precipitation and the wind fields. The ensemble average is shown to have a higher skill compared to a control experiment, which was a first analysis based on operational data sets over both the global tropical and South China Sea domain. All of these experiments were subjected to physical initialization which provides a spin-up of the model rain close to that obtained from satellite and gauge-based estimates. The results furthermore show that inherently much higher skill resides in the forecast precipitation fields if they are averaged over area elements of the order of 4° latitude by 4° longitude squares.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
Implicit ligand theory for relative binding free energies
NASA Astrophysics Data System (ADS)
Nguyen, Trung Hai; Minh, David D. L.
2018-03-01
Implicit ligand theory enables noncovalent binding free energies to be calculated based on an exponential average of the binding potential of mean force (BPMF)—the binding free energy between a flexible ligand and rigid receptor—over a precomputed ensemble of receptor configurations. In the original formalism, receptor configurations were drawn from or reweighted to the apo ensemble. Here we show that BPMFs averaged over a holo ensemble yield binding free energies relative to the reference ligand that specifies the ensemble. When using receptor snapshots from an alchemical simulation with a single ligand, the new statistical estimator outperforms the original.
Embedded feature ranking for ensemble MLP classifiers.
Windeatt, Terry; Duangsoithong, Rakkrit; Smith, Raymond
2011-06-01
A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Progressive freezing of interacting spins in isolated finite magnetic ensembles
NASA Astrophysics Data System (ADS)
Bhattacharya, Kakoli; Dupuis, Veronique; Le-Roy, Damien; Deb, Pritam
2017-02-01
Self-organization of magnetic nanoparticles into secondary nanostructures provides an innovative way for designing functional nanomaterials with novel properties, different from the constituent primary nanoparticles as well as their bulk counterparts. Collective magnetic properties of such complex closed packing of magnetic nanoparticles makes them more appealing than the individual magnetic nanoparticles in many technological applications. This work reports the collective magnetic behaviour of magnetic ensembles comprising of single domain Fe3O4 nanoparticles. The present work reveals that the ensemble formation is based on the re-orientation and attachment of the nanoparticles in an iso-oriented fashion at the mesoscale regime. Comprehensive dc magnetic measurements show the prevalence of strong interparticle interactions in the ensembles. Due to the close range organization of primary Fe3O4 nanoparticles in the ensemble, the spins of the individual nanoparticles interact through dipolar interactions as realized from remnant magnetization measurements. Signature of super spin glass like behaviour in the ensembles is observed in the memory studies carried out in field cooled conditions. Progressive freezing of spins in the ensembles is corroborated from the Vogel-Fulcher fit of the susceptibility data. Dynamic scaling of relaxation reasserted slow spin dynamics substantiating cluster spin glass like behaviour in the ensembles.
NASA Astrophysics Data System (ADS)
Walcott, Sam
2013-03-01
Interactions between the proteins actin and myosin drive muscle contraction. Properties of a single myosin interacting with an actin filament are largely known, but a trillion myosins work together in muscle. We are interested in how single-molecule properties relate to ensemble function. Myosin's reaction rates depend on force, so ensemble models keep track of both molecular state and force on each molecule. These models make subtle predictions, e.g. that myosin, when part of an ensemble, moves actin faster than when isolated. This acceleration arises because forces between molecules speed reaction kinetics. Experiments support this prediction and allow parameter estimates. A model based on this analysis describes experiments from single molecule to ensemble. In vivo, actin is regulated by proteins that, when present, cause the binding of one myosin to speed the binding of its neighbors; binding becomes cooperative. Although such interactions preclude the mean field approximation, a set of linear ODEs describes these ensembles under simplified experimental conditions. In these experiments cooperativity is strong, with the binding of one molecule affecting ten neighbors on either side. We progress toward a description of myosin ensembles under physiological conditions.
Khaki, M; Forootan, E; Kuhn, M; Awange, J; Papa, F; Shum, C K
2018-06-01
Climate change can significantly influence terrestrial water changes around the world particularly in places that have been proven to be more vulnerable such as Bangladesh. In the past few decades, climate impacts, together with those of excessive human water use have changed the country's water availability structure. In this study, we use multi-mission remotely sensed measurements along with a hydrological model to separately analyze groundwater and soil moisture variations for the period 2003-2013, and their interactions with rainfall in Bangladesh. To improve the model's estimates of water storages, terrestrial water storage (TWS) data obtained from the Gravity Recovery And Climate Experiment (GRACE) satellite mission are assimilated into the World-Wide Water Resources Assessment (W3RA) model using the ensemble-based sequential technique of the Square Root Analysis (SQRA) filter. We investigate the capability of the data assimilation approach to use a non-regional hydrological model for a regional case study. Based on these estimates, we investigate relationships between the model derived sub-surface water storage changes and remotely sensed precipitations, as well as altimetry-derived river level variations in Bangladesh by applying the empirical mode decomposition (EMD) method. A larger correlation is found between river level heights and rainfalls (78% on average) in comparison to groundwater storage variations and rainfalls (57% on average). The results indicate a significant decline in groundwater storage (∼32% reduction) for Bangladesh between 2003 and 2013, which is equivalent to an average rate of 8.73 ± 2.45mm/year. Copyright © 2018 Elsevier B.V. All rights reserved.
Exploiting ensemble learning for automatic cataract detection and grading.
Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing
2016-02-01
Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Szunyogh, Istvan; Kostelich, Eric J.; Gyarmati, G.; Patil, D. J.; Hunt, Brian R.; Kalnay, Eugenia; Ott, Edward; Yorke, James A.
2005-08-01
The accuracy and computational efficiency of the recently proposed local ensemble Kalman filter (LEKF) data assimilation scheme is investigated on a state-of-the-art operational numerical weather prediction model using simulated observations. The model selected for this purpose is the T62 horizontal- and 28-level vertical-resolution version of the Global Forecast System (GFS) of the National Center for Environmental Prediction. The performance of the data assimilation system is assessed for different configurations of the LEKF scheme. It is shown that a modest size (40-member) ensemble is sufficient to track the evolution of the atmospheric state with high accuracy. For this ensemble size, the computational time per analysis is less than 9 min on a cluster of PCs. The analyses are extremely accurate in the mid-latitude storm track regions. The largest analysis errors, which are typically much smaller than the observational errors, occur where parametrized physical processes play important roles. Because these are also the regions where model errors are expected to be the largest, limitations of a real-data implementation of the ensemble-based Kalman filter may be easily mistaken for model errors. In light of these results, the importance of testing the ensemble-based Kalman filter data assimilation systems on simulated observations is stressed.
A Bayesian Ensemble Approach for Epidemiological Projections
Lindström, Tom; Tildesley, Michael; Webb, Colleen
2015-01-01
Mathematical models are powerful tools for epidemiology and can be used to compare control actions. However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensembles based on different parameterizations of the Warwick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks. PMID:25927892
Mazurowski, Maciej A; Zurada, Jacek M; Tourassi, Georgia D
2009-07-01
Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC = 0.905 +/- 0.024) in performance as compared to the original IT-CAD system (AUC = 0.865 +/- 0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.
Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou
2018-03-27
P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
Model dependence and its effect on ensemble projections in CMIP5
NASA Astrophysics Data System (ADS)
Abramowitz, G.; Bishop, C.
2013-12-01
Conceptually, the notion of model dependence within climate model ensembles is relatively simple - modelling groups share a literature base, parametrisations, data sets and even model code - the potential for dependence in sampling different climate futures is clear. How though can this conceptual problem inform a practical solution that demonstrably improves the ensemble mean and ensemble variance as an estimate of system uncertainty? While some research has already focused on error correlation or error covariance as a candidate to improve ensemble mean estimates, a complete definition of independence must at least implicitly subscribe to an ensemble interpretation paradigm, such as the 'truth-plus-error', 'indistinguishable', or more recently 'replicate Earth' paradigm. Using a definition of model dependence based on error covariance within the replicate Earth paradigm, this presentation will show that accounting for dependence in surface air temperature gives cooler projections in CMIP5 - by as much as 20% globally in some RCPs - although results differ significantly for each RCP, especially regionally. The fact that the change afforded by accounting for dependence across different RCPs is different is not an inconsistent result. Different numbers of submissions to each RCP by different modelling groups mean that differences in projections from different RCPs are not entirely about RCP forcing conditions - they also reflect different sampling strategies.
Multiscale Macromolecular Simulation: Role of Evolving Ensembles
Singharoy, A.; Joshi, H.; Ortoleva, P.J.
2013-01-01
Multiscale analysis provides an algorithm for the efficient simulation of macromolecular assemblies. This algorithm involves the coevolution of a quasiequilibrium probability density of atomic configurations and the Langevin dynamics of spatial coarse-grained variables denoted order parameters (OPs) characterizing nanoscale system features. In practice, implementation of the probability density involves the generation of constant OP ensembles of atomic configurations. Such ensembles are used to construct thermal forces and diffusion factors that mediate the stochastic OP dynamics. Generation of all-atom ensembles at every Langevin timestep is computationally expensive. Here, multiscale computation for macromolecular systems is made more efficient by a method that self-consistently folds in ensembles of all-atom configurations constructed in an earlier step, history, of the Langevin evolution. This procedure accounts for the temporal evolution of these ensembles, accurately providing thermal forces and diffusions. It is shown that efficiency and accuracy of the OP-based simulations is increased via the integration of this historical information. Accuracy improves with the square root of the number of historical timesteps included in the calculation. As a result, CPU usage can be decreased by a factor of 3-8 without loss of accuracy. The algorithm is implemented into our existing force-field based multiscale simulation platform and demonstrated via the structural dynamics of viral capsomers. PMID:22978601
SoilGrids250m: Global gridded soil information based on machine learning
Mendes de Jesus, Jorge; Heuvelink, Gerard B. M.; Ruiperez Gonzalez, Maria; Kilibarda, Milan; Blagotić, Aleksandar; Shangguan, Wei; Wright, Marvin N.; Geng, Xiaoyuan; Bauer-Marschallinger, Bernhard; Guevara, Mario Antonio; Vargas, Rodrigo; MacMillan, Robert A.; Batjes, Niels H.; Leenaars, Johan G. B.; Ribeiro, Eloi; Wheeler, Ichsani; Mantel, Stephan; Kempen, Bas
2017-01-01
This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods—random forest and gradient boosting and/or multinomial logistic regression—as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10–fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License. PMID:28207752
NASA Astrophysics Data System (ADS)
Qi, W.; Zhang, C.; Fu, G.; Sweetapple, C.; Zhou, H.
2016-02-01
The applicability of six fine-resolution precipitation products, including precipitation radar, infrared, microwave and gauge-based products, using different precipitation computation recipes, is evaluated using statistical and hydrological methods in northeastern China. In addition, a framework quantifying uncertainty contributions of precipitation products, hydrological models, and their interactions to uncertainties in ensemble discharges is proposed. The investigated precipitation products are Tropical Rainfall Measuring Mission (TRMM) products (TRMM3B42 and TRMM3B42RT), Global Land Data Assimilation System (GLDAS)/Noah, Asian Precipitation - Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN), and a Global Satellite Mapping of Precipitation (GSMAP-MVK+) product. Two hydrological models of different complexities, i.e. a water and energy budget-based distributed hydrological model and a physically based semi-distributed hydrological model, are employed to investigate the influence of hydrological models on simulated discharges. Results show APHRODITE has high accuracy at a monthly scale compared with other products, and GSMAP-MVK+ shows huge advantage and is better than TRMM3B42 in relative bias (RB), Nash-Sutcliffe coefficient of efficiency (NSE), root mean square error (RMSE), correlation coefficient (CC), false alarm ratio, and critical success index. These findings could be very useful for validation, refinement, and future development of satellite-based products (e.g. NASA Global Precipitation Measurement). Although large uncertainty exists in heavy precipitation, hydrological models contribute most of the uncertainty in extreme discharges. Interactions between precipitation products and hydrological models can have the similar magnitude of contribution to discharge uncertainty as the hydrological models. A better precipitation product does not guarantee a better discharge simulation because of interactions. It is also found that a good discharge simulation depends on a good coalition of a hydrological model and a precipitation product, suggesting that, although the satellite-based precipitation products are not as accurate as the gauge-based products, they could have better performance in discharge simulations when appropriately combined with hydrological models. This information is revealed for the first time and very beneficial for precipitation product applications.
SoilGrids250m: Global gridded soil information based on machine learning.
Hengl, Tomislav; Mendes de Jesus, Jorge; Heuvelink, Gerard B M; Ruiperez Gonzalez, Maria; Kilibarda, Milan; Blagotić, Aleksandar; Shangguan, Wei; Wright, Marvin N; Geng, Xiaoyuan; Bauer-Marschallinger, Bernhard; Guevara, Mario Antonio; Vargas, Rodrigo; MacMillan, Robert A; Batjes, Niels H; Leenaars, Johan G B; Ribeiro, Eloi; Wheeler, Ichsani; Mantel, Stephan; Kempen, Bas
2017-01-01
This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods-random forest and gradient boosting and/or multinomial logistic regression-as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10-fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.
Global ensemble texture representations are critical to rapid scene perception.
Brady, Timothy F; Shafer-Skelton, Anna; Alvarez, George A
2017-06-01
Traditionally, recognizing the objects within a scene has been treated as a prerequisite to recognizing the scene itself. However, research now suggests that the ability to rapidly recognize visual scenes could be supported by global properties of the scene itself rather than the objects within the scene. Here, we argue for a particular instantiation of this view: That scenes are recognized by treating them as a global texture and processing the pattern of orientations and spatial frequencies across different areas of the scene without recognizing any objects. To test this model, we asked whether there is a link between how proficient individuals are at rapid scene perception and how proficiently they represent simple spatial patterns of orientation information (global ensemble texture). We find a significant and selective correlation between these tasks, suggesting a link between scene perception and spatial ensemble tasks but not nonspatial summary statistics In a second and third experiment, we additionally show that global ensemble texture information is not only associated with scene recognition, but that preserving only global ensemble texture information from scenes is sufficient to support rapid scene perception; however, preserving the same information is not sufficient for object recognition. Thus, global ensemble texture alone is sufficient to allow activation of scene representations but not object representations. Together, these results provide evidence for a view of scene recognition based on global ensemble texture rather than a view based purely on objects or on nonspatially localized global properties. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Merker, Claire; Ament, Felix; Clemens, Marco
2017-04-01
The quantification of measurement uncertainty for rain radar data remains challenging. Radar reflectivity measurements are affected, amongst other things, by calibration errors, noise, blocking and clutter, and attenuation. Their combined impact on measurement accuracy is difficult to quantify due to incomplete process understanding and complex interdependencies. An improved quality assessment of rain radar measurements is of interest for applications both in meteorology and hydrology, for example for precipitation ensemble generation, rainfall runoff simulations, or in data assimilation for numerical weather prediction. Especially a detailed description of the spatial and temporal structure of errors is beneficial in order to make best use of the areal precipitation information provided by radars. Radar precipitation ensembles are one promising approach to represent spatially variable radar measurement errors. We present a method combining ensemble radar precipitation nowcasting with data assimilation to estimate radar measurement uncertainty at each pixel. This combination of ensemble forecast and observation yields a consistent spatial and temporal evolution of the radar error field. We use an advection-based nowcasting method to generate an ensemble reflectivity forecast from initial data of a rain radar network. Subsequently, reflectivity data from single radars is assimilated into the forecast using the Local Ensemble Transform Kalman Filter. The spread of the resulting analysis ensemble provides a flow-dependent, spatially and temporally correlated reflectivity error estimate at each pixel. We will present first case studies that illustrate the method using data from a high-resolution X-band radar network.
NASA Astrophysics Data System (ADS)
Shen, Feifei; Xu, Dongmei; Xue, Ming; Min, Jinzhong
2017-07-01
This study examines the impacts of assimilating radar radial velocity (Vr) data for the simulation of hurricane Ike (2008) with two different ensemble generation techniques in the framework of the hybrid ensemble-variational (EnVar) data assimilation system of Weather Research and Forecasting model. For the generation of ensemble perturbations we apply two techniques, the ensemble transform Kalman filter (ETKF) and the ensemble of data assimilation (EDA). For the ETKF-EnVar, the forecast ensemble perturbations are updated by the ETKF, while for the EDA-EnVar, the hybrid is employed to update each ensemble member with perturbed observations. The ensemble mean is analyzed by the hybrid method with flow-dependent ensemble covariance for both EnVar. The sensitivity of analyses and forecasts to the two applied ensemble generation techniques is investigated in our current study. It is found that the EnVar system is rather stable with different ensemble update techniques in terms of its skill on improving the analyses and forecasts. The EDA-EnVar-based ensemble perturbations are likely to include slightly less organized spatial structures than those in ETKF-EnVar, and the perturbations of the latter are constructed more dynamically. Detailed diagnostics reveal that both of the EnVar schemes not only produce positive temperature increments around the hurricane center but also systematically adjust the hurricane location with the hurricane-specific error covariance. On average, the analysis and forecast from the ETKF-EnVar have slightly smaller errors than that from the EDA-EnVar in terms of track, intensity, and precipitation forecast. Moreover, ETKF-EnVar yields better forecasts when verified against conventional observations.
Designing boosting ensemble of relational fuzzy systems.
Scherer, Rafał
2010-10-01
A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.
NASA Astrophysics Data System (ADS)
Li, Xuejian; Mao, Fangjie; Du, Huaqiang; Zhou, Guomo; Xu, Xiaojun; Han, Ning; Sun, Shaobo; Gao, Guolong; Chen, Liang
2017-04-01
Subtropical forest ecosystems play essential roles in the global carbon cycle and in carbon sequestration functions, which challenge the traditional understanding of the main functional areas of carbon sequestration in the temperate forests of Europe and America. The leaf area index (LAI) is an important biological parameter in the spatiotemporal simulation of the carbon cycle, and it has considerable significance in carbon cycle research. Dynamic retrieval based on remote sensing data is an important method with which to obtain large-scale high-accuracy assessments of LAI. This study developed an algorithm for assimilating LAI dynamics based on an integrated ensemble Kalman filter using MODIS LAI data, MODIS reflectance data, and canopy reflectance data modeled by PROSAIL, for three typical types of subtropical forest (Moso bamboo forest, Lei bamboo forest, and evergreen and deciduous broadleaf forest) in China during 2014-2015. There were some errors of assimilation in winter, because of the bad data quality of the MODIS product. Overall, the assimilated LAI well matched the observed LAI, with R2 of 0.82, 0.93, and 0.87, RMSE of 0.73, 0.49, and 0.42, and aBIAS of 0.50, 0.23, and 0.03 for Moso bamboo forest, Lei bamboo forest, and evergreen and deciduous broadleaf forest, respectively. The algorithm greatly decreased the uncertainty of the MODIS LAI in the growing season and it improved the accuracy of the MODIS LAI. The advantage of the algorithm is its use of biophysical parameters (e.g., measured LAI) in the LAI assimilation, which makes it possible to assimilate long-term MODIS LAI time series data, and to provide high-accuracy LAI data for the study of carbon cycle characteristics in subtropical forest ecosystems.
NASA Astrophysics Data System (ADS)
Michel, Dominik; Hirschi, Martin; Jimenez, Carlos; McCabe, Mathew; Miralles, Diego; Wood, Eric; Seneviratne, Sonia
2014-05-01
Research on climate variations and the development of predictive capabilities largely rely on globally available reference data series of the different components of the energy and water cycles. Several efforts aimed at producing large-scale and long-term reference data sets of these components, e.g. based on in situ observations and remote sensing, in order to allow for diagnostic analyses of the drivers of temporal variations in the climate system. Evapotranspiration (ET) is an essential component of the energy and water cycle, which can not be monitored directly on a global scale by remote sensing techniques. In recent years, several global multi-year ET data sets have been derived from remote sensing-based estimates, observation-driven land surface model simulations or atmospheric reanalyses. The LandFlux-EVAL initiative presented an ensemble-evaluation of these data sets over the time periods 1989-1995 and 1989-2005 (Mueller et al. 2013). Currently, a multi-decadal global reference heat flux data set for ET at the land surface is being developed within the LandFlux initiative of the Global Energy and Water Cycle Experiment (GEWEX). This LandFlux v0 ET data set comprises four ET algorithms forced with a common radiation and surface meteorology. In order to estimate the agreement of this LandFlux v0 ET data with existing data sets, it is compared to the recently available LandFlux-EVAL synthesis benchmark product. Additional evaluation of the LandFlux v0 ET data set is based on a comparison to in situ observations of a weighing lysimeter from the hydrological research site Rietholzbach in Switzerland. These analyses serve as a test bed for similar evaluation procedures that are envisaged for ESA's WACMOS-ET initiative (http://wacmoset.estellus.eu). Reference: Mueller, B., Hirschi, M., Jimenez, C., Ciais, P., Dirmeyer, P. A., Dolman, A. J., Fisher, J. B., Jung, M., Ludwig, F., Maignan, F., Miralles, D. G., McCabe, M. F., Reichstein, M., Sheffield, J., Wang, K., Wood, E. F., Zhang, Y., and Seneviratne, S. I. (2013). Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis. Hydrology and Earth System Sciences, 17(10): 3707-3720.
NASA Astrophysics Data System (ADS)
Saito, Kazuo; Hara, Masahiro; Kunii, Masaru; Seko, Hiromu; Yamaguchi, Munehiko
2011-05-01
Different initial perturbation methods for the mesoscale ensemble prediction were compared by the Meteorological Research Institute (MRI) as a part of the intercomparison of mesoscale ensemble prediction systems (EPSs) of the World Weather Research Programme (WWRP) Beijing 2008 Olympics Research and Development Project (B08RDP). Five initial perturbation methods for mesoscale ensemble prediction were developed for B08RDP and compared at MRI: (1) a downscaling method of the Japan Meteorological Agency (JMA)'s operational one-week EPS (WEP), (2) a targeted global model singular vector (GSV) method, (3) a mesoscale model singular vector (MSV) method based on the adjoint model of the JMA non-hydrostatic model (NHM), (4) a mesoscale breeding growing mode (MBD) method based on the NHM forecast and (5) a local ensemble transform (LET) method based on the local ensemble transform Kalman filter (LETKF) using NHM. These perturbation methods were applied to the preliminary experiments of the B08RDP Tier-1 mesoscale ensemble prediction with a horizontal resolution of 15 km. To make the comparison easier, the same horizontal resolution (40 km) was employed for the three mesoscale model-based initial perturbation methods (MSV, MBD and LET). The GSV method completely outperformed the WEP method, confirming the advantage of targeting in mesoscale EPS. The GSV method generally performed well with regard to root mean square errors of the ensemble mean, large growth rates of ensemble spreads throughout the 36-h forecast period, and high detection rates and high Brier skill scores (BSSs) for weak rains. On the other hand, the mesoscale model-based initial perturbation methods showed good detection rates and BSSs for intense rains. The MSV method showed a rapid growth in the ensemble spread of precipitation up to a forecast time of 6 h, which suggests suitability of the mesoscale SV for short-range EPSs, but the initial large growth of the perturbation did not last long. The performance of the MBD method was good for ensemble prediction of intense rain with a relatively small computing cost. The LET method showed similar characteristics to the MBD method, but the spread and growth rate were slightly smaller and the relative operating characteristic area skill score and BSS did not surpass those of MBD. These characteristic features of the five methods were confirmed by checking the evolution of the total energy norms and their growth rates. Characteristics of the initial perturbations obtained by four methods (GSV, MSV, MBD and LET) were examined for the case of a synoptic low-pressure system passing over eastern China. With GSV and MSV, the regions of large spread were near the low-pressure system, but with MSV, the distribution was more concentrated on the mesoscale disturbance. On the other hand, large-spread areas were observed southwest of the disturbance in MBD and LET. The horizontal pattern of LET perturbation was similar to that of MBD, but the amplitude of the LET perturbation reflected the observation density.
NASA Astrophysics Data System (ADS)
Kobayashi, Kenichiro; Otsuka, Shigenori; Apip; Saito, Kazuo
2016-08-01
This paper presents a study on short-term ensemble flood forecasting specifically for small dam catchments in Japan. Numerical ensemble simulations of rainfall from the Japan Meteorological Agency nonhydrostatic model (JMA-NHM) are used as the input data to a rainfall-runoff model for predicting river discharge into a dam. The ensemble weather simulations use a conventional 10 km and a high-resolution 2 km spatial resolutions. A distributed rainfall-runoff model is constructed for the Kasahori dam catchment (approx. 70 km2) and applied with the ensemble rainfalls. The results show that the hourly maximum and cumulative catchment-average rainfalls of the 2 km resolution JMA-NHM ensemble simulation are more appropriate than the 10 km resolution rainfalls. All the simulated inflows based on the 2 and 10 km rainfalls become larger than the flood discharge of 140 m3 s-1, a threshold value for flood control. The inflows with the 10 km resolution ensemble rainfall are all considerably smaller than the observations, while at least one simulated discharge out of 11 ensemble members with the 2 km resolution rainfalls reproduces the first peak of the inflow at the Kasahori dam with similar amplitude to observations, although there are spatiotemporal lags between simulation and observation. To take positional lags into account of the ensemble discharge simulation, the rainfall distribution in each ensemble member is shifted so that the catchment-averaged cumulative rainfall of the Kasahori dam maximizes. The runoff simulation with the position-shifted rainfalls shows much better results than the original ensemble discharge simulations.
van Diedenhoven, Bastiaan; Ackerman, Andrew S.; Fridlind, Ann M.; Cairns, Brian
2017-01-01
The use of ensemble-average values of aspect ratio and distortion parameter of hexagonal ice prisms for the estimation of ensemble-average scattering asymmetry parameters is evaluated. Using crystal aspect ratios greater than unity generally leads to ensemble-average values of aspect ratio that are inconsistent with the ensemble-average asymmetry parameters. When a definition of aspect ratio is used that limits the aspect ratio to below unity (α≤1) for both hexagonal plates and columns, the effective asymmetry parameters calculated using ensemble-average aspect ratios are generally consistent with ensemble-average asymmetry parameters, especially if aspect ratios are geometrically averaged. Ensemble-average distortion parameters generally also yield effective asymmetry parameters that are largely consistent with ensemble-average asymmetry parameters. In the case of mixtures of plates and columns, it is recommended to geometrically average the α≤1 aspect ratios and to subsequently calculate the effective asymmetry parameter using a column or plate geometry when the contribution by columns to a given mixture’s total projected area is greater or lower than 50%, respectively. In addition, we show that ensemble-average aspect ratios, distortion parameters and asymmetry parameters can generally be retrieved accurately from simulated multi-directional polarization measurements based on mixtures of varying columns and plates. However, such retrievals tend to be somewhat biased toward yielding column-like aspect ratios. Furthermore, generally large retrieval errors can occur for mixtures with approximately equal contributions of columns and plates and for ensembles with strong contributions of thin plates. PMID:28983127
NASA Astrophysics Data System (ADS)
Khajehei, Sepideh; Moradkhani, Hamid
2015-04-01
Producing reliable and accurate hydrologic ensemble forecasts are subject to various sources of uncertainty, including meteorological forcing, initial conditions, model structure, and model parameters. Producing reliable and skillful precipitation ensemble forecasts is one approach to reduce the total uncertainty in hydrological applications. Currently, National Weather Prediction (NWP) models are developing ensemble forecasts for various temporal ranges. It is proven that raw products from NWP models are biased in mean and spread. Given the above state, there is a need for methods that are able to generate reliable ensemble forecasts for hydrological applications. One of the common techniques is to apply statistical procedures in order to generate ensemble forecast from NWP-generated single-value forecasts. The procedure is based on the bivariate probability distribution between the observation and single-value precipitation forecast. However, one of the assumptions of the current method is fitting Gaussian distribution to the marginal distributions of observed and modeled climate variable. Here, we have described and evaluated a Bayesian approach based on Copula functions to develop an ensemble precipitation forecast from the conditional distribution of single-value precipitation forecasts. Copula functions are known as the multivariate joint distribution of univariate marginal distributions, which are presented as an alternative procedure in capturing the uncertainties related to meteorological forcing. Copulas are capable of modeling the joint distribution of two variables with any level of correlation and dependency. This study is conducted over a sub-basin in the Columbia River Basin in USA using the monthly precipitation forecasts from Climate Forecast System (CFS) with 0.5x0.5 Deg. spatial resolution to reproduce the observations. The verification is conducted on a different period and the superiority of the procedure is compared with Ensemble Pre-Processor approach currently used by National Weather Service River Forecast Centers in USA.
Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.
Kilic, Niyazi; Hosgormez, Erkan
2016-03-01
Ensemble learning methods are one of the most powerful tools for the pattern classification problems. In this paper, the effects of ensemble learning methods and some physical bone densitometry parameters on osteoporotic fracture detection were investigated. Six feature set models were constructed including different physical parameters and they fed into the ensemble classifiers as input features. As ensemble learning techniques, bagging, gradient boosting and random subspace (RSM) were used. Instance based learning (IBk) and random forest (RF) classifiers applied to six feature set models. The patients were classified into three groups such as osteoporosis, osteopenia and control (healthy), using ensemble classifiers. Total classification accuracy and f-measure were also used to evaluate diagnostic performance of the proposed ensemble classification system. The classification accuracy has reached to 98.85 % by the combination of model 6 (five BMD + five T-score values) using RSM-RF classifier. The findings of this paper suggest that the patients will be able to be warned before a bone fracture occurred, by just examining some physical parameters that can easily be measured without invasive operations.
A hybrid variational ensemble data assimilation for the HIgh Resolution Limited Area Model (HIRLAM)
NASA Astrophysics Data System (ADS)
Gustafsson, N.; Bojarova, J.; Vignes, O.
2014-02-01
A hybrid variational ensemble data assimilation has been developed on top of the HIRLAM variational data assimilation. It provides the possibility of applying a flow-dependent background error covariance model during the data assimilation at the same time as full rank characteristics of the variational data assimilation are preserved. The hybrid formulation is based on an augmentation of the assimilation control variable with localised weights to be assigned to a set of ensemble member perturbations (deviations from the ensemble mean). The flow-dependency of the hybrid assimilation is demonstrated in single simulated observation impact studies and the improved performance of the hybrid assimilation in comparison with pure 3-dimensional variational as well as pure ensemble assimilation is also proven in real observation assimilation experiments. The performance of the hybrid assimilation is comparable to the performance of the 4-dimensional variational data assimilation. The sensitivity to various parameters of the hybrid assimilation scheme and the sensitivity to the applied ensemble generation techniques are also examined. In particular, the inclusion of ensemble perturbations with a lagged validity time has been examined with encouraging results.
Uehara, Shota; Tanaka, Shigenori
2017-04-24
Protein flexibility is a major hurdle in current structure-based virtual screening (VS). In spite of the recent advances in high-performance computing, protein-ligand docking methods still demand tremendous computational cost to take into account the full degree of protein flexibility. In this context, ensemble docking has proven its utility and efficiency for VS studies, but it still needs a rational and efficient method to select and/or generate multiple protein conformations. Molecular dynamics (MD) simulations are useful to produce distinct protein conformations without abundant experimental structures. In this study, we present a novel strategy that makes use of cosolvent-based molecular dynamics (CMD) simulations for ensemble docking. By mixing small organic molecules into a solvent, CMD can stimulate dynamic protein motions and induce partial conformational changes of binding pocket residues appropriate for the binding of diverse ligands. The present method has been applied to six diverse target proteins and assessed by VS experiments using many actives and decoys of DEKOIS 2.0. The simulation results have revealed that the CMD is beneficial for ensemble docking. Utilizing cosolvent simulation allows the generation of druggable protein conformations, improving the VS performance compared with the use of a single experimental structure or ensemble docking by standard MD with pure water as the solvent.
Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan
2015-06-01
Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.
Improving wave forecasting by integrating ensemble modelling and machine learning
NASA Astrophysics Data System (ADS)
O'Donncha, F.; Zhang, Y.; James, S. C.
2017-12-01
Modern smart-grid networks use technologies to instantly relay information on supply and demand to support effective decision making. Integration of renewable-energy resources with these systems demands accurate forecasting of energy production (and demand) capacities. For wave-energy converters, this requires wave-condition forecasting to enable estimates of energy production. Current operational wave forecasting systems exhibit substantial errors with wave-height RMSEs of 40 to 60 cm being typical, which limits the reliability of energy-generation predictions thereby impeding integration with the distribution grid. In this study, we integrate physics-based models with statistical learning aggregation techniques that combine forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The Simulating Waves Nearshore physics-based model is used to compute wind- and currents-augmented waves in the Monterey Bay area. Ensembles are developed based on multiple simulations perturbing input data (wave characteristics supplied at the model boundaries and winds) to the model. A learning-aggregation technique uses past observations and past model forecasts to calculate a weight for each model. The aggregated forecasts are compared to observation data to quantify the performance of the model ensemble and aggregation techniques. The appropriately weighted ensemble model outperforms an individual ensemble member with regard to forecasting wave conditions.
Encoding of Spatial Attention by Primate Prefrontal Cortex Neuronal Ensembles
Treue, Stefan
2018-01-01
Abstract Single neurons in the primate lateral prefrontal cortex (LPFC) encode information about the allocation of visual attention and the features of visual stimuli. However, how this compares to the performance of neuronal ensembles at encoding the same information is poorly understood. Here, we recorded the responses of neuronal ensembles in the LPFC of two macaque monkeys while they performed a task that required attending to one of two moving random dot patterns positioned in different hemifields and ignoring the other pattern. We found single units selective for the location of the attended stimulus as well as for its motion direction. To determine the coding of both variables in the population of recorded units, we used a linear classifier and progressively built neuronal ensembles by iteratively adding units according to their individual performance (best single units), or by iteratively adding units based on their contribution to the ensemble performance (best ensemble). For both methods, ensembles of relatively small sizes (n < 60) yielded substantially higher decoding performance relative to individual single units. However, the decoder reached similar performance using fewer neurons with the best ensemble building method compared with the best single units method. Our results indicate that neuronal ensembles within the LPFC encode more information about the attended spatial and nonspatial features of visual stimuli than individual neurons. They further suggest that efficient coding of attention can be achieved by relatively small neuronal ensembles characterized by a certain relationship between signal and noise correlation structures. PMID:29568798
A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.
Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen
2016-05-10
Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Brekke, L. D.; Prairie, J.; Pruitt, T.; Rajagopalan, B.; Woodhouse, C.
2008-12-01
Water resources adaptation planning under climate change involves making assumptions about probabilistic water supply conditions, which are linked to a given climate context (e.g., instrument records, paleoclimate indicators, projected climate data, or blend of these). Methods have been demonstrated to associate water supply assumptions with any of these climate information types. Additionally, demonstrations have been offered that represent these information types in a scenario-rich (ensemble) planning framework, either via ensembles (e.g., survey of many climate projections) or stochastic modeling (e.g., based on instrument records or paleoclimate indicators). If the planning goal involves using a hydrologic ensemble that jointly reflects paleoclimate (e.g., lower- frequency variations) and projected climate information (e.g., monthly to annual trends), methods are required to guide how these information types might be translated into water supply assumptions. However, even if such a method exists, there is lack of understanding on how such a hydrologic ensemble might differ from ensembles developed relative to paleoclimate or projected climate information alone. This research explores two questions: (1) how might paleoclimate and projected climate information be blended into an planning hydrologic ensemble, and (2) how does a planning hydrologic ensemble differ when associated with the individual climate information types (i.e. instrumental records, paleoclimate, projected climate, or blend of the latter two). Case study basins include the Gunnison River Basin in Colorado and the Missouri River Basin above Toston in Montana. Presentation will highlight ensemble development methods by information type, and comparison of ensemble results.
An information-theoretical perspective on weighted ensemble forecasts
NASA Astrophysics Data System (ADS)
Weijs, Steven V.; van de Giesen, Nick
2013-08-01
This paper presents an information-theoretical method for weighting ensemble forecasts with new information. Weighted ensemble forecasts can be used to adjust the distribution that an existing ensemble of time series represents, without modifying the values in the ensemble itself. The weighting can, for example, add new seasonal forecast information in an existing ensemble of historically measured time series that represents climatic uncertainty. A recent article in this journal compared several methods to determine the weights for the ensemble members and introduced the pdf-ratio method. In this article, a new method, the minimum relative entropy update (MRE-update), is presented. Based on the principle of minimum discrimination information, an extension of the principle of maximum entropy (POME), the method ensures that no more information is added to the ensemble than is present in the forecast. This is achieved by minimizing relative entropy, with the forecast information imposed as constraints. From this same perspective, an information-theoretical view on the various weighting methods is presented. The MRE-update is compared with the existing methods and the parallels with the pdf-ratio method are analysed. The paper provides a new, information-theoretical justification for one version of the pdf-ratio method that turns out to be equivalent to the MRE-update. All other methods result in sets of ensemble weights that, seen from the information-theoretical perspective, add either too little or too much (i.e. fictitious) information to the ensemble.
Design of an Evolutionary Approach for Intrusion Detection
2013-01-01
A novel evolutionary approach is proposed for effective intrusion detection based on benchmark datasets. The proposed approach can generate a pool of noninferior individual solutions and ensemble solutions thereof. The generated ensembles can be used to detect the intrusions accurately. For intrusion detection problem, the proposed approach could consider conflicting objectives simultaneously like detection rate of each attack class, error rate, accuracy, diversity, and so forth. The proposed approach can generate a pool of noninferior solutions and ensembles thereof having optimized trade-offs values of multiple conflicting objectives. In this paper, a three-phase, approach is proposed to generate solutions to a simple chromosome design in the first phase. In the first phase, a Pareto front of noninferior individual solutions is approximated. In the second phase of the proposed approach, the entire solution set is further refined to determine effective ensemble solutions considering solution interaction. In this phase, another improved Pareto front of ensemble solutions over that of individual solutions is approximated. The ensemble solutions in improved Pareto front reported improved detection results based on benchmark datasets for intrusion detection. In the third phase, a combination method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. Benchmark datasets, namely, KDD cup 1999 and ISCX 2012 dataset, are used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover individual solutions and ensemble solutions thereof with a good support and a detection rate from benchmark datasets (in comparison with well-known ensemble methods like bagging and boosting). In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives, and a dataset can be represented in the form of labelled instances in terms of its features. PMID:24376390
NASA Astrophysics Data System (ADS)
Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.
2012-04-01
In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological dynamic and processes, i. e. sample heterogeneity. For a same streamflow range corresponds different processes such as rising limbs or recession, where uncertainties are different. The dynamical approach improves reliability, skills and sharpness of forecasts and globally reduces confidence intervals width. When compared in details, the dynamical approach allows a noticeable reduction of confidence intervals during recessions where uncertainty is relatively lower and a slight increase of confidence intervals during rising limbs or snowmelt where uncertainty is greater. The dynamic approach, validated by forecaster's experience that considered the empirical approach not discriminative enough, improved forecaster's confidence and communication of uncertainties. Montanari, A. and Brath, A., (2004). A stochastic approach for assessing the uncertainty of rainfall-runoff simulations. Water Resources Research, 40, W01106, doi:10.1029/2003WR002540. Schaefli, B., Balin Talamba, D. and Musy, A., (2007). Quantifying hydrological modeling errors through a mixture of normal distributions. Journal of Hydrology, 332, 303-315.
The suitability of remotely sensed soil moisture for improving operational flood forecasting
NASA Astrophysics Data System (ADS)
Wanders, N.; Karssenberg, D.; de Roo, A.; de Jong, S. M.; Bierkens, M. F. P.
2014-06-01
We evaluate the added value of assimilated remotely sensed soil moisture for the European Flood Awareness System (EFAS) and its potential to improve the prediction of the timing and height of the flood peak and low flows. EFAS is an operational flood forecasting system for Europe and uses a distributed hydrological model (LISFLOOD) for flood predictions with lead times of up to 10 days. For this study, satellite-derived soil moisture from ASCAT (Advanced SCATterometer), AMSR-E (Advanced Microwave Scanning Radiometer - Earth Observing System) and SMOS (Soil Moisture and Ocean Salinity) is assimilated into the LISFLOOD model for the Upper Danube Basin and results are compared to assimilation of discharge observations only. To assimilate soil moisture and discharge data into the hydrological model, an ensemble Kalman filter (EnKF) is used. Information on the spatial (cross-) correlation of the errors in the satellite products, is included to ensure increased performance of the EnKF. For the validation, additional discharge observations not used in the EnKF are used as an independent validation data set. Our results show that the accuracy of flood forecasts is increased when more discharge observations are assimilated; the mean absolute error (MAE) of the ensemble mean is reduced by 35%. The additional inclusion of satellite data results in a further increase of the performance: forecasts of baseflows are better and the uncertainty in the overall discharge is reduced, shown by a 10% reduction in the MAE. In addition, floods are predicted with a higher accuracy and the continuous ranked probability score (CRPS) shows a performance increase of 5-10% on average, compared to assimilation of discharge only. When soil moisture data is used, the timing errors in the flood predictions are decreased especially for shorter lead times and imminent floods can be forecasted with more skill. The number of false flood alerts is reduced when more observational data is assimilated into the system. The added values of the satellite data is largest when these observations are assimilated in combination with distributed discharge observations. These results show the potential of remotely sensed soil moisture observations to improve near-real time flood forecasting in large catchments.
Human resource recommendation algorithm based on ensemble learning and Spark
NASA Astrophysics Data System (ADS)
Cong, Zihan; Zhang, Xingming; Wang, Haoxiang; Xu, Hongjie
2017-08-01
Aiming at the problem of “information overload” in the human resources industry, this paper proposes a human resource recommendation algorithm based on Ensemble Learning. The algorithm considers the characteristics and behaviours of both job seeker and job features in the real business circumstance. Firstly, the algorithm uses two ensemble learning methods-Bagging and Boosting. The outputs from both learning methods are then merged to form user interest model. Based on user interest model, job recommendation can be extracted for users. The algorithm is implemented as a parallelized recommendation system on Spark. A set of experiments have been done and analysed. The proposed algorithm achieves significant improvement in accuracy, recall rate and coverage, compared with recommendation algorithms such as UserCF and ItemCF.
Interactions between deep convective clouds and aerosols as observed by satellites
NASA Astrophysics Data System (ADS)
Yuan, T.; Li, Z. I.; Remer, L.; Martins, V.
2008-12-01
Major uncertainties regarding interactions between deep convective clouds (DCC) exist due partly to observational difficulty and partly to the entanglement among remotely sensed properties of aerosols and clouds and entanglement between meteorology and possible aerosol signals. In this study we adopt a novel, physically sound relationship between cloud crystal effective radius(CER) and brightness temperature (BT) and utilize ample sampling opportunity provided by MODIS instrument. We reveal aerosol impacts on DCCs by analyzing an ensemble data. Through a conceptual model we demonstrate how aerosol may affect DCC properties. We outline a few scenarios where aerosol signals are best separated and pronounced. Based on our results, anthropogenic pollutions and smokes are shown to effectively decrease CER and to elevate glaciation level of DCCs. On the other hand, dust particles from local sources have the opposite effects, namely, increasing cloud ice particle size and enhancing glaciation by acting possibly as giant CCN or IN. Implications of these effects for aerosols are discussed along with feedbacks of these effects to dynamics.
Disentangling climatic and anthropogenic controls on global terrestrial evapotranspiration trends
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mao, Jiafu; Shi, Xiaoying; Ricciuto, Daniel M.
Here, we examined natural and anthropogenic controls on terrestrial evapotranspiration (ET) changes from 1982-2010 using multiple estimates from remote sensing-based datasets and process-oriented land surface models. A significant increased trend of ET in each hemisphere was consistently revealed by observationally-constrained data and multi-model ensembles that considered historic natural and anthropogenic drivers. The climate impacts were simulated to determine the spatiotemporal variations in ET. Globally, rising CO 2 ranked second in these models after the predominant climatic influences, and yielded a decreasing trend in canopy transpiration and ET, especially for tropical forests and high-latitude shrub land. Increased nitrogen deposition slightly amplifiedmore » global ET via enhanced plant growth. Land-use-induced ET responses, albeit with substantial uncertainties across the factorial analysis, were minor globally, but pronounced locally, particularly over regions with intensive land-cover changes. Our study highlights the importance of employing multi-stream ET and ET-component estimates to quantify the strengthening anthropogenic fingerprint in the global hydrologic cycle.« less
Disentangling climatic and anthropogenic controls on global terrestrial evapotranspiration trends
Mao, Jiafu; Shi, Xiaoying; Ricciuto, Daniel M.; ...
2015-09-08
Here, we examined natural and anthropogenic controls on terrestrial evapotranspiration (ET) changes from 1982-2010 using multiple estimates from remote sensing-based datasets and process-oriented land surface models. A significant increased trend of ET in each hemisphere was consistently revealed by observationally-constrained data and multi-model ensembles that considered historic natural and anthropogenic drivers. The climate impacts were simulated to determine the spatiotemporal variations in ET. Globally, rising CO 2 ranked second in these models after the predominant climatic influences, and yielded a decreasing trend in canopy transpiration and ET, especially for tropical forests and high-latitude shrub land. Increased nitrogen deposition slightly amplifiedmore » global ET via enhanced plant growth. Land-use-induced ET responses, albeit with substantial uncertainties across the factorial analysis, were minor globally, but pronounced locally, particularly over regions with intensive land-cover changes. Our study highlights the importance of employing multi-stream ET and ET-component estimates to quantify the strengthening anthropogenic fingerprint in the global hydrologic cycle.« less
A Canonical Ensemble Correlation Prediction Model for Seasonal Precipitation Anomaly
NASA Technical Reports Server (NTRS)
Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Guilong
2001-01-01
This report describes an optimal ensemble forecasting model for seasonal precipitation and its error estimation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. This new CCA model includes the following features: (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States precipitation field. The predictor is the sea surface temperature.
Ensemble Deep Learning for Biomedical Time Series Classification
2016-01-01
Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost. PMID:27725828
Device and Method for Gathering Ensemble Data Sets
NASA Technical Reports Server (NTRS)
Racette, Paul E. (Inventor)
2014-01-01
An ensemble detector uses calibrated noise references to produce ensemble sets of data from which properties of non-stationary processes may be extracted. The ensemble detector comprising: a receiver; a switching device coupled to the receiver, the switching device configured to selectively connect each of a plurality of reference noise signals to the receiver; and a gain modulation circuit coupled to the receiver and configured to vary a gain of the receiver based on a forcing signal; whereby the switching device selectively connects each of the plurality of reference noise signals to the receiver to produce an output signal derived from the plurality of reference noise signals and the forcing signal.
Remote Linkages to Anomalous Winter Atmospheric Ridging over the Northeastern Pacific
NASA Technical Reports Server (NTRS)
Swain, Daniel L.; Singh, Deepti; Horton, Daniel E.; Mankin, Justin S.; Ballard, Tristan C.; Diffenbaugh, Noah S.
2017-01-01
Severe drought in California between 2013 and 2016 has been linked to the multiyear persistence of anomalously high atmospheric pressure over the northeastern Pacific Ocean, which deflected the Pacific storm track northward and suppressed regional precipitation during California's winter 'rainy season.' Multiple hypotheses have emerged regarding why this high pressure ridge near the west coast of North America was so resilient-including unusual sea surface temperature patterns in the Pacific Ocean, reductions in Arctic sea ice, random atmospheric variability, or some combination thereof. Here we explore relationships between previously documented atmospheric conditions over the North Pacific and several potential remote oceanic and cryospheric influences using both observational data and a large ensemble of climate model simulations. Our results suggest that persistent wintertime atmospheric ridging similar to that implicated in California's 2013-2016 drought can at least partially be linked to unusual Pacific sea surface temperatures, and that Pacific Ocean conditions may offer some degree of cool-season foresight in this region despite the presence of substantial internal variability.
Remote Linkages to Anomalous Winter Atmospheric Ridging Over the Northeastern Pacific
NASA Astrophysics Data System (ADS)
Swain, Daniel L.; Singh, Deepti; Horton, Daniel E.; Mankin, Justin S.; Ballard, Tristan C.; Diffenbaugh, Noah S.
2017-11-01
Severe drought in California between 2013 and 2016 has been linked to the multiyear persistence of anomalously high atmospheric pressure over the northeastern Pacific Ocean, which deflected the Pacific storm track northward and suppressed regional precipitation during California's winter "rainy season." Multiple hypotheses have emerged regarding why this high pressure ridge near the west coast of North America was so resilient—including unusual sea surface temperature patterns in the Pacific Ocean, reductions in Arctic sea ice, random atmospheric variability, or some combination thereof. Here we explore relationships between previously documented atmospheric conditions over the North Pacific and several potential remote oceanic and cryospheric influences using both observational data and a large ensemble of climate model simulations. Our results suggest that persistent wintertime atmospheric ridging similar to that implicated in California's 2013-2016 drought can at least partially be linked to unusual Pacific sea surface temperatures and that Pacific Ocean conditions may offer some degree of cool-season foresight in this region despite the presence of substantial internal variability.
A Particle Batch Smoother Approach to Snow Water Equivalent Estimation
NASA Technical Reports Server (NTRS)
Margulis, Steven A.; Girotto, Manuela; Cortes, Gonzalo; Durand, Michael
2015-01-01
This paper presents a newly proposed data assimilation method for historical snow water equivalent SWE estimation using remotely sensed fractional snow-covered area fSCA. The newly proposed approach consists of a particle batch smoother (PBS), which is compared to a previously applied Kalman-based ensemble batch smoother (EnBS) approach. The methods were applied over the 27-yr Landsat 5 record at snow pillow and snow course in situ verification sites in the American River basin in the Sierra Nevada (United States). This basin is more densely vegetated and thus more challenging for SWE estimation than the previous applications of the EnBS. Both data assimilation methods provided significant improvement over the prior (modeling only) estimates, with both able to significantly reduce prior SWE biases. The prior RMSE values at the snow pillow and snow course sites were reduced by 68%-82% and 60%-68%, respectively, when applying the data assimilation methods. This result is encouraging for a basin like the American where the moderate to high forest cover will necessarily obscure more of the snow-covered ground surface than in previously examined, less-vegetated basins. The PBS generally outperformed the EnBS: for snow pillows the PBSRMSE was approx.54%of that seen in the EnBS, while for snow courses the PBSRMSE was approx.79%of the EnBS. Sensitivity tests show relative insensitivity for both the PBS and EnBS results to ensemble size and fSCA measurement error, but a higher sensitivity for the EnBS to the mean prior precipitation input, especially in the case where significant prior biases exist.
Metal Oxide Gas Sensor Drift Compensation Using a Two-Dimensional Classifier Ensemble
Liu, Hang; Chu, Renzhi; Tang, Zhenan
2015-01-01
Sensor drift is the most challenging problem in gas sensing at present. We propose a novel two-dimensional classifier ensemble strategy to solve the gas discrimination problem, regardless of the gas concentration, with high accuracy over extended periods of time. This strategy is appropriate for multi-class classifiers that consist of combinations of pairwise classifiers, such as support vector machines. We compare the performance of the strategy with those of competing methods in an experiment based on a public dataset that was compiled over a period of three years. The experimental results demonstrate that the two-dimensional ensemble outperforms the other methods considered. Furthermore, we propose a pre-aging process inspired by that applied to the sensors to improve the stability of the classifier ensemble. The experimental results demonstrate that the weight of each multi-class classifier model in the ensemble remains fairly static before and after the addition of new classifier models to the ensemble, when a pre-aging procedure is applied. PMID:25942640
Erdmann, Thorsten; Bartelheimer, Kathrin; Schwarz, Ulrich S
2016-11-01
Based on a detailed crossbridge model for individual myosin II motors, we systematically study the influence of mechanical load and adenosine triphosphate (ATP) concentration on small myosin II ensembles made from different isoforms. For skeletal and smooth muscle myosin II, which are often used in actomyosin gels that reconstitute cell contractility, fast forward movement is restricted to a small region of phase space with low mechanical load and high ATP concentration, which is also characterized by frequent ensemble detachment. At high load, these ensembles are stalled or move backwards, but forward motion can be restored by decreasing ATP concentration. In contrast, small ensembles of nonmuscle myosin II isoforms, which are found in the cytoskeleton of nonmuscle cells, are hardly affected by ATP concentration due to the slow kinetics of the bound states. For all isoforms, the thermodynamic efficiency of ensemble movement increases with decreasing ATP concentration, but this effect is weaker for the nonmuscle myosin II isoforms.
Similarity Measures for Protein Ensembles
Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper
2009-01-01
Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement. PMID:19145244
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
A multiphysical ensemble system of numerical snow modelling
NASA Astrophysics Data System (ADS)
Lafaysse, Matthieu; Cluzet, Bertrand; Dumont, Marie; Lejeune, Yves; Vionnet, Vincent; Morin, Samuel
2017-05-01
Physically based multilayer snowpack models suffer from various modelling errors. To represent these errors, we built the new multiphysical ensemble system ESCROC (Ensemble System Crocus) by implementing new representations of different physical processes in the deterministic coupled multilayer ground/snowpack model SURFEX/ISBA/Crocus. This ensemble was driven and evaluated at Col de Porte (1325 m a.s.l., French alps) over 18 years with a high-quality meteorological and snow data set. A total number of 7776 simulations were evaluated separately, accounting for the uncertainties of evaluation data. The ability of the ensemble to capture the uncertainty associated to modelling errors is assessed for snow depth, snow water equivalent, bulk density, albedo and surface temperature. Different sub-ensembles of the ESCROC system were studied with probabilistic tools to compare their performance. Results show that optimal members of the ESCROC system are able to explain more than half of the total simulation errors. Integrating members with biases exceeding the range corresponding to observational uncertainty is necessary to obtain an optimal dispersion, but this issue can also be a consequence of the fact that meteorological forcing uncertainties were not accounted for. The ESCROC system promises the integration of numerical snow-modelling errors in ensemble forecasting and ensemble assimilation systems in support of avalanche hazard forecasting and other snowpack-modelling applications.
Improving database enrichment through ensemble docking
NASA Astrophysics Data System (ADS)
Rao, Shashidhar; Sanschagrin, Paul C.; Greenwood, Jeremy R.; Repasky, Matthew P.; Sherman, Woody; Farid, Ramy
2008-09-01
While it may seem intuitive that using an ensemble of multiple conformations of a receptor in structure-based virtual screening experiments would necessarily yield improved enrichment of actives relative to using just a single receptor, it turns out that at least in the p38 MAP kinase model system studied here, a very large majority of all possible ensembles do not yield improved enrichment of actives. However, there are combinations of receptor structures that do lead to improved enrichment results. We present here a method to select the ensembles that produce the best enrichments that does not rely on knowledge of active compounds or sophisticated analyses of the 3D receptor structures. In the system studied here, the small fraction of ensembles of up to 3 receptors that do yield good enrichments of actives were identified by selecting ensembles that have the best mean GlideScore for the top 1% of the docked ligands in a database screen of actives and drug-like "decoy" ligands. Ensembles of two receptors identified using this mean GlideScore metric generally outperform single receptors, while ensembles of three receptors identified using this metric consistently give optimal enrichment factors in which, for example, 40% of the known actives outrank all the other ligands in the database.
NASA Astrophysics Data System (ADS)
Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim
2017-08-01
Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and uncertainties of Manning's n coefficients compared to the full posterior distributions inferred by MCMC.
A comparison of ensemble post-processing approaches that preserve correlation structures
NASA Astrophysics Data System (ADS)
Schefzik, Roman; Van Schaeybroeck, Bert; Vannitsem, Stéphane
2016-04-01
Despite the fact that ensemble forecasts address the major sources of uncertainty, they exhibit biases and dispersion errors and therefore are known to improve by calibration or statistical post-processing. For instance the ensemble model output statistics (EMOS) method, also known as non-homogeneous regression approach (Gneiting et al., 2005) is known to strongly improve forecast skill. EMOS is based on fitting and adjusting a parametric probability density function (PDF). However, EMOS and other common post-processing approaches apply to a single weather quantity at a single location for a single look-ahead time. They are therefore unable of taking into account spatial, inter-variable and temporal dependence structures. Recently many research efforts have been invested in designing post-processing methods that resolve this drawback but also in verification methods that enable the detection of dependence structures. New verification methods are applied on two classes of post-processing methods, both generating physically coherent ensembles. A first class uses the ensemble copula coupling (ECC) that starts from EMOS but adjusts the rank structure (Schefzik et al., 2013). The second class is a member-by-member post-processing (MBM) approach that maps each raw ensemble member to a corrected one (Van Schaeybroeck and Vannitsem, 2015). We compare variants of the EMOS-ECC and MBM classes and highlight a specific theoretical connection between them. All post-processing variants are applied in the context of the ensemble system of the European Centre of Weather Forecasts (ECMWF) and compared using multivariate verification tools including the energy score, the variogram score (Scheuerer and Hamill, 2015) and the band depth rank histogram (Thorarinsdottir et al., 2015). Gneiting, Raftery, Westveld, and Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., {133}, 1098-1118. Scheuerer and Hamill, 2015. Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev. {143},1321-1334. Schefzik, Thorarinsdottir, Gneiting. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science {28},616-640, 2013. Thorarinsdottir, M. Scheuerer, and C. Heinz, 2015. Assessing the calibration of high-dimensional ensemble forecasts using rank histograms, arXiv:1310.0236. Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.
NASA Astrophysics Data System (ADS)
Matte, Simon; Boucher, Marie-Amélie; Boucher, Vincent; Fortier Filion, Thomas-Charles
2017-06-01
A large effort has been made over the past 10 years to promote the operational use of probabilistic or ensemble streamflow forecasts. Numerous studies have shown that ensemble forecasts are of higher quality than deterministic ones. Many studies also conclude that decisions based on ensemble rather than deterministic forecasts lead to better decisions in the context of flood mitigation. Hence, it is believed that ensemble forecasts possess a greater economic and social value for both decision makers and the general population. However, the vast majority of, if not all, existing hydro-economic studies rely on a cost-loss ratio framework that assumes a risk-neutral decision maker. To overcome this important flaw, this study borrows from economics and evaluates the economic value of early warning flood systems using the well-known Constant Absolute Risk Aversion (CARA) utility function, which explicitly accounts for the level of risk aversion of the decision maker. This new framework allows for the full exploitation of the information related to a forecasts' uncertainty, making it especially suited for the economic assessment of ensemble or probabilistic forecasts. Rather than comparing deterministic and ensemble forecasts, this study focuses on comparing different types of ensemble forecasts. There are multiple ways of assessing and representing forecast uncertainty. Consequently, there exist many different means of building an ensemble forecasting system for future streamflow. One such possibility is to dress deterministic forecasts using the statistics of past error forecasts. Such dressing methods are popular among operational agencies because of their simplicity and intuitiveness. Another approach is the use of ensemble meteorological forecasts for precipitation and temperature, which are then provided as inputs to one or many hydrological model(s). In this study, three concurrent ensemble streamflow forecasting systems are compared: simple statistically dressed deterministic forecasts, forecasts based on meteorological ensembles, and a variant of the latter that also includes an estimation of state variable uncertainty. This comparison takes place for the Montmorency River, a small flood-prone watershed in southern central Quebec, Canada. The assessment of forecasts is performed for lead times of 1 to 5 days, both in terms of forecasts' quality (relative to the corresponding record of observations) and in terms of economic value, using the new proposed framework based on the CARA utility function. It is found that the economic value of a forecast for a risk-averse decision maker is closely linked to the forecast reliability in predicting the upper tail of the streamflow distribution. Hence, post-processing forecasts to avoid over-forecasting could help improve both the quality and the value of forecasts.
A deep learning-based multi-model ensemble method for cancer prediction.
Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong
2018-01-01
Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
A mesoscale hybrid data assimilation system based on the JMA nonhydrostatic model
NASA Astrophysics Data System (ADS)
Ito, K.; Kunii, M.; Kawabata, T. T.; Saito, K. K.; Duc, L. L.
2015-12-01
This work evaluates the potential of a hybrid ensemble Kalman filter and four-dimensional variational (4D-Var) data assimilation system for predicting severe weather events from a deterministic point of view. This hybrid system is an adjoint-based 4D-Var system using a background error covariance matrix constructed from the mixture of a so-called NMC method and perturbations in a local ensemble transform Kalman filter data assimilation system, both of which are based on the Japan Meteorological Agency nonhydrostatic model. To construct the background error covariance matrix, we investigated two types of schemes. One is a spatial localization scheme and the other is neighboring ensemble approach, which regards the result at a horizontally spatially shifted point in each ensemble member as that obtained from a different realization of ensemble simulation. An assimilation of a pseudo single-observation located to the north of a tropical cyclone (TC) yielded an analysis increment of wind and temperature physically consistent with what is expected for a mature TC in both hybrid systems, whereas an analysis increment in a 4D-Var system using a static background error covariance distorted a structure of the mature TC. Real data assimilation experiments applied to 4 TCs and 3 local heavy rainfall events showed that hybrid systems and EnKF provided better initial conditions than the NMC-based 4D-Var, both for predicting the intensity and track forecast of TCs and for the location and amount of local heavy rainfall events.
Scalable quantum information processing with atomic ensembles and flying photons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mei Feng; Yu Yafei; Feng Mang
2009-10-15
We present a scheme for scalable quantum information processing with atomic ensembles and flying photons. Using the Rydberg blockade, we encode the qubits in the collective atomic states, which could be manipulated fast and easily due to the enhanced interaction in comparison to the single-atom case. We demonstrate that our proposed gating could be applied to generation of two-dimensional cluster states for measurement-based quantum computation. Moreover, the atomic ensembles also function as quantum repeaters useful for long-distance quantum state transfer. We show the possibility of our scheme to work in bad cavity or in weak coupling regime, which could muchmore » relax the experimental requirement. The efficient coherent operations on the ensemble qubits enable our scheme to be switchable between quantum computation and quantum communication using atomic ensembles.« less
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
Numerical weather prediction model tuning via ensemble prediction system
NASA Astrophysics Data System (ADS)
Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.
2011-12-01
This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.
NASA Astrophysics Data System (ADS)
Zhu, Kefeng; Xue, Ming
2016-11-01
On 21 July 2012, an extreme rainfall event that recorded a maximum rainfall amount over 24 hours of 460 mm, occurred in Beijing, China. Most operational models failed to predict such an extreme amount. In this study, a convective-permitting ensemble forecast system (CEFS), at 4-km grid spacing, covering the entire mainland of China, is applied to this extreme rainfall case. CEFS consists of 22 members and uses multiple physics parameterizations. For the event, the predicted maximum is 415 mm d-1 in the probability-matched ensemble mean. The predicted high-probability heavy rain region is located in southwest Beijing, as was observed. Ensemble-based verification scores are then investigated. For a small verification domain covering Beijing and its surrounding areas, the precipitation rank histogram of CEFS is much flatter than that of a reference global ensemble. CEFS has a lower (higher) Brier score and a higher resolution than the global ensemble for precipitation, indicating more reliable probabilistic forecasting by CEFS. Additionally, forecasts of different ensemble members are compared and discussed. Most of the extreme rainfall comes from convection in the warm sector east of an approaching cold front. A few members of CEFS successfully reproduce such precipitation, and orographic lift of highly moist low-level flows with a significantly southeasterly component is suggested to have played important roles in producing the initial convection. Comparisons between good and bad forecast members indicate a strong sensitivity of the extreme rainfall to the mesoscale environmental conditions, and, to less of an extent, the model physics.
NASA Astrophysics Data System (ADS)
Schunk, R. W.; Scherliess, L.; Eccles, V.; Gardner, L. C.; Sojka, J. J.; Zhu, L.; Pi, X.; Mannucci, A. J.; Komjathy, A.; Wang, C.; Rosen, G.
2016-12-01
As part of the NASA-NSF Space Weather Modeling Collaboration, we created a Multimodel Ensemble Prediction System (MEPS) for the Ionosphere-Thermosphere-Electrodynamics system that is based on Data Assimilation (DA) models. MEPS is composed of seven physics-based data assimilation models that cover the globe. Ensemble modeling can be conducted for the mid-low latitude ionosphere using the four GAIM data assimilation models, including the Gauss Markov (GM), Full Physics (FP), Band Limited (BL) and 4DVAR DA models. These models can assimilate Total Electron Content (TEC) from a constellation of satellites, bottom-side electron density profiles from digisondes, in situ plasma densities, occultation data and ultraviolet emissions. The four GAIM models were run for the March 16-17, 2013, geomagnetic storm period with the same data, but we also systematically added new data types and re-ran the GAIM models to see how the different data types affected the GAIM results, with the emphasis on elucidating differences in the underlying ionospheric dynamics and thermospheric coupling. Also, for each scenario the outputs from the four GAIM models were used to produce an ensemble mean for TEC, NmF2, and hmF2. A simple average of the models was used in the ensemble averaging to see if there was an improvement of the ensemble average over the individual models. For the scenarios considered, the ensemble average yielded better specifications than the individual GAIM models. The model differences and averages, and the consequent differences in ionosphere-thermosphere coupling and dynamics will be discussed.
Efficient Agent-Based Cluster Ensembles
NASA Technical Reports Server (NTRS)
Agogino, Adrian; Tumer, Kagan
2006-01-01
Numerous domains ranging from distributed data acquisition to knowledge reuse need to solve the cluster ensemble problem of combining multiple clusterings into a single unified clustering. Unfortunately current non-agent-based cluster combining methods do not work in a distributed environment, are not robust to corrupted clusterings and require centralized access to all original clusterings. Overcoming these issues will allow cluster ensembles to be used in fundamentally distributed and failure-prone domains such as data acquisition from satellite constellations, in addition to domains demanding confidentiality such as combining clusterings of user profiles. This paper proposes an efficient, distributed, agent-based clustering ensemble method that addresses these issues. In this approach each agent is assigned a small subset of the data and votes on which final cluster its data points should belong to. The final clustering is then evaluated by a global utility, computed in a distributed way. This clustering is also evaluated using an agent-specific utility that is shown to be easier for the agents to maximize. Results show that agents using the agent-specific utility can achieve better performance than traditional non-agent based methods and are effective even when up to 50% of the agents fail.
NASA Astrophysics Data System (ADS)
Kutty, Govindan; Muraleedharan, Rohit; Kesarkar, Amit P.
2018-03-01
Uncertainties in the numerical weather prediction models are generally not well-represented in ensemble-based data assimilation (DA) systems. The performance of an ensemble-based DA system becomes suboptimal, if the sources of error are undersampled in the forecast system. The present study examines the effect of accounting for model error treatments in the hybrid ensemble transform Kalman filter—three-dimensional variational (3DVAR) DA system (hybrid) in the track forecast of two tropical cyclones viz. Hudhud and Thane, formed over the Bay of Bengal, using Advanced Research Weather Research and Forecasting (ARW-WRF) model. We investigated the effect of two types of model error treatment schemes and their combination on the hybrid DA system; (i) multiphysics approach, which uses different combination of cumulus, microphysics and planetary boundary layer schemes, (ii) stochastic kinetic energy backscatter (SKEB) scheme, which perturbs the horizontal wind and potential temperature tendencies, (iii) a combination of both multiphysics and SKEB scheme. Substantial improvements are noticed in the track positions of both the cyclones, when flow-dependent ensemble covariance is used in 3DVAR framework. Explicit model error representation is found to be beneficial in treating the underdispersive ensembles. Among the model error schemes used in this study, a combination of multiphysics and SKEB schemes has outperformed the other two schemes with improved track forecast for both the tropical cyclones.
Selecting climate simulations for impact studies based on multivariate patterns of climate change.
Mendlik, Thomas; Gobiet, Andreas
In climate change impact research it is crucial to carefully select the meteorological input for impact models. We present a method for model selection that enables the user to shrink the ensemble to a few representative members, conserving the model spread and accounting for model similarity. This is done in three steps: First, using principal component analysis for a multitude of meteorological parameters, to find common patterns of climate change within the multi-model ensemble. Second, detecting model similarities with regard to these multivariate patterns using cluster analysis. And third, sampling models from each cluster, to generate a subset of representative simulations. We present an application based on the ENSEMBLES regional multi-model ensemble with the aim to provide input for a variety of climate impact studies. We find that the two most dominant patterns of climate change relate to temperature and humidity patterns. The ensemble can be reduced from 25 to 5 simulations while still maintaining its essential characteristics. Having such a representative subset of simulations reduces computational costs for climate impact modeling and enhances the quality of the ensemble at the same time, as it prevents double-counting of dependent simulations that would lead to biased statistics. The online version of this article (doi:10.1007/s10584-015-1582-0) contains supplementary material, which is available to authorized users.
Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.
2009-01-01
Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC=0.905±0.024) in performance as compared to the original IT-CAD system (AUC=0.865±0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters. PMID:19673196
Zheng, Lianqing; Chen, Mengen; Yang, Wei
2009-06-21
To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.
NASA Astrophysics Data System (ADS)
Efthimiou, G. C.; Andronopoulos, S.; Bartzis, J. G.
2018-02-01
One of the key issues of recent research on the dispersion inside complex urban environments is the ability to predict dosage-based parameters from the puff release of an airborne material from a point source in the atmospheric boundary layer inside the built-up area. The present work addresses the question of whether the computational fluid dynamics (CFD)-Reynolds-averaged Navier-Stokes (RANS) methodology can be used to predict ensemble-average dosage-based parameters that are related with the puff dispersion. RANS simulations with the ADREA-HF code were, therefore, performed, where a single puff was released in each case. The present method is validated against the data sets from two wind-tunnel experiments. In each experiment, more than 200 puffs were released from which ensemble-averaged dosage-based parameters were calculated and compared to the model's predictions. The performance of the model was evaluated using scatter plots and three validation metrics: fractional bias, normalized mean square error, and factor of two. The model presented a better performance for the temporal parameters (i.e., ensemble-average times of puff arrival, peak, leaving, duration, ascent, and descent) than for the ensemble-average dosage and peak concentration. The majority of the obtained values of validation metrics were inside established acceptance limits. Based on the obtained model performance indices, the CFD-RANS methodology as implemented in the code ADREA-HF is able to predict the ensemble-average temporal quantities related to transient emissions of airborne material in urban areas within the range of the model performance acceptance criteria established in the literature. The CFD-RANS methodology as implemented in the code ADREA-HF is also able to predict the ensemble-average dosage, but the dosage results should be treated with some caution; as in one case, the observed ensemble-average dosage was under-estimated slightly more than the acceptance criteria. Ensemble-average peak concentration was systematically underpredicted by the model to a degree higher than the allowable by the acceptance criteria, in 1 of the 2 wind-tunnel experiments. The model performance depended on the positions of the examined sensors in relation to the emission source and the buildings configuration. The work presented in this paper was carried out (partly) within the scope of COST Action ES1006 "Evaluation, improvement, and guidance for the use of local-scale emergency prediction and response tools for airborne hazards in built environments".
Operational hydrological forecasting in Bavaria. Part II: Ensemble forecasting
NASA Astrophysics Data System (ADS)
Ehret, U.; Vogelbacher, A.; Moritz, K.; Laurent, S.; Meyer, I.; Haag, I.
2009-04-01
In part I of this study, the operational flood forecasting system in Bavaria and an approach to identify and quantify forecast uncertainty was introduced. The approach is split into the calculation of an empirical 'overall error' from archived forecasts and the calculation of an empirical 'model error' based on hydrometeorological forecast tests, where rainfall observations were used instead of forecasts. The 'model error' can especially in upstream catchments where forecast uncertainty is strongly dependent on the current predictability of the atrmosphere be superimposed on the spread of a hydrometeorological ensemble forecast. In Bavaria, two meteorological ensemble prediction systems are currently tested for operational use: the 16-member COSMO-LEPS forecast and a poor man's ensemble composed of DWD GME, DWD Cosmo-EU, NCEP GFS, Aladin-Austria, MeteoSwiss Cosmo-7. The determination of the overall forecast uncertainty is dependent on the catchment characteristics: 1. Upstream catchment with high influence of weather forecast a) A hydrological ensemble forecast is calculated using each of the meteorological forecast members as forcing. b) Corresponding to the characteristics of the meteorological ensemble forecast, each resulting forecast hydrograph can be regarded as equally likely. c) The 'model error' distribution, with parameters dependent on hydrological case and lead time, is added to each forecast timestep of each ensemble member d) For each forecast timestep, the overall (i.e. over all 'model error' distribution of each ensemble member) error distribution is calculated e) From this distribution, the uncertainty range on a desired level (here: the 10% and 90% percentile) is extracted and drawn as forecast envelope. f) As the mean or median of an ensemble forecast does not necessarily exhibit meteorologically sound temporal evolution, a single hydrological forecast termed 'lead forecast' is chosen and shown in addition to the uncertainty bounds. This can be either an intermediate forecast between the extremes of the ensemble spread or a manually selected forecast based on a meteorologists advice. 2. Downstream catchments with low influence of weather forecast In downstream catchments with strong human impact on discharge (e.g. by reservoir operation) and large influence of upstream gauge observation quality on forecast quality, the 'overall error' may in most cases be larger than the combination of the 'model error' and an ensemble spread. Therefore, the overall forecast uncertainty bounds are calculated differently: a) A hydrological ensemble forecast is calculated using each of the meteorological forecast members as forcing. Here, additionally the corresponding inflow hydrograph from all upstream catchments must be used. b) As for an upstream catchment, the uncertainty range is determined by combination of 'model error' and the ensemble member forecasts c) In addition, the 'overall error' is superimposed on the 'lead forecast'. For reasons of consistency, the lead forecast must be based on the same meteorological forecast in the downstream and all upstream catchments. d) From the resulting two uncertainty ranges (one from the ensemble forecast and 'model error', one from the 'lead forecast' and 'overall error'), the envelope is taken as the most prudent uncertainty range. In sum, the uncertainty associated with each forecast run is calculated and communicated to the public in the form of 10% and 90% percentiles. As in part I of this study, the methodology as well as the useful- or uselessness of the resulting uncertainty ranges will be presented and discussed by typical examples.
NASA Astrophysics Data System (ADS)
Wang, Yuanbing; Min, Jinzhong; Chen, Yaodeng; Huang, Xiang-Yu; Zeng, Mingjian; Li, Xin
2017-01-01
This study evaluates the performance of three-dimensional variational (3DVar) and a hybrid data assimilation system using time-lagged ensembles in a heavy rainfall event. The time-lagged ensembles are constructed by sampling from a moving time window of 3 h along a model trajectory, which is economical and easy to implement. The proposed hybrid data assimilation system introduces flow-dependent error covariance derived from time-lagged ensemble into variational cost function without significantly increasing computational cost. Single observation tests are performed to document characteristic of the hybrid system. The sensitivity of precipitation forecasts to ensemble covariance weight and localization scale is investigated. Additionally, the TLEn-Var is evaluated and compared to the ETKF(ensemble transformed Kalman filter)-based hybrid assimilation within a continuously cycling framework, through which new hybrid analyses are produced every 3 h over 10 days. The 24 h accumulated precipitation, moisture, wind are analyzed between 3DVar and the hybrid assimilation using time-lagged ensembles. Results show that model states and precipitation forecast skill are improved by the hybrid assimilation using time-lagged ensembles compared with 3DVar. Simulation of the precipitable water and structure of the wind are also improved. Cyclonic wind increments are generated near the rainfall center, leading to an improved precipitation forecast. This study indicates that the hybrid data assimilation using time-lagged ensembles seems like a viable alternative or supplement in the complex models for some weather service agencies that have limited computing resources to conduct large size of ensembles.
NASA Astrophysics Data System (ADS)
Abaza, Mabrouk; Anctil, François; Fortin, Vincent; Perreault, Luc
2017-12-01
Meteorological and hydrological ensemble prediction systems are imperfect. Their outputs could often be improved through the use of a statistical processor, opening up the question of the necessity of using both processors (meteorological and hydrological), only one of them, or none. This experiment compares the predictive distributions from four hydrological ensemble prediction systems (H-EPS) utilising the Ensemble Kalman filter (EnKF) probabilistic sequential data assimilation scheme. They differ in the inclusion or not of the Distribution Based Scaling (DBS) method for post-processing meteorological forecasts and the ensemble Bayesian Model Averaging (ensemble BMA) method for hydrological forecast post-processing. The experiment is implemented on three large watersheds and relies on the combination of two meteorological reforecast products: the 4-member Canadian reforecasts from the Canadian Centre for Meteorological and Environmental Prediction (CCMEP) and the 10-member American reforecasts from the National Oceanic and Atmospheric Administration (NOAA), leading to 14 members at each time step. Results show that all four tested H-EPS lead to resolution and sharpness values that are quite similar, with an advantage to DBS + EnKF. The ensemble BMA is unable to compensate for any bias left in the precipitation ensemble forecasts. On the other hand, it succeeds in calibrating ensemble members that are otherwise under-dispersed. If reliability is preferred over resolution and sharpness, DBS + EnKF + ensemble BMA performs best, making use of both processors in the H-EPS system. Conversely, for enhanced resolution and sharpness, DBS is the preferred method.
Akbar, Shahid; Hayat, Maqsood; Iqbal, Muhammad; Jan, Mian Ahmad
2017-06-01
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, P.
2013-12-01
Quantitative analysis of the risk for reservoir real-time operation is a hard task owing to the difficulty of accurate description of inflow uncertainties. The ensemble-based hydrologic forecasts directly depict the inflows not only the marginal distributions but also their persistence via scenarios. This motivates us to analyze the reservoir real-time operating risk with ensemble-based hydrologic forecasts as inputs. A method is developed by using the forecast horizon point to divide the future time into two stages, the forecast lead-time and the unpredicted time. The risk within the forecast lead-time is computed based on counting the failure number of forecast scenarios, and the risk in the unpredicted time is estimated using reservoir routing with the design floods and the reservoir water levels of forecast horizon point. As a result, a two-stage risk analysis method is set up to quantify the entire flood risks by defining the ratio of the number of scenarios that excessive the critical value to the total number of scenarios. The China's Three Gorges Reservoir (TGR) is selected as a case study, where the parameter and precipitation uncertainties are implemented to produce ensemble-based hydrologic forecasts. The Bayesian inference, Markov Chain Monte Carlo, is used to account for the parameter uncertainty. Two reservoir operation schemes, the real operated and scenario optimization, are evaluated for the flood risks and hydropower profits analysis. With the 2010 flood, it is found that the improvement of the hydrologic forecast accuracy is unnecessary to decrease the reservoir real-time operation risk, and most risks are from the forecast lead-time. It is therefore valuable to decrease the avarice of ensemble-based hydrologic forecasts with less bias for a reservoir operational purpose.
Ligand-biased ensemble receptor docking (LigBEnD): a hybrid ligand/receptor structure-based approach
NASA Astrophysics Data System (ADS)
Lam, Polo C.-H.; Abagyan, Ruben; Totrov, Maxim
2018-01-01
Ligand docking to flexible protein molecules can be efficiently carried out through ensemble docking to multiple protein conformations, either from experimental X-ray structures or from in silico simulations. The success of ensemble docking often requires the careful selection of complementary protein conformations, through docking and scoring of known co-crystallized ligands. False positives, in which a ligand in a wrong pose achieves a better docking score than that of native pose, arise as additional protein conformations are added. In the current study, we developed a new ligand-biased ensemble receptor docking method and composite scoring function which combine the use of ligand-based atomic property field (APF) method with receptor structure-based docking. This method helps us to correctly dock 30 out of 36 ligands presented by the D3R docking challenge. For the six mis-docked ligands, the cognate receptor structures prove to be too different from the 40 available experimental Pocketome conformations used for docking and could be identified only by receptor sampling beyond experimentally explored conformational subspace.
The potential of remotely sensed soil moisture for operational flood forecasting
NASA Astrophysics Data System (ADS)
Wanders, N.; Karssenberg, D.; de Roo, A.; de Jong, S.; Bierkens, M. F.
2013-12-01
Nowadays, remotely sensed soil moisture is readily available from multiple space born sensors. The high temporal resolution and global coverage make these products very suitable for large-scale land-surface applications. The potential to use these products in operational flood forecasting has thus far not been extensively studied. In this study, we evaluate the added value of assimilated remotely sensed soil moisture for the European Flood Awareness System (EFAS) and its potential to improve the timing and height of the flood peak and low flows. EFAS is used for operational flood forecasting in Europe and uses a distributed hydrological model for flood predictions for lead times up to 10 days. Satellite-derived soil moisture from ASCAT, AMSR-E and SMOS is assimilated into the EFAS system for the Upper Danube basin and results are compared to assimilation of only discharge observations. Discharge observations are available at the outlet and at six additional locations throughout the catchment. To assimilate soil moisture data into EFAS, an Ensemble Kalman Filter (EnKF) is used. Information on the spatial (cross-) correlation of the errors in the satellite products, derived from a detailed model-satellite soil moisture comparison study, is included to ensure optimal performance of the EnKF. For the validation, additional discharge observations not used in the EnKF are used as an independent validation dataset. Our results show that the accuracy of flood forecasts is increased when more discharge observations are used in that the Mean Absolute Error (MAE) of the ensemble mean is reduced by 65%. The additional inclusion of satellite data results in a further increase of the performance: forecasts of base flows are better and the uncertainty in the overall discharge is reduced, shown by a 10% reduction in the MAE. In addition, floods are predicted with a higher accuracy and the Continuous Ranked Probability Score (CRPS) shows a performance increase of 10-15% on average, compared to assimilation of discharge only. The rank histograms show that the forecast is not biased. The timing errors in the flood predictions are decreased when soil moisture data is used and imminent floods can be forecasted with skill one day earlier. In conclusion, our study shows that assimilation of satellite soil moisture increases the performance of flood forecasting systems for large catchments, like the Upper Danube. The additional gain is highest when discharge observations from both upstream and downstream areas are used in combination with the soil moisture data. These results show the potential of future soil moisture missions with a higher spatial resolution like SMAP to improve near-real time flood forecasting in large catchments.
Evolutionary Ensemble for In Silico Prediction of Ames Test Mutagenicity
NASA Astrophysics Data System (ADS)
Chen, Huanhuan; Yao, Xin
Driven by new regulations and animal welfare, the need to develop in silico models has increased recently as alternative approaches to safety assessment of chemicals without animal testing. This paper describes a novel machine learning ensemble approach to building an in silico model for the prediction of the Ames test mutagenicity, one of a battery of the most commonly used experimental in vitro and in vivo genotoxicity tests for safety evaluation of chemicals. Evolutionary random neural ensemble with negative correlation learning (ERNE) [1] was developed based on neural networks and evolutionary algorithms. ERNE combines the method of bootstrap sampling on training data with the method of random subspace feature selection to ensure diversity in creating individuals within an initial ensemble. Furthermore, while evolving individuals within the ensemble, it makes use of the negative correlation learning, enabling individual NNs to be trained as accurate as possible while still manage to maintain them as diverse as possible. Therefore, the resulting individuals in the final ensemble are capable of cooperating collectively to achieve better generalization of prediction. The empirical experiment suggest that ERNE is an effective ensemble approach for predicting the Ames test mutagenicity of chemicals.
Effects of ensembles on methane hydrate nucleation kinetics.
Zhang, Zhengcai; Liu, Chan-Juan; Walsh, Matthew R; Guo, Guang-Jun
2016-06-21
By performing molecular dynamics simulations to form a hydrate with a methane nano-bubble in liquid water at 250 K and 50 MPa, we report how different ensembles, such as the NPT, NVT, and NVE ensembles, affect the nucleation kinetics of the methane hydrate. The nucleation trajectories are monitored using the face-saturated incomplete cage analysis (FSICA) and the mutually coordinated guest (MCG) order parameter (OP). The nucleation rate and the critical nucleus are obtained using the mean first-passage time (MFPT) method based on the FS cages and the MCG-1 OPs, respectively. The fitting results of MFPT show that hydrate nucleation and growth are coupled together, consistent with the cage adsorption hypothesis which emphasizes that the cage adsorption of methane is a mechanism for both hydrate nucleation and growth. For the three different ensembles, the hydrate nucleation rate is quantitatively ordered as follows: NPT > NVT > NVE, while the sequence of hydrate crystallinity is exactly reversed. However, the largest size of the critical nucleus appears in the NVT ensemble, rather than in the NVE ensemble. These results are helpful for choosing a suitable ensemble when to study hydrate formation via computer simulations, and emphasize the importance of the order degree of the critical nucleus.
NASA Astrophysics Data System (ADS)
Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang
2016-07-01
This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.
NASA Astrophysics Data System (ADS)
Fahim, A. M.; Shen, R.; Yue, Z.; Di, W.; Mushtaq Shah, S.
2015-12-01
Moisture in the upper most layer of soil column from 14 different models under Coupled Model Intercomparison Project Phase-5 (CMIP5) project were analyzed for four seasons of the year. Aim of this study was to explore variability in soil moisture over south Asia using multi model ensemble and relationship between summer rainfall and soil moisture for spring and summer season. GLDAS (Global Land Data Assimilation System) dataset set was used for comparing CMIP5 ensemble mean soil moisture in different season. Ensemble mean represents soil moisture well in accordance with the geographical features; prominent arid regions are indicated profoundly. Empirical Orthogonal Function (EOF) analysis was applied to study the variability. First component of EOF explains 17%, 16%, 11% and 11% variability for spring, summer, autumn and winter season respectively. Analysis reveal increasing trend in soil moisture over most parts of Afghanistan, Central and north western parts of Pakistan, northern India and eastern to south eastern parts of China, in spring season. During summer, south western part of India exhibits highest negative trend while rest of the study area show minute trend (increasing or decreasing). In autumn, south west of India is under highest negative loadings. During winter season, north western parts of study area show decreasing trend. Summer rainfall has very week (negative or positive) spatial correlation, with spring soil moisture, while possess higher correlation with summer soil moisture. Our studies have significant contribution to understand complex nature of land - atmosphere interactions, as soil moisture prediction plays an important role in the cycle of sink and source of many air pollutants. Next level of research should be on filling the gaps between accurately measuring the soil moisture using satellite remote sensing and land surface modelling. Impact of soil moisture in tracking down different types of pollutant will also be studied.
Farhan, Saima; Fahiem, Muhammad Abuzar; Tauseef, Huma
2014-01-01
Structural brain imaging is playing a vital role in identification of changes that occur in brain associated with Alzheimer's disease. This paper proposes an automated image processing based approach for the identification of AD from MRI of the brain. The proposed approach is novel in a sense that it has higher specificity/accuracy values despite the use of smaller feature set as compared to existing approaches. Moreover, the proposed approach is capable of identifying AD patients in early stages. The dataset selected consists of 85 age and gender matched individuals from OASIS database. The features selected are volume of GM, WM, and CSF and size of hippocampus. Three different classification models (SVM, MLP, and J48) are used for identification of patients and controls. In addition, an ensemble of classifiers, based on majority voting, is adopted to overcome the error caused by an independent base classifier. Ten-fold cross validation strategy is applied for the evaluation of our scheme. Moreover, to evaluate the performance of proposed approach, individual features and combination of features are fed to individual classifiers and ensemble based classifier. Using size of left hippocampus as feature, the accuracy achieved with ensemble of classifiers is 93.75%, with 100% specificity and 87.5% sensitivity.
NASA Astrophysics Data System (ADS)
Faggiani Dias, D.; Subramanian, A. C.; Zanna, L.; Miller, A. J.
2017-12-01
Sea surface temperature (SST) in the Pacific sector is well known to vary on time scales from seasonal to decadal, and the ability to predict these SST fluctuations has many societal and economical benefits. Therefore, we use a suite of statistical linear inverse models (LIMs) to understand the remote and local SST variability that influences SST predictions over the North Pacific region and further improve our understanding on how the long-observed SST record can help better guide multi-model ensemble forecasts. Observed monthly SST anomalies in the Pacific sector (between 15oS and 60oN) are used to construct different regional LIMs for seasonal to decadal prediction. The forecast skills of the LIMs are compared to that from two operational forecast systems in the North American Multi-Model Ensemble (NMME) revealing that the LIM has better skill in the Northeastern Pacific than NMME models. The LIM is also found to have comparable forecast skill for SST in the Tropical Pacific with NMME models. This skill, however, is highly dependent on the initialization month, with forecasts initialized during the summer having better skill than those initialized during the winter. The forecast skill with LIM is also influenced by the verification period utilized to make the predictions, likely due to the changing character of El Niño in the 20th century. The North Pacific seems to be a source of predictability for the Tropics on seasonal to interannual time scales, while the Tropics act to worsen the skill for the forecast in the North Pacific. The data were also bandpassed into seasonal, interannual and decadal time scales to identify the relationships between time scales using the structure of the propagator matrix. For the decadal component, this coupling occurs the other way around: Tropics seem to be a source of predictability for the Extratropics, but the Extratropics don't improve the predictability for the Tropics. These results indicate the importance of temporal scale interactions in improving predictability on decadal timescales. Hence, we show that LIMs are not only useful as benchmarks for estimates of statistical skill, but also to isolate contributions to the forecast skills from different timescales, spatial scales or even model components.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won
2014-08-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won
2014-01-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188
Space weather forecasting with a Multimodel Ensemble Prediction System (MEPS)
NASA Astrophysics Data System (ADS)
Schunk, R. W.; Scherliess, L.; Eccles, V.; Gardner, L. C.; Sojka, J. J.; Zhu, L.; Pi, X.; Mannucci, A. J.; Butala, M.; Wilson, B. D.; Komjathy, A.; Wang, C.; Rosen, G.
2016-07-01
The goal of the Multimodel Ensemble Prediction System (MEPS) program is to improve space weather specification and forecasting with ensemble modeling. Space weather can have detrimental effects on a variety of civilian and military systems and operations, and many of the applications pertain to the ionosphere and upper atmosphere. Space weather can affect over-the-horizon radars, HF communications, surveying and navigation systems, surveillance, spacecraft charging, power grids, pipelines, and the Federal Aviation Administration (FAA's) Wide Area Augmentation System (WAAS). Because of its importance, numerous space weather forecasting approaches are being pursued, including those involving empirical, physics-based, and data assimilation models. Clearly, if there are sufficient data, the data assimilation modeling approach is expected to be the most reliable, but different data assimilation models can produce different results. Therefore, like the meteorology community, we created a Multimodel Ensemble Prediction System (MEPS) for the Ionosphere-Thermosphere-Electrodynamics (ITE) system that is based on different data assimilation models. The MEPS ensemble is composed of seven physics-based data assimilation models for the ionosphere, ionosphere-plasmasphere, thermosphere, high-latitude ionosphere-electrodynamics, and middle to low latitude ionosphere-electrodynamics. Hence, multiple data assimilation models can be used to describe each region. A selected storm event that was reconstructed with four different data assimilation models covering the middle and low latitude ionosphere is presented and discussed. In addition, the effect of different data types on the reconstructions is shown.
A comparison of resampling schemes for estimating model observer performance with small ensembles
NASA Astrophysics Data System (ADS)
Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.
2017-09-01
In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.
Image Change Detection via Ensemble Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Benjamin W; Vatsavai, Raju
2013-01-01
The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work,more » we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.« less
A probabilistic verification score for contours demonstrated with idealized ice-edge forecasts
NASA Astrophysics Data System (ADS)
Goessling, Helge; Jung, Thomas
2017-04-01
We introduce a probabilistic verification score for ensemble-based forecasts of contours: the Spatial Probability Score (SPS). Defined as the spatial integral of local (Half) Brier Scores, the SPS can be considered the spatial analog of the Continuous Ranked Probability Score (CRPS). Applying the SPS to idealized seasonal ensemble forecasts of the Arctic sea-ice edge in a global coupled climate model, we demonstrate that the SPS responds properly to ensemble size, bias, and spread. When applied to individual forecasts or ensemble means (or quantiles), the SPS is reduced to the 'volume' of mismatch, in case of the ice edge corresponding to the Integrated Ice Edge Error (IIEE).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Xuehang; Chen, Xingyuan; Ye, Ming
2015-07-01
This study develops a new framework of facies-based data assimilation for characterizing spatial distribution of hydrofacies and estimating their associated hydraulic properties. This framework couples ensemble data assimilation with transition probability-based geostatistical model via a parameterization based on a level set function. The nature of ensemble data assimilation makes the framework efficient and flexible to be integrated with various types of observation data. The transition probability-based geostatistical model keeps the updated hydrofacies distributions under geological constrains. The framework is illustrated by using a two-dimensional synthetic study that estimates hydrofacies spatial distribution and permeability in each hydrofacies from transient head data.more » Our results show that the proposed framework can characterize hydrofacies distribution and associated permeability with adequate accuracy even with limited direct measurements of hydrofacies. Our study provides a promising starting point for hydrofacies delineation in complex real problems.« less
Time-dependent generalized Gibbs ensembles in open quantum systems
NASA Astrophysics Data System (ADS)
Lange, Florian; Lenarčič, Zala; Rosch, Achim
2018-04-01
Generalized Gibbs ensembles have been used as powerful tools to describe the steady state of integrable many-particle quantum systems after a sudden change of the Hamiltonian. Here, we demonstrate numerically that they can be used for a much broader class of problems. We consider integrable systems in the presence of weak perturbations which break both integrability and drive the system to a state far from equilibrium. Under these conditions, we show that the steady state and the time evolution on long timescales can be accurately described by a (truncated) generalized Gibbs ensemble with time-dependent Lagrange parameters, determined from simple rate equations. We compare the numerically exact time evolutions of density matrices for small systems with a theory based on block-diagonal density matrices (diagonal ensemble) and a time-dependent generalized Gibbs ensemble containing only a small number of approximately conserved quantities, using the one-dimensional Heisenberg model with perturbations described by Lindblad operators as an example.
AUC-Maximizing Ensembles through Metalearning.
LeDell, Erin; van der Laan, Mark J; Petersen, Maya
2016-05-01
Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.
AUC-Maximizing Ensembles through Metalearning
LeDell, Erin; van der Laan, Mark J.; Peterson, Maya
2016-01-01
Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree. PMID:27227721
Modelling of Heat and Moisture Loss Through NBC Ensembles
1991-11-01
the heat and moisture transport through various NBC clothing ensembles. The analysis involves simplifying the three dimensional physical problem of... clothing on a person to that of a one dimensional problem of flow through parallel layers of clothing and air. Body temperatures are calculated based on...prescribed work rates, ambient conditions and clothing properties. Sweat response and respiration rates are estimated based on empirical data to
Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann
2012-09-24
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
NWP model forecast skill optimization via closure parameter variations
NASA Astrophysics Data System (ADS)
Järvinen, H.; Ollinaho, P.; Laine, M.; Solonen, A.; Haario, H.
2012-04-01
We present results of a novel approach to tune predictive skill of numerical weather prediction (NWP) models. These models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. The current practice is to specify manually the numerical parameter values, based on expert knowledge. We developed recently a concept and method (QJRMS 2011) for on-line estimation of the NWP model parameters via closure parameter variations. The method called EPPES ("Ensemble prediction and parameter estimation system") utilizes ensemble prediction infra-structure for parameter estimation in a very cost-effective way: practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating an ensemble of predictions so that each member uses different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In this presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an ensemble prediction system emulator, based on the ECHAM5 atmospheric GCM show that the model tuning capability of EPPES scales up to realistic models and ensemble prediction systems. Finally, preliminary results of EPPES in the context of ECMWF forecasting system are presented.
Ensemble Downscaling of Winter Seasonal Forecasts: The MRED Project
NASA Astrophysics Data System (ADS)
Arritt, R. W.; Mred Team
2010-12-01
The Multi-Regional climate model Ensemble Downscaling (MRED) project is a multi-institutional project that is producing large ensembles of downscaled winter seasonal forecasts from coupled atmosphere-ocean seasonal prediction models. Eight regional climate models each are downscaling 15-member ensembles from the National Centers for Environmental Prediction (NCEP) Climate Forecast System (CFS) and the new NASA seasonal forecast system based on the GEOS5 atmospheric model coupled with the MOM4 ocean model. This produces 240-member ensembles, i.e., 8 regional models x 15 global ensemble members x 2 global models, for each winter season (December-April) of 1982-2003. Results to date show that combined global-regional downscaled forecasts have greatest skill for seasonal precipitation anomalies during strong El Niño events such as 1982-83 and 1997-98. Ensemble means of area-averaged seasonal precipitation for the regional models generally track the corresponding results for the global model, though there is considerable inter-model variability amongst the regional models. For seasons and regions where area mean precipitation is accurately simulated the regional models bring added value by extracting greater spatial detail from the global forecasts, mainly due to better resolution of terrain in the regional models. Our results also emphasize that an ensemble approach is essential to realizing the added value from the combined global-regional modeling system.
A Single Column Model Ensemble Approach Applied to the TWP-ICE Experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davies, Laura; Jakob, Christian; Cheung, K.
2013-06-27
Single column models (SCM) are useful testbeds for investigating the parameterisation schemes of numerical weather prediction and climate models. The usefulness of SCM simulations are limited, however, by the accuracy of the best-estimate large-scale data prescribed. One method to address this uncertainty is to perform ensemble simulations of the SCM. This study first derives an ensemble of large-scale data for the Tropical Warm Pool International Cloud Experiment (TWP-ICE) based on an estimate of a possible source of error in the best-estimate product. This data is then used to carry out simulations with 11 SCM and 2 cloud-resolving models (CRM). Best-estimatemore » simulations are also performed. All models show that moisture related variables are close to observations and there are limited differences between the best-estimate and ensemble mean values. The models, however, show different sensitivities to changes in the forcing particularly when weakly forced. The ensemble simulations highlight important differences in the moisture budget between the SCM and CRM. Systematic differences are also apparent in the ensemble mean vertical structure of cloud variables. The ensemble is further used to investigate relations between cloud variables and precipitation identifying large differences between CRM and SCM. This study highlights that additional information can be gained by performing ensemble simulations enhancing the information derived from models using the more traditional single best-estimate simulation.« less
Insights into the deterministic skill of air quality ensembles ...
Simulations from chemical weather models are subject to uncertainties in the input data (e.g. emission inventory, initial and boundary conditions) as well as those intrinsic to the model (e.g. physical parameterization, chemical mechanism). Multi-model ensembles can improve the forecast skill, provided that certain mathematical conditions are fulfilled. In this work, four ensemble methods were applied to two different datasets, and their performance was compared for ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10). Apart from the unconditional ensemble average, the approach behind the other three methods relies on adding optimum weights to members or constraining the ensemble to those members that meet certain conditions in time or frequency domain. The two different datasets were created for the first and second phase of the Air Quality Model Evaluation International Initiative (AQMEII). The methods are evaluated against ground level observations collected from the EMEP (European Monitoring and Evaluation Programme) and AirBase databases. The goal of the study is to quantify to what extent we can extract predictable signals from an ensemble with superior skill over the single models and the ensemble mean. Verification statistics show that the deterministic models simulate better O3 than NO2 and PM10, linked to different levels of complexity in the represented processes. The unconditional ensemble mean achieves higher skill compared to each stati
Reliable probabilities through statistical post-processing of ensemble predictions
NASA Astrophysics Data System (ADS)
Van Schaeybroeck, Bert; Vannitsem, Stéphane
2013-04-01
We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.
A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles
NASA Astrophysics Data System (ADS)
Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.
2016-12-01
Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.
NASA Astrophysics Data System (ADS)
Vervatis, Vassilios; De Mey, Pierre; Ayoub, Nadia; Kailas, Marios; Sofianos, Sarantis
2017-04-01
The project entitled Stochastic Coastal/Regional Uncertainty Modelling (SCRUM) aims at strengthening CMEMS in the areas of ocean uncertainty quantification, ensemble consistency verification and ensemble data assimilation. The project has been initiated by the University of Athens and LEGOS/CNRS research teams, in the framework of CMEMS Service Evolution. The work is based on stochastic modelling of ocean physics and biogeochemistry in the Bay of Biscay, on an identical sub-grid configuration of the IBI-MFC system in its latest CMEMS operational version V2. In a first step, we use a perturbed tendencies scheme to generate ensembles describing uncertainties in open ocean and on the shelf, focusing on upper ocean processes. In a second step, we introduce two methodologies (i.e. rank histograms and array modes) aimed at checking the consistency of the above ensembles with respect to TAC data and arrays. Preliminary results highlight that wind uncertainties dominate all other atmosphere-ocean sources of model errors. The ensemble spread in medium-range ensembles is approximately 0.01 m for SSH and 0.15 °C for SST, though these values vary depending on season and cross shelf regions. Ecosystem model uncertainties emerging from perturbations in physics appear to be moderately larger than those perturbing the concentration of the biogeochemical compartments, resulting in total chlorophyll spread at about 0.01 mg.m-3. First consistency results show that the model ensemble and the pseudo-ensemble of OSTIA (L4) observation SSTs appear to exhibit nonzero joint probabilities with each other since error vicinities overlap. Rank histograms show that the model ensemble is initially under-dispersive, though results improve in the context of seasonal-range ensembles.
Ehrhardt, Fiona; Soussana, Jean-François; Bellocchi, Gianni; Grace, Peter; McAuliffe, Russel; Recous, Sylvie; Sándor, Renáta; Smith, Pete; Snow, Val; de Antoni Migliorati, Massimiliano; Basso, Bruno; Bhatia, Arti; Brilli, Lorenzo; Doltra, Jordi; Dorich, Christopher D; Doro, Luca; Fitton, Nuala; Giacomini, Sandro J; Grant, Brian; Harrison, Matthew T; Jones, Stephanie K; Kirschbaum, Miko U F; Klumpp, Katja; Laville, Patricia; Léonard, Joël; Liebig, Mark; Lieffering, Mark; Martin, Raphaël; Massad, Raia S; Meier, Elizabeth; Merbold, Lutz; Moore, Andrew D; Myrgiotis, Vasileios; Newton, Paul; Pattey, Elizabeth; Rolinski, Susanne; Sharp, Joanna; Smith, Ward N; Wu, Lianhai; Zhang, Qing
2018-02-01
Simulation models are extensively used to predict agricultural productivity and greenhouse gas emissions. However, the uncertainties of (reduced) model ensemble simulations have not been assessed systematically for variables affecting food security and climate change mitigation, within multi-species agricultural contexts. We report an international model comparison and benchmarking exercise, showing the potential of multi-model ensembles to predict productivity and nitrous oxide (N 2 O) emissions for wheat, maize, rice and temperate grasslands. Using a multi-stage modelling protocol, from blind simulations (stage 1) to partial (stages 2-4) and full calibration (stage 5), 24 process-based biogeochemical models were assessed individually or as an ensemble against long-term experimental data from four temperate grassland and five arable crop rotation sites spanning four continents. Comparisons were performed by reference to the experimental uncertainties of observed yields and N 2 O emissions. Results showed that across sites and crop/grassland types, 23%-40% of the uncalibrated individual models were within two standard deviations (SD) of observed yields, while 42 (rice) to 96% (grasslands) of the models were within 1 SD of observed N 2 O emissions. At stage 1, ensembles formed by the three lowest prediction model errors predicted both yields and N 2 O emissions within experimental uncertainties for 44% and 33% of the crop and grassland growth cycles, respectively. Partial model calibration (stages 2-4) markedly reduced prediction errors of the full model ensemble E-median for crop grain yields (from 36% at stage 1 down to 4% on average) and grassland productivity (from 44% to 27%) and to a lesser and more variable extent for N 2 O emissions. Yield-scaled N 2 O emissions (N 2 O emissions divided by crop yields) were ranked accurately by three-model ensembles across crop species and field sites. The potential of using process-based model ensembles to predict jointly productivity and N 2 O emissions at field scale is discussed. © 2017 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Sokol, Zbyněk; Mejsnar, Jan; Pop, Lukáš; Bližňák, Vojtěch
2017-09-01
A new method for the probabilistic nowcasting of instantaneous rain rates (ENS) based on the ensemble technique and extrapolation along Lagrangian trajectories of current radar reflectivity is presented. Assuming inaccurate forecasts of the trajectories, an ensemble of precipitation forecasts is calculated and used to estimate the probability that rain rates will exceed a given threshold in a given grid point. Although the extrapolation neglects the growth and decay of precipitation, their impact on the probability forecast is taken into account by the calibration of forecasts using the reliability component of the Brier score (BS). ENS forecasts the probability that the rain rates will exceed thresholds of 0.1, 1.0 and 3.0 mm/h in squares of 3 km by 3 km. The lead times were up to 60 min, and the forecast accuracy was measured by the BS. The ENS forecasts were compared with two other methods: combined method (COM) and neighbourhood method (NEI). NEI considered the extrapolated values in the square neighbourhood of 5 by 5 grid points of the point of interest as ensemble members, and the COM ensemble was comprised of united ensemble members of ENS and NEI. The results showed that the calibration technique significantly improves bias of the probability forecasts by including additional uncertainties that correspond to neglected processes during the extrapolation. In addition, the calibration can also be used for finding the limits of maximum lead times for which the forecasting method is useful. We found that ENS is useful for lead times up to 60 min for thresholds of 0.1 and 1 mm/h and approximately 30 to 40 min for a threshold of 3 mm/h. We also found that a reasonable size of the ensemble is 100 members, which provided better scores than ensembles with 10, 25 and 50 members. In terms of the BS, the best results were obtained by ENS and COM, which are comparable. However, ENS is better calibrated and thus preferable.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
NASA Astrophysics Data System (ADS)
Baker, Allison H.; Hu, Yong; Hammerling, Dorit M.; Tseng, Yu-heng; Xu, Haiying; Huang, Xiaomeng; Bryan, Frank O.; Yang, Guangwen
2016-07-01
The Parallel Ocean Program (POP), the ocean model component of the Community Earth System Model (CESM), is widely used in climate research. Most current work in CESM-POP focuses on improving the model's efficiency or accuracy, such as improving numerical methods, advancing parameterization, porting to new architectures, or increasing parallelism. Since ocean dynamics are chaotic in nature, achieving bit-for-bit (BFB) identical results in ocean solutions cannot be guaranteed for even tiny code modifications, and determining whether modifications are admissible (i.e., statistically consistent with the original results) is non-trivial. In recent work, an ensemble-based statistical approach was shown to work well for software verification (i.e., quality assurance) on atmospheric model data. The general idea of the ensemble-based statistical consistency testing is to use a qualitative measurement of the variability of the ensemble of simulations as a metric with which to compare future simulations and make a determination of statistical distinguishability. The capability to determine consistency without BFB results boosts model confidence and provides the flexibility needed, for example, for more aggressive code optimizations and the use of heterogeneous execution environments. Since ocean and atmosphere models have differing characteristics in term of dynamics, spatial variability, and timescales, we present a new statistical method to evaluate ocean model simulation data that requires the evaluation of ensemble means and deviations in a spatial manner. In particular, the statistical distribution from an ensemble of CESM-POP simulations is used to determine the standard score of any new model solution at each grid point. Then the percentage of points that have scores greater than a specified threshold indicates whether the new model simulation is statistically distinguishable from the ensemble simulations. Both ensemble size and composition are important. Our experiments indicate that the new POP ensemble consistency test (POP-ECT) tool is capable of distinguishing cases that should be statistically consistent with the ensemble and those that should not, as well as providing a simple, subjective and systematic way to detect errors in CESM-POP due to the hardware or software stack, positively contributing to quality assurance for the CESM-POP code.
NASA Astrophysics Data System (ADS)
Singh, Shailesh Kumar
2014-05-01
Streamflow forecasts are essential for making critical decision for optimal allocation of water supplies for various demands that include irrigation for agriculture, habitat for fisheries, hydropower production and flood warning. The major objective of this study is to explore the Ensemble Streamflow Prediction (ESP) based forecast in New Zealand catchments and to highlights the present capability of seasonal flow forecasting of National Institute of Water and Atmospheric Research (NIWA). In this study a probabilistic forecast framework for ESP is presented. The basic assumption in ESP is that future weather pattern were experienced historically. Hence, past forcing data can be used with current initial condition to generate an ensemble of prediction. Small differences in initial conditions can result in large difference in the forecast. The initial state of catchment can be obtained by continuously running the model till current time and use this initial state with past forcing data to generate ensemble of flow for future. The approach taken here is to run TopNet hydrological models with a range of past forcing data (precipitation, temperature etc.) with current initial conditions. The collection of runs is called the ensemble. ESP give probabilistic forecasts for flow. From ensemble members the probability distributions can be derived. The probability distributions capture part of the intrinsic uncertainty in weather or climate. An ensemble stream flow prediction which provide probabilistic hydrological forecast with lead time up to 3 months is presented for Rangitata, Ahuriri, and Hooker and Jollie rivers in South Island of New Zealand. ESP based seasonal forecast have better skill than climatology. This system can provide better over all information for holistic water resource management.
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-05-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Landsgesell, Jonas; Holm, Christian; Smiatek, Jens
2017-02-14
We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.
Minimal ensemble based on subset selection using ECG to diagnose categories of CAN.
Abawajy, Jemal; Kelarev, Andrei; Yi, Xun; Jelinek, Herbert F
2018-07-01
Early diagnosis of cardiac autonomic neuropathy (CAN) is critical for reversing or decreasing its progression and prevent complications. Diagnostic accuracy or precision is one of the core requirements of CAN detection. As the standard Ewing battery tests suffer from a number of shortcomings, research in automating and improving the early detection of CAN has recently received serious attention in identifying additional clinical variables and designing advanced ensembles of classifiers to improve the accuracy or precision of CAN diagnostics. Although large ensembles are commonly proposed for the automated diagnosis of CAN, large ensembles are characterized by slow processing speed and computational complexity. This paper applies ECG features and proposes a new ensemble-based approach for diagnosis of CAN progression. We introduce a Minimal Ensemble Based On Subset Selection (MEBOSS) for the diagnosis of all categories of CAN including early, definite and atypical CAN. MEBOSS is based on a novel multi-tier architecture applying classifier subset selection as well as the training subset selection during several steps of its operation. Our experiments determined the diagnostic accuracy or precision obtained in 5 × 2 cross-validation for various options employed in MEBOSS and other classification systems. The experiments demonstrate the operation of the MEBOSS procedure invoking the most effective classifiers available in the open source software environment SageMath. The results of our experiments show that for the large DiabHealth database of CAN related parameters MEBOSS outperformed other classification systems available in SageMath and achieved 94% to 97% precision in 5 × 2 cross-validation correctly distinguishing any two CAN categories to a maximum of five categorizations including control, early, definite, severe and atypical CAN. These results show that MEBOSS architecture is effective and can be recommended for practical implementations in systems for the diagnosis of CAN progression. Copyright © 2018 Elsevier B.V. All rights reserved.
Tatinati, Sivanagaraja; Nazarpour, Kianoush; Tech Ang, Wei; Veluvolu, Kalyana C
2016-08-01
Successful treatment of tumors with motion-adaptive radiotherapy requires accurate prediction of respiratory motion, ideally with a prediction horizon larger than the latency in radiotherapy system. Accurate prediction of respiratory motion is however a non-trivial task due to the presence of irregularities and intra-trace variabilities, such as baseline drift and temporal changes in fundamental frequency pattern. In this paper, to enhance the accuracy of the respiratory motion prediction, we propose a stacked regression ensemble framework that integrates heterogeneous respiratory motion prediction algorithms. We further address two crucial issues for developing a successful ensemble framework: (1) selection of appropriate prediction methods to ensemble (level-0 methods) among the best existing prediction methods; and (2) finding a suitable generalization approach that can successfully exploit the relative advantages of the chosen level-0 methods. The efficacy of the developed ensemble framework is assessed with real respiratory motion traces acquired from 31 patients undergoing treatment. Results show that the developed ensemble framework improves the prediction performance significantly compared to the best existing methods. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Landsgesell, Jonas; Holm, Christian; Smiatek, Jens
2017-03-01
The reaction ensemble and the constant pH method are well-known chemical equilibrium approaches to simulate protonation and deprotonation reactions in classical molecular dynamics and Monte Carlo simulations. In this article, we demonstrate the similarity between both methods under certain conditions. We perform molecular dynamics simulations of a weak polyelectrolyte in order to compare the titration curves obtained by both approaches. Our findings reveal a good agreement between the methods when the reaction ensemble is used to sweep the reaction constant. Pronounced differences between the reaction ensemble and the constant pH method can be observed for stronger acids and bases in terms of adaptive pH values. These deviations are due to the presence of explicit protons in the reaction ensemble method which induce a screening of electrostatic interactions between the charged titrable groups of the polyelectrolyte. The outcomes of our simulation hint to a better applicability of the reaction ensemble method for systems in confined geometries and titrable groups in polyelectrolytes with different pKa values.
Water Balance in the Amazon Basin from a Land Surface Model Ensemble
NASA Technical Reports Server (NTRS)
Getirana, Augusto C. V.; Dutra, Emanuel; Guimberteau, Matthieu; Kam, Jonghun; Li, Hong-Yi; Decharme, Bertrand; Zhang, Zhengqiu; Ducharne, Agnes; Boone, Aaron; Balsamo, Gianpaolo;
2014-01-01
Despite recent advances in land surfacemodeling and remote sensing, estimates of the global water budget are still fairly uncertain. This study aims to evaluate the water budget of the Amazon basin based on several state-ofthe- art land surface model (LSM) outputs. Water budget variables (terrestrial water storage TWS, evapotranspiration ET, surface runoff R, and base flow B) are evaluated at the basin scale using both remote sensing and in situ data. Meteorological forcings at a 3-hourly time step and 18 spatial resolution were used to run 14 LSMs. Precipitation datasets that have been rescaled to matchmonthly Global Precipitation Climatology Project (GPCP) andGlobal Precipitation Climatology Centre (GPCC) datasets and the daily Hydrologie du Bassin de l'Amazone (HYBAM) dataset were used to perform three experiments. The Hydrological Modeling and Analysis Platform (HyMAP) river routing scheme was forced with R and B and simulated discharges are compared against observations at 165 gauges. Simulated ET and TWS are compared against FLUXNET and MOD16A2 evapotranspiration datasets andGravity Recovery and ClimateExperiment (GRACE)TWSestimates in two subcatchments of main tributaries (Madeira and Negro Rivers).At the basin scale, simulated ET ranges from 2.39 to 3.26 mm day(exp -1) and a low spatial correlation between ET and precipitation indicates that evapotranspiration does not depend on water availability over most of the basin. Results also show that other simulated water budget components vary significantly as a function of both the LSM and precipitation dataset, but simulated TWS generally agrees with GRACE estimates at the basin scale. The best water budget simulations resulted from experiments using HYBAM, mostly explained by a denser rainfall gauge network and the rescaling at a finer temporal scale.
NASA Astrophysics Data System (ADS)
Bi, Lei; Yang, Ping
2015-04-01
Understanding the inherent optical properties (IOPs) of coccoliths and coccolithophores is important in oceanic radiative transfer simulations and remote sensing implementations. In this study, the invariant imbedding T-matrix method (II-TM) is employed to investigate the IOPs of coccoliths and coccolithophores. The Emiliania huxleyi (Ehux) coccolith and coccolithophore models are built based on observed biometric parameters including the eccentricity, the number of slits, and the rim width of detached coccoliths. The calcification state that specifies the amount of calcium of a single coccolith is critical in the determination of the size-volume/mass relationship (note, the volume/mass of coccoltihs at different calcification states are different although the diameters are the same). The present results show that the calcification state, namely, under-calcification, normal-calcification, or over-calcification, significantly influences the backscattering cross section and the phase matrix. Furthermore, the linear depolarization ratio of the light scattered by coccoliths is sensitive to the degree of calcification, and provides a potentially valuable parameter for interpreting oceanic remote sensing data. The phase function of an ensemble of randomly oriented coccolithophores has a similar pattern to that of individual coccoliths, but the forward scattering is dominant in the coccolithophores due to the large geometric cross sections. The linear depolarization ratio associated with coccolithophores is found to be larger than that for coccoliths as polarization is more sensitive to multiple scattering than the phase function. The simulated coccolithophore phase matrix numerical results are compared with laboratory measurements. For scattering angles larger than 100°, an increase of the phase function with respect to the scattering angle is confirmed based on the present coccolithophore model while the spherical approximation fails.
NASA Astrophysics Data System (ADS)
Lam, D. T.; Kerrou, J.; Benabderrahmane, H.; Perrochet, P.
2017-12-01
The calibration of groundwater flow models in transient state can be motivated by the expected improved characterization of the aquifer hydraulic properties, especially when supported by a rich transient dataset. In the prospect of setting up a calibration strategy for a variably-saturated transient groundwater flow model of the area around the ANDRA's Bure Underground Research Laboratory, we wish to take advantage of the long hydraulic head and flowrate time series collected near and at the access shafts in order to help inform the model hydraulic parameters. A promising inverse approach for such high-dimensional nonlinear model, and which applicability has been illustrated more extensively in other scientific fields, could be an iterative ensemble smoother algorithm initially developed for a reservoir engineering problem. Furthermore, the ensemble-based stochastic framework will allow to address to some extent the uncertainty of the calibration for a subsequent analysis of a flow process dependent prediction. By assimilating the available data in one single step, this method iteratively updates each member of an initial ensemble of stochastic realizations of parameters until the minimization of an objective function. However, as it is well known for ensemble-based Kalman methods, this correction computed from approximations of covariance matrices is most efficient when the ensemble realizations are multi-Gaussian. As shown by the comparison of the updated ensemble mean obtained for our simplified synthetic model of 2D vertical flow by using either multi-Gaussian or multipoint simulations of parameters, the ensemble smoother fails to preserve the initial connectivity of the facies and the parameter bimodal distribution. Given the geological structures depicted by the multi-layered geological model built for the real case, our goal is to find how to still best leverage the performance of the ensemble smoother while using an initial ensemble of conditional multi-Gaussian simulations or multipoint simulations as conceptually consistent as possible. Performance of the algorithm including additional steps to help mitigate the effects of non-Gaussian patterns, such as Gaussian anamorphosis, or resampling of facies from the training image using updated local probability constraints will be assessed.
2013-01-01
Background Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. Methods This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. Results and conclusions We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface. PMID:24564970
NASA Astrophysics Data System (ADS)
Rasera, L. G.; Mariethoz, G.; Lane, S. N.
2017-12-01
Frequent acquisition of high-resolution digital elevation models (HR-DEMs) over large areas is expensive and difficult. Satellite-derived low-resolution digital elevation models (LR-DEMs) provide extensive coverage of Earth's surface but at coarser spatial and temporal resolutions. Although useful for large scale problems, LR-DEMs are not suitable for modeling hydrologic and geomorphic processes at scales smaller than their spatial resolution. In this work, we present a multiple-point geostatistical approach for downscaling a target LR-DEM based on available high-resolution training data and recurrent high-resolution remote sensing images. The method aims at generating several equiprobable HR-DEMs conditioned to a given target LR-DEM by borrowing small scale topographic patterns from an analogue containing data at both coarse and fine scales. An application of the methodology is demonstrated by using an ensemble of simulated HR-DEMs as input to a flow-routing algorithm. The proposed framework enables a probabilistic assessment of the spatial structures generated by natural phenomena operating at scales finer than the available terrain elevation measurements. A case study in the Swiss Alps is provided to illustrate the methodology.
Fielding, M. D.; Chiu, J. C.; Hogan, R. J.; ...
2015-02-16
Active remote sensing of marine boundary-layer clouds is challenging as drizzle drops often dominate the observed radar reflectivity. We present a new method to simultaneously retrieve cloud and drizzle vertical profiles in drizzling boundary-layer cloud using surface-based observations of radar reflectivity, lidar attenuated backscatter, and zenith radiances. Specifically, the vertical structure of droplet size and water content of both cloud and drizzle is characterised throughout the cloud. An ensemble optimal estimation approach provides full error statistics given the uncertainty in the observations. To evaluate the new method, we first perform retrievals using synthetic measurements from large-eddy simulation snapshots of cumulusmore » under stratocumulus, where cloud water path is retrieved with an error of 31 g m −2. The method also performs well in non-drizzling clouds where no assumption of the cloud profile is required. We then apply the method to observations of marine stratocumulus obtained during the Atmospheric Radiation Measurement MAGIC deployment in the northeast Pacific. Here, retrieved cloud water path agrees well with independent 3-channel microwave radiometer retrievals, with a root mean square difference of 10–20 g m −2.« less
Ensemble-Based Data Assimilation With a Martian GCM
NASA Astrophysics Data System (ADS)
Lawson, W.; Richardson, M. I.; McCleese, D. J.; Anderson, J. L.; Chen, Y.; Snyder, C.
2007-12-01
Quantitative study of Mars weather and climate will ultimately stem from analysis of its dynamic and thermodynamic fields. Of all the observations of Mars available to date, such fields are most easily derived from mapping data (radiances) of the martian atmosphere as measured by orbiting infrared spectrometers and radiometers (e.g., MGS / TES and MRO / MCS). Such data-derived products are the solutions to inverse problems, and while individual profile retrievals have been the popular data-derived products in the planetary sciences, the terrestrial meteorological community has gained much ground over the last decade by employing techniques of data assimilation (DA) to analyze radiances. Ancillary information is required to close an inverse problem (i.e., to disambiguate the family of possibilities that are consistent with the observations), and DA practitioners inevitably rely on numerical models for this information (e.g., general circulation models (GCMs)). Data assimilation elicits maximal information content from available observations, and, by way of the physics encoded in the numerical model, spreads this information spatially, temporally, and across variables, thus allowing global extrapolation of limited and non-simultaneous observations. If the model is skillful, then a given, specific model integration can be corrected by the information spreading abilities of DA, and the resulting time sequence of "analysis" states are brought into agreement with the observations. These analysis states are complete, gridded estimates of all the fields one might wish to diagnose for scientific study of the martian atmosphere. Though a numerical model has been used to obtain these estimates, their fidelity rests in their simultaneous consistency with both the observations (to within their stated uncertainties) and the physics contained in the model. In this fashion, radiance observations can, say, be used to deduce the wind field. A new class of DA approaches based on Monte Carlo approximations, "ensemble-based methods," has matured enough to be both appropriate for use in planetary problems and exploitably within the reach of planetary scientists. Capitalizing on this new class of methods, the National Center for Atmospheric Research (NCAR) has developed a framework for ensemble-based DA that is flexible and modular in its use of various forecast models and data sets. The framework is called DART, the Data Assimilation Research Testbed, and it is freely available on-line. We have begun to take advantage of this rich software infrastructure, and are on our way toward performing state of the art DA in the martian atmosphere using Caltech's martian general circulation model, PlanetWRF. We have begun by testing and validating the model within DART under idealized scenarios, and we hope to address actual, available infrared remote sensing datasets from Mars orbiters in the coming year. We shall present the details of this approach and our progress to date.
Brekke, L.D.; Dettinger, M.D.; Maurer, E.P.; Anderson, M.
2008-01-01
Ensembles of historical climate simulations and climate projections from the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3) multi-model dataset were investigated to determine how model credibility affects apparent relative scenario likelihoods in regional risk assessments. Methods were developed and applied in a Northern California case study. An ensemble of 59 twentieth century climate simulations from 17 WCRP CMIP3 models was analyzed to evaluate relative model credibility associated with a 75-member projection ensemble from the same 17 models. Credibility was assessed based on how models realistically reproduced selected statistics of historical climate relevant to California climatology. Metrics of this credibility were used to derive relative model weights leading to weight-threshold culling of models contributing to the projection ensemble. Density functions were then estimated for two projected quantities (temperature and precipitation), with and without considering credibility-based ensemble reductions. An analysis for Northern California showed that, while some models seem more capable at recreating limited aspects twentieth century climate, the overall tendency is for comparable model performance when several credibility measures are combined. Use of these metrics to decide which models to include in density function development led to local adjustments to function shapes, but led to limited affect on breadth and central tendency, which were found to be more influenced by 'completeness' of the original ensemble in terms of models and emissions pathways. ?? 2007 Springer Science+Business Media B.V.
An Intelligent Ensemble Neural Network Model for Wind Speed Prediction in Renewable Energy Systems.
Ranganayaki, V; Deepa, S N
2016-01-01
Various criteria are proposed to select the number of hidden neurons in artificial neural network (ANN) models and based on the criterion evolved an intelligent ensemble neural network model is proposed to predict wind speed in renewable energy applications. The intelligent ensemble neural model based wind speed forecasting is designed by averaging the forecasted values from multiple neural network models which includes multilayer perceptron (MLP), multilayer adaptive linear neuron (Madaline), back propagation neural network (BPN), and probabilistic neural network (PNN) so as to obtain better accuracy in wind speed prediction with minimum error. The random selection of hidden neurons numbers in artificial neural network results in overfitting or underfitting problem. This paper aims to avoid the occurrence of overfitting and underfitting problems. The selection of number of hidden neurons is done in this paper employing 102 criteria; these evolved criteria are verified by the computed various error values. The proposed criteria for fixing hidden neurons are validated employing the convergence theorem. The proposed intelligent ensemble neural model is applied for wind speed prediction application considering the real time wind data collected from the nearby locations. The obtained simulation results substantiate that the proposed ensemble model reduces the error value to minimum and enhances the accuracy. The computed results prove the effectiveness of the proposed ensemble neural network (ENN) model with respect to the considered error factors in comparison with that of the earlier models available in the literature.
Simultaneous calibration of ensemble river flow predictions over an entire range of lead times
NASA Astrophysics Data System (ADS)
Hemri, S.; Fundel, F.; Zappa, M.
2013-10-01
Probabilistic estimates of future water levels and river discharge are usually simulated with hydrologic models using ensemble weather forecasts as main inputs. As hydrologic models are imperfect and the meteorological ensembles tend to be biased and underdispersed, the ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, in order to achieve both reliable and sharp predictions statistical postprocessing is required. In this work Bayesian model averaging (BMA) is applied to statistically postprocess ensemble runoff raw forecasts for a catchment in Switzerland, at lead times ranging from 1 to 240 h. The raw forecasts have been obtained using deterministic and ensemble forcing meteorological models with different forecast lead time ranges. First, BMA is applied based on mixtures of univariate normal distributions, subject to the assumption of independence between distinct lead times. Then, the independence assumption is relaxed in order to estimate multivariate runoff forecasts over the entire range of lead times simultaneously, based on a BMA version that uses multivariate normal distributions. Since river runoff is a highly skewed variable, Box-Cox transformations are applied in order to achieve approximate normality. Both univariate and multivariate BMA approaches are able to generate well calibrated probabilistic forecasts that are considerably sharper than climatological forecasts. Additionally, multivariate BMA provides a promising approach for incorporating temporal dependencies into the postprocessed forecasts. Its major advantage against univariate BMA is an increase in reliability when the forecast system is changing due to model availability.
NASA Astrophysics Data System (ADS)
Poppick, A. N.; McKinnon, K. A.; Dunn-Sigouin, E.; Deser, C.
2017-12-01
Initial condition climate model ensembles suggest that regional temperature trends can be highly variable on decadal timescales due to characteristics of internal climate variability. Accounting for trend uncertainty due to internal variability is therefore necessary to contextualize recent observed temperature changes. However, while the variability of trends in a climate model ensemble can be evaluated directly (as the spread across ensemble members), internal variability simulated by a climate model may be inconsistent with observations. Observation-based methods for assessing the role of internal variability on trend uncertainty are therefore required. Here, we use a statistical resampling approach to assess trend uncertainty due to internal variability in historical 50-year (1966-2015) winter near-surface air temperature trends over North America. We compare this estimate of trend uncertainty to simulated trend variability in the NCAR CESM1 Large Ensemble (LENS), finding that uncertainty in wintertime temperature trends over North America due to internal variability is largely overestimated by CESM1, on average by a factor of 32%. Our observation-based resampling approach is combined with the forced signal from LENS to produce an 'Observational Large Ensemble' (OLENS). The members of OLENS indicate a range of spatially coherent fields of temperature trends resulting from different sequences of internal variability consistent with observations. The smaller trend variability in OLENS suggests that uncertainty in the historical climate change signal in observations due to internal variability is less than suggested by LENS.
An Intelligent Ensemble Neural Network Model for Wind Speed Prediction in Renewable Energy Systems
Ranganayaki, V.; Deepa, S. N.
2016-01-01
Various criteria are proposed to select the number of hidden neurons in artificial neural network (ANN) models and based on the criterion evolved an intelligent ensemble neural network model is proposed to predict wind speed in renewable energy applications. The intelligent ensemble neural model based wind speed forecasting is designed by averaging the forecasted values from multiple neural network models which includes multilayer perceptron (MLP), multilayer adaptive linear neuron (Madaline), back propagation neural network (BPN), and probabilistic neural network (PNN) so as to obtain better accuracy in wind speed prediction with minimum error. The random selection of hidden neurons numbers in artificial neural network results in overfitting or underfitting problem. This paper aims to avoid the occurrence of overfitting and underfitting problems. The selection of number of hidden neurons is done in this paper employing 102 criteria; these evolved criteria are verified by the computed various error values. The proposed criteria for fixing hidden neurons are validated employing the convergence theorem. The proposed intelligent ensemble neural model is applied for wind speed prediction application considering the real time wind data collected from the nearby locations. The obtained simulation results substantiate that the proposed ensemble model reduces the error value to minimum and enhances the accuracy. The computed results prove the effectiveness of the proposed ensemble neural network (ENN) model with respect to the considered error factors in comparison with that of the earlier models available in the literature. PMID:27034973
Atmospheric Nitrogen Deposition to the Oceans: Observation- and Model-Based Estimates
NASA Astrophysics Data System (ADS)
Baker, Alex; Altieri, Katye; Okin, Greg; Dentener, Frank; Uematsu, Mitsuo; Kanakidou, Maria; Sarin, Manmohan; Duce, Robert; Galloway, Jim; Keene, Bill; Singh, Arvind; Zamora, Lauren; Lamarque, Jean-Francois; Hsu, Shih-Chieh
2014-05-01
The reactive nitrogen (Nr) burden of the atmosphere has been increased by a factor of 3-4 by anthropogenic activity since the industrial revolution. This has led to large increases in the deposition of nitrate and ammonium to the surface waters of the open ocean, particularly downwind of major human population centres, such as those in North America, Europe and Southeast Asia. In oligotrophic waters, this deposition has the potential to significantly impact marine productivity and the global carbon cycle. Global-scale understanding of N deposition to the oceans is reliant on our ability to produce effective models of reactive nitrogen emission, atmospheric chemistry, transport and deposition (including deposition to the land surface). The Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP) recently completed a multi-model analysis of global N deposition, including comparisons to wet deposition observations from three regional networks in North America, Europe and Southeast Asia (Lamarque et al., Atmos. Chem. Phys., 13, 7977-8018, 2013). No similar datasets exist which would allow observation - model comparisons of wet deposition for the open oceans, because long-term wet deposition records are available for only a handful of remote island sites and rain collection over the open ocean itself is very difficult. In this work we attempt instead to use ~2600 observations of aerosol nitrate and ammonium concentrations, acquired chiefly from sampling aboard ships in the period 1995 - 2012, to assess the ACCMIP N deposition fields over the remote ocean. This database is non-uniformly distributed in time and space. We selected four ocean regions (the eastern North Atlantic, the South Atlantic, the northern Indian Ocean and northwest Pacific) where we considered the density and distribution of observational data is sufficient to provide effective comparison to the model ensemble. Two of these regions are adjacent to the land networks used in the ACCMIP comparison, while the others are far removed from land regions for which the model output has been rigorously compared to observational data. Here we will present calculated dry deposition fluxes of nitrate and ammonium from average observed concentrations in these regions, using deposition velocities of 0.9 cm/s and 0.1 cm/s respectively, and the results of a comparison of these fluxes to the ACCMIP model ensemble product. Uncertainties in the comparison and potential sources of bias between the observations and model will be discussed.
Ensemble coding of face identity is present but weaker in congenital prosopagnosia.
Robson, Matthew K; Palermo, Romina; Jeffery, Linda; Neumann, Markus F
2018-03-01
Individuals with congenital prosopagnosia (CP) are impaired at identifying individual faces but do not appear to show impairments in extracting the average identity from a group of faces (known as ensemble coding). However, possible deficits in ensemble coding in a previous study (CPs n = 4) may have been masked because CPs relied on pictorial (image) cues rather than identity cues. Here we asked whether a larger sample of CPs (n = 11) would show intact ensemble coding of identity when availability of image cues was minimised. Participants viewed a "set" of four faces and then judged whether a subsequent individual test face, either an exemplar or a "set average", was in the preceding set. Ensemble coding occurred when matching (vs. mismatching) averages were mistakenly endorsed as set members. We assessed both image- and identity-based ensemble coding, by varying whether test faces were either the same or different images of the identities in the set. CPs showed significant ensemble coding in both tasks, indicating that their performance was independent of image cues. As a group, CPs' ensemble coding was weaker than controls in both tasks, consistent with evidence that perceptual processing of face identity is disrupted in CP. This effect was driven by CPs (n= 3) who, in addition to having impaired face memory, also performed particularly poorly on a measure of face perception (CFPT). Future research, using larger samples, should examine whether deficits in ensemble coding may be restricted to CPs who also have substantial face perception deficits. Copyright © 2018 Elsevier Ltd. All rights reserved.
Advanced ensemble modelling of flexible macromolecules using X-ray solution scattering.
Tria, Giancarlo; Mertens, Haydyn D T; Kachala, Michael; Svergun, Dmitri I
2015-03-01
Dynamic ensembles of macromolecules mediate essential processes in biology. Understanding the mechanisms driving the function and molecular interactions of 'unstructured' and flexible molecules requires alternative approaches to those traditionally employed in structural biology. Small-angle X-ray scattering (SAXS) is an established method for structural characterization of biological macromolecules in solution, and is directly applicable to the study of flexible systems such as intrinsically disordered proteins and multi-domain proteins with unstructured regions. The Ensemble Optimization Method (EOM) [Bernadó et al. (2007 ▶). J. Am. Chem. Soc. 129, 5656-5664] was the first approach introducing the concept of ensemble fitting of the SAXS data from flexible systems. In this approach, a large pool of macromolecules covering the available conformational space is generated and a sub-ensemble of conformers coexisting in solution is selected guided by the fit to the experimental SAXS data. This paper presents a series of new developments and advancements to the method, including significantly enhanced functionality and also quantitative metrics for the characterization of the results. Building on the original concept of ensemble optimization, the algorithms for pool generation have been redesigned to allow for the construction of partially or completely symmetric oligomeric models, and the selection procedure was improved to refine the size of the ensemble. Quantitative measures of the flexibility of the system studied, based on the characteristic integral parameters of the selected ensemble, are introduced. These improvements are implemented in the new EOM version 2.0, and the capabilities as well as inherent limitations of the ensemble approach in SAXS, and of EOM 2.0 in particular, are discussed.
NASA Astrophysics Data System (ADS)
Jones, Emlyn M.; Baird, Mark E.; Mongin, Mathieu; Parslow, John; Skerratt, Jenny; Lovell, Jenny; Margvelashvili, Nugzar; Matear, Richard J.; Wild-Allen, Karen; Robson, Barbara; Rizwi, Farhan; Oke, Peter; King, Edward; Schroeder, Thomas; Steven, Andy; Taylor, John
2016-12-01
Skillful marine biogeochemical (BGC) models are required to understand a range of coastal and global phenomena such as changes in nitrogen and carbon cycles. The refinement of BGC models through the assimilation of variables calculated from observed in-water inherent optical properties (IOPs), such as phytoplankton absorption, is problematic. Empirically derived relationships between IOPs and variables such as chlorophyll-a concentration (Chl a), total suspended solids (TSS) and coloured dissolved organic matter (CDOM) have been shown to have errors that can exceed 100 % of the observed quantity. These errors are greatest in shallow coastal regions, such as the Great Barrier Reef (GBR), due to the additional signal from bottom reflectance. Rather than assimilate quantities calculated using IOP algorithms, this study demonstrates the advantages of assimilating quantities calculated directly from the less error-prone satellite remote-sensing reflectance (RSR). To assimilate the observed RSR, we use an in-water optical model to produce an equivalent simulated RSR and calculate the mismatch between the observed and simulated quantities to constrain the BGC model with a deterministic ensemble Kalman filter (DEnKF). The traditional assumption that simulated surface Chl a is equivalent to the remotely sensed OC3M estimate of Chl a resulted in a forecast error of approximately 75 %. We show this error can be halved by instead using simulated RSR to constrain the model via the assimilation system. When the analysis and forecast fields from the RSR-based assimilation system are compared with the non-assimilating model, a comparison against independent in situ observations of Chl a, TSS and dissolved inorganic nutrients (NO3, NH4 and DIP) showed that errors are reduced by up to 90 %. In all cases, the assimilation system improves the simulation compared to the non-assimilating model. Our approach allows for the incorporation of vast quantities of remote-sensing observations that have in the past been discarded due to shallow water and/or artefacts introduced by terrestrially derived TSS and CDOM or the lack of a calibrated regional IOP algorithm.
Loop-gap microwave resonator for hybrid quantum systems
NASA Astrophysics Data System (ADS)
Ball, Jason R.; Yamashiro, Yu; Sumiya, Hitoshi; Onoda, Shinobu; Ohshima, Takeshi; Isoya, Junichi; Konstantinov, Denis; Kubo, Yuimaru
2018-05-01
We designed a loop-gap microwave resonator for applications of spin-based hybrid quantum systems and tested it with impurity spins in diamond. Strong coupling with ensembles of nitrogen-vacancy (NV) centers and substitutional nitrogen (P1) centers was observed. These results show that loop-gap resonators are viable in the prospect of spin-based hybrid quantum systems, especially for an ensemble quantum memory or a quantum transducer.
Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion
2013-03-01
the decisions made by each individual modality. Fusion of features is the simple concatenation of feature vectors from multiple modalities to be...of Features BayesNet MDL 330 LibSVM PCA 80 J48 Wrapper Evaluator 11 3.5.3 Ensemble Based Decision Level Fusion. In ensemble learning multiple ...The high fusion percentages validate our hypothesis that by combining features from multiple modalities, classification accuracy can be improved. As
Negative correlation learning for customer churn prediction: a comparison study.
Rodan, Ali; Fayyoumi, Ayham; Faris, Hossam; Alsakran, Jamal; Al-Kadi, Omar
2015-01-01
Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis.
The Canadian seasonal forecast and the APCC exchange.
NASA Astrophysics Data System (ADS)
Archambault, B.; Fontecilla, J.; Kharin, V.; Bourgouin, P.; Ashok, K.; Lee, D.
2009-05-01
In this talk, we will first describe the Canadian seasonal forecast system. This system uses a 4 model ensemble approach with each of these models generating a 10 members ensemble. Multi-model issues related to this system will be describes. Secondly, we will describe an international multi-system initiative. The Asia-Pacific Economic Cooperation (APEC) is a forum for 21 Pacific Rim countries or regions including Canada. The APEC Climate Center (APCC) provides seasonal forecasts to their regional climate centers with a Multi Model Ensemble (MME) approach. The APCC MME is based on 13 ensemble prediction systems from different institutions including MSC(Canada), NCEP(USA), COLA(USA), KMA(Korea), JMA(Japan), BOM(Australia) and others. In this presentation, we will describe the basics of this international cooperation.
NASA Astrophysics Data System (ADS)
Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping
2017-11-01
Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For peak values taking flood forecasts from each individual member into account is more appropriate.
NASA Astrophysics Data System (ADS)
Brown, James; Seo, Dong-Jun
2010-05-01
Operational forecasts of hydrometeorological and hydrologic variables often contain large uncertainties, for which ensemble techniques are increasingly used. However, the utility of ensemble forecasts depends on the unbiasedness of the forecast probabilities. We describe a technique for quantifying and removing biases from ensemble forecasts of hydrometeorological and hydrologic variables, intended for use in operational forecasting. The technique makes no a priori assumptions about the distributional form of the variables, which is often unknown or difficult to model parametrically. The aim is to estimate the conditional cumulative distribution function (ccdf) of the observed variable given a (possibly biased) real-time ensemble forecast from one or several forecasting systems (multi-model ensembles). The technique is based on Bayesian optimal linear estimation of indicator variables, and is analogous to indicator cokriging (ICK) in geostatistics. By developing linear estimators for the conditional expectation of the observed variable at many thresholds, ICK provides a discrete approximation of the full ccdf. Since ICK minimizes the conditional error variance of the indicator expectation at each threshold, it effectively minimizes the Continuous Ranked Probability Score (CRPS) when infinitely many thresholds are employed. However, the ensemble members used as predictors in ICK, and other bias-correction techniques, are often highly cross-correlated, both within and between models. Thus, we propose an orthogonal transform of the predictors used in ICK, which is analogous to using their principal components in the linear system of equations. This leads to a well-posed problem in which a minimum number of predictors are used to provide maximum information content in terms of the total variance explained. The technique is used to bias-correct precipitation ensemble forecasts from the NCEP Global Ensemble Forecast System (GEFS), for which independent validation results are presented. Extension to multimodel ensembles from the NCEP GFS and Short Range Ensemble Forecast (SREF) systems is also proposed.
Predicting protein function and other biomedical characteristics with heterogeneous ensembles
Whalen, Sean; Pandey, Om Prakash
2015-01-01
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink. PMID:26342255
pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins.
Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A Keith; Felli, Isabella C; Forman-Kay, Julie D; Kriwacki, Richard W; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I; Uversky, Vladimir N; Vendruscolo, Michele; Wishart, David; Wright, Peter E; Tompa, Peter
2014-01-01
The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
NASA Astrophysics Data System (ADS)
Silvestro, Paolo Cosmo; Yang, Hao; Jin, X. L.; Yang, Guijun; Casa, Raffaele; Pignatti, Stefano
2016-08-01
The ultimate aim of this work is to develop methods for the assimilation of the biophysical variables estimated by remote sensing in a suitable crop growth model. Two strategies were followed, one based on the use of Leaf Area Index (LAI) estimated by optical data, and the other based on the use of biomass estimated by SAR. The first one estimates LAI from the reflectance measured by the optical sensors on board of HJ1A, HJ1B and Landsat, using a method based on the training of artificial neural networks (ANN) with PROSAIL model simulations. The retrieved LAI is used to improve wheat yield estimation, using assimilation methods based on the Ensemble Kalman Filter, which assimilate the biophysical variables into growth crop model. The second strategy estimates biomass from SAR imagery. Polarimetric decomposition methods were used based on multi-temporal fully polarimetric Radarsat-2 data during the entire growing season. The estimated biomass was assimilating to FAO Aqua crop model for improving the winter wheat yield estimation, with the Particle Swarm Optimization (PSO) method. These procedures were used in a spatial application with data collected in the rural area of Yangling (Shaanxi Province) in 2014 and were validated for a number of wheat fields for which ground yield data had been recorded and according to statistical yield data for the area.
Uncertainty Analysis in Large Area Aboveground Biomass Mapping
NASA Astrophysics Data System (ADS)
Baccini, A.; Carvalho, L.; Dubayah, R.; Goetz, S. J.; Friedl, M. A.
2011-12-01
Satellite and aircraft-based remote sensing observations are being more frequently used to generate spatially explicit estimates of aboveground carbon stock of forest ecosystems. Because deforestation and forest degradation account for circa 10% of anthropogenic carbon emissions to the atmosphere, policy mechanisms are increasingly recognized as a low-cost mitigation option to reduce carbon emission. They are, however, contingent upon the capacity to accurately measures carbon stored in the forests. Here we examine the sources of uncertainty and error propagation in generating maps of aboveground biomass. We focus on characterizing uncertainties associated with maps at the pixel and spatially aggregated national scales. We pursue three strategies to describe the error and uncertainty properties of aboveground biomass maps, including: (1) model-based assessment using confidence intervals derived from linear regression methods; (2) data-mining algorithms such as regression trees and ensembles of these; (3) empirical assessments using independently collected data sets.. The latter effort explores error propagation using field data acquired within satellite-based lidar (GLAS) acquisitions versus alternative in situ methods that rely upon field measurements that have not been systematically collected for this purpose (e.g. from forest inventory data sets). A key goal of our effort is to provide multi-level characterizations that provide both pixel and biome-level estimates of uncertainties at different scales.
NASA Astrophysics Data System (ADS)
Wu, Kai; Shu, Hong; Nie, Lei; Jiao, Zhenhang
2018-01-01
Spatially correlated errors are typically ignored in data assimilation, thus degenerating the observation error covariance R to a diagonal matrix. We argue that a nondiagonal R carries more observation information making assimilation results more accurate. A method, denoted TC_Cov, was proposed for soil moisture data assimilation to estimate spatially correlated observation error covariance based on triple collocation (TC). Assimilation experiments were carried out to test the performance of TC_Cov. AMSR-E soil moisture was assimilated with a diagonal R matrix computed using the TC and assimilated using a nondiagonal R matrix, as estimated by proposed TC_Cov. The ensemble Kalman filter was considered as the assimilation method. Our assimilation results were validated against climate change initiative data and ground-based soil moisture measurements using the Pearson correlation coefficient and unbiased root mean square difference metrics. These experiments confirmed that deterioration of diagonal R assimilation results occurred when model simulation is more accurate than observation data. Furthermore, nondiagonal R achieved higher correlation coefficient and lower ubRMSD values over diagonal R in experiments and demonstrated the effectiveness of TC_Cov to estimate richly structuralized R in data assimilation. In sum, compared with diagonal R, nondiagonal R may relieve the detrimental effects of assimilation when simulated model results outperform observation data.
Molecular dynamics of liquid crystals
NASA Astrophysics Data System (ADS)
Sarman, Sten
1997-02-01
We derive Green-Kubo relations for the viscosities of a nematic liquid crystal. The derivation is based on the application of a Gaussian constraint algorithm that makes the director angular velocity of a liquid crystal a constant of motion. Setting this velocity equal to zero means that a director-based coordinate system becomes an inertial frame and that the constraint torques do not do any work on the system. The system consequently remains in equilibrium. However, one generates a different equilibrium ensemble. The great advantage of this ensemble is that the Green-Kubo relations for the viscosities become linear combinations of time correlation function integrals, whereas they are complicated rational functions in the conventional canonical ensemble. This facilitates the numerical evaluation of the viscosities by molecular dynamics simulations.
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
Enhancing Flood Prediction Reliability Using Bayesian Model Averaging
NASA Astrophysics Data System (ADS)
Liu, Z.; Merwade, V.
2017-12-01
Uncertainty analysis is an indispensable part of modeling the hydrology and hydrodynamics of non-idealized environmental systems. Compared to reliance on prediction from one model simulation, using on ensemble of predictions that consider uncertainty from different sources is more reliable. In this study, Bayesian model averaging (BMA) is applied to Black River watershed in Arkansas and Missouri by combining multi-model simulations to get reliable deterministic water stage and probabilistic inundation extent predictions. The simulation ensemble is generated from 81 LISFLOOD-FP subgrid model configurations that include uncertainty from channel shape, channel width, channel roughness and discharge. Model simulation outputs are trained with observed water stage data during one flood event, and BMA prediction ability is validated for another flood event. Results from this study indicate that BMA does not always outperform all members in the ensemble, but it provides relatively robust deterministic flood stage predictions across the basin. Station based BMA (BMA_S) water stage prediction has better performance than global based BMA (BMA_G) prediction which is superior to the ensemble mean prediction. Additionally, high-frequency flood inundation extent (probability greater than 60%) in BMA_G probabilistic map is more accurate than the probabilistic flood inundation extent based on equal weights.
NASA Astrophysics Data System (ADS)
Ouyang, Qi; Lu, Wenxi; Lin, Jin; Deng, Wenbing; Cheng, Weiguo
2017-08-01
The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Azami, Hamed; Escudero, Javier
2015-08-01
Breast cancer is one of the most common types of cancer in women all over the world. Early diagnosis of this kind of cancer can significantly increase the chances of long-term survival. Since diagnosis of breast cancer is a complex problem, neural network (NN) approaches have been used as a promising solution. Considering the low speed of the back-propagation (BP) algorithm to train a feed-forward NN, we consider a number of improved NN trainings for the Wisconsin breast cancer dataset: BP with momentum, BP with adaptive learning rate, BP with adaptive learning rate and momentum, Polak-Ribikre conjugate gradient algorithm (CGA), Fletcher-Reeves CGA, Powell-Beale CGA, scaled CGA, resilient BP (RBP), one-step secant and quasi-Newton methods. An NN ensemble, which is a learning paradigm to combine a number of NN outputs, is used to improve the accuracy of the classification task. Results demonstrate that NN ensemble-based classification methods have better performance than NN-based algorithms. The highest overall average accuracy is 97.68% obtained by NN ensemble trained by RBP for 50%-50% training-test evaluation method.
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
Kaji, Takahiro; Ito, Syoji; Iwai, Shigenori; Miyasaka, Hiroshi
2009-10-22
Single-molecule and ensemble time-resolved fluorescence measurements were applied for the investigation of the conformational dynamics of single-stranded DNA, ssDNA, connected with a fluorescein dye by a C6 linker, where the motions both of DNA and the C6 linker affect the geometry of the system. From the ensemble measurement of the fluorescence quenching via photoinduced electron transfer with a guanine base in the DNA sequence, three main conformations were found in aqueous solution: a conformation unaffected by the guanine base in the excited state lifetime of fluorescein, a conformation in which the fluorescence is dynamically quenched in the excited-state lifetime, and a conformation leading to rapid quenching via nonfluorescent complex. The analysis by using the parameters acquired from the ensemble measurements for interphoton time distribution histograms and FCS autocorrelations by the single-molecule measurement revealed that interconversion in these three conformations took place with two characteristic time constants of several hundreds of nanoseconds and tens of microseconds. The advantage of the combination use of the ensemble measurements with the single-molecule detections for rather complex dynamic motions is discussed by integrating the experimental results with those obtained by molecular dynamics simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shiogama, Hideo; Imada, Yukiko; Mori, Masato
Here, we describe two unprecedented large (100-member), longterm (61-year) ensembles based on MRI-AGCM3.2, which were driven by historical and non-warming climate forcing. These ensembles comprise the "Database for Policy Decision making for Future climate change (d4PDF)". We compare these ensembles to large ensembles based on another climate model, as well as to observed data, to investigate the influence of anthropogenic activities on historical changes in the numbers of record-breaking events, including: the annual coldest daily minimum temperature (TNn), the annual warmest daily maximum temperature (TXx) and the annual most intense daily precipitation event (Rx1day). These two climate model ensembles indicatemore » that human activity has already had statistically significant impacts on the number of record-breaking extreme events worldwide mainly in the Northern Hemisphere land. Specifically, human activities have altered the likelihood that a wider area globally would suffer record-breaking TNn, TXx and Rx1day events than that observed over the 2001- 2010 period by a factor of at least 0.6, 5.4 and 1.3, respectively. However, we also find that the estimated spatial patterns and amplitudes of anthropogenic impacts on the probabilities of record-breaking events are sensitive to the climate model and/or natural-world boundary conditions used in the attribution studies.« less
Scenario approach for the seasonal forecast of Kharif flows from the Upper Indus Basin
NASA Astrophysics Data System (ADS)
Fraz Ismail, Muhammad; Bogacki, Wolfgang
2018-02-01
Snow and glacial melt runoff are the major sources of water contribution from the high mountainous terrain of the Indus River upstream of the Tarbela reservoir. A reliable forecast of seasonal water availability for the Kharif cropping season (April-September) can pave the way towards better water management and a subsequent boost in the agro-economy of Pakistan. The use of degree-day models in conjunction with satellite-based remote-sensing data for the forecasting of seasonal snow and ice melt runoff has proved to be a suitable approach for data-scarce regions. In the present research, the Snowmelt Runoff Model (SRM) has not only been enhanced by incorporating the glacier (G)
component but also applied for the forecast of seasonal water availability from the Upper Indus Basin (UIB). Excel-based SRM+G takes account of separate degree-day factors for snow and glacier melt processes. All-year simulation runs with SRM+G for the period 2003-2014 result in an average flow component distribution of 53, 21, and 26 % for snow, glacier, and rain, respectively. The UIB has been divided into Upper and Lower parts because of the different climatic conditions in the Tibetan Plateau. The scenario approach for seasonal forecasting, which like the Ensemble Streamflow Prediction method uses historic meteorology as model forcings, has proven to be adequate for long-term water availability forecasts. The accuracy of the forecast with a mean absolute percentage error (MAPE) of 9.5 % could be slightly improved compared to two existing operational forecasts for the UIB, and the bias could be reduced to -2.0 %. However, the association between forecasts and observations as well as the skill in predicting extreme conditions is rather weak for all three models, which motivates further research on the selection of a subset of ensemble members according to forecasted seasonal anomalies.
A multi-source data assimilation framework for flood forecasting: Accounting for runoff routing lags
NASA Astrophysics Data System (ADS)
Meng, S.; Xie, X.
2015-12-01
In the flood forecasting practice, model performance is usually degraded due to various sources of uncertainties, including the uncertainties from input data, model parameters, model structures and output observations. Data assimilation is a useful methodology to reduce uncertainties in flood forecasting. For the short-term flood forecasting, an accurate estimation of initial soil moisture condition will improve the forecasting performance. Considering the time delay of runoff routing is another important effect for the forecasting performance. Moreover, the observation data of hydrological variables (including ground observations and satellite observations) are becoming easily available. The reliability of the short-term flood forecasting could be improved by assimilating multi-source data. The objective of this study is to develop a multi-source data assimilation framework for real-time flood forecasting. In this data assimilation framework, the first step is assimilating the up-layer soil moisture observations to update model state and generated runoff based on the ensemble Kalman filter (EnKF) method, and the second step is assimilating discharge observations to update model state and runoff within a fixed time window based on the ensemble Kalman smoother (EnKS) method. This smoothing technique is adopted to account for the runoff routing lag. Using such assimilation framework of the soil moisture and discharge observations is expected to improve the flood forecasting. In order to distinguish the effectiveness of this dual-step assimilation framework, we designed a dual-EnKF algorithm in which the observed soil moisture and discharge are assimilated separately without accounting for the runoff routing lag. The results show that the multi-source data assimilation framework can effectively improve flood forecasting, especially when the runoff routing has a distinct time lag. Thus, this new data assimilation framework holds a great potential in operational flood forecasting by merging observations from ground measurement and remote sensing retrivals.
Deep learning ensemble with asymptotic techniques for oscillometric blood pressure estimation.
Lee, Soojeong; Chang, Joon-Hyuk
2017-11-01
This paper proposes a deep learning based ensemble regression estimator with asymptotic techniques, and offers a method that can decrease uncertainty for oscillometric blood pressure (BP) measurements using the bootstrap and Monte-Carlo approach. While the former is used to estimate SBP and DBP, the latter attempts to determine confidence intervals (CIs) for SBP and DBP based on oscillometric BP measurements. This work originally employs deep belief networks (DBN)-deep neural networks (DNN) to effectively estimate BPs based on oscillometric measurements. However, there are some inherent problems with these methods. First, it is not easy to determine the best DBN-DNN estimator, and worthy information might be omitted when selecting one DBN-DNN estimator and discarding the others. Additionally, our input feature vectors, obtained from only five measurements per subject, represent a very small sample size; this is a critical weakness when using the DBN-DNN technique and can cause overfitting or underfitting, depending on the structure of the algorithm. To address these problems, an ensemble with an asymptotic approach (based on combining the bootstrap with the DBN-DNN technique) is utilized to generate the pseudo features needed to estimate the SBP and DBP. In the first stage, the bootstrap-aggregation technique is used to create ensemble parameters. Afterward, the AdaBoost approach is employed for the second-stage SBP and DBP estimation. We then use the bootstrap and Monte-Carlo techniques in order to determine the CIs based on the target BP estimated using the DBN-DNN ensemble regression estimator with the asymptotic technique in the third stage. The proposed method can mitigate the estimation uncertainty such as large the standard deviation of error (SDE) on comparing the proposed DBN-DNN ensemble regression estimator with the DBN-DNN single regression estimator, we identify that the SDEs of the SBP and DBP are reduced by 0.58 and 0.57 mmHg, respectively. These indicate that the proposed method actually enhances the performance by 9.18% and 10.88% compared with the DBN-DNN single estimator. The proposed methodology improves the accuracy of BP estimation and reduces the uncertainty for BP estimation. Copyright © 2017 Elsevier B.V. All rights reserved.
Diurnal Ensemble Surface Meteorology Statistics
Excel file containing diurnal ensemble statistics of 2-m temperature, 2-m mixing ratio and 10-m wind speed. This Excel file contains figures for Figure 2 in the paper and worksheets containing all statistics for the 14 members of the ensemble and a base simulation.This dataset is associated with the following publication:Gilliam , R., C. Hogrefe , J. Godowitch, S. Napelenok , R. Mathur , and S.T. Rao. Impact of inherent meteorology uncertainty on air quality model predictions. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES. American Geophysical Union, Washington, DC, USA, 120(23): 12,259–12,280, (2015).
Mode locking of electron spin coherences in singly charged quantum dots.
Greilich, A; Yakovlev, D R; Shabaev, A; Efros, Al L; Yugova, I A; Oulton, R; Stavarache, V; Reuter, D; Wieck, A; Bayer, M
2006-07-21
The fast dephasing of electron spins in an ensemble of quantum dots is detrimental for applications in quantum information processing. We show here that dephasing can be overcome by using a periodic train of light pulses to synchronize the phases of the precessing spins, and we demonstrate this effect in an ensemble of singly charged (In,Ga)As/GaAs quantum dots. This mode locking leads to constructive interference of contributions to Faraday rotation and presents potential applications based on robust quantum coherence within an ensemble of dots.
Ensemble inequivalence and Maxwell construction in the self-gravitating ring model
NASA Astrophysics Data System (ADS)
Rocha Filho, T. M.; Silvestre, C. H.; Amato, M. A.
2018-06-01
The statement that Gibbs equilibrium ensembles are equivalent is a base line in many approaches in the context of equilibrium statistical mechanics. However, as a known fact, for some physical systems this equivalence may not be true. In this paper we illustrate from first principles the inequivalence between the canonical and microcanonical ensembles for a system with long range interactions. We make use of molecular dynamics simulations and Monte Carlo simulations to explore the thermodynamics properties of the self-gravitating ring model and discuss on what conditions the Maxwell construction is applicable.
USDA-ARS?s Scientific Manuscript database
Ensembles of process-based crop models are now commonly used to simulate crop growth and development for climate scenarios of temperature and/or precipitation changes corresponding to different projections of atmospheric CO2 concentrations. This approach generates large datasets with thousands of de...
Behavior of Filters and Smoothers for Strongly Nonlinear Dynamics
NASA Technical Reports Server (NTRS)
Zhu, Yanqui; Cohn, Stephen E.; Todling, Ricardo
1999-01-01
The Kalman filter is the optimal filter in the presence of known gaussian error statistics and linear dynamics. Filter extension to nonlinear dynamics is non trivial in the sense of appropriately representing high order moments of the statistics. Monte Carlo, ensemble-based, methods have been advocated as the methodology for representing high order moments without any questionable closure assumptions. Investigation along these lines has been conducted for highly idealized dynamics such as the strongly nonlinear Lorenz model as well as more realistic models of the means and atmosphere. A few relevant issues in this context are related to the necessary number of ensemble members to properly represent the error statistics and, the necessary modifications in the usual filter situations to allow for correct update of the ensemble members. The ensemble technique has also been applied to the problem of smoothing for which similar questions apply. Ensemble smoother examples, however, seem to be quite puzzling in that results state estimates are worse than for their filter analogue. In this study, we use concepts in probability theory to revisit the ensemble methodology for filtering and smoothing in data assimilation. We use the Lorenz model to test and compare the behavior of a variety of implementations of ensemble filters. We also implement ensemble smoothers that are able to perform better than their filter counterparts. A discussion of feasibility of these techniques to large data assimilation problems will be given at the time of the conference.
2012-01-01
Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969
Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T
2012-12-08
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Snow multivariable data assimilation for hydrological predictions in mountain areas
NASA Astrophysics Data System (ADS)
Piazzi, Gaia; Campo, Lorenzo; Gabellani, Simone; Rudari, Roberto; Castelli, Fabio; Cremonese, Edoardo; Morra di Cella, Umberto; Stevenin, Hervé; Ratto, Sara Maria
2016-04-01
The seasonal presence of snow on alpine catchments strongly impacts both surface energy balance and water resource. Thus, the knowledge of the snowpack dynamics is of critical importance for several applications, such as water resource management, floods prediction and hydroelectric power production. Several independent data sources provide information about snowpack state: ground-based measurements, satellite data and physical models. Although all these data types are reliable, each of them is affected by specific flaws and errors (respectively dependency on local conditions, sensor biases and limitations, initialization and poor quality forcing data). Moreover, there are physical factors that make an exhaustive reconstruction of snow dynamics complicated: snow intermittence in space and time, stratification and slow phenomena like metamorphism processes, uncertainty in snowfall evaluation, wind transportation, etc. Data Assimilation (DA) techniques provide an objective methodology to combine observational and modeled information to obtain the most likely estimate of snowpack state. Indeed, by combining all the available sources of information, the implementation of DA schemes can quantify and reduce the uncertainties of the estimations. This study presents SMASH (Snow Multidata Assimilation System for Hydrology), a multi-layer snow dynamic model, strengthened by a robust multivariable data assimilation algorithm. The model is physically based on mass and energy balances and can be used to reproduce the main physical processes occurring within the snowpack: accumulation, density dynamics, melting, sublimation, radiative balance, heat and mass exchanges. The model is driven by observed forcing meteorological data (air temperature, wind velocity, relative air humidity, precipitation and incident solar radiation) to provide a complete estimate of snowpack state. The implementation of an Ensemble Kalman Filter (EnKF) scheme enables to assimilate simultaneously ground-based and remotely sensed data of different snow-related variables (snow albedo and surface temperature, Snow Water Equivalent from passive microwave sensors and Snow Cover Area). SMASH performance was evaluated in the period June 2012 - December 2013 at the meteorological station of Torgnon (Tellinod, 2 160 msl), located in Aosta Valley, a mountain region in northwestern Italy. The EnKF algorithm was firstly tested by assimilating several ground-based measurements: snow depth, land surface temperature, snow density and albedo. The assimilation of snow observed data revealed an overall considerable enhancement of model predictions with respect to the open loop experiments. A first attempt to integrate also remote sensed information was performed by assimilating the Land Surface Temperature (LST) from METEOSAT Second Generation (MSG), leading to good results. The analysis allowed identifying the snow depth and the snowpack surface temperature as the most impacting variables in the assimilation process. In order to pinpoint an optimal number of ensemble instances, SMASH performances were also quantitatively evaluated by varying the instances amount. Furthermore, the impact of the data assimilation frequency was analyzed by varying the assimilation time step (3h, 6h, 12h, 24h).
Forecasting European Droughts using the North American Multi-Model Ensemble (NMME)
NASA Astrophysics Data System (ADS)
Thober, Stephan; Kumar, Rohini; Samaniego, Luis; Sheffield, Justin; Schäfer, David; Mai, Juliane
2015-04-01
Soil moisture droughts have the potential to diminish crop yields causing economic damage or even threatening the livelihood of societies. State-of-the-art drought forecasting systems incorporate seasonal meteorological forecasts to estimate future drought conditions. Meteorological forecasting skill (in particular that of precipitation), however, is limited to a few weeks because of the chaotic behaviour of the atmosphere. One of the most important challenges in drought forecasting is to understand how the uncertainty in the atmospheric forcings (e.g., precipitation and temperature) is further propagated into hydrologic variables such as soil moisture. The North American Multi-Model Ensemble (NMME) provides the latest collection of a multi-institutional seasonal forecasting ensemble for precipitation and temperature. In this study, we analyse the skill of NMME forecasts for predicting European drought events. The monthly NMME forecasts are downscaled to daily values to force the mesoscale hydrological model (mHM). The mHM soil moisture forecasts obtained with the forcings of the dynamical models are then compared against those obtained with the Ensemble Streamflow Prediction (ESP) approach. ESP recombines historical meteorological forcings to create a new ensemble forecast. Both forecasts are compared against reference soil moisture conditions obtained using observation based meteorological forcings. The study is conducted for the period from 1982 to 2009 and covers a large part of the Pan-European domain (10°W to 40°E and 35°N to 55°N). Results indicate that NMME forecasts are better at predicting the reference soil moisture variability as compared to ESP. For example, NMME explains 50% of the variability in contrast to only 31% by ESP at a six-month lead time. The Equitable Threat Skill Score (ETS), which combines the hit and false alarm rates, is analysed for drought events using a 0.2 threshold of a soil moisture percentile index. On average, the NMME based ensemble forecasts have consistently higher skill than the ESP based ones (ETS of 13% as compared to 5% at a six-month lead time). Additionally, the ETS ensemble spread of NMME forecasts is considerably narrower than that of ESP; the lower boundary of the NMME ensemble spread coincides most of the time with the ensemble median of ESP. Among the NMME models, NCEP-CFSv2 outperforms the other models in terms of ETS most of the time. Removing the three worst performing models does not deteriorate the ensemble performance (neither in skill nor in spread), but would substantially reduce the computational resources required in an operational forecasting system. For major European drought events (e.g., 1990, 1992, 2003, and 2007), NMME forecasts tend to underestimate area under drought and drought magnitude during times of drought development. During drought recovery, this underestimation is weaker for area under drought or even reversed into an overestimation for drought magnitude. This indicates that the NMME models are too wet during drought development and too dry during drought recovery. In summary, soil moisture drought forecasts by NMME are more skillful than those of an ESP based approach. However, they still show systematic biases in reproducing the observed drought dynamics during drought development and recovery.
NASA Astrophysics Data System (ADS)
Neal, J. C.; Wood, M.; Bermúdez, M.; Hostache, R.; Freer, J. E.; Bates, P. D.; Coxon, G.
2017-12-01
Remote sensing of flood inundation extent has long been a potential source of data for constraining and correcting simulations of floodplain inundation. Hydrodynamic models and the computing resources to run them have developed to the extent that simulation of flood inundation in two-dimensional space is now feasible over large river basins in near real-time. However, despite substantial evidence that there is useful information content within inundation extent data, even from low resolution SAR such as that gathered by Envisat ASAR in wide swath mode, making use of the information in a data assimilation system has proved difficult. He we review recent applications of the Ensemble Kalman Filter (EnKF) and Particle Filter for assimilating SAR data, with a focus on the River Severn UK and compare these with complementary research that has looked at the internal error sources and boundary condition errors using detailed terrestrial data that is not available in most locations. Previous applications of the EnKF to this reach have focused on upstream boundary conditions as the source of flow error, however this description of errors was too simplistic for the simulation of summer flood events where localised intense rainfall can be substantial. Therefore, we evaluate the introduction of uncertain lateral inflows to the ensemble. A further limitation of the existing EnKF based methods is the need to convert flood extent to water surface elevations by intersecting the shoreline location with a high quality digital elevation model (e.g. LiDAR). To simplify this data processing step, we evaluate a method to directly assimilate inundation extent as a EnKF model state rather than assimilating water heights, potentially allowing the scheme to be used where high-quality terrain data are sparse.
Prediction of drug synergy in cancer using ensemble-based machine learning techniques
NASA Astrophysics Data System (ADS)
Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder
2018-04-01
Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.
Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho
2017-01-01
Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Instances selection algorithm by ensemble margin
NASA Astrophysics Data System (ADS)
Saidi, Meryem; Bechar, Mohammed El Amine; Settouti, Nesma; Chikh, Mohamed Amine
2018-05-01
The main limit of data mining algorithms is their inability to deal with the huge amount of available data in a reasonable processing time. A solution of producing fast and accurate results is instances and features selection. This process eliminates noisy or redundant data in order to reduce the storage and computational cost without performances degradation. In this paper, a new instance selection approach called Ensemble Margin Instance Selection (EMIS) algorithm is proposed. This approach is based on the ensemble margin. To evaluate our approach, we have conducted several experiments on different real-world classification problems from UCI Machine learning repository. The pixel-based image segmentation is a field where the storage requirement and computational cost of applied model become higher. To solve these limitations we conduct a study based on the application of EMIS and other instance selection techniques for the segmentation and automatic recognition of white blood cells WBC (nucleus and cytoplasm) in cytological images.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2016-12-02
In the postgenomic era, the number of unreviewed protein sequences is remarkably larger and grows tremendously faster than that of reviewed ones. However, existing methods for protein subchloroplast localization often ignore the information from these unlabeled proteins. This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), namely, LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins. Experimental results on a stringent benchmark dataset and a novel independent dataset suggest that LNP-Chlo performs at least 6% (absolute) better than state-of-the-art predictors. This paper also demonstrates that ensemble LNP significantly outperforms LNP based on individual features. For readers' convenience, the online Web server LNP-Chlo is freely available at http://bioinfo.eie.polyu.edu.hk/LNPChloServer/ .
NASA Astrophysics Data System (ADS)
Simon, E.; Bertino, L.; Samuelsen, A.
2011-12-01
Combined state-parameter estimation in ocean biogeochemical models with ensemble-based Kalman filters is a challenging task due to the non-linearity of the models, the constraints of positiveness that apply to the variables and parameters, and the non-Gaussian distribution of the variables in which they result. Furthermore, these models are sensitive to numerous parameters that are poorly known. Previous works [1] demonstrated that the Gaussian anamorphosis extensions of ensemble-based Kalman filters were relevant tools to perform combined state-parameter estimation in such non-Gaussian framework. In this study, we focus on the estimation of the grazing preferences parameters of zooplankton species. These parameters are introduced to model the diet of zooplankton species among phytoplankton species and detritus. They are positive values and their sum is equal to one. Because the sum-to-one constraint cannot be handled by ensemble-based Kalman filters, a reformulation of the parameterization is proposed. We investigate two types of changes of variables for the estimation of sum-to-one constrained parameters. The first one is based on Gelman [2] and leads to the estimation of normal distributed parameters. The second one is based on the representation of the unit sphere in spherical coordinates and leads to the estimation of parameters with bounded distributions (triangular or uniform). These formulations are illustrated and discussed in the framework of twin experiments realized in the 1D coupled model GOTM-NORWECOM with Gaussian anamorphosis extensions of the deterministic ensemble Kalman filter (DEnKF). [1] Simon E., Bertino L. : Gaussian anamorphosis extension of the DEnKF for combined state and parameter estimation : application to a 1D ocean ecosystem model. Journal of Marine Systems, 2011. doi :10.1016/j.jmarsys.2011.07.007 [2] Gelman A. : Method of Moments Using Monte Carlo Simulation. Journal of Computational and Graphical Statistics, 4, 1, 36-54, 1995.
Calibration of limited-area ensemble precipitation forecasts for hydrological predictions
NASA Astrophysics Data System (ADS)
Diomede, Tommaso; Marsigli, Chiara; Montani, Andrea; Nerozzi, Fabrizio; Paccagnella, Tiziana
2015-04-01
The main objective of this study is to investigate the impact of calibration for limited-area ensemble precipitation forecasts, to be used for driving discharge predictions up to 5 days in advance. A reforecast dataset, which spans 30 years, based on the Consortium for Small Scale Modeling Limited-Area Ensemble Prediction System (COSMO-LEPS) was used for testing the calibration strategy. Three calibration techniques were applied: quantile-to-quantile mapping, linear regression, and analogs. The performance of these methodologies was evaluated in terms of statistical scores for the precipitation forecasts operationally provided by COSMO-LEPS in the years 2003-2007 over Germany, Switzerland, and the Emilia-Romagna region (northern Italy). The analog-based method seemed to be preferred because of its capability of correct position errors and spread deficiencies. A suitable spatial domain for the analog search can help to handle model spatial errors as systematic errors. However, the performance of the analog-based method may degrade in cases where a limited training dataset is available. A sensitivity test on the length of the training dataset over which to perform the analog search has been performed. The quantile-to-quantile mapping and linear regression methods were less effective, mainly because the forecast-analysis relation was not so strong for the available training dataset. A comparison between the calibration based on the deterministic reforecast and the calibration based on the full operational ensemble used as training dataset has been considered, with the aim to evaluate whether reforecasts are really worthy for calibration, given that their computational cost is remarkable. The verification of the calibration process was then performed by coupling ensemble precipitation forecasts with a distributed rainfall-runoff model. This test was carried out for a medium-sized catchment located in Emilia-Romagna, showing a beneficial impact of the analog-based method on the reduction of missed events for discharge predictions.
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timofeev, Andrey V.; Egorov, Dmitry V.
This paper presents new results concerning selection of an optimal information fusion formula for an ensemble of Lipschitz classifiers. The goal of information fusion is to create an integral classificatory which could provide better generalization ability of the ensemble while achieving a practically acceptable level of effectiveness. The problem of information fusion is very relevant for data processing in multi-channel C-OTDR-monitoring systems. In this case we have to effectively classify targeted events which appear in the vicinity of the monitored object. Solution of this problem is based on usage of an ensemble of Lipschitz classifiers each of which corresponds tomore » a respective channel. We suggest a brand new method for information fusion in case of ensemble of Lipschitz classifiers. This method is called “The Weighing of Inversely as Lipschitz Constants” (WILC). Results of WILC-method practical usage in multichannel C-OTDR monitoring systems are presented.« less
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
Current and Future Decadal Trends in the Oceanic Carbon Uptake Are Dominated by Internal Variability
NASA Astrophysics Data System (ADS)
Li, Hongmei; Ilyina, Tatiana
2018-01-01
We investigate the internal decadal variability of the ocean carbon uptake using 100 ensemble simulations based on the Max Planck Institute Earth system model (MPI-ESM). We find that on decadal time scales, internal variability (ensemble spread) is as large as the forced temporal variability (ensemble mean), and the largest internal variability is found in major carbon sink regions, that is, the 50-65°S band of the Southern Ocean, the North Pacific, and the North Atlantic. The MPI-ESM ensemble produces both positive and negative 10 year trends in the ocean carbon uptake in agreement with observational estimates. Negative decadal trends are projected to occur in the future under RCP4.5 scenario. Due to the large internal variability, the Southern Ocean and the North Pacific require the most ensemble members (more than 53 and 46, respectively) to reproduce the forced decadal trends. This number increases up to 79 in future decades as CO2 emission trajectory changes.
A Single-column Model Ensemble Approach Applied to the TWP-ICE Experiment
NASA Technical Reports Server (NTRS)
Davies, L.; Jakob, C.; Cheung, K.; DelGenio, A.; Hill, A.; Hume, T.; Keane, R. J.; Komori, T.; Larson, V. E.; Lin, Y.;
2013-01-01
Single-column models (SCM) are useful test beds for investigating the parameterization schemes of numerical weather prediction and climate models. The usefulness of SCM simulations are limited, however, by the accuracy of the best estimate large-scale observations prescribed. Errors estimating the observations will result in uncertainty in modeled simulations. One method to address the modeled uncertainty is to simulate an ensemble where the ensemble members span observational uncertainty. This study first derives an ensemble of large-scale data for the Tropical Warm Pool International Cloud Experiment (TWP-ICE) based on an estimate of a possible source of error in the best estimate product. These data are then used to carry out simulations with 11 SCM and two cloud-resolving models (CRM). Best estimate simulations are also performed. All models show that moisture-related variables are close to observations and there are limited differences between the best estimate and ensemble mean values. The models, however, show different sensitivities to changes in the forcing particularly when weakly forced. The ensemble simulations highlight important differences in the surface evaporation term of the moisture budget between the SCM and CRM. Differences are also apparent between the models in the ensemble mean vertical structure of cloud variables, while for each model, cloud properties are relatively insensitive to forcing. The ensemble is further used to investigate cloud variables and precipitation and identifies differences between CRM and SCM particularly for relationships involving ice. This study highlights the additional analysis that can be performed using ensemble simulations and hence enables a more complete model investigation compared to using the more traditional single best estimate simulation only.
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
Salmon, Loïc; Giambaşu, George M; Nikolova, Evgenia N; Petzold, Katja; Bhattacharya, Akash; Case, David A; Al-Hashimi, Hashim M
2015-10-14
Approaches that combine experimental data and computational molecular dynamics (MD) to determine atomic resolution ensembles of biomolecules require the measurement of abundant experimental data. NMR residual dipolar couplings (RDCs) carry rich dynamics information, however, difficulties in modulating overall alignment of nucleic acids have limited the ability to fully extract this information. We present a strategy for modulating RNA alignment that is based on introducing variable dynamic kinks in terminal helices. With this strategy, we measured seven sets of RDCs in a cUUCGg apical loop and used this rich data set to test the accuracy of an 0.8 μs MD simulation computed using the Amber ff10 force field as well as to determine an atomic resolution ensemble. The MD-generated ensemble quantitatively reproduces the measured RDCs, but selection of a sub-ensemble was required to satisfy the RDCs within error. The largest discrepancies between the RDC-selected and MD-generated ensembles are observed for the most flexible loop residues and backbone angles connecting the loop to the helix, with the RDC-selected ensemble resulting in more uniform dynamics. Comparison of the RDC-selected ensemble with NMR spin relaxation data suggests that the dynamics occurs on the ps-ns time scales as verified by measurements of R(1ρ) relaxation-dispersion data. The RDC-satisfying ensemble samples many conformations adopted by the hairpin in crystal structures indicating that intrinsic plasticity may play important roles in conformational adaptation. The approach presented here can be applied to test nucleic acid force fields and to characterize dynamics in diverse RNA motifs at atomic resolution.
A Diagnostics Tool to detect ensemble forecast system anomaly and guide operational decisions
NASA Astrophysics Data System (ADS)
Park, G. H.; Srivastava, A.; Shrestha, E.; Thiemann, M.; Day, G. N.; Draijer, S.
2017-12-01
The hydrologic community is moving toward using ensemble forecasts to take uncertainty into account during the decision-making process. The New York City Department of Environmental Protection (DEP) implements several types of ensemble forecasts in their decision-making process: ensemble products for a statistical model (Hirsch and enhanced Hirsch); the National Weather Service (NWS) Advanced Hydrologic Prediction Service (AHPS) forecasts based on the classical Ensemble Streamflow Prediction (ESP) technique; and the new NWS Hydrologic Ensemble Forecasting Service (HEFS) forecasts. To remove structural error and apply the forecasts to additional forecast points, the DEP post processes both the AHPS and the HEFS forecasts. These ensemble forecasts provide mass quantities of complex data, and drawing conclusions from these forecasts is time-consuming and difficult. The complexity of these forecasts also makes it difficult to identify system failures resulting from poor data, missing forecasts, and server breakdowns. To address these issues, we developed a diagnostic tool that summarizes ensemble forecasts and provides additional information such as historical forecast statistics, forecast skill, and model forcing statistics. This additional information highlights the key information that enables operators to evaluate the forecast in real-time, dynamically interact with the data, and review additional statistics, if needed, to make better decisions. We used Bokeh, a Python interactive visualization library, and a multi-database management system to create this interactive tool. This tool compiles and stores data into HTML pages that allows operators to readily analyze the data with built-in user interaction features. This paper will present a brief description of the ensemble forecasts, forecast verification results, and the intended applications for the diagnostic tool.
A variational ensemble scheme for noisy image data assimilation
NASA Astrophysics Data System (ADS)
Yang, Yin; Robinson, Cordelia; Heitz, Dominique; Mémin, Etienne
2014-05-01
Data assimilation techniques aim at recovering a system state variables trajectory denoted as X, along time from partially observed noisy measurements of the system denoted as Y. These procedures, which couple dynamics and noisy measurements of the system, fulfill indeed a twofold objective. On one hand, they provide a denoising - or reconstruction - procedure of the data through a given model framework and on the other hand, they provide estimation procedures for unknown parameters of the dynamics. A standard variational data assimilation problem can be formulated as the minimization of the following objective function with respect to the initial discrepancy, η, from the background initial guess: δ« J(η(x)) = 1∥Xb (x) - X (t ,x)∥2 + 1 tf∥H(X (t,x ))- Y (t,x)∥2dt. 2 0 0 B 2 t0 R (1) where the observation operator H links the state variable and the measurements. The cost function can be interpreted as the log likelihood function associated to the a posteriori distribution of the state given the past history of measurements and the background. In this work, we aim at studying ensemble based optimal control strategies for data assimilation. Such formulation nicely combines the ingredients of ensemble Kalman filters and variational data assimilation (4DVar). It is also formulated as the minimization of the objective function (1), but similarly to ensemble filter, it introduces in its objective function an empirical ensemble-based background-error covariance defined as: B ≡ <(Xb -
Ensemble approach for differentiation of malignant melanoma
NASA Astrophysics Data System (ADS)
Rastgoo, Mojdeh; Morel, Olivier; Marzani, Franck; Garcia, Rafael
2015-04-01
Melanoma is the deadliest type of skin cancer, yet it is the most treatable kind depending on its early diagnosis. The early prognosis of melanoma is a challenging task for both clinicians and dermatologists. Due to the importance of early diagnosis and in order to assist the dermatologists, we propose an automated framework based on ensemble learning methods and dermoscopy images to differentiate melanoma from dysplastic and benign lesions. The evaluation of our framework on the recent and public dermoscopy benchmark (PH2 dataset) indicates the potential of proposed method. Our evaluation, using only global features, revealed that ensembles such as random forest perform better than single learner. Using random forest ensemble and combination of color and texture features, our framework achieved the highest sensitivity of 94% and specificity of 92%.
Negative Correlation Learning for Customer Churn Prediction: A Comparison Study
Faris, Hossam
2015-01-01
Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis. PMID:25879060
Petit and grand ensemble Monte Carlo calculations of the thermodynamics of the lattice gas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murch, G.E.; Thorn, R.J.
1978-11-01
A direct Monte Carlo method for estimating the chemical potential in the petit canonical ensemble was applied to the simple cubic Ising-like lattice gas. The method is based on a simple relationship between the chemical potential and the potential energy distribution in a lattice gas at equilibrium as derived independently by Widom, and Jackson and Klein. Results are presented here for the chemical potential at various compositions and temperatures above and below the zero field ferromagnetic and antiferromagnetic critical points. The same lattice gas model was reconstructed in the form of a restricted grand canonical ensemble and results at severalmore » temperatures were compared with those from the petit canonical ensemble. The agreement was excellent in these cases.« less
NASA Astrophysics Data System (ADS)
Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.
2016-12-01
Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.
Abawajy, Jemal; Kelarev, Andrei; Chowdhury, Morshed U; Jelinek, Herbert F
2016-01-01
Blood biochemistry attributes form an important class of tests, routinely collected several times per year for many patients with diabetes. The objective of this study is to investigate the role of blood biochemistry for improving the predictive accuracy of the diagnosis of cardiac autonomic neuropathy (CAN) progression. Blood biochemistry contributes to CAN, and so it is a causative factor that can provide additional power for the diagnosis of CAN especially in the absence of a complete set of Ewing tests. We introduce automated iterative multitier ensembles (AIME) and investigate their performance in comparison to base classifiers and standard ensemble classifiers for blood biochemistry attributes. AIME incorporate diverse ensembles into several tiers simultaneously and combine them into one automatically generated integrated system so that one ensemble acts as an integral part of another ensemble. We carried out extensive experimental analysis using large datasets from the diabetes screening research initiative (DiScRi) project. The results of our experiments show that several blood biochemistry attributes can be used to supplement the Ewing battery for the detection of CAN in situations where one or more of the Ewing tests cannot be completed because of the individual difficulties faced by each patient in performing the tests. The results show that AIME provide higher accuracy as a multitier CAN classification paradigm. The best predictive accuracy of 99.57% has been obtained by the AIME combining decorate on top tier with bagging on middle tier based on random forest. Practitioners can use these findings to increase the accuracy of CAN diagnosis.
Collective coupling in hybrid superconducting circuits
NASA Astrophysics Data System (ADS)
Saito, Shiro
Hybrid quantum systems utilizing superconducting circuits have attracted significant recent attention, not only for quantum information processing tasks but also as a way to explore fundamentally new physics regimes. In this talk, I will discuss two superconducting circuit based hybrid quantum system approaches. The first is a superconducting flux qubit - electron spin ensemble hybrid system in which quantum information manipulated in the flux qubit can be transferred to, stored in and retrieved from the ensemble. Although the coherence time of the ensemble is short, about 20 ns, this is a significant first step to utilize the spin ensemble as quantum memory for superconducting flux qubits. The second approach is a superconducting resonator - flux qubit ensemble hybrid system in which we fabricated a superconducting LC resonator coupled to a large ensemble of flux qubits. Here we observed a dispersive frequency shift of approximately 250 MHz in the resonators transmission spectrum. This indicates thousands of flux qubits are coupling to the resonator collectively. Although we need to improve our qubits inhomogeneity, our system has many potential uses including the creation of new quantum metamaterials, novel applications in quantum metrology and so on. This work was partially supported by JSPS KAKENHI Grant Number 25220601.
Formation and evolution of multimodal size distributions of InAs/GaAs quantum dots
NASA Astrophysics Data System (ADS)
Pohl, U. W.; Pötschke, K.; Schliwa, A.; Lifshits, M. B.; Shchukin, V. A.; Jesson, D. E.; Bimberg, D.
2006-05-01
Self-organized formation and evolution of quantum dot (QD) ensembles with a multimodal size distribution is reported. Such ensembles form after fast deposition near the critical thickness during a growth interruption (GRI) prior to cap layer growth and consist of pure InAs truncated pyramids with heights varying in steps of complete InAs monolayers, thereby creating well-distinguishable sub-ensembles. Ripening during GRI manifests itself by an increase of sub-ensembles of larger QDs at the expense of sub-ensembles of smaller ones, leaving the wetting layer unchanged. The dynamics of the multimodal QD size distribution is theoretically described using a kinetic approach. Starting from a broad distribution of flat QDs, a predominantly vertical growth is found due to strain-induced barriers for nucleation of a next atomic layer on different facets. QDs having initially a shorter base length attain a smaller height, accounting for the experimentally observed sub-ensemble structure. The evolution of the distribution is described by a master equation, which accounts for growth or dissolution of the QDs by mass exchange between the QDs and the adatom sea. The numerical solution is in good agreement with the measured dynamics.
Alternative Approaches to Land Initialization for Seasonal Precipitation and Temperature Forecasts
NASA Technical Reports Server (NTRS)
Koster, Randal; Suarez, Max; Liu, Ping; Jambor, Urszula
2004-01-01
The seasonal prediction system of the NASA Global Modeling and Assimilation Office is used to generate ensembles of summer forecasts utilizing realistic soil moisture initialization. To derive the realistic land states, we drive offline the system's land model with realistic meteorological forcing over the period 1979-1993 (in cooperation with the Global Land Data Assimilation System project at GSFC) and then extract the state variables' values on the chosen forecast start dates. A parallel series of forecast ensembles is performed with a random (though climatologically consistent) set of land initial conditions; by comparing the two sets of ensembles, we can isolate the impact of land initialization on forecast skill from that of the imposed SSTs. The base initialization experiment is supplemented with several forecast ensembles that use alternative initialization techniques. One ensemble addresses the impact of minimizing climate drift in the system through the scaling of the initial conditions, and another is designed to isolate the importance of the precipitation signal from that of all other signals in the antecedent offline forcing. A third ensemble includes a more realistic initialization of the atmosphere along with the land initialization. The impact of each variation on forecast skill is quantified.
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Ensemble-Based Assimilation of Aerosol Observations in GEOS-5
NASA Technical Reports Server (NTRS)
Buchard, V.; Da Silva, A.
2016-01-01
MERRA-2 is the latest Aerosol Reanalysis produced at NASA's Global Modeling Assimilation Office (GMAO) from 1979 to present. This reanalysis is based on a version of the GEOS-5 model radiatively coupled to GOCART aerosols and includes assimilation of bias corrected Aerosol Optical Depth (AOD) from AVHRR over ocean, MODIS sensors on both Terra and Aqua satellites, MISR over bright surfaces and AERONET data. In order to assimilate lidar profiles of aerosols, we are updating the aerosol component of our assimilation system to an Ensemble Kalman Filter (EnKF) type of scheme using ensembles generated routinely by the meteorological assimilation. Following the work performed with the first NASA's aerosol reanalysis (MERRAero), we first validate the vertical structure of MERRA-2 aerosol assimilated fields using CALIOP data over regions of particular interest during 2008.
NASA Astrophysics Data System (ADS)
Juszczyk, Michał
2018-04-01
This paper reports some results of the studies on the use of artificial intelligence tools for the purposes of cost estimation based on building information models. A problem of the cost estimates based on the building information models on a macro level supported by the ensembles of artificial neural networks is concisely discussed. In the course of the research a regression model has been built for the purposes of cost estimation of buildings' floor structural frames, as higher level elements. Building information models are supposed to serve as a repository of data used for the purposes of cost estimation. The core of the model is the ensemble of neural networks. The developed model allows the prediction of cost estimates with satisfactory accuracy.
A method for ensemble wildland fire simulation
Mark A. Finney; Isaac C. Grenfell; Charles W. McHugh; Robert C. Seli; Diane Trethewey; Richard D. Stratton; Stuart Brittain
2011-01-01
An ensemble simulation system that accounts for uncertainty in long-range weather conditions and two-dimensional wildland fire spread is described. Fuel moisture is expressed based on the energy release component, a US fire danger rating index, and its variation throughout the fire season is modeled using time series analysis of historical weather data. This analysis...
Cacha, L A; Parida, S; Dehuri, S; Cho, S-B; Poznanski, R R
2016-12-01
The huge number of voxels in fMRI over time poses a major challenge to for effective analysis. Fast, accurate, and reliable classifiers are required for estimating the decoding accuracy of brain activities. Although machine-learning classifiers seem promising, individual classifiers have their own limitations. To address this limitation, the present paper proposes a method based on the ensemble of neural networks to analyze fMRI data for cognitive state classification for application across multiple subjects. Similarly, the fuzzy integral (FI) approach has been employed as an efficient tool for combining different classifiers. The FI approach led to the development of a classifiers ensemble technique that performs better than any of the single classifier by reducing the misclassification, the bias, and the variance. The proposed method successfully classified the different cognitive states for multiple subjects with high accuracy of classification. Comparison of the performance improvement, while applying ensemble neural networks method, vs. that of the individual neural network strongly points toward the usefulness of the proposed method.
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-12-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Quantifying rapid changes in cardiovascular state with a moving ensemble average.
Cieslak, Matthew; Ryan, William S; Babenko, Viktoriya; Erro, Hannah; Rathbun, Zoe M; Meiring, Wendy; Kelsey, Robert M; Blascovich, Jim; Grafton, Scott T
2018-04-01
MEAP, the moving ensemble analysis pipeline, is a new open-source tool designed to perform multisubject preprocessing and analysis of cardiovascular data, including electrocardiogram (ECG), impedance cardiogram (ICG), and continuous blood pressure (BP). In addition to traditional ensemble averaging, MEAP implements a moving ensemble averaging method that allows for the continuous estimation of indices related to cardiovascular state, including cardiac output, preejection period, heart rate variability, and total peripheral resistance, among others. Here, we define the moving ensemble technique mathematically, highlighting its differences from fixed-window ensemble averaging. We describe MEAP's interface and features for signal processing, artifact correction, and cardiovascular-based fMRI analysis. We demonstrate the accuracy of MEAP's novel B point detection algorithm on a large collection of hand-labeled ICG waveforms. As a proof of concept, two subjects completed a series of four physical and cognitive tasks (cold pressor, Valsalva maneuver, video game, random dot kinetogram) on 3 separate days while ECG, ICG, and BP were recorded. Critically, the moving ensemble method reliably captures the rapid cyclical cardiovascular changes related to the baroreflex during the Valsalva maneuver and the classic cold pressor response. Cardiovascular measures were seen to vary considerably within repetitions of the same cognitive task for each individual, suggesting that a carefully designed paradigm could be used to capture fast-acting event-related changes in cardiovascular state. © 2017 Society for Psychophysiological Research.
Pourhoseingholi, Mohamad Amin; Kheirian, Sedigheh; Zali, Mohammad Reza
2017-12-01
Colorectal cancer (CRC) is one of the most common malignancies and cause of cancer mortality worldwide. Given the importance of predicting the survival of CRC patients and the growing use of data mining methods, this study aims to compare the performance of models for predicting 5-year survival of CRC patients using variety of basic and ensemble data mining methods. The CRC dataset from The Shahid Beheshti University of Medical Sciences Research Center for Gastroenterology and Liver Diseases were used for prediction and comparative study of the base and ensemble data mining techniques. Feature selection methods were used to select predictor attributes for classification. The WEKA toolkit and MedCalc software were respectively utilized for creating and comparing the models. The obtained results showed that the predictive performance of developed models was altogether high (all greater than 90%). Overall, the performance of ensemble models was higher than that of basic classifiers and the best result achieved by ensemble voting model in terms of area under the ROC curve (AUC= 0.96). AUC Comparison of models showed that the ensemble voting method significantly outperformed all models except for two methods of Random Forest (RF) and Bayesian Network (BN) considered the overlapping 95% confidence intervals. This result may indicate high predictive power of these two methods along with ensemble voting for predicting 5-year survival of CRC patients.
A new Method for the Estimation of Initial Condition Uncertainty Structures in Mesoscale Models
NASA Astrophysics Data System (ADS)
Keller, J. D.; Bach, L.; Hense, A.
2012-12-01
The estimation of fast growing error modes of a system is a key interest of ensemble data assimilation when assessing uncertainty in initial conditions. Over the last two decades three methods (and variations of these methods) have evolved for global numerical weather prediction models: ensemble Kalman filter, singular vectors and breeding of growing modes (or now ensemble transform). While the former incorporates a priori model error information and observation error estimates to determine ensemble initial conditions, the latter two techniques directly address the error structures associated with Lyapunov vectors. However, in global models these structures are mainly associated with transient global wave patterns. When assessing initial condition uncertainty in mesoscale limited area models, several problems regarding the aforementioned techniques arise: (a) additional sources of uncertainty on the smaller scales contribute to the error and (b) error structures from the global scale may quickly move through the model domain (depending on the size of the domain). To address the latter problem, perturbation structures from global models are often included in the mesoscale predictions as perturbed boundary conditions. However, the initial perturbations (when used) are often generated with a variant of an ensemble Kalman filter which does not necessarily focus on the large scale error patterns. In the framework of the European regional reanalysis project of the Hans-Ertel-Center for Weather Research we use a mesoscale model with an implemented nudging data assimilation scheme which does not support ensemble data assimilation at all. In preparation of an ensemble-based regional reanalysis and for the estimation of three-dimensional atmospheric covariance structures, we implemented a new method for the assessment of fast growing error modes for mesoscale limited area models. The so-called self-breeding is development based on the breeding of growing modes technique. Initial perturbations are integrated forward for a short time period and then rescaled and added to the initial state again. Iterating this rapid breeding cycle provides estimates for the initial uncertainty structure (or local Lyapunov vectors) given a specific norm. To avoid that all ensemble perturbations converge towards the leading local Lyapunov vector we apply an ensemble transform variant to orthogonalize the perturbations in the sub-space spanned by the ensemble. By choosing different kind of norms to measure perturbation growth, this technique allows for estimating uncertainty patterns targeted at specific sources of errors (e.g. convection, turbulence). With case study experiments we show applications of the self-breeding method for different sources of uncertainty and different horizontal scales.
Ensemble Methods for MiRNA Target Prediction from Expression Data.
Le, Thuc Duy; Zhang, Junpeng; Liu, Lin; Li, Jiuyong
2015-01-01
microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory. In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials.
Ensemble Methods for MiRNA Target Prediction from Expression Data
Le, Thuc Duy; Zhang, Junpeng; Liu, Lin; Li, Jiuyong
2015-01-01
Background microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory. Results In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials. PMID:26114448
NASA Astrophysics Data System (ADS)
Watanabe, S.; Kim, H.; Utsumi, N.
2017-12-01
This study aims to develop a new approach which projects hydrology under climate change using super ensemble experiments. The use of multiple ensemble is essential for the estimation of extreme, which is a major issue in the impact assessment of climate change. Hence, the super ensemble experiments are recently conducted by some research programs. While it is necessary to use multiple ensemble, the multiple calculations of hydrological simulation for each output of ensemble simulations needs considerable calculation costs. To effectively use the super ensemble experiments, we adopt a strategy to use runoff projected by climate models directly. The general approach of hydrological projection is to conduct hydrological model simulations which include land-surface and river routing process using atmospheric boundary conditions projected by climate models as inputs. This study, on the other hand, simulates only river routing model using runoff projected by climate models. In general, the climate model output is systematically biased so that a preprocessing which corrects such bias is necessary for impact assessments. Various bias correction methods have been proposed, but, to the best of our knowledge, no method has proposed for variables other than surface meteorology. Here, we newly propose a method for utilizing the projected future runoff directly. The developed method estimates and corrects the bias based on the pseudo-observation which is a result of retrospective offline simulation. We show an application of this approach to the super ensemble experiments conducted under the program of Half a degree Additional warming, Prognosis and Projected Impacts (HAPPI). More than 400 ensemble experiments from multiple climate models are available. The results of the validation using historical simulations by HAPPI indicates that the output of this approach can effectively reproduce retrospective runoff variability. Likewise, the bias of runoff from super ensemble climate projections is corrected, and the impact of climate change on hydrologic extremes is assessed in a cost-efficient way.
A Lab Based Method for Exoplanet Cloud and Aerosol Characterization
NASA Astrophysics Data System (ADS)
Johnson, A. V.; Schneiderman, T. M.; Bauer, A. J. R.; Cziczo, D. J.
2017-12-01
The atmospheres of some smaller, cooler exoplanets, like GJ 1214b, lack strong spectral features. This may suggest the presence of a high, optically thick cloud layer and poses great challenges for atmospheric characterization, but there is hope. The study of extraterrestrial atmospheres with terrestrial based techniques has proven useful for understanding the cloud-laden atmospheres of our solar system. Here we build on this by leveraging laboratory-based, terrestrial cloud particle instrumentation to better understand the microphysical and radiative properties of proposed exoplanet cloud and aerosol particles. The work to be presented focuses on the scattering properties of single particles, that may be representative of those suspended in exoplanet atmospheres, levitated in an Electrodynamic Balance (EDB). I will discuss how we leverage terrestrial based cloud microphysics for exoplanet applications, the instruments for single and ensemble particle studies used in this work, our investigation of ammonium nitrate (NH4NO3) scattering across temperature dependent crystalline phase changes, and the steps we are taking toward the collection of scattering phase functions and polarization of scattered light for exoplanet cloud analogs. Through this and future studies we hope to better understand how upper level cloud and/or aerosol particles in exoplanet atmospheres interact with incoming radiation from their host stars and what atmospheric information may still be obtainable through remote observations when no spectral features are observed.
NASA Astrophysics Data System (ADS)
Sanderson, B. M.
2017-12-01
The CMIP ensembles represent the most comprehensive source of information available to decision-makers for climate adaptation, yet it is clear that there are fundamental limitations in our ability to treat the ensemble as an unbiased sample of possible future climate trajectories. There is considerable evidence that models are not independent, and increasing complexity and resolution combined with computational constraints prevent a thorough exploration of parametric uncertainty or internal variability. Although more data than ever is available for calibration, the optimization of each model is influenced by institutional priorities, historical precedent and available resources. The resulting ensemble thus represents a miscellany of climate simulators which defy traditional statistical interpretation. Models are in some cases interdependent, but are sufficiently complex that the degree of interdependency is conditional on the application. Configurations have been updated using available observations to some degree, but not in a consistent or easily identifiable fashion. This means that the ensemble cannot be viewed as a true posterior distribution updated by available data, but nor can observational data alone be used to assess individual model likelihood. We assess recent literature for combining projections from an imperfect ensemble of climate simulators. Beginning with our published methodology for addressing model interdependency and skill in the weighting scheme for the 4th US National Climate Assessment, we consider strategies for incorporating process-based constraints on future response, perturbed parameter experiments and multi-model output into an integrated framework. We focus on a number of guiding questions: Is the traditional framework of confidence in projections inferred from model agreement leading to biased or misleading conclusions? Can the benefits of upweighting skillful models be reconciled with the increased risk of truth lying outside the weighted ensemble distribution? If CMIP is an ensemble of partially informed best-guesses, can we infer anything about the parent distribution of all possible models of the climate system (and if not, are we implicitly under-representing the risk of a climate catastrophe outside of the envelope of CMIP simulations)?
NASA Astrophysics Data System (ADS)
Flores, A. N.; Entekhabi, D.; Bras, R. L.
2007-12-01
Soil hydraulic and thermal properties (SHTPs) affect both the rate of moisture redistribution in the soil column and the volumetric soil water capacity. Adequately constraining these properties through field and lab analysis to parameterize spatially-distributed hydrology models is often prohibitively expensive. Because SHTPs vary significantly at small spatial scales individual soil samples are also only reliably indicative of local conditions, and these properties remain a significant source of uncertainty in soil moisture and temperature estimation. In ensemble-based soil moisture data assimilation, uncertainty in the model-produced prior estimate due to associated uncertainty in SHTPs must be taken into account to avoid under-dispersive ensembles. To treat SHTP uncertainty for purposes of supplying inputs to a distributed watershed model we use the restricted pairing (RP) algorithm, an extension of Latin Hypercube (LH) sampling. The RP algorithm generates an arbitrary number of SHTP combinations by sampling the appropriate marginal distributions of the individual soil properties using the LH approach, while imposing a target rank correlation among the properties. A previously-published meta- database of 1309 soils representing 12 textural classes is used to fit appropriate marginal distributions to the properties and compute the target rank correlation structure, conditioned on soil texture. Given categorical soil textures, our implementation of the RP algorithm generates an arbitrarily-sized ensemble of realizations of the SHTPs required as input to the TIN-based Realtime Integrated Basin Simulator with vegetation dynamics (tRIBS+VEGGIE) distributed parameter ecohydrology model. Soil moisture ensembles simulated with RP- generated SHTPs exhibit less variance than ensembles simulated with SHTPs generated by a scheme that neglects correlation among properties. Neglecting correlation among SHTPs can lead to physically unrealistic combinations of parameters that exhibit implausible hydrologic behavior when input to the tRIBS+VEGGIE model.
Mastwal, Surjeet; Cao, Vania; Wang, Kuan Hong
2016-01-01
Mental functions involve coordinated activities of specific neuronal ensembles that are embedded in complex brain circuits. Aberrant neuronal ensemble dynamics is thought to form the neurobiological basis of mental disorders. A major challenge in mental health research is to identify these cellular ensembles and determine what molecular mechanisms constrain their emergence and consolidation during development and learning. Here, we provide a perspective based on recent studies that use activity-dependent gene Arc/Arg3.1 as a cellular marker to identify neuronal ensembles and a molecular probe to modulate circuit functions. These studies have demonstrated that the transcription of Arc is activated in selective groups of frontal cortical neurons in response to specific behavioral tasks. Arc expression regulates the persistent firing of individual neurons and predicts the consolidation of neuronal ensembles during repeated learning. Therefore, the Arc pathway represents a prototypical example of activity-dependent genetic feedback regulation of neuronal ensembles. The activation of this pathway in the frontal cortex starts during early postnatal development and requires dopaminergic (DA) input. Conversely, genetic disruption of Arc leads to a hypoactive mesofrontal dopamine circuit and its related cognitive deficit. This mutual interaction suggests an auto-regulatory mechanism to amplify the impact of neuromodulators and activity-regulated genes during postnatal development. Such a mechanism may contribute to the association of mutations in dopamine and Arc pathways with neurodevelopmental psychiatric disorders. As the mesofrontal dopamine circuit shows extensive activity-dependent developmental plasticity, activity-guided modulation of DA projections or Arc ensembles during development may help to repair circuit deficits related to neuropsychiatric disorders.
The Behavior of Filters and Smoothers for Strongly Nonlinear Dynamics
NASA Technical Reports Server (NTRS)
Zhu, Yanqiu; Cohn, Stephen E.; Todling, Ricardo
1999-01-01
The Kalman filter is the optimal filter in the presence of known Gaussian error statistics and linear dynamics. Filter extension to nonlinear dynamics is non trivial in the sense of appropriately representing high order moments of the statistics. Monte Carlo, ensemble-based, methods have been advocated as the methodology for representing high order moments without any questionable closure assumptions (e.g., Miller 1994). Investigation along these lines has been conducted for highly idealized dynamics such as the strongly nonlinear Lorenz (1963) model as well as more realistic models of the oceans (Evensen and van Leeuwen 1996) and atmosphere (Houtekamer and Mitchell 1998). A few relevant issues in this context are related to the necessary number of ensemble members to properly represent the error statistics and, the necessary modifications in the usual filter equations to allow for correct update of the ensemble members (Burgers 1998). The ensemble technique has also been applied to the problem of smoothing for which similar questions apply. Ensemble smoother examples, however, seem to quite puzzling in that results of state estimate are worse than for their filter analogue (Evensen 1997). In this study, we use concepts in probability theory to revisit the ensemble methodology for filtering and smoothing in data assimilation. We use Lorenz (1963) model to test and compare the behavior of a variety implementations of ensemble filters. We also implement ensemble smoothers that are able to perform better than their filter counterparts. A discussion of feasibility of these techniques to large data assimilation problems will be given at the time of the conference.
AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.
Zhao, X G; Dai, W; Li, Y; Tian, L
2011-11-01
The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.