Organic Carbon Transformation and Mercury Methylation in Tundra Soils from Barrow Alaska
Liang, L.; Wullschleger, Stan; Graham, David; Gu, B.; Yang, Ziming
2016-04-20
This dataset includes information on soil labile organic carbon transformation and mercury methylation for tundra soils from Barrow, Alaska. The soil cores were collected from high-centered polygon (trough) at BEO and were incubated under anaerobic laboratory conditions at both freezing and warming temperatures for up to 8 months. Soil organic carbon including reducing sugars, alcohols, and organic acids were analyzed, and CH4 and CO2 emissions were quantified. Net production of methylmercury and Fe(II)/Fe(total) ratio were also measured and provided in this dataset.
So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
2018-01-01
Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity. PMID:29772020
Smith, Stephen A; Brown, Joseph W; Walker, Joseph F
2018-01-01
Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that "gene shopping" can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.
NASA Astrophysics Data System (ADS)
Mandal, D.; Bhatia, N.; Srivastav, R. K.
2016-12-01
Soil Water Assessment Tool (SWAT) is one of the most comprehensive hydrologic models to simulate streamflow for a watershed. The two major inputs for a SWAT model are: (i) Digital Elevation Models (DEM), and (ii) Land Use and Land Cover Maps (LULC). This study aims to quantify the uncertainty in streamflow predictions using SWAT for San Bernard River in Brazos-Colorado coastal watershed, Texas, by incorporating the respective datasets from different sources: (i) DEM data will be obtained from ASTER GDEM V2, GMTED2010, NHD DEM, and SRTM DEM datasets with ranging resolution from 1/3 arc-second to 30 arc-second, and (ii) LULC data will be obtained from GLCC V2, MRLC NLCD2011, NOAA's C-CAP, USGS GAP, and TCEQ databases. Weather variables (Precipitation and Max-Min Temperature at daily scale) will be obtained from National Climatic Data Centre (NCDC) and SWAT in-built STASGO tool will be used to obtain the soil maps. The SWAT model will be calibrated using SWAT-CUP SUFI-2 approach and its performance will be evaluated using the statistical indices of Nash-Sutcliffe efficiency (NSE), ratio of Root-Mean-Square-Error to standard deviation of observed streamflow (RSR), and Percent-Bias Error (PBIAS). The study will help understand the performance of SWAT model with varying data sources and eventually aid the regional state water boards in planning, designing, and managing hydrologic systems.
Hair-bundle proteomes of avian and mammalian inner-ear utricles
Wilmarth, Phillip A.; Krey, Jocelyn F.; Shin, Jung-Bum; Choi, Dongseok; David, Larry L.; Barr-Gillespie, Peter G.
2015-01-01
Examination of multiple proteomics datasets within or between species increases the reliability of protein identification. We report here proteomes of inner-ear hair bundles from three species (chick, mouse, and rat), which were collected on LTQ or LTQ Velos ion-trap mass spectrometers; the constituent proteins were quantified using MS2 intensities, which are the summed intensities of all peptide fragmentation spectra matched to a protein. The data are available via ProteomeXchange with identifiers PXD002410 (chick LTQ), PXD002414 (chick Velos), PXD002415 (mouse Velos), and PXD002416 (rat LTQ). The two chick bundle datasets compared favourably to a third, already-described chick bundle dataset, which was quantified using MS1 peak intensities, the summed intensities of peptides identified by high-resolution mass spectrometry (PXD000104; updated analysis in PXD002445). The mouse bundle dataset described here was comparable to a different mouse bundle dataset quantified using MS1 intensities (PXD002167). These six datasets will be useful for identifying the core proteome of vestibular hair bundles. PMID:26645194
NASA Astrophysics Data System (ADS)
Dube, Timothy; Mutanga, Onisimo
2015-03-01
Aboveground biomass estimation is critical in understanding forest contribution to regional carbon cycles. Despite the successful application of high spatial and spectral resolution sensors in aboveground biomass (AGB) estimation, there are challenges related to high acquisition costs, small area coverage, multicollinearity and limited availability. These challenges hamper the successful regional scale AGB quantification. The aim of this study was to assess the utility of the newly-launched medium-resolution multispectral Landsat 8 Operational Land Imager (OLI) dataset with a large swath width, in quantifying AGB in a forest plantation. We applied different sets of spectral analysis (test I: spectral bands; test II: spectral vegetation indices and test III: spectral bands + spectral vegetation indices) in testing the utility of Landsat 8 OLI using two non-parametric algorithms: stochastic gradient boosting and the random forest ensembles. The results of the study show that the medium-resolution multispectral Landsat 8 OLI dataset provides better AGB estimates for Eucalyptus dunii, Eucalyptus grandis and Pinus taeda especially when using the extracted spectral information together with the derived spectral vegetation indices. We also noted that incorporating the optimal subset of the most important selected medium-resolution multispectral Landsat 8 OLI bands improved AGB accuracies. We compared medium-resolution multispectral Landsat 8 OLI AGB estimates with Landsat 7 ETM + estimates and the latter yielded lower estimation accuracies. Overall, this study demonstrates the invaluable potential and strength of applying the relatively affordable and readily available newly-launched medium-resolution Landsat 8 OLI dataset, with a large swath width (185-km) in precisely estimating AGB. This strength of the Landsat OLI dataset is crucial especially in sub-Saharan Africa where high-resolution remote sensing data availability remains a challenge.
NASA Astrophysics Data System (ADS)
Lu, Yujuan; Yan, Mingquan; Korshin, Gregory V.
2017-09-01
The speciation, bioavailability and transport of Pb(II) in the environment are strongly affected by dissolved organic matter (DOM). Despite the importance of these interactions, the nature of Pb(II)-DOM binding is insufficiently attested. This study addressed this deficiency using the method of differential absorbance spectroscopy in combination with the non-ideal competitive adsorption (NICA)-Donnan model. Differential absorbance data allowed quantifying the interactions between Pb(II) and DOM in a wide range of pH values, ionic strengths and Pb(II) concentrations at an environmentally relevant DOM concentration (5 mg L-1). Changes of the slopes of the log-transformed absorbance spectra of DOM in the range of wavelength 242-262 and 350-400 nm were found to be predictive of the extent of Pb(II) bound by DOM carboxylic groups and of the total amount of DOM-bound Pb(II), respectively. The results also demonstrated the preferential involvement of DOM carboxylic groups in Pb(II) binding. The spectroscopic data allowed optimizing selected Pb(II)-DOM complexation constants used in the NICA-Donnan Model. This resulted in a markedly improved performance of that model when it was applied to interpret previously published Pb(II)-fulvic acid datasets.
Tan, Li Kuo; Liew, Yih Miin; Lim, Einly; Abdul Aziz, Yang Faridah; Chee, Kok Han; McLaughlin, Robert A
2018-06-01
In this paper, we develop and validate an open source, fully automatic algorithm to localize the left ventricular (LV) blood pool centroid in short axis cardiac cine MR images, enabling follow-on automated LV segmentation algorithms. The algorithm comprises four steps: (i) quantify motion to determine an initial region of interest surrounding the heart, (ii) identify potential 2D objects of interest using an intensity-based segmentation, (iii) assess contraction/expansion, circularity, and proximity to lung tissue to score all objects of interest in terms of their likelihood of constituting part of the LV, and (iv) aggregate the objects into connected groups and construct the final LV blood pool volume and centroid. This algorithm was tested against 1140 datasets from the Kaggle Second Annual Data Science Bowl, as well as 45 datasets from the STACOM 2009 Cardiac MR Left Ventricle Segmentation Challenge. Correct LV localization was confirmed in 97.3% of the datasets. The mean absolute error between the gold standard and localization centroids was 2.8 to 4.7 mm, or 12 to 22% of the average endocardial radius. Graphical abstract Fully automated localization of the left ventricular blood pool in short axis cardiac cine MR images.
Enhancing studies of the connectome in autism using the autism brain imaging data exchange II
Di Martino, Adriana; O’Connor, David; Chen, Bosi; Alaerts, Kaat; Anderson, Jeffrey S.; Assaf, Michal; Balsters, Joshua H.; Baxter, Leslie; Beggiato, Anita; Bernaerts, Sylvie; Blanken, Laura M. E.; Bookheimer, Susan Y.; Braden, B. Blair; Byrge, Lisa; Castellanos, F. Xavier; Dapretto, Mirella; Delorme, Richard; Fair, Damien A.; Fishman, Inna; Fitzgerald, Jacqueline; Gallagher, Louise; Keehn, R. Joanne Jao; Kennedy, Daniel P.; Lainhart, Janet E.; Luna, Beatriz; Mostofsky, Stewart H.; Müller, Ralph-Axel; Nebel, Mary Beth; Nigg, Joel T.; O’Hearn, Kirsten; Solomon, Marjorie; Toro, Roberto; Vaidya, Chandan J.; Wenderoth, Nicole; White, Tonya; Craddock, R. Cameron; Lord, Catherine; Leventhal, Bennett; Milham, Michael P.
2017-01-01
The second iteration of the Autism Brain Imaging Data Exchange (ABIDE II) aims to enhance the scope of brain connectomics research in Autism Spectrum Disorder (ASD). Consistent with the initial ABIDE effort (ABIDE I), that released 1112 datasets in 2012, this new multisite open-data resource is an aggregate of resting state functional magnetic resonance imaging (MRI) and corresponding structural MRI and phenotypic datasets. ABIDE II includes datasets from an additional 487 individuals with ASD and 557 controls previously collected across 16 international institutions. The combination of ABIDE I and ABIDE II provides investigators with 2156 unique cross-sectional datasets allowing selection of samples for discovery and/or replication. This sample size can also facilitate the identification of neurobiological subgroups, as well as preliminary examinations of sex differences in ASD. Additionally, ABIDE II includes a range of psychiatric variables to inform our understanding of the neural correlates of co-occurring psychopathology; 284 diffusion imaging datasets are also included. It is anticipated that these enhancements will contribute to unraveling key sources of ASD heterogeneity. PMID:28291247
Multiple-rule bias in the comparison of classification rules
Yousefi, Mohammadmahdi R.; Hua, Jianping; Dougherty, Edward R.
2011-01-01
Motivation: There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule. Results: This article provides a careful probabilistic analysis of the second issue and the ‘multiple-rule bias’, resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators. Availability: We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included. Contact: edward@ece.tamu.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21546390
NASA Astrophysics Data System (ADS)
Leavens, Claudia; Vik, Torbjørn; Schulz, Heinrich; Allaire, Stéphane; Kim, John; Dawson, Laura; O'Sullivan, Brian; Breen, Stephen; Jaffray, David; Pekar, Vladimir
2008-03-01
Manual contouring of target volumes and organs at risk in radiation therapy is extremely time-consuming, in particular for treating the head-and-neck area, where a single patient treatment plan can take several hours to contour. As radiation treatment delivery moves towards adaptive treatment, the need for more efficient segmentation techniques will increase. We are developing a method for automatic model-based segmentation of the head and neck. This process can be broken down into three main steps: i) automatic landmark identification in the image dataset of interest, ii) automatic landmark-based initialization of deformable surface models to the patient image dataset, and iii) adaptation of the deformable models to the patient-specific anatomical boundaries of interest. In this paper, we focus on the validation of the first step of this method, quantifying the results of our automatic landmark identification method. We use an image atlas formed by applying thin-plate spline (TPS) interpolation to ten atlas datasets, using 27 manually identified landmarks in each atlas/training dataset. The principal variation modes returned by principal component analysis (PCA) of the landmark positions were used by an automatic registration algorithm, which sought the corresponding landmarks in the clinical dataset of interest using a controlled random search algorithm. Applying a run time of 60 seconds to the random search, a root mean square (rms) distance to the ground-truth landmark position of 9.5 +/- 0.6 mm was calculated for the identified landmarks. Automatic segmentation of the brain, mandible and brain stem, using the detected landmarks, is demonstrated.
NASA Astrophysics Data System (ADS)
Ryu, Youngryel; Jiang, Chongya
2016-04-01
To gain insights about the underlying impacts of global climate change on terrestrial ecosystem fluxes, we present a long-term (1982-2015) global radiation, carbon and water fluxes products by integrating multi-satellite data with a process-based model, the Breathing Earth System Simulator (BESS). BESS is a coupled processed model that integrates radiative transfer in the atmosphere and canopy, photosynthesis (GPP), and evapotranspiration (ET). BESS was designed most sensitive to the variables that can be quantified reliably, fully taking advantages of remote sensing atmospheric and land products. Originally, BESS entirely relied on MODIS as input variables to produce global GPP and ET during the MODIS era. This study extends the work to provide a series of long-term products from 1982 to 2015 by incorporating AVHRR data. In addition to GPP and ET, more land surface processes related datasets are mapped to facilitate the discovery of the ecological variations and changes. The CLARA-A1 cloud property datasets, the TOMS aerosol datasets, along with the GLASS land surface albedo datasets, were input to a look-up table derived from an atmospheric radiative transfer model to produce direct and diffuse components of visible and near infrared radiation datasets. Theses radiation components together with the LAI3g datasets and the GLASS land surface albedo datasets, were used to calculate absorbed radiation through a clumping corrected two-stream canopy radiative transfer model. ECMWF ERA interim air temperature data were downscaled by using ALP-II land surface temperature dataset and a region-dependent regression model. The spatial and seasonal variations of CO2 concentration were accounted by OCO-2 datasets, whereas NOAA's global CO2 growth rates data were used to describe interannual variations. All these remote sensing based datasets are used to run the BESS. Daily fluxes in 1/12 degree were computed and then aggregated to half-month interval to match with the spatial-temporal resolution of LAI3g dataset. The BESS GPP and ET products were compared to other independent datasets including MPI-BGC and CLM. Overall, the BESS products show good agreement with the other two datasets, indicating a compelling potential for bridging remote sensing and land surface models.
NASA Astrophysics Data System (ADS)
Macedonio, Giovanni; Costa, Antonio; Scollo, Simona; Neri, Augusto
2015-04-01
Uncertainty in the tephra fallout hazard assessment may depend on different meteorological datasets and eruptive source parameters used in the modelling. We present a statistical study to analyze this uncertainty in the case of a sub-Plinian eruption of Vesuvius of VEI = 4, column height of 18 km and total erupted mass of 5 × 1011 kg. The hazard assessment for tephra fallout is performed using the advection-diffusion model Hazmap. Firstly, we analyze statistically different meteorological datasets: i) from the daily atmospheric soundings of the stations located in Brindisi (Italy) between 1962 and 1976 and between 1996 and 2012, and in Pratica di Mare (Rome, Italy) between 1996 and 2012; ii) from numerical weather prediction models of the National Oceanic and Atmospheric Administration and of the European Centre for Medium-Range Weather Forecasts. Furthermore, we modify the total mass, the total grain-size distribution, the eruption column height, and the diffusion coefficient. Then, we quantify the impact that different datasets and model input parameters have on the probability maps. Results shows that the parameter that mostly affects the tephra fallout probability maps, keeping constant the total mass, is the particle terminal settling velocity, which is a function of the total grain-size distribution, particle density and shape. Differently, the evaluation of the hazard assessment weakly depends on the use of different meteorological datasets, column height and diffusion coefficient.
2017-02-01
note, a number of different measures implemented in both MATLAB and Python as functions are used to quantify similarity/distance between 2 vector-based...this technical note are widely used and may have an important role when computing the distance and similarity of large datasets and when considering high...throughput processes. In this technical note, a number of different measures implemented in both MAT- LAB and Python as functions are used to
Segmentation of Unstructured Datasets
NASA Technical Reports Server (NTRS)
Bhat, Smitha
1996-01-01
Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.
Unbalanced 2 x 2 Factorial Designs and the Interaction Effect: A Troublesome Combination
2015-01-01
In this power study, ANOVAs of unbalanced and balanced 2 x 2 datasets are compared (N = 120). Datasets are created under the assumption that H1 of the effects is true. The effects are constructed in two ways, assuming: 1. contributions to the effects solely in the treatment groups; 2. contrasting contributions in treatment and control groups. The main question is whether the two ANOVA correction methods for imbalance (applying Sums of Squares Type II or III; SS II or SS III) offer satisfactory power in the presence of an interaction. Overall, SS II showed higher power, but results varied strongly. When compared to a balanced dataset, for some unbalanced datasets the rejection rate of H0 of main effects was undesirably higher. SS III showed consistently somewhat lower power. When the effects were constructed with equal contributions from control and treatment groups, the interaction could be re-estimated satisfactorily. When an interaction was present, SS III led consistently to somewhat lower rejection rates of H0 of main effects, compared to the rejection rates found in equivalent balanced datasets, while SS II produced strongly varying results. In data constructed with only effects in the treatment groups and no effects in the control groups, the H0 of moderate and strong interaction effects was often not rejected and SS II seemed applicable. Even then, SS III provided slightly better results when a true interaction was present. ANOVA allowed not always for a satisfactory re-estimation of the unique interaction effect. Yet, SS II worked better only when an interaction effect could be excluded, whereas SS III results were just marginally worse in that case. Overall, SS III provided consistently 1 to 5% lower rejection rates of H0 in comparison with analyses of balanced datasets, while results of SS II varied too widely for general application. PMID:25807514
Trace Gas/Aerosol Interactions and GMI Modeling Support
NASA Technical Reports Server (NTRS)
Penner, Joyce E.; Liu, Xiaohong; Das, Bigyani; Bergmann, Dan; Rodriquez, Jose M.; Strahan, Susan; Wang, Minghuai; Feng, Yan
2005-01-01
Current global aerosol models use different physical and chemical schemes and parameters, different meteorological fields, and often different emission sources. Since the physical and chemical parameterization schemes are often tuned to obtain results that are consistent with observations, it is difficult to assess the true uncertainty due to meteorology alone. Under the framework of the NASA global modeling initiative (GMI), the differences and uncertainties in aerosol simulations (for sulfate, organic carbon, black carbon, dust and sea salt) solely due to different meteorological fields are analyzed and quantified. Three meteorological datasets available from the NASA DAO GCM, the GISS-II' GCM, and the NASA finite volume GCM (FVGCM) are used to drive the same aerosol model. The global sulfate and mineral dust burdens with FVGCM fields are 40% and 20% less than those with DAO and GISS fields, respectively due to its heavier rainfall. Meanwhile, the sea salt burden predicted with FVGCM fields is 56% and 43% higher than those with DAO and GISS, respectively, due to its stronger convection especially over the Southern Hemispheric Ocean. Sulfate concentrations at the surface in the Northern Hemisphere extratropics and in the middle to upper troposphere differ by more than a factor of 3 between the three meteorological datasets. The agreement between model calculated and observed aerosol concentrations in the industrial regions (e.g., North America and Europe) is quite similar for all three meteorological datasets. Away from the source regions, however, the comparisons with observations differ greatly for DAO, FVGCM and GISS, and the performance of the model using different datasets varies largely depending on sites and species. Global annual average aerosol optical depth at 550 nm is 0.120-0.131 for the three meteorological datasets.
Merged SAGE II / MIPAS / OMPS Ozone Record : Impact of Transfer Standard on Ozone Trends.
NASA Astrophysics Data System (ADS)
Kramarova, N. A.; Laeng, A.; von Clarmann, T.; Stiller, G. P.; Walker, K. A.; Zawodny, J. M.; Plieninger, J.
2017-12-01
The deseasonalized ozone anomalies from SAGE II, MIPAS and OMPS-LP datasets are merged into one long record. Two versions of the dataset will be presented : ACE-FTS instrument or MLS instrument are used as a transfer standard. The data are provided in 10 degrees latitude bins, going from 60N to 60S for the period from October 1984 to March 2017. The main differences between presented in this study merged ozone record and the merged SAGE II / Ozone_CCI / OMPS-Saskatoon dataset by V. Sofieva are: - the OMPS-LP data are from the NASA GSFC version 2 processor - the MIPAS 2002-2004 date are taken into the record - Data are merged using a transfer standard. In overlapping periods data are merged as weighted means where the weights are inversely proportional to the standard errors of the means (SEM) of the corresponding individual monthly means. The merged dataset comes with the uncertainty estimates. Ozone trends are calculated out of both versions of the dataset. The impact of transfer standard on obtained trends is discussed.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-24
... evaluates potential datasets and recommends which datasets are appropriate for assessment analyses. The... points to datasets incorporated in the original SEDAR benchmark assessment and run the benchmark... Webinar II November 22, 2010; 10 a.m. - 1 p.m.; SEDAR Update Assessment Webinar III Using updated datasets...
Yang, Chihae; Barlow, Susan M; Muldoon Jacobs, Kristi L; Vitcheva, Vessela; Boobis, Alan R; Felter, Susan P; Arvidson, Kirk B; Keller, Detlef; Cronin, Mark T D; Enoch, Steven; Worth, Andrew; Hollnagel, Heli M
2017-11-01
A new dataset of cosmetics-related chemicals for the Threshold of Toxicological Concern (TTC) approach has been compiled, comprising 552 chemicals with 219, 40, and 293 chemicals in Cramer Classes I, II, and III, respectively. Data were integrated and curated to create a database of No-/Lowest-Observed-Adverse-Effect Level (NOAEL/LOAEL) values, from which the final COSMOS TTC dataset was developed. Criteria for study inclusion and NOAEL decisions were defined, and rigorous quality control was performed for study details and assignment of Cramer classes. From the final COSMOS TTC dataset, human exposure thresholds of 42 and 7.9 μg/kg-bw/day were derived for Cramer Classes I and III, respectively. The size of Cramer Class II was insufficient for derivation of a TTC value. The COSMOS TTC dataset was then federated with the dataset of Munro and colleagues, previously published in 1996, after updating the latter using the quality control processes for this project. This federated dataset expands the chemical space and provides more robust thresholds. The 966 substances in the federated database comprise 245, 49 and 672 chemicals in Cramer Classes I, II and III, respectively. The corresponding TTC values of 46, 6.2 and 2.3 μg/kg-bw/day are broadly similar to those of the original Munro dataset. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Quantifying the tibiofemoral joint space using x-ray tomosynthesis.
Kalinosky, Benjamin; Sabol, John M; Piacsek, Kelly; Heckel, Beth; Gilat Schmidt, Taly
2011-12-01
Digital x-ray tomosynthesis (DTS) has the potential to provide 3D information about the knee joint in a load-bearing posture, which may improve diagnosis and monitoring of knee osteoarthritis compared with projection radiography, the current standard of care. Manually quantifying and visualizing the joint space width (JSW) from 3D tomosynthesis datasets may be challenging. This work developed a semiautomated algorithm for quantifying the 3D tibiofemoral JSW from reconstructed DTS images. The algorithm was validated through anthropomorphic phantom experiments and applied to three clinical datasets. A user-selected volume of interest within the reconstructed DTS volume was enhanced with 1D multiscale gradient kernels. The edge-enhanced volumes were divided by polarity into tibial and femoral edge maps and combined across kernel scales. A 2D connected components algorithm was performed to determine candidate tibial and femoral edges. A 2D joint space width map (JSW) was constructed to represent the 3D tibiofemoral joint space. To quantify the algorithm accuracy, an adjustable knee phantom was constructed, and eleven posterior-anterior (PA) and lateral DTS scans were acquired with the medial minimum JSW of the phantom set to 0-5 mm in 0.5 mm increments (VolumeRad™, GE Healthcare, Chalfont St. Giles, United Kingdom). The accuracy of the algorithm was quantified by comparing the minimum JSW in a region of interest in the medial compartment of the JSW map to the measured phantom setting for each trial. In addition, the algorithm was applied to DTS scans of a static knee phantom and the JSW map compared to values estimated from a manually segmented computed tomography (CT) dataset. The algorithm was also applied to three clinical DTS datasets of osteoarthritic patients. The algorithm segmented the JSW and generated a JSW map for all phantom and clinical datasets. For the adjustable phantom, the estimated minimum JSW values were plotted against the measured values for all trials. A linear fit estimated a slope of 0.887 (R² = 0.962) and a mean error across all trials of 0.34 mm for the PA phantom data. The estimated minimum JSW values for the lateral adjustable phantom acquisitions were found to have low correlation to the measured values (R² = 0.377), with a mean error of 2.13 mm. The error in the lateral adjustable-phantom datasets appeared to be caused by artifacts due to unrealistic features in the phantom bones. JSW maps generated by DTS and CT varied by a mean of 0.6 mm and 0.8 mm across the knee joint, for PA and lateral scans. The tibial and femoral edges were successfully segmented and JSW maps determined for PA and lateral clinical DTS datasets. A semiautomated method is presented for quantifying the 3D joint space in a 2D JSW map using tomosynthesis images. The proposed algorithm quantified the JSW across the knee joint to sub-millimeter accuracy for PA tomosynthesis acquisitions. Overall, the results suggest that x-ray tomosynthesis may be beneficial for diagnosing and monitoring disease progression or treatment of osteoarthritis by providing quantitative images of JSW in the load-bearing knee.
On sample size and different interpretations of snow stability datasets
NASA Astrophysics Data System (ADS)
Schirmer, M.; Mitterer, C.; Schweizer, J.
2009-04-01
Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar aspect distributions to the large dataset. We used 100 different subsets for each sample size. Statistical variations obtained in the complete dataset were also tested on the smaller subsets using the Mann-Whitney or the Kruskal-Wallis test. For each subset size, the number of subsets were counted in which the significance level was reached. For these tests no nominal data scale was assumed. (iii) For the same subsets described above, the distribution of the aspect median was determined. A count of how often this distribution was substantially different from the distribution obtained with the complete dataset was made. Since two valid stability interpretations were available (an objective and a subjective interpretation as described above), the effect of the arbitrary choice of the interpretation on spatial variability results was tested. In over one third of the cases the two interpretations came to different results. The effect of these differences were studied in a similar method as described in (iii): the distribution of the aspect median was determined for subsets of the complete dataset using both interpretations, compared against each other as well as to the results of the complete dataset. For the complete dataset the two interpretations showed mainly identical results. Therefore the subset size was determined from the point at which the results of the two interpretations converged. A universal result for the optimal subset size cannot be presented since results differed between different situations contained in the dataset. The optimal subset size is thus dependent on stability variation in a given situation, which is unknown initially. There are indications that for some situations even the complete dataset might be not large enough. At a subset size of approximately 25, the significant differences between aspect groups (as determined using the whole dataset) were only obtained in one out of five situations. In some situations, up to 20% of the subsets showed a substantially different distribution of the aspect median. Thus, in most cases, 25 measurements (which can be achieved by six two-person teams in one day) did not allow to draw reliable conclusions.
NASA Astrophysics Data System (ADS)
Thompson, S. E.; Levy, M. C.
2016-12-01
Quantifying regional water cycle changes resulting from the physical transformation of the earth's surface is essential for water security. Although hydrology has a rich legacy of "paired basin" experiments that identify water cycle responses to imposed land use or land cover change (i) there is a deficit of such studies across many representative biomes worldwide, including the tropics, and (ii) the paired basins generally do not provide a representative sample of regional river systems in a way that can inform policy. Larger sample, empirical analyses are needed for such policy-relevant understanding - and these analyses must be supported by regional data. Northern Brazil is a global agricultural and biodiversity center, where regional climate and hydrology are projected (through modeling) to have strong sensitivities to land cover change. Dramatic land cover change has and continues to occur in this region. We used a causal statistical anlaysis framework to explore the effects of deforestation and land cover conversion on regional hydrology. Firstly, we used a comparative approach to address the `data selection uncertainty' problem associated with rainfall datasets covering this sparsely monitored region. We compared 9 remotely-sensed (RS) and in-situ (IS) rainfall datasets, demonstrating that rainfall characterization and trends were sensitive to the selected data sources and identifying which of these datasets had the strongest fidelity to independently measured streamflow occurrence. Next, we employed a "differences-in-differences" regression technique to evaluate the effects of land use change on the quantiles of the flow duration curve between populations of basins experiencing different levels of land conversion. Regionally, controlling for climate and other variables, deforestation significantly increased flow in the lowest third of the flow duration curve. Addressing this problem required harmonizing 9 separate spatial datasets (in addition to the 9 rainfall datasets originally considered), and relied extensively on the use of newly developed data acquisition and analysis platforms such as Google Earth Engine and Columbia IRI/LDEO. The datasets developed in this project have been made discoverable through collaboration with CUAHSI.
Atkinson, Jonathan A; Lobet, Guillaume; Noll, Manuel; Meyer, Patrick E; Griffiths, Marcus; Wells, Darren M
2017-10-01
Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping. © The Authors 2017. Published by Oxford University Press.
Atkinson, Jonathan A.; Lobet, Guillaume; Noll, Manuel; Meyer, Patrick E.; Griffiths, Marcus
2017-01-01
Abstract Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping. PMID:29020748
Quantification and Visualization of Variation in Anatomical Trees
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amenta, Nina; Datar, Manasi; Dirksen, Asger
This paper presents two approaches to quantifying and visualizing variation in datasets of trees. The first approach localizes subtrees in which significant population differences are found through hypothesis testing and sparse classifiers on subtree features. The second approach visualizes the global metric structure of datasets through low-distortion embedding into hyperbolic planes in the style of multidimensional scaling. A case study is made on a dataset of airway trees in relation to Chronic Obstructive Pulmonary Disease.
Policy makers need to understand how land cover change alters storm water regimes, yet existing methods do not fully utilize newly available datasets to quantify storm water changes at a landscape-scale. Here, we use high-resolution, remotely-sensed land cover, imperviousness, an...
Using aerial images for establishing a workflow for the quantification of water management measures
NASA Astrophysics Data System (ADS)
Leuschner, Annette; Merz, Christoph; van Gasselt, Stephan; Steidl, Jörg
2017-04-01
Quantified landscape characteristics, such as morphology, land use or hydrological conditions, play an important role for hydrological investigations as landscape parameters directly control the overall water balance. A powerful assimilation and geospatial analysis of remote sensing datasets in combination with hydrological modeling allows to quantify landscape parameters and water balances efficiently. This study focuses on the development of a workflow to extract hydrologically relevant data from aerial image datasets and derived products in order to allow an effective parametrization of a hydrological model. Consistent and self-contained data source are indispensable for achieving reasonable modeling results. In order to minimize uncertainties and inconsistencies, input parameters for modeling should be extracted from one remote-sensing dataset mainly if possbile. Here, aerial images have been chosen because of their high spatial and spectral resolution that permits the extraction of various model relevant parameters, like morphology, land-use or artificial drainage-systems. The methodological repertoire to extract environmental parameters range from analyses of digital terrain models, multispectral classification and segmentation of land use distribution maps and mapping of artificial drainage-systems based on spectral and visual inspection. The workflow has been tested for a mesoscale catchment area which forms a characteristic hydrological system of a young moraine landscape located in the state of Brandenburg, Germany. These dataset were used as input-dataset for multi-temporal hydrological modelling of water balances to detect and quantify anthropogenic and meteorological impacts. ArcSWAT, as a GIS-implemented extension and graphical user input interface for the Soil Water Assessment Tool (SWAT) was chosen. The results of this modeling approach provide the basis for anticipating future development of the hydrological system, and regarding system changes for the adaption of water resource management decisions.
Human movement data for malaria control and elimination strategic planning.
Pindolia, Deepa K; Garcia, Andres J; Wesolowski, Amy; Smith, David L; Buckee, Caroline O; Noor, Abdisalan M; Snow, Robert W; Tatem, Andrew J
2012-06-18
Recent increases in funding for malaria control have led to the reduction in transmission in many malaria endemic countries, prompting the national control programmes of 36 malaria endemic countries to set elimination targets. Accounting for human population movement (HPM) in planning for control, elimination and post-elimination surveillance is important, as evidenced by previous elimination attempts that were undermined by the reintroduction of malaria through HPM. Strategic control and elimination planning, therefore, requires quantitative information on HPM patterns and the translation of these into parasite dispersion. HPM patterns and the risk of malaria vary substantially across spatial and temporal scales, demographic and socioeconomic sub-groups, and motivation for travel, so multiple data sets are likely required for quantification of movement. While existing studies based on mobile phone call record data combined with malaria transmission maps have begun to address within-country HPM patterns, other aspects remain poorly quantified despite their importance in accurately gauging malaria movement patterns and building control and detection strategies, such as cross-border HPM, demographic and socioeconomic stratification of HPM patterns, forms of transport, personal malaria protection and other factors that modify malaria risk. A wealth of data exist to aid filling these gaps, which, when combined with spatial data on transport infrastructure, traffic and malaria transmission, can answer relevant questions to guide strategic planning. This review aims to (i) discuss relevant types of HPM across spatial and temporal scales, (ii) document where datasets exist to quantify HPM, (iii) highlight where data gaps remain and (iv) briefly put forward methods for integrating these datasets in a Geographic Information System (GIS) framework for analysing and modelling human population and Plasmodium falciparum malaria infection movements.
Human movement data for malaria control and elimination strategic planning
2012-01-01
Recent increases in funding for malaria control have led to the reduction in transmission in many malaria endemic countries, prompting the national control programmes of 36 malaria endemic countries to set elimination targets. Accounting for human population movement (HPM) in planning for control, elimination and post-elimination surveillance is important, as evidenced by previous elimination attempts that were undermined by the reintroduction of malaria through HPM. Strategic control and elimination planning, therefore, requires quantitative information on HPM patterns and the translation of these into parasite dispersion. HPM patterns and the risk of malaria vary substantially across spatial and temporal scales, demographic and socioeconomic sub-groups, and motivation for travel, so multiple data sets are likely required for quantification of movement. While existing studies based on mobile phone call record data combined with malaria transmission maps have begun to address within-country HPM patterns, other aspects remain poorly quantified despite their importance in accurately gauging malaria movement patterns and building control and detection strategies, such as cross-border HPM, demographic and socioeconomic stratification of HPM patterns, forms of transport, personal malaria protection and other factors that modify malaria risk. A wealth of data exist to aid filling these gaps, which, when combined with spatial data on transport infrastructure, traffic and malaria transmission, can answer relevant questions to guide strategic planning. This review aims to (i) discuss relevant types of HPM across spatial and temporal scales, (ii) document where datasets exist to quantify HPM, (iii) highlight where data gaps remain and (iv) briefly put forward methods for integrating these datasets in a Geographic Information System (GIS) framework for analysing and modelling human population and Plasmodium falciparum malaria infection movements. PMID:22703541
2013-01-01
Background Microbial ecologists often employ methods from classical community ecology to analyze microbial community diversity. However, these methods have limitations because microbial communities differ from macro-organismal communities in key ways. This study sought to quantify microbial diversity using methods that are better suited for data spanning multiple domains of life and dimensions of diversity. Diversity profiles are one novel, promising way to analyze microbial datasets. Diversity profiles encompass many other indices, provide effective numbers of diversity (mathematical generalizations of previous indices that better convey the magnitude of differences in diversity), and can incorporate taxa similarity information. To explore whether these profiles change interpretations of microbial datasets, diversity profiles were calculated for four microbial datasets from different environments spanning all domains of life as well as viruses. Both similarity-based profiles that incorporated phylogenetic relatedness and naïve (not similarity-based) profiles were calculated. Simulated datasets were used to examine the robustness of diversity profiles to varying phylogenetic topology and community composition. Results Diversity profiles provided insights into microbial datasets that were not detectable with classical univariate diversity metrics. For all datasets analyzed, there were key distinctions between calculations that incorporated phylogenetic diversity as a measure of taxa similarity and naïve calculations. The profiles also provided information about the effects of rare species on diversity calculations. Additionally, diversity profiles were used to examine thousands of simulated microbial communities, showing that similarity-based and naïve diversity profiles only agreed approximately 50% of the time in their classification of which sample was most diverse. This is a strong argument for incorporating similarity information and calculating diversity with a range of emphases on rare and abundant species when quantifying microbial community diversity. Conclusions For many datasets, diversity profiles provided a different view of microbial community diversity compared to analyses that did not take into account taxa similarity information, effective diversity, or multiple diversity metrics. These findings are a valuable contribution to data analysis methodology in microbial ecology. PMID:24238386
Investment strategies used as spectroscopy of financial markets reveal new stylized facts.
Zhou, Wei-Xing; Mu, Guo-Hua; Chen, Wei; Sornette, Didier
2011-01-01
We propose a new set of stylized facts quantifying the structure of financial markets. The key idea is to study the combined structure of both investment strategies and prices in order to open a qualitatively new level of understanding of financial and economic markets. We study the detailed order flow on the Shenzhen Stock Exchange of China for the whole year of 2003. This enormous dataset allows us to compare (i) a closed national market (A-shares) with an international market (B-shares), (ii) individuals and institutions, and (iii) real traders to random strategies with respect to timing that share otherwise all other characteristics. We find in general that more trading results in smaller net return due to trading frictions, with the exception that the net return is independent of the trading frequency for A-share individual traders. We unveiled quantitative power laws with non-trivial exponents, that quantify the deterioration of performance with frequency and with holding period of the strategies used by traders. Random strategies are found to perform much better than real ones, both for winners and losers. Surprising large arbitrage opportunities exist, especially when using zero-intelligence strategies. This is a diagnostic of possible inefficiencies of these financial markets.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-09
... of the Data Workshop is a data report which compiles and evaluates potential datasets and recommends which datasets are appropriate for assessment analyses. The product of the Assessment Process is a stock... webinars II through IV: Using datasets recommended from the Data Workshop, participants will employ...
Tills, Oliver; Bitterli, Tabitha; Culverhouse, Phil; Spicer, John I; Rundle, Simon
2013-02-01
Motion analysis is one of the tools available to biologists to extract biologically relevant information from image datasets and has been applied to a diverse range of organisms. The application of motion analysis during early development presents a challenge, as embryos often exhibit complex, subtle and diverse movement patterns. A method of motion analysis able to holistically quantify complex embryonic movements could be a powerful tool for fields such as toxicology and developmental biology to investigate whole organism stress responses. Here we assessed whether motion analysis could be used to distinguish the effects of stressors on three early developmental stages of each of three species: (i) the zebrafish Danio rerio (stages 19 h, 21.5 h and 33 h exposed to 1.5% ethanol and a salinity of 5); (ii) the African clawed toad Xenopus laevis (stages 24, 32 and 34 exposed to a salinity of 20); and iii) the pond snail Radix balthica (stages E3, E4, E6, E9 and E11 exposed to salinities of 5, 10 and 15). Image sequences were analysed using Sparse Optic Flow and the resultant frame-to-frame motion parameters were analysed using Discrete Fourier Transform to quantify the distribution of energy at different frequencies. This spectral frequency dataset was then used to construct a Bray-Curtis similarity matrix and differences in movement patterns between embryos in this matrix were tested for using ANOSIM. Spectral frequency analysis of these motion parameters was able to distinguish stage-specific effects of environmental stressors in most cases, including Xenopus laevis at stages 24, 32 and 34 exposed to a salinity of 20, Danio rerio at 33 hpf exposed to 1.5% ethanol, and Radix balthica at stages E4, E9 and E11 exposed to salinities of 5, 10 and 15. This technique was better able to distinguish embryos exposed to stressors than analysis of manual quantification of movement and within species distinguished most of the developmental stages studied in the control treatments. This innovative use of motion analysis incorporates data quantifying embryonic movements at a range of frequencies and so provides an holistic analysis of an embryo's movement patterns. This technique has potential applications for quantifying embryonic responses to environmental stressors such as exposure to pharmaceuticals or pollutants, and also as an automated tool for developmental staging of embryos.
Quantifying surface albedo and other direct biogeophysical climate forcings of forestry activities.
Bright, Ryan M; Zhao, Kaiguang; Jackson, Robert B; Cherubini, Francesco
2015-09-01
By altering fluxes of heat, momentum, and moisture exchanges between the land surface and atmosphere, forestry and other land-use activities affect climate. Although long recognized scientifically as being important, these so-called biogeophysical forcings are rarely included in climate policies for forestry and other land management projects due to the many challenges associated with their quantification. Here, we review the scientific literature in the fields of atmospheric science and terrestrial ecology in light of three main objectives: (i) to elucidate the challenges associated with quantifying biogeophysical climate forcings connected to land use and land management, with a focus on the forestry sector; (ii) to identify and describe scientific approaches and/or metrics facilitating the quantification and interpretation of direct biogeophysical climate forcings; and (iii) to identify and recommend research priorities that can help overcome the challenges of their attribution to specific land-use activities, bridging the knowledge gap between the climate modeling, forest ecology, and resource management communities. We find that ignoring surface biogeophysics may mislead climate mitigation policies, yet existing metrics are unlikely to be sufficient. Successful metrics ought to (i) include both radiative and nonradiative climate forcings; (ii) reconcile disparities between biogeophysical and biogeochemical forcings, and (iii) acknowledge trade-offs between global and local climate benefits. We call for more coordinated research among terrestrial ecologists, resource managers, and coupled climate modelers to harmonize datasets, refine analytical techniques, and corroborate and validate metrics that are more amenable to analyses at the scale of an individual site or region. © 2015 John Wiley & Sons Ltd.
Adaptive ingredients against food spoilage in Japanese cuisine.
Ohtsubo, Yohsuke
2009-12-01
Billing and Sherman proposed the antimicrobial hypothesis to explain the worldwide spice use pattern. The present study explored whether two antimicrobial ingredients (i.e. spices and vinegar) are used in ways consistent with the antimicrobial hypothesis. Four specific predictions were tested: meat-based recipes would call for more spices/vinegar than vegetable-based recipes; summer recipes would call for more spices/vinegar than winter recipes; recipes in hotter regions would call for more spices/vinegar; and recipes including unheated ingredients would call for more spices/vinegar. Spice/vinegar use patterns were compiled from two types of traditional Japanese cookbooks. Dataset I included recipes provided by elderly Japanese housewives. Dataset II included recipes provided by experts in traditional Japanese foods. The analyses of Dataset I revealed that the vinegar use pattern conformed to the predictions. In contrast, analyses of Dataset II generally supported the predictions in terms of spices, but not vinegar.
Establishing a threshold for the number of missing days using 7 d pedometer data.
Kang, Minsoo; Hart, Peter D; Kim, Youngdeok
2012-11-01
The purpose of this study was to examine the threshold of the number of missing days of recovery using the individual information (II)-centered approach. Data for this study came from 86 participants, aged from 17 to 79 years old, who had 7 consecutive days of complete pedometer (Yamax SW 200) wear. Missing datasets (1 d through 5 d missing) were created by a SAS random process 10,000 times each. All missing values were replaced using the II-centered approach. A 7 d average was calculated for each dataset, including the complete dataset. Repeated measure ANOVA was used to determine the differences between 1 d through 5 d missing datasets and the complete dataset. Mean absolute percentage error (MAPE) was also computed. Mean (SD) daily step count for the complete 7 d dataset was 7979 (3084). Mean (SD) values for the 1 d through 5 d missing datasets were 8072 (3218), 8066 (3109), 7968 (3273), 7741 (3050) and 8314 (3529), respectively (p > 0.05). The lower MAPEs were estimated for 1 d missing (5.2%, 95% confidence interval (CI) 4.4-6.0) and 2 d missing (8.4%, 95% CI 7.0-9.8), while all others were greater than 10%. The results of this study show that the 1 d through 5 d missing datasets, with replaced values, were not significantly different from the complete dataset. Based on the MAPE results, it is not recommended to replace more than two days of missing step counts.
NASA Astrophysics Data System (ADS)
Tamminen, J.; Sofieva, V.; Kyrölä, E.; Laine, M.; Degenstein, D. A.; Bourassa, A. E.; Roth, C.; Zawada, D.; Weber, M.; Rozanov, A.; Rahpoe, N.; Stiller, G. P.; Laeng, A.; von Clarmann, T.; Walker, K. A.; Sheese, P.; Hubert, D.; Van Roozendael, M.; Zehner, C.; Damadeo, R. P.; Zawodny, J. M.; Kramarova, N. A.; Bhartia, P. K.
2017-12-01
We present a merged dataset of ozone profiles from several satellite instruments: SAGE II on ERBS, GOMOS, SCIAMACHY and MIPAS on Envisat, OSIRIS on Odin, ACE-FTS on SCISAT, and OMPS on Suomi-NPP. The merged dataset is created in the framework of European Space Agency Climate Change Initiative (Ozone_cci) with the aim of analyzing stratospheric ozone trends. For the merged dataset, we used the latest versions of the original ozone datasets. The datasets from the individual instruments have been extensively validated and inter-compared; only those datasets, which are in good agreement and do not exhibit significant drifts with respect to collocated ground-based observations and with respect to each other, are used for merging. The long-term SAGE-CCI-OMPS dataset is created by computation and merging of deseasonalized anomalies from individual instruments. The merged SAGE-CCI-OMPS dataset consists of deseasonalized anomalies of ozone in 10° latitude bands from 90°S to 90°N and from 10 to 50 km in steps of 1 km covering the period from October 1984 to July 2016. This newly created dataset is used for evaluating ozone trends in the stratosphere through multiple linear regression. Negative ozone trends in the upper stratosphere are observed before 1997 and positive trends are found after 1997. The upper stratospheric trends are statistically significant at mid-latitudes in the upper stratosphere and indicate ozone recovery, as expected from the decrease of stratospheric halogens that started in the middle of the 1990s.
Designing of interferon-gamma inducing MHC class-II binders
2013-01-01
Background The generation of interferon-gamma (IFN-γ) by MHC class II activated CD4+ T helper cells play a substantial contribution in the control of infections such as caused by Mycobacterium tuberculosis. In the past, numerous methods have been developed for predicting MHC class II binders that can activate T-helper cells. Best of author’s knowledge, no method has been developed so far that can predict the type of cytokine will be secreted by these MHC Class II binders or T-helper epitopes. In this study, an attempt has been made to predict the IFN-γ inducing peptides. The main dataset used in this study contains 3705 IFN-γ inducing and 6728 non-IFN-γ inducing MHC class II binders. Another dataset called IFNgOnly contains 4483 IFN-γ inducing epitopes and 2160 epitopes that induce other cytokine except IFN-γ. In addition we have alternate dataset that contains IFN-γ inducing and equal number of random peptides. Results It was observed that the peptide length, positional conservation of residues and amino acid composition affects IFN-γ inducing capabilities of these peptides. We identified the motifs in IFN-γ inducing binders/peptides using MERCI software. Our analysis indicates that IFN-γ inducing and non-inducing peptides can be discriminated using above features. We developed models for predicting IFN-γ inducing peptides using various approaches like machine learning technique, motifs-based search, and hybrid approach. Our best model based on the hybrid approach achieved maximum prediction accuracy of 82.10% with MCC of 0.62 on main dataset. We also developed hybrid model on IFNgOnly dataset and achieved maximum accuracy of 81.39% with 0.57 MCC. Conclusion Based on this study, we have developed a webserver for predicting i) IFN-γ inducing peptides, ii) virtual screening of peptide libraries and iii) identification of IFN-γ inducing regions in antigen (http://crdd.osdd.net/raghava/ifnepitope/). Reviewers This article was reviewed by Prof Kurt Blaser, Prof Laurence Eisenlohr and Dr Manabu Sugai. PMID:24304645
Control of Methane Production and Exchange in Northern Peatlands
NASA Technical Reports Server (NTRS)
Crill, Patrick
1997-01-01
This proposal has successfully supported studies that have developed unique long ten-n datasets of methane (CH4) emissions and carbon dioxide (CO2) exchange in order to quantify the controls on CH4 production and exchange especially the linkages to the carbon cycle in northern peatlands. The primary research site has been a small fen in southeastern New Hampshire where a unique multi-year data baseline of CH4 flux measurements was begun (with NASA funding) in 1989. The fen has also been instrumented for continuous hydrological and meteorological observations and year-round porewater sampling. Multiyear datasets of methane flux are very valuable and very rare. Datasets using the same sampling techniques at the same sites are the only way to assess the effect of the integrated ecosystem response to climatological variability. The research has had two basic objectives: 1. To quantify the effect of seasonal and interannual variability on CH4flux. 2. To examine process level controls on methane dynamics.
Characterizing Time Series Data Diversity for Wind Forecasting: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hodge, Brian S; Chartan, Erol Kevin; Feng, Cong
Wind forecasting plays an important role in integrating variable and uncertain wind power into the power grid. Various forecasting models have been developed to improve the forecasting accuracy. However, it is challenging to accurately compare the true forecasting performances from different methods and forecasters due to the lack of diversity in forecasting test datasets. This paper proposes a time series characteristic analysis approach to visualize and quantify wind time series diversity. The developed method first calculates six time series characteristic indices from various perspectives. Then the principal component analysis is performed to reduce the data dimension while preserving the importantmore » information. The diversity of the time series dataset is visualized by the geometric distribution of the newly constructed principal component space. The volume of the 3-dimensional (3D) convex polytope (or the length of 1D number axis, or the area of the 2D convex polygon) is used to quantify the time series data diversity. The method is tested with five datasets with various degrees of diversity.« less
DoE Phase II SBIR: Spectrally-Assisted Vehicle Tracking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villeneuve, Pierre V.
2013-02-28
The goal of this Phase II SBIR is to develop a prototype software package to demonstrate spectrally-aided vehicle tracking performance. The primary application is to demonstrate improved target vehicle tracking performance in complex environments where traditional spatial tracker systems may show reduced performance. Example scenarios in Figure 1 include a) the target vehicle obscured by a large structure for an extended period of time, or b), the target engaging in extreme maneuvers amongst other civilian vehicles. The target information derived from spatial processing is unable to differentiate between the green versus the red vehicle. Spectral signature exploitation enables comparison ofmore » new candidate targets with existing track signatures. The ambiguity in this confusing scenario is resolved by folding spectral analysis results into each target nomination and association processes. Figure 3 shows a number of example spectral signatures from a variety of natural and man-made materials. The work performed over the two-year effort was divided into three general areas: algorithm refinement, software prototype development, and prototype performance demonstration. The tasks performed under this Phase II to accomplish the program goals were as follows: 1. Acquire relevant vehicle target datasets to support prototype. 2. Refine algorithms for target spectral feature exploitation. 3. Implement a prototype multi-hypothesis target tracking software package. 4. Demonstrate and quantify tracking performance using relevant data.« less
Improvements to Passive Acoustic Tracking Methods for Marine Mammal Monitoring
2014-09-30
species of interest in these datasets are sperm whales , beaked whales , minke whales , and humpback whales . Most methods developed will be...datasets, automated detectors for fin and sei whales were developed, implemented and quantified. For the “stereotypical” calls produced by these animals...Objective 4: The matched filter detectors implemented for fin and sei whale calls are sufficient for the purposes of this project, with
A new method for detecting, quantifying and monitoring diffuse contamination
NASA Astrophysics Data System (ADS)
Fabian, Karl; Reimann, Clemens; de Caritat, Patrice
2017-04-01
A new method is presented for detecting and quantifying diffuse contamination at the regional to continental scale. It is based on the analysis of cumulative distribution functions (CDFs) in cumulative probability (CP) plots for spatially representative datasets, preferably containing >1000 samples. Simulations demonstrate how different types of contamination influence elemental CDFs of different sample media. Contrary to common belief, diffuse contamination does not result in exceedingly high element concentrations in regional- to continental-scale datasets. Instead it produces a distinctive shift of concentrations in the background distribution of the studied element resulting in a steeper data distribution in the CP plot. Via either (1) comparing the distribution of an element in top soil samples to the distribution of the same element in bottom soil samples from the same area, taking soil forming processes into consideration, or (2) comparing the distribution of the contaminating element (e.g., Pb) to that of an element with a geochemically comparable behaviour but no contamination source (e.g., Rb or Ba in case of Pb), the relative impact of diffuse contamination on the element concentration can be estimated either graphically in the CP plot via a best fit estimate or quantitatively via a Kolmogorov-Smirnov or Cramer vonMiese test. This is demonstrated using continental-scale geochemical soil datasets from Europe, Australia, and the USA, and a regional scale dataset from Norway. Several different datasets from Europe deliver comparable results at regional to continental scales. The method is also suitable for monitoring diffuse contamination based on the statistical distribution of repeat datasets at the continental scale in a cost-effective manner.
Krüger, Angela V; Jelier, Rob; Dzyubachyk, Oleh; Zimmerman, Timo; Meijering, Erik; Lehner, Ben
2015-02-15
Chromatin regulators are widely expressed proteins with diverse roles in gene expression, nuclear organization, cell cycle regulation, pluripotency, physiology and development, and are frequently mutated in human diseases such as cancer. Their inhibition often results in pleiotropic effects that are difficult to study using conventional approaches. We have developed a semi-automated nuclear tracking algorithm to quantify the divisions, movements and positions of all nuclei during the early development of Caenorhabditis elegans and have used it to systematically study the effects of inhibiting chromatin regulators. The resulting high dimensional datasets revealed that inhibition of multiple regulators, including F55A3.3 (encoding FACT subunit SUPT16H), lin-53 (RBBP4/7), rba-1 (RBBP4/7), set-16 (MLL2/3), hda-1 (HDAC1/2), swsn-7 (ARID2), and let-526 (ARID1A/1B) affected cell cycle progression and caused chromosome segregation defects. In contrast, inhibition of cir-1 (CIR1) accelerated cell division timing in specific cells of the AB lineage. The inhibition of RNA polymerase II also accelerated these division timings, suggesting that normal gene expression is required to delay cell cycle progression in multiple lineages in the early embryo. Quantitative analyses of the dataset suggested the existence of at least two functionally distinct SWI/SNF chromatin remodeling complex activities in the early embryo, and identified a redundant requirement for the egl-27 and lin-40 MTA orthologs in the development of endoderm and mesoderm lineages. Moreover, our dataset also revealed a characteristic rearrangement of chromatin to the nuclear periphery upon the inhibition of multiple general regulators of gene expression. Our systematic, comprehensive and quantitative datasets illustrate the power of single cell-resolution quantitative tracking and high dimensional phenotyping to investigate gene function. Furthermore, the results provide an overview of the functions of essential chromatin regulators during the early development of an animal. Copyright © 2014 Elsevier Inc. All rights reserved.
Atmospheric Science Data Center
2013-07-10
... channel due to uncertainty in the H2O spectroscopy in this spectral band Updated our estimation of the SAGE II water vapor channel filter location drift resulting in better agreement with more modern datasets ...
Jiang, Yueyang; Kim, John B.; Still, Christopher J.; Kerns, Becky K.; Kline, Jeffrey D.; Cunningham, Patrick G.
2018-01-01
Statistically downscaled climate data have been widely used to explore possible impacts of climate change in various fields of study. Although many studies have focused on characterizing differences in the downscaling methods, few studies have evaluated actual downscaled datasets being distributed publicly. Spatially focusing on the Pacific Northwest, we compare five statistically downscaled climate datasets distributed publicly in the US: ClimateNA, NASA NEX-DCP30, MACAv2-METDATA, MACAv2-LIVNEH and WorldClim. We compare the downscaled projections of climate change, and the associated observational data used as training data for downscaling. We map and quantify the variability among the datasets and characterize the spatio-temporal patterns of agreement and disagreement among the datasets. Pair-wise comparisons of datasets identify the coast and high-elevation areas as areas of disagreement for temperature. For precipitation, high-elevation areas, rainshadows and the dry, eastern portion of the study area have high dissimilarity among the datasets. By spatially aggregating the variability measures into watersheds, we develop guidance for selecting datasets within the Pacific Northwest climate change impact studies. PMID:29461513
Jiang, Yueyang; Kim, John B; Still, Christopher J; Kerns, Becky K; Kline, Jeffrey D; Cunningham, Patrick G
2018-02-20
Statistically downscaled climate data have been widely used to explore possible impacts of climate change in various fields of study. Although many studies have focused on characterizing differences in the downscaling methods, few studies have evaluated actual downscaled datasets being distributed publicly. Spatially focusing on the Pacific Northwest, we compare five statistically downscaled climate datasets distributed publicly in the US: ClimateNA, NASA NEX-DCP30, MACAv2-METDATA, MACAv2-LIVNEH and WorldClim. We compare the downscaled projections of climate change, and the associated observational data used as training data for downscaling. We map and quantify the variability among the datasets and characterize the spatio-temporal patterns of agreement and disagreement among the datasets. Pair-wise comparisons of datasets identify the coast and high-elevation areas as areas of disagreement for temperature. For precipitation, high-elevation areas, rainshadows and the dry, eastern portion of the study area have high dissimilarity among the datasets. By spatially aggregating the variability measures into watersheds, we develop guidance for selecting datasets within the Pacific Northwest climate change impact studies.
Investment Strategies Used as Spectroscopy of Financial Markets Reveal New Stylized Facts
Zhou, Wei-Xing; Mu, Guo-Hua; Chen, Wei; Sornette, Didier
2011-01-01
We propose a new set of stylized facts quantifying the structure of financial markets. The key idea is to study the combined structure of both investment strategies and prices in order to open a qualitatively new level of understanding of financial and economic markets. We study the detailed order flow on the Shenzhen Stock Exchange of China for the whole year of 2003. This enormous dataset allows us to compare (i) a closed national market (A-shares) with an international market (B-shares), (ii) individuals and institutions, and (iii) real traders to random strategies with respect to timing that share otherwise all other characteristics. We find in general that more trading results in smaller net return due to trading frictions, with the exception that the net return is independent of the trading frequency for A-share individual traders. We unveiled quantitative power laws with non-trivial exponents, that quantify the deterioration of performance with frequency and with holding period of the strategies used by traders. Random strategies are found to perform much better than real ones, both for winners and losers. Surprising large arbitrage opportunities exist, especially when using zero-intelligence strategies. This is a diagnostic of possible inefficiencies of these financial markets. PMID:21935403
Slowey, Aaron J.; Marvin-DiPasquale, Mark
2012-01-01
Conclusions - Despite their intrinsic variability, Hg/Au electrodes fabricated by hand can be used to quantify O2, S(−II), Fe(II), and Mn(II) without calibrating every electrode for every constituent of interest. The pilot ion method can achieve accuracies to within 20% or less, provided that the underlying principle—the independence of slope ratios—is demonstrated for all voltammetric techniques used, and effects of the physicochemical properties of the system on voltammetric signals are addressed through baseline subtraction.
NASA Technical Reports Server (NTRS)
Bi, Lei; Yang, Ping; Liu, Chao; Yi, Bingqi; Baum, Bryan A.; Van Diedenhoven, Bastiaan; Iwabuchi, Hironobu
2014-01-01
A fundamental problem in remote sensing and radiative transfer simulations involving ice clouds is the ability to compute accurate optical properties for individual ice particles. While relatively simple and intuitively appealing, the conventional geometric-optics method (CGOM) is used frequently for the solution of light scattering by ice crystals. Due to the approximations in the ray-tracing technique, the CGOM accuracy is not well quantified. The result is that the uncertainties are introduced that can impact many applications. Improvements in the Invariant Imbedding T-matrix method (II-TM) and the Improved Geometric-Optics Method (IGOM) provide a mechanism to assess the aforementioned uncertainties. The results computed by the II-TMþIGOM are considered as a benchmark because the IITM solves Maxwell's equations from first principles and is applicable to particle size parameters ranging into the domain at which the IGOM has reasonable accuracy. To assess the uncertainties with the CGOM in remote sensing and radiative transfer simulations, two independent optical property datasets of hexagonal columns are developed for sensitivity studies by using the CGOM and the II-TMþIGOM, respectively. Ice cloud bulk optical properties obtained from the two datasets are compared and subsequently applied to retrieve the optical thickness and effective diameter from Moderate Resolution Imaging Spectroradiometer (MODIS) measurements. Additionally, the bulk optical properties are tested in broadband radiative transfer (RT) simulations using the general circulation model (GCM) version of the Rapid Radiative Transfer Model (RRTMG) that is adopted in the National Center for Atmospheric Research (NCAR) Community Atmosphere Model (CAM, version 5.1). For MODIS retrievals, the mean bias of uncertainties of applying the CGOM in shortwave bands (0.86 and 2.13 micrometers) can be up to 5% in the optical thickness and as high as 20% in the effective diameter, depending on cloud optical thickness and effective diameter. In the MODIS infrared window bands centered at 8.5, 11, and 12 micrometers biases in the optical thickness and effective diameter are up to 12% and 10%, respectively. The CGOM-based simulation errors in ice cloud radiative forcing calculations are on the order of 10Wm(exp 2).
Detecting and Quantifying Forest Change: The Potential of Existing C- and X-Band Radar Datasets.
Tanase, Mihai A; Ismail, Ismail; Lowell, Kim; Karyanto, Oka; Santoro, Maurizio
2015-01-01
This paper evaluates the opportunity provided by global interferometric radar datasets for monitoring deforestation, degradation and forest regrowth in tropical and semi-arid environments. The paper describes an easy to implement method for detecting forest spatial changes and estimating their magnitude. The datasets were acquired within space-borne high spatial resolutions radar missions at near-global scales thus being significant for monitoring systems developed under the United Framework Convention on Climate Change (UNFCCC). The approach presented in this paper was tested in two areas located in Indonesia and Australia. Forest change estimation was based on differences between a reference dataset acquired in February 2000 by the Shuttle Radar Topography Mission (SRTM) and TanDEM-X mission (TDM) datasets acquired in 2011 and 2013. The synergy between SRTM and TDM datasets allowed not only identifying changes in forest extent but also estimating their magnitude with respect to the reference through variations in forest height.
NASA Astrophysics Data System (ADS)
Sun, L. Qing; Feng, Feng X.
2014-11-01
In this study, we first built and compared two different climate datasets for Wuling mountainous area in 2010, one of which considered topographical effects during the ANUSPLIN interpolation was referred as terrain-based climate dataset, while the other one did not was called ordinary climate dataset. Then, we quantified the topographical effects of climatic inputs on NPP estimation by inputting two different climate datasets to the same ecosystem model, the Boreal Ecosystem Productivity Simulator (BEPS), to evaluate the importance of considering relief when estimating NPP. Finally, we found the primary contributing variables to the topographical effects through a series of experiments given an overall accuracy of the model output for NPP. The results showed that: (1) The terrain-based climate dataset presented more reliable topographic information and had closer agreements with the station dataset than the ordinary climate dataset at successive time series of 365 days in terms of the daily mean values. (2) On average, ordinary climate dataset underestimated NPP by 12.5% compared with terrain-based climate dataset over the whole study area. (3) The primary climate variables contributing to the topographical effects of climatic inputs for Wuling mountainous area were temperatures, which suggest that it is necessary to correct temperature differences for estimating NPP accurately in such a complex terrain.
NASA Technical Reports Server (NTRS)
Chelton, Dudley B.; Schlax, Michael G.
1994-01-01
A formalism is presented for determining the wavenumber-frequency transfer function associated with an irregularly sampled multidimensional dataset. This transfer function reveals the filtering characteristics and aliasing patterns inherent in the sample design. In combination with information about the spectral characteristics of the signal, the transfer function can be used to quantify the spatial and temporal resolution capability of the dataset. Application of the method to idealized Geosat altimeter data (i.e., neglecting measurement errors and data dropouts) concludes that the Geosat orbit configuration is capable of resolving scales of about 3 deg in latitude and longitude by about 30 days.
Using Graph Indices for the Analysis and Comparison of Chemical Datasets.
Fourches, Denis; Tropsha, Alexander
2013-10-01
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Song, Y.; Gurney, K. R.; Rayner, P. J.; Asefi-Najafabady, S.
2012-12-01
High resolution quantification of global fossil fuel CO2 emissions has become essential in research aimed at understanding the global carbon cycle and supporting the verification of international agreements on greenhouse gas emission reductions. The Fossil Fuel Data Assimilation System (FFDAS) was used to estimate global fossil fuel carbon emissions at 0.25 degree from 1992 to 2010. FFDAS quantifies CO2 emissions based on areal population density, per capita economic activity, energy intensity and carbon intensity. A critical constraint to this system is the estimation of national-scale fossil fuel CO2 emissions disaggregated into economic sectors. Furthermore, prior uncertainty estimation is an important aspect of the FFDAS. Objective techniques to quantify uncertainty for the national emissions are essential. There are several institutional datasets that quantify national carbon emissions, including British Petroleum (BP), the International Energy Agency (IEA), the Energy Information Administration (EIA), and the Carbon Dioxide Information and Analysis Center (CDIAC). These four datasets have been "harmonized" by Jordan Macknick for inter-comparison purposes (Macknick, Carbon Management, 2011). The harmonization attempted to generate consistency among the different institutional datasets via a variety of techniques such as reclassifying into consistent emitting categories, recalculating based on consistent emission factors, and converting into consistent units. These harmonized data form the basis of our uncertainty estimation. We summarized the maximum, minimum and mean national carbon emissions for all the datasets from 1992 to 2010. We calculated key statistics highlighting the remaining differences among the harmonized datasets. We combine the span (max - min) of datasets for each country and year with the standard deviation of the national spans over time. We utilize the economic sectoral definitions from IEA to disaggregate the national total emission into specific sectors required by FFDAS. Our results indicated that although the harmonization performed by Macknick generates better agreement among datasets, significant differences remain at national total level. For example, the CO2 emission span for most countries range from 10% to 12%; BP is generally the highest of the four datasets while IEA is typically the lowest; The US and China had the highest absolute span values but lower percentage span values compared to other countries. However, the US and China make up nearly one-half of the total global absolute span quantity. The absolute span value for the summation of national differences approaches 1 GtC/year in 2007, almost one-half of the biological "missing sink". The span value is used as a potential bias in a recalculation of global and regional carbon budgets to highlight the importance of fossil fuel CO2 emissions in calculating the missing sink. We conclude that if the harmonized span represents potential bias, calculations of the missing sink through forward budget or inverse approaches may be biased by nearly a factor of two.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wendelberger, Laura Jean
In large datasets, it is time consuming or even impossible to pick out interesting images. Our proposed solution is to find statistics to quantify the information in each image and use those to identify and pick out images of interest.
Quantifying scaling effects on satellite-derived forest area estimates for the conterminous USA
Daolan Zheng; L.S. Heath; M.J. Ducey; J.E. Smith
2009-01-01
We quantified the scaling effects on forest area estimates for the conterminous USA using regression analysis and the National Land Cover Dataset 30m satellite-derived maps in 2001 and 1992. The original data were aggregated to: (1) broad cover types (forest vs. non-forest); and (2) coarser resolutions (1km and 10 km). Standard errors of the model estimates were 2.3%...
PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.
Gan, Ruei-Chi; Chen, Ting-Wen; Wu, Timothy H; Huang, Po-Jung; Lee, Chi-Ching; Yeh, Yuan-Ming; Chiu, Cheng-Hsun; Huang, Hsien-Da; Tang, Petrus
2016-12-22
Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .
Concerted control of Escherichia coli cell division
Osella, Matteo; Nugent, Eileen; Cosentino Lagomarsino, Marco
2014-01-01
The coordination of cell growth and division is a long-standing problem in biology. Focusing on Escherichia coli in steady growth, we quantify cell division control using a stochastic model, by inferring the division rate as a function of the observable parameters from large empirical datasets of dividing cells. We find that (i) cells have mechanisms to control their size, (ii) size control is effected by changes in the doubling time, rather than in the single-cell elongation rate, (iii) the division rate increases steeply with cell size for small cells, and saturates for larger cells. Importantly, (iv) the current size is not the only variable controlling cell division, but the time spent in the cell cycle appears to play a role, and (v) common tests of cell size control may fail when such concerted control is in place. Our analysis illustrates the mechanisms of cell division control in E. coli. The phenomenological framework presented is sufficiently general to be widely applicable and opens the way for rigorous tests of molecular cell-cycle models. PMID:24550446
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle
Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance
Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle
2014-09-29
Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Molecular Subtypes of Glioblastoma Are Relevant to Lower Grade Glioma
Sloan, Andrew E.; Chen, Yanwen; Brat, Daniel J.; O’Neill, Brian Patrick; de Groot, John; Yust-Katz, Shlomit; Yung, Wai-Kwan Alfred; Cohen, Mark L.; Aldape, Kenneth D.; Rosenfeld, Steven; Verhaak, Roeland G. W.; Barnholtz-Sloan, Jill S.
2014-01-01
Background Gliomas are the most common primary malignant brain tumors in adults with great heterogeneity in histopathology and clinical course. The intent was to evaluate the relevance of known glioblastoma (GBM) expression and methylation based subtypes to grade II and III gliomas (ie. lower grade gliomas). Methods Gene expression array, single nucleotide polymorphism (SNP) array and clinical data were obtained for 228 GBMs and 176 grade II/II gliomas (GII/III) from the publically available Rembrandt dataset. Two additional datasets with IDH1 mutation status were utilized as validation datasets (one publicly available dataset and one newly generated dataset from MD Anderson). Unsupervised clustering was performed and compared to gene expression subtypes assigned using the Verhaak et al 840-gene classifier. The glioma-CpG Island Methylator Phenotype (G-CIMP) was assigned using prediction models by Fine et al. Results Unsupervised clustering by gene expression aligned with the Verhaak 840-gene subtype group assignments. GII/IIIs were preferentially assigned to the proneural subtype with IDH1 mutation and G-CIMP. GBMs were evenly distributed among the four subtypes. Proneural, IDH1 mutant, G-CIMP GII/III s had significantly better survival than other molecular subtypes. Only 6% of GBMs were proneural and had either IDH1 mutation or G-CIMP but these tumors had significantly better survival than other GBMs. Copy number changes in chromosomes 1p and 19q were associated with GII/IIIs, while these changes in CDKN2A, PTEN and EGFR were more commonly associated with GBMs. Conclusions GBM gene-expression and methylation based subtypes are relevant for GII/III s and associate with overall survival differences. A better understanding of the association between these subtypes and GII/IIIs could further knowledge regarding prognosis and mechanisms of glioma progression. PMID:24614622
Zepeda-Mendoza, Marie Lisandra; Bohmann, Kristine; Carmona Baez, Aldo; Gilbert, M Thomas P
2016-05-03
DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.
Moore, A. C.; DeLucca, J. F.; Elliott, D. M.; Burris, D. L.
2016-01-01
This paper describes a new method, based on a recent analytical model (Hertzian biphasic theory (HBT)), to simultaneously quantify cartilage contact modulus, tension modulus, and permeability. Standard Hertzian creep measurements were performed on 13 osteochondral samples from three mature bovine stifles. Each creep dataset was fit for material properties using HBT. A subset of the dataset (N = 4) was also fit using Oyen's method and FEBio, an open-source finite element package designed for soft tissue mechanics. The HBT method demonstrated statistically significant sensitivity to differences between cartilage from the tibial plateau and cartilage from the femoral condyle. Based on the four samples used for comparison, no statistically significant differences were detected between properties from the HBT and FEBio methods. While the finite element method is considered the gold standard for analyzing this type of contact, the expertise and time required to setup and solve can be prohibitive, especially for large datasets. The HBT method agreed quantitatively with FEBio but also offers ease of use by nonexperts, rapid solutions, and exceptional fit quality (R2 = 0.999 ± 0.001, N = 13). PMID:27536012
Quantifying Uncertainties in Land Surface Microwave Emissivity Retrievals
NASA Technical Reports Server (NTRS)
Tian, Yudong; Peters-Lidard, Christa D.; Harrison, Kenneth W.; Prigent, Catherine; Norouzi, Hamidreza; Aires, Filipe; Boukabara, Sid-Ahmed; Furuzawa, Fumie A.; Masunaga, Hirohiko
2012-01-01
Uncertainties in the retrievals of microwave land surface emissivities were quantified over two types of land surfaces: desert and tropical rainforest. Retrievals from satellite-based microwave imagers, including SSM/I, TMI and AMSR-E, were studied. Our results show that there are considerable differences between the retrievals from different sensors and from different groups over these two land surface types. In addition, the mean emissivity values show different spectral behavior across the frequencies. With the true emissivity assumed largely constant over both of the two sites throughout the study period, the differences are largely attributed to the systematic and random errors in the retrievals. Generally these retrievals tend to agree better at lower frequencies than at higher ones, with systematic differences ranging 14% (312 K) over desert and 17% (320 K) over rainforest. The random errors within each retrieval dataset are in the range of 0.52% (26 K). In particular, at 85.0/89.0 GHz, there are very large differences between the different retrieval datasets, and within each retrieval dataset itself. Further investigation reveals that these differences are mostly likely caused by rain/cloud contamination, which can lead to random errors up to 1017 K under the most severe conditions.
Comparison of recent SnIa datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S., E-mail: jbueno@cc.uoi.gr, E-mail: nesseris@nbi.ku.dk, E-mail: leandros@uoi.gr
2009-11-01
We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w{sub 0}+w{sub 1}(1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U),more » (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w{sub 0},w{sub 1}) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample.« less
An extensive dataset of eye movements during viewing of complex images.
Wilming, Niklas; Onat, Selim; Ossandón, José P; Açık, Alper; Kietzmann, Tim C; Kaspar, Kai; Gameiro, Ricardo R; Vormberg, Alexandra; König, Peter
2017-01-31
We present a dataset of free-viewing eye-movement recordings that contains more than 2.7 million fixation locations from 949 observers on more than 1000 images from different categories. This dataset aggregates and harmonizes data from 23 different studies conducted at the Institute of Cognitive Science at Osnabrück University and the University Medical Center in Hamburg-Eppendorf. Trained personnel recorded all studies under standard conditions with homogeneous equipment and parameter settings. All studies allowed for free eye-movements, and differed in the age range of participants (~7-80 years), stimulus sizes, stimulus modifications (phase scrambled, spatial filtering, mirrored), and stimuli categories (natural and urban scenes, web sites, fractal, pink-noise, and ambiguous artistic figures). The size and variability of viewing behavior within this dataset presents a strong opportunity for evaluating and comparing computational models of overt attention, and furthermore, for thoroughly quantifying strategies of viewing behavior. This also makes the dataset a good starting point for investigating whether viewing strategies change in patient groups.
Genome-wide assessment of differential translations with ribosome profiling data.
Xiao, Zhengtao; Zou, Qin; Liu, Yu; Yang, Xuerui
2016-04-04
The closely regulated process of mRNA translation is crucial for precise control of protein abundance and quality. Ribosome profiling, a combination of ribosome foot-printing and RNA deep sequencing, has been used in a large variety of studies to quantify genome-wide mRNA translation. Here, we developed Xtail, an analysis pipeline tailored for ribosome profiling data that comprehensively and accurately identifies differentially translated genes in pairwise comparisons. Applied on simulated and real datasets, Xtail exhibits high sensitivity with minimal false-positive rates, outperforming existing methods in the accuracy of quantifying differential translations. With published ribosome profiling datasets, Xtail does not only reveal differentially translated genes that make biological sense, but also uncovers new events of differential translation in human cancer cells on mTOR signalling perturbation and in human primary macrophages on interferon gamma (IFN-γ) treatment. This demonstrates the value of Xtail in providing novel insights into the molecular mechanisms that involve translational dysregulations.
High-Pressure Viewports for Infrared Systems. Phase 2. Chalcogenide Glass
1982-01-28
radiation. the permanent microcopic dipole domain undergoes spontaneous polarization, which results in the buildup of charge on the opposite surface of the...are quantified in table 3. Materials that have been useful in the 8-12 prm region are all II-VI compounds , ie prepared from elements of group II and...Polycrystalline II-VI compounds . 18 ’T 6" ý4 04 MELT-FORMED GLASS The properties of the melt-formed glasses are quantified in table 4. Only glasses that have been
3D shape recovery from image focus using gray level co-occurrence matrix
NASA Astrophysics Data System (ADS)
Mahmood, Fahad; Munir, Umair; Mehmood, Fahad; Iqbal, Javaid
2018-04-01
Recovering a precise and accurate 3-D shape of the target object utilizing robust 3-D shape recovery algorithm is an ultimate objective of computer vision community. Focus measure algorithm plays an important role in this architecture which convert the color values of each pixel of the acquired 2-D image dataset into corresponding focus values. After convolving the focus measure filter with the input 2-D image dataset, a 3-D shape recovery approach is applied which will recover the depth map. In this document, we are concerned with proposing Gray Level Co-occurrence Matrix along with its statistical features for computing the focus information of the image dataset. The Gray Level Co-occurrence Matrix quantifies the texture present in the image using statistical features and then applies joint probability distributive function of the gray level pairs of the input image. Finally, we quantify the focus value of the input image using Gaussian Mixture Model. Due to its little computational complexity, sharp focus measure curve, robust to random noise sources and accuracy, it is considered as superior alternative to most of recently proposed 3-D shape recovery approaches. This algorithm is deeply investigated on real image sequences and synthetic image dataset. The efficiency of the proposed scheme is also compared with the state of art 3-D shape recovery approaches. Finally, by means of two global statistical measures, root mean square error and correlation, we claim that this approach -in spite of simplicity generates accurate results.
Appropriate use of the increment entropy for electrophysiological time series.
Liu, Xiaofeng; Wang, Xue; Zhou, Xu; Jiang, Aimin
2018-04-01
The increment entropy (IncrEn) is a new measure for quantifying the complexity of a time series. There are three critical parameters in the IncrEn calculation: N (length of the time series), m (dimensionality), and q (quantifying precision). However, the question of how to choose the most appropriate combination of IncrEn parameters for short datasets has not been extensively explored. The purpose of this research was to provide guidance on choosing suitable IncrEn parameters for short datasets by exploring the effects of varying the parameter values. We used simulated data, epileptic EEG data and cardiac interbeat (RR) data to investigate the effects of the parameters on the calculated IncrEn values. The results reveal that IncrEn is sensitive to changes in m, q and N for short datasets (N≤500). However, IncrEn reaches stability at a data length of N=1000 with m=2 and q=2, and for short datasets (N=100), it shows better relative consistency with 2≤m≤6 and 2≤q≤8 We suggest that the value of N should be no less than 100. To enable a clear distinction between different classes based on IncrEn, we recommend that m and q should take values between 2 and 4. With appropriate parameters, IncrEn enables the effective detection of complexity variations in physiological time series, suggesting that IncrEn should be useful for the analysis of physiological time series in clinical applications. Copyright © 2018 Elsevier Ltd. All rights reserved.
Boareto, Marcelo; Cesar, Jonatas; Leite, Vitor B P; Caticha, Nestor
2015-01-01
We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.
X-ray computed tomography library of shark anatomy and lower jaw surface models.
Kamminga, Pepijn; De Bruin, Paul W; Geleijns, Jacob; Brazeau, Martin D
2017-04-11
The cranial diversity of sharks reflects disparate biomechanical adaptations to feeding. In order to be able to investigate and better understand the ecomorphology of extant shark feeding systems, we created a x-ray computed tomography (CT) library of shark cranial anatomy with three-dimensional (3D) lower jaw reconstructions. This is used to examine and quantify lower jaw disparity in extant shark species in a separate study. The library is divided in a dataset comprised of medical CT scans of 122 sharks (Selachimorpha, Chondrichthyes) representing 73 extant species, including digitized morphology of entire shark specimens. This CT dataset and additional data provided by other researchers was used to reconstruct a second dataset containing 3D models of the left lower jaw for 153 individuals representing 94 extant shark species. These datasets form an extensive anatomical record of shark skeletal anatomy, necessary for comparative morphological, biomechanical, ecological and phylogenetic studies.
Identifying and Quantifying the Intermediate Processes during Nitrate-Dependent Iron(II) Oxidation.
Jamieson, James; Prommer, Henning; Kaksonen, Anna H; Sun, Jing; Siade, Adam J; Yusov, Anna; Bostick, Benjamin
2018-05-15
Microbially driven nitrate-dependent iron (Fe) oxidation (NDFO) in subsurface environments has been intensively studied. However, the extent to which Fe(II) oxidation is biologically catalyzed remains unclear because no neutrophilic iron-oxidizing and nitrate reducing autotroph has been isolated to confirm the existence of an enzymatic pathway. While mixotrophic NDFO bacteria have been isolated, understanding the process is complicated by simultaneous abiotic oxidation due to nitrite produced during denitrification. In this study, the relative contributions of biotic and abiotic processes during NDFO were quantified through the compilation and model-based interpretation of previously published experimental data. The kinetics of chemical denitrification by Fe(II) (chemodenitrification) were assessed, and compelling evidence was found for the importance of organic ligands, specifically exopolymeric substances secreted by bacteria, in enhancing abiotic oxidation of Fe(II). However, nitrite alone could not explain the observed magnitude of Fe(II) oxidation, with 60-75% of overall Fe(II) oxidation attributed to an enzymatic pathway for investigated strains: Acidovorax ( A.) strain BoFeN1, 2AN, A. ebreus strain TPSY, Paracoccus denitrificans Pd 1222, and Pseudogulbenkiania sp. strain 2002. By rigorously quantifying the intermediate processes, this study eliminated the potential for abiotic Fe(II) oxidation to be exclusively responsible for NDFO and verified the key contribution from an additional, biological Fe(II) oxidation process catalyzed by NDFO bacteria.
3D Feature Extraction for Unstructured Grids
NASA Technical Reports Server (NTRS)
Silver, Deborah
1996-01-01
Visualization techniques provide tools that help scientists identify observed phenomena in scientific simulation. To be useful, these tools must allow the user to extract regions, classify and visualize them, abstract them for simplified representations, and track their evolution. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This article explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and those from Finite Element Analysis.
Application Programming in AWIPS II
NASA Technical Reports Server (NTRS)
Smit, Matt; McGrath, Kevin; Burks, Jason; Carcione, Brian
2012-01-01
Since its inception almost 8 years ago, NASA's Short-term Prediction Research and Transition (SPoRT) Center has integrated NASA data into the National Weather Service's decision support system (DSS) the Advanced Weather Interactive Processing System (AWIPS). SPoRT has, in some instances, had to shape and transform data sets into various formats and manipulate configurations to visualize them in AWIPS. With the advent of the next generation of DSS, AWIPS II, developers will be able to develop their own plugins to handle any type of data. Raytheon is developing AWIPS II to be a more extensible package written mainly in Java, and built around a Service Oriented Architecture. A plugin architecture will allow users to install their own code modules, and (if all the rules have been properly followed) they will work hand-in-hand with AWIPS II as if it were originally built in. Users can bring in new datasets with existing plugins, tweak plugins to handle a nuance or desired new functionality, or create an entirely new visualization layout for a new dataset. SPoRT is developing plugins to ensure its existing NASA data will be ready for AWIPS II when it is delivered, and to prepare for the future of new instruments on upcoming satellites.
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
Stoker, Jason M.; Tyler, Dean J.; Turnipseed, D. Phil; Van Wilson, K.; Oimoen, Michael J.
2009-01-01
Hurricane Katrina was one of the largest natural disasters in U.S. history. Due to the sheer size of the affected areas, an unprecedented regional analysis at very high resolution and accuracy was needed to properly quantify and understand the effects of the hurricane and the storm tide. Many disparate sources of lidar data were acquired and processed for varying environmental reasons by pre- and post-Katrina projects. The datasets were in several formats and projections and were processed to varying phases of completion, and as a result the task of producing a seamless digital elevation dataset required a high level of coordination, research, and revision. To create a seamless digital elevation dataset, many technical issues had to be resolved before producing the desired 1/9-arc-second (3meter) grid needed as the map base for projecting the Katrina peak storm tide throughout the affected coastal region. This report presents the methodology that was developed to construct seamless digital elevation datasets from multipurpose, multi-use, and disparate lidar datasets, and describes an easily accessible Web application for viewing the maximum storm tide caused by Hurricane Katrina in southeastern Louisiana, Mississippi, and Alabama.
Quantifying Uncertainties in Land-Surface Microwave Emissivity Retrievals
NASA Technical Reports Server (NTRS)
Tian, Yudong; Peters-Lidard, Christa D.; Harrison, Kenneth W.; Prigent, Catherine; Norouzi, Hamidreza; Aires, Filipe; Boukabara, Sid-Ahmed; Furuzawa, Fumie A.; Masunaga, Hirohiko
2013-01-01
Uncertainties in the retrievals of microwaveland-surface emissivities are quantified over two types of land surfaces: desert and tropical rainforest. Retrievals from satellite-based microwave imagers, including the Special Sensor Microwave Imager, the Tropical Rainfall Measuring Mission Microwave Imager, and the Advanced Microwave Scanning Radiometer for Earth Observing System, are studied. Our results show that there are considerable differences between the retrievals from different sensors and from different groups over these two land-surface types. In addition, the mean emissivity values show different spectral behavior across the frequencies. With the true emissivity assumed largely constant over both of the two sites throughout the study period, the differences are largely attributed to the systematic and random errors inthe retrievals. Generally, these retrievals tend to agree better at lower frequencies than at higher ones, with systematic differences ranging 1%-4% (3-12 K) over desert and 1%-7% (3-20 K) over rainforest. The random errors within each retrieval dataset are in the range of 0.5%-2% (2-6 K). In particular, at 85.5/89.0 GHz, there are very large differences between the different retrieval datasets, and within each retrieval dataset itself. Further investigation reveals that these differences are most likely caused by rain/cloud contamination, which can lead to random errors up to 10-17 K under the most severe conditions.
AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku
2014-05-27
The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.
NASA Astrophysics Data System (ADS)
Lamarche, G.; Le Gonidec, Y.; Lucieer, V.; Lurton, X.; Greinert, J.; Dupré, S.; Nau, A.; Heffron, E.; Roche, M.; Ladroit, Y.; Urban, P.
2017-12-01
Detecting liquid, solid or gaseous features in the ocean is generating considerable interest in the geoscience community, because of their potentially high economic values (oil & gas, mining), their significance for environmental management (oil/gas leakage, biodiversity mapping, greenhouse gas monitoring) as well as their potential cultural and traditional values (food, freshwater). Enhancing people's capability to quantify and manage the natural capital present in the ocean water goes hand in hand with the development of marine acoustic technology, as marine echosounders provide the most reliable and technologically advanced means to develop quantitative studies of water column backscatter data. This is not developed to its full capability because (i) of the complexity of the physics involved in relation to the constantly changing marine environment, and (ii) the rapid technological evolution of high resolution multibeam echosounder (MBES) water-column imaging systems. The Water Column Imaging Working Group is working on a series of multibeam echosounder (MBES) water column datasets acquired in a variety of environments, using a range of frequencies, and imaging a number of water-column features such as gas seeps, oil leaks, suspended particulate matter, vegetation and freshwater springs. Access to data from different acoustic frequencies and ocean dynamics enables us to discuss and test multifrequency approaches which is the most promising means to develop a quantitative analysis of the physical properties of acoustic scatterers, providing rigorous cross calibration of the acoustic devices. In addition, high redundancy of multibeam data, such as is available for some datasets, will allow us to develop data processing techniques, leading to quantitative estimates of water column gas seeps. Each of the datasets has supporting ground-truthing data (underwater videos and photos, physical oceanography measurements) which provide information on the origin and chemistry of the seep content. This is of first importance when assessing the physical properties of water column scatterers from backscatter acoustic measurement.
Fluid Lensing based Machine Learning for Augmenting Earth Science Coral Datasets
NASA Astrophysics Data System (ADS)
Li, A.; Instrella, R.; Chirayath, V.
2016-12-01
Recently, there has been increased interest in monitoring the effects of climate change upon the world's marine ecosystems, particularly coral reefs. These delicate ecosystems are especially threatened due to their sensitivity to ocean warming and acidification, leading to unprecedented levels of coral bleaching and die-off in recent years. However, current global aquatic remote sensing datasets are unable to quantify changes in marine ecosystems at spatial and temporal scales relevant to their growth. In this project, we employ various supervised and unsupervised machine learning algorithms to augment existing datasets from NASA's Earth Observing System (EOS), using high resolution airborne imagery. This method utilizes NASA's ongoing airborne campaigns as well as its spaceborne assets to collect remote sensing data over these afflicted regions, and employs Fluid Lensing algorithms to resolve optical distortions caused by the fluid surface, producing cm-scale resolution imagery of these diverse ecosystems from airborne platforms. Support Vector Machines (SVMs) and K-mean clustering methods were applied to satellite imagery at 0.5m resolution, producing segmented maps classifying coral based on percent cover and morphology. Compared to a previous study using multidimensional maximum a posteriori (MAP) estimation to separate these features in high resolution airborne datasets, SVMs are able to achieve above 75% accuracy when augmented with existing MAP estimates, while unsupervised methods such as K-means achieve roughly 68% accuracy, verified by manually segmented reference data provided by a marine biologist. This effort thus has broad applications for coastal remote sensing, by helping marine biologists quantify behavioral trends spanning large areas and over longer timescales, and to assess the health of coral reefs worldwide.
Generalized relative entropies in the classical limit
NASA Astrophysics Data System (ADS)
Kowalski, A. M.; Martin, M. T.; Plastino, A.
2015-03-01
Our protagonists are (i) the Cressie-Read family of divergences (characterized by the parameter γ), (ii) Tsallis' generalized relative entropies (characterized by the q one), and, as a particular instance of both, (iii) the Kullback-Leibler (KL) relative entropy. In their normalized versions, we ascertain the equivalence between (i) and (ii). Additionally, we employ these three entropic quantifiers in order to provide a statistical investigation of the classical limit of a semiclassical model, whose properties are well known from a purely dynamic viewpoint. This places us in a good position to assess the appropriateness of our statistical quantifiers for describing involved systems. We compare the behaviour of (i), (ii), and (iii) as one proceeds towards the classical limit. We determine optimal ranges for γ and/or q. It is shown the Tsallis-quantifier is better than KL's for 1.5 < q < 2.5.
NASA Astrophysics Data System (ADS)
Kushwaha, Alok Kumar Singh; Srivastava, Rajeev
2015-09-01
An efficient view invariant framework for the recognition of human activities from an input video sequence is presented. The proposed framework is composed of three consecutive modules: (i) detect and locate people by background subtraction, (ii) view invariant spatiotemporal template creation for different activities, (iii) and finally, template matching is performed for view invariant activity recognition. The foreground objects present in a scene are extracted using change detection and background modeling. The view invariant templates are constructed using the motion history images and object shape information for different human activities in a video sequence. For matching the spatiotemporal templates for various activities, the moment invariants and Mahalanobis distance are used. The proposed approach is tested successfully on our own viewpoint dataset, KTH action recognition dataset, i3DPost multiview dataset, MSR viewpoint action dataset, VideoWeb multiview dataset, and WVU multiview human action recognition dataset. From the experimental results and analysis over the chosen datasets, it is observed that the proposed framework is robust, flexible, and efficient with respect to multiple views activity recognition, scale, and phase variations.
Dataset definition for CMS operations and physics analyses
NASA Astrophysics Data System (ADS)
Franzoni, Giovanni; Compact Muon Solenoid Collaboration
2016-04-01
Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.
Modeling the topography of shallow braided rivers using Structure-from-Motion photogrammetry
NASA Astrophysics Data System (ADS)
Javernick, L.; Brasington, J.; Caruso, B.
2014-05-01
Recent advances in computer vision and image analysis have led to the development of a novel, fully automated photogrammetric method to generate dense 3d point cloud data. This approach, termed Structure-from-Motion or SfM, requires only limited ground-control and is ideally suited to imagery obtained from low-cost, non-metric cameras acquired either at close-range or using aerial platforms. Terrain models generated using SfM have begun to emerge recently and with a growing spectrum of software now available, there is an urgent need to provide a robust quality assessment of the data products generated using standard field and computational workflows. To address this demand, we present a detailed error analysis of sub-meter resolution terrain models of two contiguous reaches (1.6 and 1.7 km long) of the braided Ahuriri River, New Zealand, generated using SfM. A six stage methodology is described, involving: i) hand-held image acquisition from an aerial platform, ii) 3d point cloud extraction modeling using Agisoft PhotoScan, iii) georeferencing on a redundant network of GPS-surveyed ground-control points, iv) point cloud filtering to reduce computational demand as well as reduce vegetation noise, v) optical bathymetric modeling of inundated areas; and vi) data fusion and surface modeling to generate sub-meter raster terrain models. Bootstrapped geo-registration as well as extensive distributed GPS and sonar-based bathymetric check-data were used to quantify the quality of the models generated after each processing step. The results obtained provide the first quantified analysis of SfM applied to model the complex terrain of a braided river. Results indicate that geo-registration errors of 0.04 m (planar) and 0.10 m (elevation) and vertical surface errors of 0.10 m in non-vegetation areas can be achieved from a dataset of photographs taken at 600 m and 800 m above the ground level. These encouraging results suggest that this low-cost, logistically simple method can deliver high quality terrain datasets competitive with those obtained with significantly more expensive laser scanning, and suitable for geomorphic change detection and hydrodynamic modeling.
Agricultural land use alters the seasonality and magnitude of stream metabolism
Streams are active processors of organic carbon; however, spatial and temporal variation in the rates and controls on metabolism are not well quantified in streams draining intensively-farmed landscapes. We present a comprehensive dataset of gross primary production (GPP) and ec...
Genome-wide assessment of differential translations with ribosome profiling data
Xiao, Zhengtao; Zou, Qin; Liu, Yu; Yang, Xuerui
2016-01-01
The closely regulated process of mRNA translation is crucial for precise control of protein abundance and quality. Ribosome profiling, a combination of ribosome foot-printing and RNA deep sequencing, has been used in a large variety of studies to quantify genome-wide mRNA translation. Here, we developed Xtail, an analysis pipeline tailored for ribosome profiling data that comprehensively and accurately identifies differentially translated genes in pairwise comparisons. Applied on simulated and real datasets, Xtail exhibits high sensitivity with minimal false-positive rates, outperforming existing methods in the accuracy of quantifying differential translations. With published ribosome profiling datasets, Xtail does not only reveal differentially translated genes that make biological sense, but also uncovers new events of differential translation in human cancer cells on mTOR signalling perturbation and in human primary macrophages on interferon gamma (IFN-γ) treatment. This demonstrates the value of Xtail in providing novel insights into the molecular mechanisms that involve translational dysregulations. PMID:27041671
NASA Astrophysics Data System (ADS)
Rehfeld, Kira; Goswami, Bedartha; Marwan, Norbert; Breitenbach, Sebastian; Kurths, Jürgen
2013-04-01
Statistical analysis of dependencies amongst paleoclimate data helps to infer on the climatic processes they reflect. Three key challenges have to be addressed, however: the datasets are heterogeneous in time (i) and space (ii), and furthermore time itself is a variable that needs to be reconstructed, which (iii) introduces additional uncertainties. To address these issues in a flexible way we developed the paleoclimate network framework, inspired by the increasing application of complex networks in climate research. Nodes in the paleoclimate network represent a paleoclimate archive, and an associated time series. Links between these nodes are assigned, if these time series are significantly similar. Therefore, the base of the paleoclimate network is formed by linear and nonlinear estimators for Pearson correlation, mutual information and event synchronization, which quantify similarity from irregularly sampled time series. Age uncertainties are propagated into the final network analysis using time series ensembles which reflect the uncertainty. We discuss how spatial heterogeneity influences the results obtained from network measures, and demonstrate the power of the approach by inferring teleconnection variability of the Asian summer monsoon for the past 1000 years.
Funk, Chris; Peterson, Pete; Landsfeld, Martin; Pedreros, Diego; Verdin, James; Shukla, Shraddhanand; Husak, Gregory; Rowland, James; Harrison, Laura; Hoell, Andrew; Michaelsen, Joel
2015-01-01
The Climate Hazards group Infrared Precipitation with Stations (CHIRPS) dataset builds on previous approaches to ‘smart’ interpolation techniques and high resolution, long period of record precipitation estimates based on infrared Cold Cloud Duration (CCD) observations. The algorithm i) is built around a 0.05° climatology that incorporates satellite information to represent sparsely gauged locations, ii) incorporates daily, pentadal, and monthly 1981-present 0.05° CCD-based precipitation estimates, iii) blends station data to produce a preliminary information product with a latency of about 2 days and a final product with an average latency of about 3 weeks, and iv) uses a novel blending procedure incorporating the spatial correlation structure of CCD-estimates to assign interpolation weights. We present the CHIRPS algorithm, global and regional validation results, and show how CHIRPS can be used to quantify the hydrologic impacts of decreasing precipitation and rising air temperatures in the Greater Horn of Africa. Using the Variable Infiltration Capacity model, we show that CHIRPS can support effective hydrologic forecasts and trend analyses in southeastern Ethiopia.
Saini, Sanjay; Zakaria, Nordin; Rambli, Dayang Rohaya Awang; Sulaiman, Suziah
2015-01-01
The high-dimensional search space involved in markerless full-body articulated human motion tracking from multiple-views video sequences has led to a number of solutions based on metaheuristics, the most recent form of which is Particle Swarm Optimization (PSO). However, the classical PSO suffers from premature convergence and it is trapped easily into local optima, significantly affecting the tracking accuracy. To overcome these drawbacks, we have developed a method for the problem based on Hierarchical Multi-Swarm Cooperative Particle Swarm Optimization (H-MCPSO). The tracking problem is formulated as a non-linear 34-dimensional function optimization problem where the fitness function quantifies the difference between the observed image and a projection of the model configuration. Both the silhouette and edge likelihoods are used in the fitness function. Experiments using Brown and HumanEva-II dataset demonstrated that H-MCPSO performance is better than two leading alternative approaches-Annealed Particle Filter (APF) and Hierarchical Particle Swarm Optimization (HPSO). Further, the proposed tracking method is capable of automatic initialization and self-recovery from temporary tracking failures. Comprehensive experimental results are presented to support the claims.
2016-01-01
One of the most celebrated findings in complex systems in the last decade is that different indexes y (e.g. patents) scale nonlinearly with the population x of the cities in which they appear, i.e. y∼xβ,β≠1. More recently, the generality of this finding has been questioned in studies that used new databases and different definitions of city boundaries. In this paper, we investigate the existence of nonlinear scaling, using a probabilistic framework in which fluctuations are accounted for explicitly. In particular, we show that this allows not only to (i) estimate β and confidence intervals, but also to (ii) quantify the evidence in favour of β≠1 and (iii) test the hypothesis that the observations are compatible with the nonlinear scaling. We employ this framework to compare five different models to 15 different datasets and we find that the answers to points (i)–(iii) crucially depend on the fluctuations contained in the data, on how they are modelled, and on the fact that the city sizes are heavy-tailed distributed. PMID:27493764
Funk, Chris; Peterson, Pete; Landsfeld, Martin; Pedreros, Diego; Verdin, James; Shukla, Shraddhanand; Husak, Gregory; Rowland, James; Harrison, Laura; Hoell, Andrew; Michaelsen, Joel
2015-01-01
The Climate Hazards group Infrared Precipitation with Stations (CHIRPS) dataset builds on previous approaches to ‘smart’ interpolation techniques and high resolution, long period of record precipitation estimates based on infrared Cold Cloud Duration (CCD) observations. The algorithm i) is built around a 0.05° climatology that incorporates satellite information to represent sparsely gauged locations, ii) incorporates daily, pentadal, and monthly 1981-present 0.05° CCD-based precipitation estimates, iii) blends station data to produce a preliminary information product with a latency of about 2 days and a final product with an average latency of about 3 weeks, and iv) uses a novel blending procedure incorporating the spatial correlation structure of CCD-estimates to assign interpolation weights. We present the CHIRPS algorithm, global and regional validation results, and show how CHIRPS can be used to quantify the hydrologic impacts of decreasing precipitation and rising air temperatures in the Greater Horn of Africa. Using the Variable Infiltration Capacity model, we show that CHIRPS can support effective hydrologic forecasts and trend analyses in southeastern Ethiopia. PMID:26646728
NASA Astrophysics Data System (ADS)
Malczewski, Jacek; Rinner, Claus
2005-06-01
Commonly used GIS combination operators such as Boolean conjunction/disjunction and weighted linear combination can be generalized to the ordered weighted averaging (OWA) family of operators. This multicriteria evaluation method allows decision-makers to define a decision strategy on a continuum between pessimistic and optimistic strategies. Recently, OWA has been introduced to GIS-based decision support systems. We propose to extend a previous implementation of OWA with linguistic quantifiers to simplify the definition of decision strategies and to facilitate an exploratory analysis of multiple criteria. The linguistic quantifier-guided OWA procedure is illustrated using a dataset for evaluating residential quality of neighborhoods in London, Ontario.
Lu, Yongtao; Boudiffa, Maya; Dall'Ara, Enrico; Bellantuono, Ilaria; Viceconti, Marco
2016-07-05
In vivo micro-computed tomography (µCT) scanning of small rodents is a powerful method for longitudinal monitoring of bone adaptation. However, the life-time bone growth in small rodents makes it a challenge to quantify local bone adaptation. Therefore, the aim of this study was to develop a protocol, which can take into account large bone growth, to quantify local bone adaptations over space and time. The entire right tibiae of eight 14-week-old C57BL/6J female mice were consecutively scanned four times in an in vivo µCT scanner using a nominal isotropic image voxel size of 10.4µm. The repeated scan image datasets were aligned to the corresponding baseline (first) scan image dataset using rigid registration. 80% of tibia length (starting from the endpoint of the proximal growth plate) was selected as the volume of interest and partitioned into 40 regions along the tibial long axis (10 divisions) and in the cross-section (4 sectors). The bone mineral content (BMC) was used to quantify bone adaptation and was calculated in each region. All local BMCs have precision errors (PE%CV) of less than 3.5% (24 out of 40 regions have PE%CV of less than 2%), least significant changes (LSCs) of less than 3.8%, and 38 out of 40 regions have intraclass correlation coefficients (ICCs) of over 0.8. The proposed protocol allows to quantify local bone adaptations over an entire tibia in longitudinal studies, with a high reproducibility, an essential requirement to reduce the number of animals to achieve the necessary statistical power. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Mapping 2000 2010 Impervious Surface Change in India Using Global Land Survey Landsat Data
NASA Technical Reports Server (NTRS)
Wang, Panshi; Huang, Chengquan; Brown De Colstoun, Eric C.
2017-01-01
Understanding and monitoring the environmental impacts of global urbanization requires better urban datasets. Continuous field impervious surface change (ISC) mapping using Landsat data is an effective way to quantify spatiotemporal dynamics of urbanization. It is well acknowledged that Landsat-based estimation of impervious surface is subject to seasonal and phenological variations. The overall goal of this paper is to map 200-02010 ISC for India using Global Land Survey datasets and training data only available for 2010. To this end, a method was developed that could transfer the regression tree model developed for mapping 2010 impervious surface to 2000 using an iterative training and prediction (ITP) approach An independent validation dataset was also developed using Google Earth imagery. Based on the reference ISC from the validation dataset, the RMSE of predicted ISC was estimated to be 18.4%. At 95% confidence, the total estimated ISC for India between 2000 and 2010 is 2274.62 +/- 7.84 sq km.
NASA Astrophysics Data System (ADS)
Flaounas, Emmanouil; Drobinski, Philippe; Borga, Marco; Calvet, Jean-Christophe; Delrieu, Guy; Morin, Efrat; Tartari, Gianni; Toffolon, Roberta
2012-06-01
This letter assesses the quality of temperature and rainfall daily retrievals of the European Climate Assessment and Dataset (ECA&D) with respect to measurements collected locally in various parts of the Euro-Mediterranean region in the framework of the Hydrological Cycle in the Mediterranean Experiment (HyMeX), endorsed by the Global Energy and Water Cycle Experiment (GEWEX) of the World Climate Research Program (WCRP). The ECA&D, among other gridded datasets, is very often used as a reference for model calibration and evaluation. This is for instance the case in the context of the WCRP Coordinated Regional Downscaling Experiment (CORDEX) and its Mediterranean declination MED-CORDEX. This letter quantifies ECA&D dataset uncertainties associated with temperature and precipitation intra-seasonal variability, seasonal distribution and extremes. Our motivation is to help the interpretation of the results when validating or calibrating downscaling models by the ECA&D dataset in the context of regional climate research in the Euro-Mediterranean region.
Small sample sizes in the study of ontogenetic allometry; implications for palaeobiology
Vavrek, Matthew J.
2015-01-01
Quantitative morphometric analyses, particularly ontogenetic allometry, are common methods used in quantifying shape, and changes therein, in both extinct and extant organisms. Due to incompleteness and the potential for restricted sample sizes in the fossil record, palaeobiological analyses of allometry may encounter higher rates of error. Differences in sample size between fossil and extant studies and any resulting effects on allometric analyses have not been thoroughly investigated, and a logical lower threshold to sample size is not clear. Here we show that studies based on fossil datasets have smaller sample sizes than those based on extant taxa. A similar pattern between vertebrates and invertebrates indicates this is not a problem unique to either group, but common to both. We investigate the relationship between sample size, ontogenetic allometric relationship and statistical power using an empirical dataset of skull measurements of modern Alligator mississippiensis. Across a variety of subsampling techniques, used to simulate different taphonomic and/or sampling effects, smaller sample sizes gave less reliable and more variable results, often with the result that allometric relationships will go undetected due to Type II error (failure to reject the null hypothesis). This may result in a false impression of fewer instances of positive/negative allometric growth in fossils compared to living organisms. These limitations are not restricted to fossil data and are equally applicable to allometric analyses of rare extant taxa. No mathematically derived minimum sample size for ontogenetic allometric studies is found; rather results of isometry (but not necessarily allometry) should not be viewed with confidence at small sample sizes. PMID:25780770
Wang, Yan; Lin, Bo
2012-01-01
It is unclear whether the new anti-catabolic agent denosumab represents a viable alternative to the widely used anti-catabolic agent pamidronate in the treatment of Multiple Myeloma (MM)-induced bone disease. This lack of clarity primarily stems from the lack of sufficient clinical investigations, which are costly and time consuming. However, in silico investigations require less time and expense, suggesting that they may be a useful complement to traditional clinical investigations. In this paper, we aim to (i) develop integrated computational models that are suitable for investigating the effects of pamidronate and denosumab on MM-induced bone disease and (ii) evaluate the responses to pamidronate and denosumab treatments using these integrated models. To achieve these goals, pharmacokinetic models of pamidronate and denosumab are first developed and then calibrated and validated using different clinical datasets. Next, the integrated computational models are developed by incorporating the simulated transient concentrations of pamidronate and denosumab and simulations of their actions on the MM-bone compartment into the previously proposed MM-bone model. These integrated models are further calibrated and validated by different clinical datasets so that they are suitable to be applied to investigate the responses to the pamidronate and denosumab treatments. Finally, these responses are evaluated by quantifying the bone volume, bone turnover, and MM-cell density. This evaluation identifies four denosumab regimes that potentially produce an overall improved bone-related response compared with the recommended pamidronate regime. This in silico investigation supports the idea that denosumab represents an appropriate alternative to pamidronate in the treatment of MM-induced bone disease. PMID:23028650
A comparison of cover pole with standard vegetation monitoring methods
USDA-ARS?s Scientific Manuscript database
The ability of resource managers to make informed decisions regarding wildlife habitat could be improved with the use of existing datasets and the use of cost effective, standardized methods to simultaneously quantify vertical and horizontal cover. The objectives of this study were to (1) characteri...
NASA Astrophysics Data System (ADS)
Quinn, Paul; Jonczyk, Jennine; Owen, Gareth; Barber, Nick; Adams, Russell; ODonnell, Greg; EdenDTC Team
2015-04-01
The process insights afforded to catchment scientists through the availability of high frequency time series of hydrological and nutrient pollution datasets are invaluable. However, the observations reveal both good and bad news for the WFD. Data for flow, N, P and sediment (taken at 30 min intervals) from the River Eden Demonstration Test Catchment and several other detailed UK studies, will be used to discuss nutrient fluxes in catchments between 1km2 and 10km2. Monitoring of the seasonal groundwater status and the forensic analysis of numerous storm events have identified dominant flow pathways and nutrient losses. Nonetheless, many of the management questions demanded by the WFD will not be resolved by collecting these datasets alone. Long term trends are unlikely to be determined from these data and even if trends are found they are unlikely to be accurately apportioned to the activities that have caused them. The impacts of where and when an action takes place will not be detected at the catchment scale and the cost effectiveness of any mitigation method is unlikely to be quantifiable. Even in small well instrumented catchments the natural variability in rainfall, antecedent patterns and the variability in farming practices will mask any identifiable catchment scale signal. This does not mean the cost of the data acquisition has been wasted, it just means that the knowledge and expertise gained from these data should be used in new novel ways. It will always be difficult to quantify the actual losses occurring at the farm or field scale, but the positive benefits of any mitigation may still be approximated. The evidence for the rate of nutrient removal from a local sediment trap, wetland and a pond can be shown with high resolution datasets. However, any quantifiable results are still highly localised and the transfer and upscaling of any findings must be done with care. Modelling these datasets is also possible and the nature of models have evolved in the light of improved data, particularly in the representation of storm driven flow pathways. Hence the aggregation and the impact of any management or mitigation will rely on having confidence that local activities are beneficial, that a basket of measures merit pursuing, and are worthy of funding. A novel set of data driven risk-based indices, impact models and new experiments are needed to show the worth of catchment scale management. The high frequency data have been useful to build knowledge but a quantifiable cause and effect remains an elusive goal at the catchment scale.
A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs
Miao, Zhichao; Westhof, Eric
2015-01-01
Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods. PMID:26681179
Microbial Community Profiling of Human Saliva Using Shotgun Metagenomic Sequencing
Hasan, Nur A.; Young, Brian A.; Minard-Smith, Angela T.; Saeed, Kelly; Li, Huai; Heizer, Esley M.; McMillan, Nancy J.; Isom, Richard; Abdullah, Abdul Shakur; Bornman, Daniel M.; Faith, Seth A.; Choi, Seon Young; Dickens, Michael L.; Cebula, Thomas A.; Colwell, Rita R.
2014-01-01
Human saliva is clinically informative of both oral and general health. Since next generation shotgun sequencing (NGS) is now widely used to identify and quantify bacteria, we investigated the bacterial flora of saliva microbiomes of two healthy volunteers and five datasets from the Human Microbiome Project, along with a control dataset containing short NGS reads from bacterial species representative of the bacterial flora of human saliva. GENIUS, a system designed to identify and quantify bacterial species using unassembled short NGS reads was used to identify the bacterial species comprising the microbiomes of the saliva samples and datasets. Results, achieved within minutes and at greater than 90% accuracy, showed more than 175 bacterial species comprised the bacterial flora of human saliva, including bacteria known to be commensal human flora but also Haemophilus influenzae, Neisseria meningitidis, Streptococcus pneumoniae, and Gamma proteobacteria. Basic Local Alignment Search Tool (BLASTn) analysis in parallel, reported ca. five times more species than those actually comprising the in silico sample. Both GENIUSand BLAST analyses of saliva samples identified major genera comprising the bacterial flora of saliva, but GENIUS provided a more precise description of species composition, identifying to strain in most cases and delivered results at least 10,000 times faster. Therefore, GENIUS offers a facile and accurate system for identification and quantification of bacterial species and/or strains in metagenomic samples. PMID:24846174
Expected results and outputs include: extensive dataset of in-field and laboratory emissions data for traditional and improved cookstoves; parameterization to predict cookstove emissions from drive cycle data; indoor and personal exposure data for traditional and improved cook...
Natural and human-related landscape features influence the ecology and water quality within lakes. It is critical, therefore, to quantify landscape features in a hydrologically meaningful way to effectively manage these important ecosystems. Such summaries of the landscape are of...
Alaska national hydrography dataset positional accuracy assessment study
Arundel, Samantha; Yamamoto, Kristina H.; Constance, Eric; Mantey, Kim; Vinyard-Houx, Jeremy
2013-01-01
Initial visual assessments Wide range in the quality of fit between features in NHD and these new image sources. No statistical analysis has been performed to actually quantify accuracy Determining absolute accuracy is cost prohibitive (must collect independent, well defined test points) Quantitative analysis of relative positional error is feasible.
We used a gradient (divided into impervious cover categories), spatially-balanced, random design (1) to sample streams along an impervious cover gradient in a large coastal watershed, (2) to characterize relationships between water chemistry and land cover, and (3) to document di...
NASA Astrophysics Data System (ADS)
Dafflon, B.; Hubbard, S. S.; Ulrich, C.; Peterson, J. E.; Wu, Y.; Wainwright, H. M.; Gangodagamage, C.; Kholodov, A. L.; Kneafsey, T. J.
2013-12-01
Improvement in parameterizing Arctic process-rich terrestrial models to simulate feedbacks to a changing climate requires advances in estimating the spatiotemporal variations in active layer and permafrost properties - in sufficiently high resolution yet over modeling-relevant scales. As part of the DOE Next-Generation Ecosystem Experiments (NGEE-Arctic), we are developing advanced strategies for imaging the subsurface and for investigating land and subsurface co-variability and dynamics. Our studies include acquisition and integration of various measurements, including point-based, surface-based geophysical, and remote sensing datasets These data have been collected during a series of campaigns at the NGEE Barrow, AK site along transects that traverse a range of hydrological and geomorphological conditions, including low- to high- centered polygons and drained thaw lake basins. In this study, we describe the use of galvanic-coupled electrical resistance tomography (ERT), capacitively-coupled resistivity (CCR) , permafrost cores, above-ground orthophotography, and digital elevation model (DEM) to (1) explore complementary nature and trade-offs between characterization resolution, spatial extent and accuracy of different datasets; (2) develop inversion approaches to quantify permafrost characteristics (such as ice content, ice wedge frequency, and presence of unfrozen deep layer) and (3) identify correspondences between permafrost and land surface properties (such as water inundation, topography, and vegetation). In terms of methods, we developed a 1D-based direct search approach to estimate electrical conductivity distribution while allowing exploration of multiple solutions and prior information in a flexible way. Application of the method to the Barrow datasets reveals the relative information content of each dataset for characterizing permafrost properties, which shows features variability from below one meter length scales to large trends over more than a kilometer. Further, we used Pole- and Kite-based low-altitude aerial photography with inferred DEM, as well as DEM from LiDAR dataset, to quantify land-surface properties and their co-variability with the subsurface properties. Comparison of the above- and below-ground characterization information indicate that while some permafrost characteristics correspond with changes in hydrogeomorphological expressions, others features show more complex linkages with landscape properties. Overall, our results indicate that remote sensing data, point-scale measurements and surface geophysical measurements enable the identification of regional zones having similar relations between subsurface and land surface properties. Identification of such zonation and associated permafrost-land surface properties can be used to guide investigations of carbon cycling processes and for model parameterization.
Compressive sensing reconstruction of 3D wet refractivity based on GNSS and InSAR observations
NASA Astrophysics Data System (ADS)
Heublein, Marion; Alshawaf, Fadwa; Erdnüß, Bastian; Zhu, Xiao Xiang; Hinz, Stefan
2018-06-01
In this work, the reconstruction quality of an approach for neutrospheric water vapor tomography based on Slant Wet Delays (SWDs) obtained from Global Navigation Satellite Systems (GNSS) and Interferometric Synthetic Aperture Radar (InSAR) is investigated. The novelties of this approach are (1) the use of both absolute GNSS and absolute InSAR SWDs for tomography and (2) the solution of the tomographic system by means of compressive sensing (CS). The tomographic reconstruction is performed based on (i) a synthetic SWD dataset generated using wet refractivity information from the Weather Research and Forecasting (WRF) model and (ii) a real dataset using GNSS and InSAR SWDs. Thus, the validation of the achieved results focuses (i) on a comparison of the refractivity estimates with the input WRF refractivities and (ii) on radiosonde profiles. In case of the synthetic dataset, the results show that the CS approach yields a more accurate and more precise solution than least squares (LSQ). In addition, the benefit of adding synthetic InSAR SWDs into the tomographic system is analyzed. When applying CS, adding synthetic InSAR SWDs into the tomographic system improves the solution both in magnitude and in scattering. When solving the tomographic system by means of LSQ, no clear behavior is observed. In case of the real dataset, the estimated refractivities of both methodologies show a consistent behavior although the LSQ and CS solution strategies differ.
Impact of survey workflow on precision and accuracy of terrestrial LiDAR datasets
NASA Astrophysics Data System (ADS)
Gold, P. O.; Cowgill, E.; Kreylos, O.
2009-12-01
Ground-based LiDAR (Light Detection and Ranging) survey techniques are enabling remote visualization and quantitative analysis of geologic features at unprecedented levels of detail. For example, digital terrain models computed from LiDAR data have been used to measure displaced landforms along active faults and to quantify fault-surface roughness. But how accurately do terrestrial LiDAR data represent the true ground surface, and in particular, how internally consistent and precise are the mosaiced LiDAR datasets from which surface models are constructed? Addressing this question is essential for designing survey workflows that capture the necessary level of accuracy for a given project while minimizing survey time and equipment, which is essential for effective surveying of remote sites. To address this problem, we seek to define a metric that quantifies how scan registration error changes as a function of survey workflow. Specifically, we are using a Trimble GX3D laser scanner to conduct a series of experimental surveys to quantify how common variables in field workflows impact the precision of scan registration. Primary variables we are testing include 1) use of an independently measured network of control points to locate scanner and target positions, 2) the number of known-point locations used to place the scanner and point clouds in 3-D space, 3) the type of target used to measure distances between the scanner and the known points, and 4) setting up the scanner over a known point as opposed to resectioning of known points. Precision of the registered point cloud is quantified using Trimble Realworks software by automatic calculation of registration errors (errors between locations of the same known points in different scans). Accuracy of the registered cloud (i.e., its ground-truth) will be measured in subsequent experiments. To obtain an independent measure of scan-registration errors and to better visualize the effects of these errors on a registered point cloud, we scan from multiple locations an object of known geometry (a cylinder mounted above a square box). Preliminary results show that even in a controlled experimental scan of an object of known dimensions, there is significant variability in the precision of the registered point cloud. For example, when 3 scans of the central object are registered using 4 known points (maximum time, maximum equipment), the point clouds align to within ~1 cm (normal to the object surface). However, when the same point clouds are registered with only 1 known point (minimum time, minimum equipment), misalignment of the point clouds can range from 2.5 to 5 cm, depending on target type. The greater misalignment of the 3 point clouds when registered with fewer known points stems from the field method employed in acquiring the dataset and demonstrates the impact of field workflow on LiDAR dataset precision. By quantifying the degree of scan mismatch in results such as this, we can provide users with the information needed to maximize efficiency in remote field surveys.
Evaluating Variability and Uncertainty of Geological Strength Index at a Specific Site
NASA Astrophysics Data System (ADS)
Wang, Yu; Aladejare, Adeyemi Emman
2016-09-01
Geological Strength Index (GSI) is an important parameter for estimating rock mass properties. GSI can be estimated from quantitative GSI chart, as an alternative to the direct observational method which requires vast geological experience of rock. GSI chart was developed from past observations and engineering experience, with either empiricism or some theoretical simplifications. The GSI chart thereby contains model uncertainty which arises from its development. The presence of such model uncertainty affects the GSI estimated from GSI chart at a specific site; it is, therefore, imperative to quantify and incorporate the model uncertainty during GSI estimation from the GSI chart. A major challenge for quantifying the GSI chart model uncertainty is a lack of the original datasets that have been used to develop the GSI chart, since the GSI chart was developed from past experience without referring to specific datasets. This paper intends to tackle this problem by developing a Bayesian approach for quantifying the model uncertainty in GSI chart when using it to estimate GSI at a specific site. The model uncertainty in the GSI chart and the inherent spatial variability in GSI are modeled explicitly in the Bayesian approach. The Bayesian approach generates equivalent samples of GSI from the integrated knowledge of GSI chart, prior knowledge and observation data available from site investigation. Equations are derived for the Bayesian approach, and the proposed approach is illustrated using data from a drill and blast tunnel project. The proposed approach effectively tackles the problem of how to quantify the model uncertainty that arises from using GSI chart for characterization of site-specific GSI in a transparent manner.
Quantifying Biomass and Bare Earth Changes from the Hayman Fire Using Multi-temporal Lidar
NASA Astrophysics Data System (ADS)
Stoker, J. M.; Kaufmann, M. R.; Greenlee, S. K.
2007-12-01
Small-footprint multiple-return lidar data collected in the Cheesman Lake property prior to the 2002 Hayman fire in Colorado provided an excellent opportunity to evaluate Lidar as a tool to predict and analyze fire effects on both soil erosion and overstory structure. Re-measuring this area and applying change detection techniques allowed for analyses at a high level of detail. Our primary objectives focused on the use of change detection techniques using multi-temporal lidar data to: (1) evaluate the effectiveness of change detection to identify and quantify areas of erosion or deposition caused by post-fire rain events and rehab activities; (2) identify and quantify areas of biomass loss or forest structure change due to the Hayman fire; and (3) examine effects of pre-fire fuels and vegetation structure derived from lidar data on patterns of burn severity. While we were successful in identifying areas where changes occurred, the original error bounds on the variation in actual elevations made it difficult, if not misleading to quantify volumes of material changed on a per pixel basis. In order to minimize these variations in the two datasets, we investigated several correction and co-registration methodologies. The lessons learned from this project highlight the need for a high level of flight planning and understanding of errors in a lidar dataset in order to correctly estimate and report quantities of vertical change. Directly measuring vertical change using only lidar without ancillary information can provide errors that could make quantifications confusing, especially in areas with steep slopes.
Study Quantifies Physical Demands of Yoga in Seniors
... Z Study Quantifies Physical Demands of Yoga in Seniors Share: A recent NCCAM-funded study measured the ... performance of seven standing poses commonly taught in senior yoga classes: Chair, Wall Plank, Tree, Warrior II, ...
Evaluating new SMAP soil moisture for drought monitoring in the rangelands of the US High Plains
Velpuri, Naga Manohar; Senay, Gabriel B.; Morisette, Jeffrey T.
2016-01-01
Level 3 soil moisture datasets from the recently launched Soil Moisture Active Passive (SMAP) satellite are evaluated for drought monitoring in rangelands.Validation of SMAP soil moisture (SSM) with in situ and modeled estimates showed high level of agreement.SSM showed the highest correlation with surface soil moisture (0-5 cm) and a strong correlation to depths up to 20 cm.SSM showed a reliable and expected response of capturing seasonal dynamics in relation to precipitation, land surface temperature, and evapotranspiration.Further evaluation using multi-year SMAP datasets is necessary to quantify the full benefits and limitations for drought monitoring in rangelands.
Clock Agreement Among Parallel Supercomputer Nodes
Jones, Terry R.; Koenig, Gregory A.
2014-04-30
This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
ERIC Educational Resources Information Center
Livingstone, Sonia; Cagiltay, Kursat; Ólafsson, Kjartan
2015-01-01
In the EU Kids Online II project, data were collected from children and parents via in-home face-to-face interviews in 25 European countries to examine children's Internet use, activities and skills, the risk of harm they encountered, parental awareness, and safety strategies regarding children's Internet use and risks. The project provides…
Thomas, David; Finan, Chris; Newport, Melanie J; Jones, Susan
2015-10-01
The complexity of DNA can be quantified using estimates of entropy. Variation in DNA complexity is expected between the promoters of genes with different transcriptional mechanisms; namely housekeeping (HK) and tissue specific (TS). The former are transcribed constitutively to maintain general cellular functions, and the latter are transcribed in restricted tissue and cells types for specific molecular events. It is known that promoter features in the human genome are related to tissue specificity, but this has been difficult to quantify on a genomic scale. If entropy effectively quantifies DNA complexity, calculating the entropies of HK and TS gene promoters as profiles may reveal significant differences. Entropy profiles were calculated for a total dataset of 12,003 human gene promoters and for 501 housekeeping (HK) and 587 tissue specific (TS) human gene promoters. The mean profiles show the TS promoters have a significantly lower entropy (p<2.2e-16) than HK gene promoters. The entropy distributions for the 3 datasets show that promoter entropies could be used to identify novel HK genes. Functional features comprise DNA sequence patterns that are non-random and hence they have lower entropies. The lower entropy of TS gene promoters can be explained by a higher density of positive and negative regulatory elements, required for genes with complex spatial and temporary expression. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chappell, Nick A; Jones, Timothy D; Tych, Wlodek
2017-10-15
Insufficient temporal monitoring of water quality in streams or engineered drains alters the apparent shape of storm chemographs, resulting in shifted model parameterisations and changed interpretations of solute sources that have produced episodes of poor water quality. This so-called 'aliasing' phenomenon is poorly recognised in water research. Using advances in in-situ sensor technology it is now possible to monitor sufficiently frequently to avoid the onset of aliasing. A systems modelling procedure is presented allowing objective identification of sampling rates needed to avoid aliasing within strongly rainfall-driven chemical dynamics. In this study aliasing of storm chemograph shapes was quantified by changes in the time constant parameter (TC) of transfer functions. As a proportion of the original TC, the onset of aliasing varied between watersheds, ranging from 3.9-7.7 to 54-79 %TC (or 110-160 to 300-600 min). However, a minimum monitoring rate could be identified for all datasets if the modelling results were presented in the form of a new statistic, ΔTC. For the eight H + , DOC and NO 3 -N datasets examined from a range of watershed settings, an empirically-derived threshold of 1.3(ΔTC) could be used to quantify minimum monitoring rates within sampling protocols to avoid artefacts in subsequent data analysis. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Li, Zhao-Liang
2018-01-01
Few studies have examined hyperspectral remote-sensing image classification with type-II fuzzy sets. This paper addresses image classification based on a hyperspectral remote-sensing technique using an improved interval type-II fuzzy c-means (IT2FCM*) approach. In this study, in contrast to other traditional fuzzy c-means-based approaches, the IT2FCM* algorithm considers the ranking of interval numbers and the spectral uncertainty. The classification results based on a hyperspectral dataset using the FCM, IT2FCM, and the proposed improved IT2FCM* algorithms show that the IT2FCM* method plays the best performance according to the clustering accuracy. In this paper, in order to validate and demonstrate the separability of the IT2FCM*, four type-I fuzzy validity indexes are employed, and a comparative analysis of these fuzzy validity indexes also applied in FCM and IT2FCM methods are made. These four indexes are also applied into different spatial and spectral resolution datasets to analyze the effects of spectral and spatial scaling factors on the separability of FCM, IT2FCM, and IT2FCM* methods. The results of these validity indexes from the hyperspectral datasets show that the improved IT2FCM* algorithm have the best values among these three algorithms in general. The results demonstrate that the IT2FCM* exhibits good performance in hyperspectral remote-sensing image classification because of its ability to handle hyperspectral uncertainty. PMID:29373548
Competition amplifies drought stress in forests across broad climatic and compositional gradients
Kelly E. Gleason; John B. Bradford; Alessandra Bottero; Anthony W. D' Amato; Shawn Fraver; Brian J. Palik; Michael A. Battaglia; Louis Iverson; Laura Kenefic; Christel C. Kern
2017-01-01
Forests around the world are experiencing increasingly severe droughts and elevated competitive intensity due to increased tree density. However, the influence of interactions between drought and competition on forest growth remains poorly understood. Using a unique dataset of stand-scale dendrochronology sampled from 6405 trees, we quantified how annual growth of...
Relationships between harvest of American ginseng and hardwood timber production
Stephen P. Prisley; James Chamberlain; Michael McGuffin
2012-01-01
The goal of this research was to quantify the relationship between American ginseng (Panax quinquefolius) and timber inventory and harvest. This was done through compilation and analysis of county-level data from public datasets: ginseng harvest data from U.S. Fish and Wildlife Service, US Forest Service (USFS) forest inventory and analysis (FIA)...
Collections of publically available secondary data used to develop the conclusions described in the journal article.This dataset is associated with the following publication:Faulkner , B., S. Leibowitz , T. Canfield , and J. Groves. Quantifying groundwater dependency of riparian surface hydrologic features using the exit gradient. Hydrological Processes. John Wiley & Sons, Ltd., Indianapolis, IN, USA, 1-11, (2015).
Statistical analysis of QC data and estimation of fuel rod behaviour
NASA Astrophysics Data System (ADS)
Heins, L.; Groβ, H.; Nissen, K.; Wunderlich, F.
1991-02-01
The behaviour of fuel rods while in reactor is influenced by many parameters. As far as fabrication is concerned, fuel pellet diameter and density, and inner cladding diameter are important examples. Statistical analyses of quality control data show a scatter of these parameters within the specified tolerances. At present it is common practice to use a combination of superimposed unfavorable tolerance limits (worst case dataset) in fuel rod design calculations. Distributions are not considered. The results obtained in this way are very conservative but the degree of conservatism is difficult to quantify. Probabilistic calculations based on distributions allow the replacement of the worst case dataset by a dataset leading to results with known, defined conservatism. This is achieved by response surface methods and Monte Carlo calculations on the basis of statistical distributions of the important input parameters. The procedure is illustrated by means of two examples.
Social media fingerprints of unemployment.
Llorente, Alejandro; Garcia-Herranz, Manuel; Cebrian, Manuel; Moro, Esteban
2015-01-01
Recent widespread adoption of electronic and pervasive technologies has enabled the study of human behavior at an unprecedented level, uncovering universal patterns underlying human activity, mobility, and interpersonal communication. In the present work, we investigate whether deviations from these universal patterns may reveal information about the socio-economical status of geographical regions. We quantify the extent to which deviations in diurnal rhythm, mobility patterns, and communication styles across regions relate to their unemployment incidence. For this we examine a country-scale publicly articulated social media dataset, where we quantify individual behavioral features from over 19 million geo-located messages distributed among more than 340 different Spanish economic regions, inferred by computing communities of cohesive mobility fluxes. We find that regions exhibiting more diverse mobility fluxes, earlier diurnal rhythms, and more correct grammatical styles display lower unemployment rates. As a result, we provide a simple model able to produce accurate, easily interpretable reconstruction of regional unemployment incidence from their social-media digital fingerprints alone. Our results show that cost-effective economical indicators can be built based on publicly-available social media datasets.
Leaders and followers: quantifying consistency in spatio-temporal propagation patterns
NASA Astrophysics Data System (ADS)
Kreuz, Thomas; Satuvuori, Eero; Pofahl, Martin; Mulansky, Mario
2017-04-01
Repetitive spatio-temporal propagation patterns are encountered in fields as wide-ranging as climatology, social communication and network science. In neuroscience, perfectly consistent repetitions of the same global propagation pattern are called a synfire pattern. For any recording of sequences of discrete events (in neuroscience terminology: sets of spike trains) the questions arise how closely it resembles such a synfire pattern and which are the spike trains that lead/follow. Here we address these questions and introduce an algorithm built on two new indicators, termed SPIKE-order and spike train order, that define the synfire indicator value, which allows to sort multiple spike trains from leader to follower and to quantify the consistency of the temporal leader-follower relationships for both the original and the optimized sorting. We demonstrate our new approach using artificially generated datasets before we apply it to analyze the consistency of propagation patterns in two real datasets from neuroscience (giant depolarized potentials in mice slices) and climatology (El Niño sea surface temperature recordings). The new algorithm is distinguished by conceptual and practical simplicity, low computational cost, as well as flexibility and universality.
Social Media Fingerprints of Unemployment
Llorente, Alejandro; Garcia-Herranz, Manuel; Cebrian, Manuel; Moro, Esteban
2015-01-01
Recent widespread adoption of electronic and pervasive technologies has enabled the study of human behavior at an unprecedented level, uncovering universal patterns underlying human activity, mobility, and interpersonal communication. In the present work, we investigate whether deviations from these universal patterns may reveal information about the socio-economical status of geographical regions. We quantify the extent to which deviations in diurnal rhythm, mobility patterns, and communication styles across regions relate to their unemployment incidence. For this we examine a country-scale publicly articulated social media dataset, where we quantify individual behavioral features from over 19 million geo-located messages distributed among more than 340 different Spanish economic regions, inferred by computing communities of cohesive mobility fluxes. We find that regions exhibiting more diverse mobility fluxes, earlier diurnal rhythms, and more correct grammatical styles display lower unemployment rates. As a result, we provide a simple model able to produce accurate, easily interpretable reconstruction of regional unemployment incidence from their social-media digital fingerprints alone. Our results show that cost-effective economical indicators can be built based on publicly-available social media datasets. PMID:26020628
A tool for the estimation of the distribution of landslide area in R
NASA Astrophysics Data System (ADS)
Rossi, M.; Cardinali, M.; Fiorucci, F.; Marchesini, I.; Mondini, A. C.; Santangelo, M.; Ghosh, S.; Riguer, D. E. L.; Lahousse, T.; Chang, K. T.; Guzzetti, F.
2012-04-01
We have developed a tool in R (the free software environment for statistical computing, http://www.r-project.org/) to estimate the probability density and the frequency density of landslide area. The tool implements parametric and non-parametric approaches to the estimation of the probability density and the frequency density of landslide area, including: (i) Histogram Density Estimation (HDE), (ii) Kernel Density Estimation (KDE), and (iii) Maximum Likelihood Estimation (MLE). The tool is available as a standard Open Geospatial Consortium (OGC) Web Processing Service (WPS), and is accessible through the web using different GIS software clients. We tested the tool to compare Double Pareto and Inverse Gamma models for the probability density of landslide area in different geological, morphological and climatological settings, and to compare landslides shown in inventory maps prepared using different mapping techniques, including (i) field mapping, (ii) visual interpretation of monoscopic and stereoscopic aerial photographs, (iii) visual interpretation of monoscopic and stereoscopic VHR satellite images and (iv) semi-automatic detection and mapping from VHR satellite images. Results show that both models are applicable in different geomorphological settings. In most cases the two models provided very similar results. Non-parametric estimation methods (i.e., HDE and KDE) provided reasonable results for all the tested landslide datasets. For some of the datasets, MLE failed to provide a result, for convergence problems. The two tested models (Double Pareto and Inverse Gamma) resulted in very similar results for large and very large datasets (> 150 samples). Differences in the modeling results were observed for small datasets affected by systematic biases. A distinct rollover was observed in all analyzed landslide datasets, except for a few datasets obtained from landslide inventories prepared through field mapping or by semi-automatic mapping from VHR satellite imagery. The tool can also be used to evaluate the probability density and the frequency density of landslide volume.
BiodMHC: an online server for the prediction of MHC class II-peptide binding affinity.
Wang, Lian; Pan, Danling; Hu, Xihao; Xiao, Jinyu; Gao, Yangyang; Zhang, Huifang; Zhang, Yan; Liu, Juan; Zhu, Shanfeng
2009-05-01
Effective identification of major histocompatibility complex (MHC) molecules restricted peptides is a critical step in discovering immune epitopes. Although many online servers have been built to predict class II MHC-peptide binding affinity, they have been trained on different datasets, and thus fail in providing a unified comparison of various methods. In this paper, we present our implementation of seven popular predictive methods, namely SMM-align, ARB, SVR-pairwise, Gibbs sampler, ProPred, LP-top2, and MHCPred, on a single web server named BiodMHC (http://biod.whu.edu.cn/BiodMHC/index.html, the software is available upon request). Using a standard measure of AUC (Area Under the receiver operating characteristic Curves), we compare these methods by means of not only cross validation but also prediction on independent test datasets. We find that SMM-align, ProPred, SVR-pairwise, ARB, and Gibbs sampler are the five best-performing methods. For the binding affinity prediction of class II MHC-peptide, BiodMHC provides a convenient online platform for researchers to obtain binding information simultaneously using various methods.
Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique
2014-01-01
Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates.
Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique
2014-01-01
Background Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. Methodology/Principal findings The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. Conclusion/Significance The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates. PMID:24489839
Association of tRNA methyltransferase NSUN2/IGF-II molecular signature with ovarian cancer survival.
Yang, Jia-Cheng; Risch, Eric; Zhang, Meiqin; Huang, Chan; Huang, Huatian; Lu, Lingeng
2017-09-01
To investigate the association between NSUN2/IGF-II signature and ovarian cancer survival. Using a publicly accessible dataset of RNA sequencing and clinical follow-up data, we performed Classification and Regression Tree and survival analyses. Patients with NSUN2 high IGF-II low had significantly superior overall and disease progression-free survival, followed by NSUN2 low IGF-II low , NSUN2 high IGF-II high and NSUN2 low IGF-II high (p < 0.0001 for overall, p = 0.0024 for progression-free survival, respectively). The associations of NSUN2/IGF-II signature with the risks of death and relapse remained significant in multivariate Cox regression models. Random-effects meta-analyses show the upregulated NSUN2 and IGF-II expression in ovarian cancer versus normal tissues. The NSUN2/IGF-II signature associates with heterogeneous outcome and may have clinical implications in managing ovarian cancer.
NASA Astrophysics Data System (ADS)
Christen, A.; Crawford, B.; Ketler, R.; Lee, J. K.; McKendry, I. G.; Nesic, Z.; Caitlin, S.
2015-12-01
Measurements of long-lived greenhouse gases in the urban atmosphere are potentially useful to constrain and validate urban emission inventories, or space-borne remote-sensing products. We summarize and compare three different approaches, operating at different scales, that directly or indirectly identify, attribute and quantify emissions (and uptake) of carbon dioxide (CO2) in urban environments. All three approaches are illustrated using in-situ measurements in the atmosphere in and over Vancouver, Canada. Mobile sensing may be a promising way to quantify and map CO2 mixing ratios at fine scales across heterogenous and complex urban environments. We developed a system for monitoring CO2 mixing ratios at street level using a network of mobile CO2 sensors deployable on vehicles and bikes. A total of 5 prototype sensors were built and simultaneously used in a measurement campaign across a range of urban land use types and densities within a short time frame (3 hours). The dataset is used to aid in fine scale emission mapping in combination with simultaneous tower-based flux measurements. Overall, calculated CO2 emissions are realistic when compared against a spatially disaggregated scale emission inventory. The second approach is based on mass flux measurements of CO2 using a tower-based eddy covariance (EC) system. We present a continuous 7-year long dataset of CO2 fluxes measured by EC at the 28m tall flux tower 'Vancouver-Sunset'. We show how this dataset can be combined with turbulent source area models to quantify and partition different emission processes at the neighborhood-scale. The long-term EC measurements are within 10% of a spatially disaggregated scale emission inventory. Thirdly, at the urban scale, we present a dataset of CO2 mixing ratios measured using a tethered balloon system in the urban boundary layer above Vancouver. Using a simple box model, net city-scale CO2 emissions can be determined using measured rate of change of CO2 mixing ratios, estimated CO2 advection and entrainment fluxes. Daily city-scale emissions totals predicted by the model are within 32% of a spatially scaled municipal greenhouse gas inventory. In summary, combining information from different approaches and scales is a promising approach to establish long-term emission monitoring networks in cities.
Isolated effect of geometry on mitral valve function for in silico model development.
Siefert, Andrew William; Rabbah, Jean-Pierre Michel; Saikrishnan, Neelakantan; Kunzelman, Karyn Susanne; Yoganathan, Ajit Prithivaraj
2015-01-01
Computational models for the heart's mitral valve (MV) exhibit several uncertainties that may be reduced by further developing these models using ground-truth data-sets. This study generated a ground-truth data-set by quantifying the effects of isolated mitral annular flattening, symmetric annular dilatation, symmetric papillary muscle (PM) displacement and asymmetric PM displacement on leaflet coaptation, mitral regurgitation (MR) and anterior leaflet strain. MVs were mounted in an in vitro left heart simulator and tested under pulsatile haemodynamics. Mitral leaflet coaptation length, coaptation depth, tenting area, MR volume, MR jet direction and anterior leaflet strain in the radial and circumferential directions were successfully quantified at increasing levels of geometric distortion. From these data, increase in the levels of isolated PM displacement resulted in the greatest mean change in coaptation depth (70% increase), tenting area (150% increase) and radial leaflet strain (37% increase) while annular dilatation resulted in the largest mean change in coaptation length (50% decrease) and regurgitation volume (134% increase). Regurgitant jets were centrally located for symmetric annular dilatation and symmetric PM displacement. Asymmetric PM displacement resulted in asymmetrically directed jets. Peak changes in anterior leaflet strain in the circumferential direction were smaller and exhibited non-significant differences across the tested conditions. When used together, this ground-truth data-set may be used to parametrically evaluate and develop modelling assumptions for both the MV leaflets and subvalvular apparatus. This novel data may improve MV computational models and provide a platform for the development of future surgical planning tools.
Meinerz, Kelsey; Beeman, Scott C; Duan, Chong; Bretthorst, G Larry; Garbow, Joel R; Ackerman, Joseph J H
2018-01-01
Recently, a number of MRI protocols have been reported that seek to exploit the effect of dissolved oxygen (O 2 , paramagnetic) on the longitudinal 1 H relaxation of tissue water, thus providing image contrast related to tissue oxygen content. However, tissue water relaxation is dependent on a number of mechanisms, and this raises the issue of how best to model the relaxation data. This problem, the model selection problem, occurs in many branches of science and is optimally addressed by Bayesian probability theory. High signal-to-noise, densely sampled, longitudinal 1 H relaxation data were acquired from rat brain in vivo and from a cross-linked bovine serum albumin (xBSA) phantom, a sample that recapitulates the relaxation characteristics of tissue water in vivo . Bayesian-based model selection was applied to a cohort of five competing relaxation models: (i) monoexponential, (ii) stretched-exponential, (iii) biexponential, (iv) Gaussian (normal) R 1 -distribution, and (v) gamma R 1 -distribution. Bayesian joint analysis of multiple replicate datasets revealed that water relaxation of both the xBSA phantom and in vivo rat brain was best described by a biexponential model, while xBSA relaxation datasets truncated to remove evidence of the fast relaxation component were best modeled as a stretched exponential. In all cases, estimated model parameters were compared to the commonly used monoexponential model. Reducing the sampling density of the relaxation data and adding Gaussian-distributed noise served to simulate cases in which the data are acquisition-time or signal-to-noise restricted, respectively. As expected, reducing either the number of data points or the signal-to-noise increases the uncertainty in estimated parameters and, ultimately, reduces support for more complex relaxation models.
Abriata, Luciano A; Bovigny, Christophe; Dal Peraro, Matteo
2016-06-17
Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html ) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design.
A tool to evaluate local biophysical effects on temperature due to land cover change transitions
NASA Astrophysics Data System (ADS)
Perugini, Lucia; Caporaso, Luca; Duveiller, Gregory; Cescatti, Alessandro; Abad-Viñas, Raul; Grassi, Giacomo; Quesada, Benjamin
2017-04-01
Land Cover Changes (LCC) affect local, regional and global climate through biophysical variations of the surface energy budget mediated by albedo, evapotranspiration, and roughness. Assessment of the full climate impacts of anthropogenic LCC are incomplete without considering biophysical effects, but the high level of uncertainties in quantifying their impacts to date have made it impractical to offer clear advice on which policy makers could act. To overcome this barrier, we provide a tool to evaluate the biophysical impact of a matrix of land cover transitions, following a tiered methodological approach similar to the one provided by the IPCC to estimate the biogeochemical effects, i.e. through three levels of methodological complexity, from Tier 1 (i.e. default method and factors) to Tier 3 (i.e. specific methods and factors). In particular, the tool provides guidance for quantitative assessment of changes in temperature following a land cover transition. The tool focuses on temperature for two main reasons (i) it is the main variable of interest for policy makers at local and regional level, and (ii) temperature is able to summarize the impact of radiative and non-radiative processes following LULCC. The potential changes in annual air temperature that can be expected from various land cover transitions are derived from a dedicated dataset constructed by the JRC in the framework of the LUC4C FP7 project. The inputs for the dataset are air temperature values derived from satellite Earth Observation data (MODIS) and land cover characterization from the ESA Climate Change Initiative product reclassified into their IPCC land use category equivalent. This data, originally at 0.05 degree of spatial resolution, is aggregated and analysed at regional level to provide guidance on the expected temperature impact following specific LCC transitions.
NASA Astrophysics Data System (ADS)
Bukoski, J. J.; Broadhead, J. S.; Donato, D.; Murdiyarso, D.; Gregoire, T. G.
2016-12-01
Mangroves provide extensive ecosystem services that support both local livelihoods and international environmental goals, including coastal protection, water filtration, biodiversity conservation and the sequestration of carbon (C). While voluntary C market projects that seek to preserve and enhance forest C stocks offer a potential means of generating finance for mangrove conservation, their implementation faces barriers due to the high costs of quantifying C stocks through measurement, reporting and verification (MRV) activities. To streamline MRV activities in mangrove C forestry projects, we develop predictive models for (i) biomass-based C stocks, and (ii) soil-based C stocks for the mangroves of the Asia-Pacific. We use linear mixed effect models to account for spatial correlation in modeling the expected C as a function of stand attributes. The most parsimonious biomass model predicts total biomass C stocks as a function of both basal area and the interaction between latitude and basal area, whereas the most parsimonious soil C model predicts soil C stocks as a function of the logarithmic transformations of both latitude and basal area. Random effects are specified by site for both models, and are found to explain a substantial proportion of variance within the estimation datasets. The root mean square error (RMSE) of the biomass C model is approximated at 24.6 Mg/ha (18.4% of mean biomass C in the dataset), whereas the RMSE of the soil C model is estimated at 4.9 mg C/cm 3 (14.1% of mean soil C). A substantial proportion of the variation in soil C, however, is explained by the random effects and thus the use of the SOC model may be most valuable for sites in which field measurements of soil C exist.
Hydrodynamic modelling and global datasets: Flow connectivity and SRTM data, a Bangkok case study.
NASA Astrophysics Data System (ADS)
Trigg, M. A.; Bates, P. B.; Michaelides, K.
2012-04-01
The rise in the global interconnected manufacturing supply chains requires an understanding and consistent quantification of flood risk at a global scale. Flood risk is often better quantified (or at least more precisely defined) in regions where there has been an investment in comprehensive topographical data collection such as LiDAR coupled with detailed hydrodynamic modelling. Yet in regions where these data and modelling are unavailable, the implications of flooding and the knock on effects for global industries can be dramatic, as evidenced by the recent floods in Bangkok, Thailand. There is a growing momentum in terms of global modelling initiatives to address this lack of a consistent understanding of flood risk and they will rely heavily on the application of available global datasets relevant to hydrodynamic modelling, such as Shuttle Radar Topography Mission (SRTM) data and its derivatives. These global datasets bring opportunities to apply consistent methodologies on an automated basis in all regions, while the use of coarser scale datasets also brings many challenges such as sub-grid process representation and downscaled hydrology data from global climate models. There are significant opportunities for hydrological science in helping define new, realistic and physically based methodologies that can be applied globally as well as the possibility of gaining new insights into flood risk through analysis of the many large datasets that will be derived from this work. We use Bangkok as a case study to explore some of the issues related to using these available global datasets for hydrodynamic modelling, with particular focus on using SRTM data to represent topography. Research has shown that flow connectivity on the floodplain is an important component in the dynamics of flood flows on to and off the floodplain, and indeed within different areas of the floodplain. A lack of representation of flow connectivity, often due to data resolution limitations, means that important subgrid processes are missing from hydrodynamic models leading to poor model predictive capabilities. Specifically here, the issue of flow connectivity during flood events is explored using geostatistical techniques to quantify the change of flow connectivity on floodplains due to grid rescaling methods. We also test whether this method of assessing connectivity can be used as new tool in the quantification of flood risk that moves beyond the simple flood extent approach, encapsulating threshold changes and data limitations.
Suresh, V; Parthasarathy, S
2014-01-01
We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.
Rogers, L J; Douglas, R R
1984-02-01
In this paper (the second in a series), we consider a (generic) pair of datasets, which have been analyzed by the techniques of the previous paper. Thus, their "stable subspaces" have been established by comparative factor analysis. The pair of datasets must satisfy two confirmable conditions. The first is the "Inclusion Condition," which requires that the stable subspace of one of the datasets is nearly identical to a subspace of the other dataset's stable subspace. On the basis of that, we have assumed the pair to have similar generating signals, with stochastically independent generators. The second verifiable condition is that the (presumed same) generating signals have distinct ratios of variances for the two datasets. Under these conditions a small elaboration of some elementary linear algebra reduces the rotation problem to several eigenvalue-eigenvector problems. Finally, we emphasize that an analysis of each dataset by the method of Douglas and Rogers (1983) is an essential prerequisite for the useful application of the techniques in this paper. Nonempirical methods of estimating the number of factors simply will not suffice, as confirmed by simulations reported in the previous paper.
Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image
NASA Astrophysics Data System (ADS)
Pirotti, F.; Sunar, F.; Piragnolo, M.
2016-06-01
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.
Quantifying incident-induced travel delays on freeways using traffic sensor data : phase II
DOT National Transportation Integrated Search
2010-12-01
Traffic incidents cause approximately 50 percent of freeway congestion in metropolitan areas, resulting in extra travel time and fuel cost. Quantifying incident-induced delay (IID) will help people better understand the real costs of incidents, maxim...
Hancock, Matthew C.; Magnan, Jerry F.
2016-01-01
Abstract. In the assessment of nodules in CT scans of the lungs, a number of image-derived features are diagnostically relevant. Currently, many of these features are defined only qualitatively, so they are difficult to quantify from first principles. Nevertheless, these features (through their qualitative definitions and interpretations thereof) are often quantified via a variety of mathematical methods for the purpose of computer-aided diagnosis (CAD). To determine the potential usefulness of quantified diagnostic image features as inputs to a CAD system, we investigate the predictive capability of statistical learning methods for classifying nodule malignancy. We utilize the Lung Image Database Consortium dataset and only employ the radiologist-assigned diagnostic feature values for the lung nodules therein, as well as our derived estimates of the diameter and volume of the nodules from the radiologists’ annotations. We calculate theoretical upper bounds on the classification accuracy that are achievable by an ideal classifier that only uses the radiologist-assigned feature values, and we obtain an accuracy of 85.74 (±1.14)%, which is, on average, 4.43% below the theoretical maximum of 90.17%. The corresponding area-under-the-curve (AUC) score is 0.932 (±0.012), which increases to 0.949 (±0.007) when diameter and volume features are included and has an accuracy of 88.08 (±1.11)%. Our results are comparable to those in the literature that use algorithmically derived image-based features, which supports our hypothesis that lung nodules can be classified as malignant or benign using only quantified, diagnostic image features, and indicates the competitiveness of this approach. We also analyze how the classification accuracy depends on specific features and feature subsets, and we rank the features according to their predictive power, statistically demonstrating the top four to be spiculation, lobulation, subtlety, and calcification. PMID:27990453
Hancock, Matthew C; Magnan, Jerry F
2016-10-01
In the assessment of nodules in CT scans of the lungs, a number of image-derived features are diagnostically relevant. Currently, many of these features are defined only qualitatively, so they are difficult to quantify from first principles. Nevertheless, these features (through their qualitative definitions and interpretations thereof) are often quantified via a variety of mathematical methods for the purpose of computer-aided diagnosis (CAD). To determine the potential usefulness of quantified diagnostic image features as inputs to a CAD system, we investigate the predictive capability of statistical learning methods for classifying nodule malignancy. We utilize the Lung Image Database Consortium dataset and only employ the radiologist-assigned diagnostic feature values for the lung nodules therein, as well as our derived estimates of the diameter and volume of the nodules from the radiologists' annotations. We calculate theoretical upper bounds on the classification accuracy that are achievable by an ideal classifier that only uses the radiologist-assigned feature values, and we obtain an accuracy of 85.74 [Formula: see text], which is, on average, 4.43% below the theoretical maximum of 90.17%. The corresponding area-under-the-curve (AUC) score is 0.932 ([Formula: see text]), which increases to 0.949 ([Formula: see text]) when diameter and volume features are included and has an accuracy of 88.08 [Formula: see text]. Our results are comparable to those in the literature that use algorithmically derived image-based features, which supports our hypothesis that lung nodules can be classified as malignant or benign using only quantified, diagnostic image features, and indicates the competitiveness of this approach. We also analyze how the classification accuracy depends on specific features and feature subsets, and we rank the features according to their predictive power, statistically demonstrating the top four to be spiculation, lobulation, subtlety, and calcification.
Time-Series Analysis: A Cautionary Tale
NASA Technical Reports Server (NTRS)
Damadeo, Robert
2015-01-01
Time-series analysis has often been a useful tool in atmospheric science for deriving long-term trends in various atmospherically important parameters (e.g., temperature or the concentration of trace gas species). In particular, time-series analysis has been repeatedly applied to satellite datasets in order to derive the long-term trends in stratospheric ozone, which is a critical atmospheric constituent. However, many of the potential pitfalls relating to the non-uniform sampling of the datasets were often ignored and the results presented by the scientific community have been unknowingly biased. A newly developed and more robust application of this technique is applied to the Stratospheric Aerosol and Gas Experiment (SAGE) II version 7.0 ozone dataset and the previous biases and newly derived trends are presented.
Lordan, Grace; Tang, Kam Ki; Carmignani, Fabrizio
2011-08-01
In recent times there has been a sense that HIV/AIDS control has been attracting a significantly larger portion of donor health funding to the extent that it crowds out funding for other health concerns. Although there is no doubt that HIV/AIDS has absorbed a large share of development assistance for health (DAH), whether HIV/AIDS is actually diverting funding away from other health concerns has yet to be analyzed fully. To fill this vacuum, this study aims to test if a higher level of HIV/AIDS funding is related to a displacement in funding for other health concerns, and if yes, to quantify the magnitude of the displacement effect. Specifically, we consider whether HIV/AIDS DAH has displaced i) TB, ii) malaria iii) health sector and 'other' DAH in terms of the dollar amount received for aid. We consider this question within a regression framework controlling for time and recipient heterogeneity. We find displacement effects for malaria and health sector funding but not TB. In particular, the displacement effect for malaria is large and worrying. Copyright © 2011 Elsevier Ltd. All rights reserved.
Quantifying tropical peatland dissolved organic carbon (DOC) using UV-visible spectroscopy.
Cook, Sarah; Peacock, Mike; Evans, Chris D; Page, Susan E; Whelan, Mick J; Gauci, Vincent; Kho, Lip Khoon
2017-05-15
UV-visible spectroscopy has been shown to be a useful technique for determining dissolved organic carbon (DOC) concentrations. However, at present we are unaware of any studies in the literature that have investigated the suitability of this approach for tropical DOC water samples from any tropical peatlands, although some work has been performed in other tropical environments. We used water samples from two oil palm estates in Sarawak, Malaysia to: i) investigate the suitability of both single and two-wavelength proxies for tropical DOC determination; ii) develop a calibration dataset and set of parameters to calculate DOC concentrations indirectly; iii) provide tropical researchers with guidance on the best spectrophotometric approaches to use in future analyses of DOC. Both single and two-wavelength model approaches performed well with no one model significantly outperforming the other. The predictive ability of the models suggests that UV-visible spectroscopy is both a viable and low cost method for rapidly analyzing DOC in water samples immediately post-collection, which can be important when working at remote field sites with access to only basic laboratory facilities. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
A source-attractor approach to network detection of radiation sources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qishi; Barry, M. L..; Grieme, M.
Radiation source detection using a network of detectors is an active field of research for homeland security and defense applications. We propose Source-attractor Radiation Detection (SRD) method to aggregate measurements from a network of detectors for radiation source detection. SRD method models a potential radiation source as a magnet -like attractor that pulls in pre-computed virtual points from the detector locations. A detection decision is made if a sufficient level of attraction, quantified by the increase in the clustering of the shifted virtual points, is observed. Compared with traditional methods, SRD has the following advantages: i) it does not requiremore » an accurate estimate of the source location from limited and noise-corrupted sensor readings, unlike the localizationbased methods, and ii) its virtual point shifting and clustering calculation involve simple arithmetic operations based on the number of detectors, avoiding the high computational complexity of grid-based likelihood estimation methods. We evaluate its detection performance using canonical datasets from Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) tests. SRD achieves both lower false alarm rate and false negative rate compared to three existing algorithms for network source detection.« less
Shared periodic performer movements coordinate interactions in duo improvisations.
Eerola, Tuomas; Jakubowski, Kelly; Moran, Nikki; Keller, Peter E; Clayton, Martin
2018-02-01
Human interaction involves the exchange of temporally coordinated, multimodal cues. Our work focused on interaction in the visual domain, using music performance as a case for analysis due to its temporally diverse and hierarchical structures. We made use of two improvising duo datasets-(i) performances of a jazz standard with a regular pulse and (ii) non-pulsed, free improvizations-to investigate whether human judgements of moments of interaction between co-performers are influenced by body movement coordination at multiple timescales. Bouts of interaction in the performances were manually annotated by experts and the performers' movements were quantified using computer vision techniques. The annotated interaction bouts were then predicted using several quantitative movement and audio features. Over 80% of the interaction bouts were successfully predicted by a broadband measure of the energy of the cross-wavelet transform of the co-performers' movements in non-pulsed duos. A more complex model, with multiple predictors that captured more specific, interacting features of the movements, was needed to explain a significant amount of variance in the pulsed duos. The methods developed here have key implications for future work on measuring visual coordination in musical ensemble performances, and can be easily adapted to other musical contexts, ensemble types and traditions.
Quantifying Interannual Variability for Photovoltaic Systems in PVWatts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ryberg, David Severin; Freeman, Janine; Blair, Nate
2015-10-01
The National Renewable Energy Laboratory's (NREL's) PVWatts is a relatively simple tool used by industry and individuals alike to easily estimate the amount of energy a photovoltaic (PV) system will produce throughout the course of a typical year. PVWatts Version 5 has previously been shown to be able to reasonably represent an operating system's output when provided with concurrent weather data, however this type of data is not available when estimating system output during future time frames. For this purpose PVWatts uses weather data from typical meteorological year (TMY) datasets which are available on the NREL website. The TMY filesmore » represent a statistically 'typical' year which by definition excludes anomalous weather patterns and as a result may not provide sufficient quantification of project risk to the financial community. It was therefore desired to quantify the interannual variability associated with TMY files in order to improve the understanding of risk associated with these projects. To begin to understand the interannual variability of a PV project, we simulated two archetypal PV system designs, which are common in the PV industry, in PVWatts using the NSRDB's 1961-1990 historical dataset. This dataset contains measured hourly weather data and spans the thirty years from 1961-1990 for 239 locations in the United States. To note, this historical dataset was used to compose the TMY2 dataset. Using the results of these simulations we computed several statistical metrics which may be of interest to the financial community and normalized the results with respect to the TMY energy prediction at each location, so that these results could be easily translated to similar systems. This report briefly describes the simulation process used and the statistical methodology employed for this project, but otherwise focuses mainly on a sample of our results. A short discussion of these results is also provided. It is our hope that this quantification of the interannual variability of PV systems will provide a starting point for variability considerations in future PV system designs and investigations. however this type of data is not available when estimating system output during future time frames.« less
Persson, U. Martin
2017-01-01
While we know that deforestation in the tropics is increasingly driven by commercial agriculture, most tropical countries still lack recent and spatially-explicit assessments of the relative importance of pasture and cropland expansion in causing forest loss. Here we present a spatially explicit quantification of the extent to which cultivated land and grassland expanded at the expense of forests across Latin America in 2001–2011, by combining two “state-of-the-art” global datasets (Global Forest Change forest loss and GlobeLand30-2010 land cover). We further evaluate some of the limitations and challenges in doing this. We find that this approach does capture some of the major patterns of land cover following deforestation, with GlobeLand30-2010’s Grassland class (which we interpret as pasture) being the most common land cover replacing forests across Latin America. However, our analysis also reveals some major limitations to combining these land cover datasets for quantifying pasture and cropland expansion into forest. First, a simple one-to-one translation between GlobeLand30-2010’s Cultivated land and Grassland classes into cropland and pasture respectively, should not be made without caution, as GlobeLand30-2010 defines its Cultivated land to include some pastures. Comparisons with the TerraClass dataset over the Brazilian Amazon and with previous literature indicates that Cultivated land in GlobeLand30-2010 includes notable amounts of pasture and other vegetation (e.g. in Paraguay and the Brazilian Amazon). This further suggests that the approach taken here generally leads to an underestimation (of up to ~60%) of the role of pasture in replacing forest. Second, a large share (~33%) of the Global Forest Change forest loss is found to still be forest according to GlobeLand30-2010 and our analysis suggests that the accuracy of the combined datasets, especially for areas with heterogeneous land cover and/or small-scale forest loss, is still too poor for deriving accurate quantifications of land cover following forest loss. PMID:28704510
Pendrill, Florence; Persson, U Martin
2017-01-01
While we know that deforestation in the tropics is increasingly driven by commercial agriculture, most tropical countries still lack recent and spatially-explicit assessments of the relative importance of pasture and cropland expansion in causing forest loss. Here we present a spatially explicit quantification of the extent to which cultivated land and grassland expanded at the expense of forests across Latin America in 2001-2011, by combining two "state-of-the-art" global datasets (Global Forest Change forest loss and GlobeLand30-2010 land cover). We further evaluate some of the limitations and challenges in doing this. We find that this approach does capture some of the major patterns of land cover following deforestation, with GlobeLand30-2010's Grassland class (which we interpret as pasture) being the most common land cover replacing forests across Latin America. However, our analysis also reveals some major limitations to combining these land cover datasets for quantifying pasture and cropland expansion into forest. First, a simple one-to-one translation between GlobeLand30-2010's Cultivated land and Grassland classes into cropland and pasture respectively, should not be made without caution, as GlobeLand30-2010 defines its Cultivated land to include some pastures. Comparisons with the TerraClass dataset over the Brazilian Amazon and with previous literature indicates that Cultivated land in GlobeLand30-2010 includes notable amounts of pasture and other vegetation (e.g. in Paraguay and the Brazilian Amazon). This further suggests that the approach taken here generally leads to an underestimation (of up to ~60%) of the role of pasture in replacing forest. Second, a large share (~33%) of the Global Forest Change forest loss is found to still be forest according to GlobeLand30-2010 and our analysis suggests that the accuracy of the combined datasets, especially for areas with heterogeneous land cover and/or small-scale forest loss, is still too poor for deriving accurate quantifications of land cover following forest loss.
NASA Astrophysics Data System (ADS)
Hakim, I.; May, D.; Abo Ras, M.; Meyendorf, N.; Donaldson, S.
2016-04-01
On the present work, samples of carbon fiber/epoxy composites with different void levels were fabricated using hand layup vacuum bagging process by varying the pressure. Thermal nondestructive methods: thermal conductivity measurement, pulse thermography, pulse phase thermography and lock-in-thermography, and mechanical testing: modes I and II interlaminar fracture toughness were conducted. Comparing the parameters resulted from the thermal nondestructive testing revealed that voids lead to reductions in thermal properties in all directions of composites. The results of mode I and mode II interlaminar fracture toughness showed that voids lead to reductions in interlaminar fracture toughness. The parameters resulted from thermal nondestructive testing were correlated to the results of mode I and mode II interlaminar fracture toughness and voids were quantified.
Baptiste Dafflon; Rusen Oktem; John Peterson; Craig Ulrich; Anh Phuong Tran; Vladimir Romanovsky; Susan Hubbard
2017-05-10
The dataset contains measurements obtained through electrical resistivity tomography (ERT) to monitor soil properties, pole-mounted optical cameras to monitor vegetation dynamics, point probes to measure soil temperature, and periodic manual measurements of thaw layer thickness, snow thickness and soil dielectric permittivity.
Within-population spatial synchrony in mast seeding of North American oaks.
A.V. Liebhold; M. Sork; O.N. Peltonen; Westfall R. Bjørnstad; J. Elkinton; M. H. J. Knops
2004-01-01
Mast seeding, the synchronous production of large crops of seeds, has been frequently documented in oak species. In this study we used several North American oak data-sets to quantify within-stand (10 km) synchrony in mast dynamics. Results indicated that intraspecific synchrony in seed production always exceeded interspecific synchrony and was essentially constant...
Spatio-temporal Eigenvector Filtering: Application on Bioenergy Crop Impacts
NASA Astrophysics Data System (ADS)
Wang, M.; Kamarianakis, Y.; Georgescu, M.
2017-12-01
A suite of 10-year ensemble-based simulations was conducted to investigate the hydroclimatic impacts due to large-scale deployment of perennial bioenergy crops across the continental United States. Given the large size of the simulated dataset (about 60Tb), traditional hierarchical spatio-temporal statistical modelling cannot be implemented for the evaluation of physics parameterizations and biofuel impacts. In this work, we propose a filtering algorithm that takes into account the spatio-temporal autocorrelation structure of the data while avoiding spatial confounding. This method is used to quantify the robustness of simulated hydroclimatic impacts associated with bioenergy crops to alternative physics parameterizations and observational datasets. Results are evaluated against those obtained from three alternative Bayesian spatio-temporal specifications.
Pantheon 1.0, a manually verified dataset of globally famous biographies.
Yu, Amy Zhao; Ronen, Shahar; Hu, Kevin; Lu, Tiffany; Hidalgo, César A
2016-01-05
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually verified demographic information (place and date of birth, gender) (ii) a taxonomy of occupations classifying each biography at three levels of aggregation and (iii) two measures of global popularity including the number of languages in which a biography is present in Wikipedia (L), and the Historical Popularity Index (HPI) a metric that combines information on L, time since birth, and page-views (2008-2013). We compare the Pantheon 1.0 dataset to data from the 2003 book, Human Accomplishments, and also to external measures of accomplishment in individual games and sports: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of popularity (L and HPI) correlate highly with individual accomplishment, suggesting that measures of global popularity proxy the historical impact of individuals.
Pantheon 1.0, a manually verified dataset of globally famous biographies
Yu, Amy Zhao; Ronen, Shahar; Hu, Kevin; Lu, Tiffany; Hidalgo, César A.
2016-01-01
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually verified demographic information (place and date of birth, gender) (ii) a taxonomy of occupations classifying each biography at three levels of aggregation and (iii) two measures of global popularity including the number of languages in which a biography is present in Wikipedia (L), and the Historical Popularity Index (HPI) a metric that combines information on L, time since birth, and page-views (2008–2013). We compare the Pantheon 1.0 dataset to data from the 2003 book, Human Accomplishments, and also to external measures of accomplishment in individual games and sports: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of popularity (L and HPI) correlate highly with individual accomplishment, suggesting that measures of global popularity proxy the historical impact of individuals. PMID:26731133
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
NASA Astrophysics Data System (ADS)
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index for diverse field conditions. Sci. Total Env. 541:424-434. Pulido Moncada et al., 2014. Data-driven analysis of soil quality indicators using limited data. Geoderma, 235:271-278. Stellacci et al., 2016. Comparison of different multivariate methods to select key soil variables for soil quality indices computation. XLV Congress of the Italian Society of Agronomy (SIA), Sassari, 20-22 September 2016.
NASA Technical Reports Server (NTRS)
1999-01-01
This volume contains abstracts that have been accepted for presentation at the Workshop on New Views of the Moon II: Understanding the Moon Through the Integration of Diverse Datasets, September 22-24, 1999, in Flagstaff, Arizona. The workshop conveners are Lisa Gaddis (U.S. Geological Survey, Flagstaff and Charles K. Shearer (University of New Mexico). Color versions of some of the images contained in this volume are available on the meeting Web site (http://cass.jsc.nasa.gov/meetings/moon99/pdf/program.pdf).
Polling, C; Tulloch, A; Banerjee, S; Cross, S; Dutta, R; Wood, D M; Dargan, P I; Hotopf, M
2015-07-16
Self-harm is a significant public health concern in the UK. This is reflected in the recent addition to the English Public Health Outcomes Framework of rates of attendance at Emergency Departments (EDs) following self-harm. However there is currently no source of data to measure this outcome. Routinely available data for inpatient admissions following self-harm miss the majority of cases presenting to services. We aimed to investigate (i) if a dataset of ED presentations could be produced using a combination of routinely collected clinical and administrative data and (ii) to validate this dataset against another one produced using methods similar to those used in previous studies. Using the Clinical Record Interactive Search system, the electronic health records (EHRs) used in four EDs were linked to Hospital Episode Statistics to create a dataset of attendances following self-harm. This dataset was compared with an audit dataset of ED attendances created by manual searching of ED records. The proportion of total cases detected by each dataset was compared. There were 1932 attendances detected by the EHR dataset and 1906 by the audit. The EHR and audit datasets detected 77% and 76 of all attendances respectively and both detected 82% of individual patients. There were no differences in terms of age, sex, ethnicity or marital status between those detected and those missed using the EHR method. Both datasets revealed more than double the number of self-harm incidents than could be identified from inpatient admission records. It was possible to use routinely collected EHR data to create a dataset of attendances at EDs following self-harm. The dataset detected the same proportion of attendances and individuals as the audit dataset, proved more comprehensive than the use of inpatient admission records, and did not show a systematic bias in those cases it missed.
Alcaraz-Segura, Domingo; Liras, Elisa; Tabik, Siham; Paruelo, José; Cabello, Javier
2010-01-01
Successive efforts have processed the Advanced Very High Resolution Radiometer (AVHRR) sensor archive to produce Normalized Difference Vegetation Index (NDVI) datasets (i.e., PAL, FASIR, GIMMS, and LTDR) under different corrections and processing schemes. Since NDVI datasets are used to evaluate carbon gains, differences among them may affect nations’ carbon budgets in meeting international targets (such as the Kyoto Protocol). This study addresses the consistency across AVHRR NDVI datasets in the Iberian Peninsula (Spain and Portugal) by evaluating whether their 1982–1999 NDVI trends show similar spatial patterns. Significant trends were calculated with the seasonal Mann-Kendall trend test and their spatial consistency with partial Mantel tests. Over 23% of the Peninsula (N, E, and central mountain ranges) showed positive and significant NDVI trends across the four datasets and an additional 18% across three datasets. In 20% of Iberia (SW quadrant), the four datasets exhibited an absence of significant trends and an additional 22% across three datasets. Significant NDVI decreases were scarce (croplands in the Guadalquivir and Segura basins, La Mancha plains, and Valencia). Spatial consistency of significant trends across at least three datasets was observed in 83% of the Peninsula, but it decreased to 47% when comparing across the four datasets. FASIR, PAL, and LTDR were the most spatially similar datasets, while GIMMS was the most different. The different performance of each AVHRR dataset to detect significant NDVI trends (e.g., LTDR detected greater significant trends (both positive and negative) and in 32% more pixels than GIMMS) has great implications to evaluate carbon budgets. The lack of spatial consistency across NDVI datasets derived from the same AVHRR sensor archive, makes it advisable to evaluate carbon gains trends using several satellite datasets and, whether possible, independent/additional data sources to contrast. PMID:22205868
Alcaraz-Segura, Domingo; Liras, Elisa; Tabik, Siham; Paruelo, José; Cabello, Javier
2010-01-01
Successive efforts have processed the Advanced Very High Resolution Radiometer (AVHRR) sensor archive to produce Normalized Difference Vegetation Index (NDVI) datasets (i.e., PAL, FASIR, GIMMS, and LTDR) under different corrections and processing schemes. Since NDVI datasets are used to evaluate carbon gains, differences among them may affect nations' carbon budgets in meeting international targets (such as the Kyoto Protocol). This study addresses the consistency across AVHRR NDVI datasets in the Iberian Peninsula (Spain and Portugal) by evaluating whether their 1982-1999 NDVI trends show similar spatial patterns. Significant trends were calculated with the seasonal Mann-Kendall trend test and their spatial consistency with partial Mantel tests. Over 23% of the Peninsula (N, E, and central mountain ranges) showed positive and significant NDVI trends across the four datasets and an additional 18% across three datasets. In 20% of Iberia (SW quadrant), the four datasets exhibited an absence of significant trends and an additional 22% across three datasets. Significant NDVI decreases were scarce (croplands in the Guadalquivir and Segura basins, La Mancha plains, and Valencia). Spatial consistency of significant trends across at least three datasets was observed in 83% of the Peninsula, but it decreased to 47% when comparing across the four datasets. FASIR, PAL, and LTDR were the most spatially similar datasets, while GIMMS was the most different. The different performance of each AVHRR dataset to detect significant NDVI trends (e.g., LTDR detected greater significant trends (both positive and negative) and in 32% more pixels than GIMMS) has great implications to evaluate carbon budgets. The lack of spatial consistency across NDVI datasets derived from the same AVHRR sensor archive, makes it advisable to evaluate carbon gains trends using several satellite datasets and, whether possible, independent/additional data sources to contrast.
Human neutral genetic variation and forensic STR data.
Silva, Nuno M; Pereira, Luísa; Poloni, Estella S; Currat, Mathias
2012-01-01
The forensic genetics field is generating extensive population data on polymorphism of short tandem repeats (STR) markers in globally distributed samples. In this study we explored and quantified the informative power of these datasets to address issues related to human evolution and diversity, by using two online resources: an allele frequency dataset representing 141 populations summing up to almost 26 thousand individuals; a genotype dataset consisting of 42 populations and more than 11 thousand individuals. We show that the genetic relationships between populations based on forensic STRs are best explained by geography, as observed when analysing other worldwide datasets generated specifically to study human diversity. However, the global level of genetic differentiation between populations (as measured by a fixation index) is about half the value estimated with those other datasets, which contain a much higher number of markers but much less individuals. We suggest that the main factor explaining this difference is an ascertainment bias in forensics data resulting from the choice of markers for individual identification. We show that this choice results in average low variance of heterozygosity across world regions, and hence in low differentiation among populations. Thus, the forensic genetic markers currently produced for the purpose of individual assignment and identification allow the detection of the patterns of neutral genetic structure that characterize the human population but they do underestimate the levels of this genetic structure compared to the datasets of STRs (or other kinds of markers) generated specifically to study the diversity of human populations.
NASA Astrophysics Data System (ADS)
Windham-Myers, L.; Holmquist, J. R.; Bergamaschi, B. A.; Byrd, K. B.; Callaway, J.; Crooks, S.; Drexler, J. Z.; Feagin, R. A.; Ferner, M. C.; Gonneea, M. E.; Kroeger, K. D.; Megonigal, P.; Morris, J. T.; Schile, L. M.; Simard, M.; Sutton-Grier, A.; Takekawa, J.; Troxler, T.; Weller, D.; Woo, I.
2015-12-01
Despite their high rates of long-term carbon (C) sequestration when compared to upland ecosystems, coastal C accounting is only recently receiving the attention of policy makers and carbon markets. Assessing accuracy and uncertainty in net C flux estimates requires both direct and derived measurements based on both short and long term dynamics in key drivers, particularly soil accretion rates and soil organic content. We are testing the ability of remote sensing products and national scale datasets to estimate biomass and soil stocks and fluxes over a wide range of spatial and temporal scales. For example, the 2013 Wetlands Supplement to the 2006 IPCC GHG national inventory reporting guidelines requests information on development of Tier I-III reporting, which express increasing levels of detail. We report progress toward development of a Carbon Monitoring System for "blue carbon" that may be useful for IPCC reporting guidelines at Tier II levels. Our project uses a current dataset of publically available and contributed field-based measurements to validate models of changing soil C stocks, across a broad range of U.S. tidal wetland types and landuse conversions. Additionally, development of biomass algorithms for both radar and spectral datasets will be tested and used to determine the "price of precision" of different satellite products. We discuss progress in calculating Tier II estimates focusing on variation introduced by the different input datasets. These include the USFWS National Wetlands Inventory, NOAA Coastal Change Analysis Program, and combinations to calculate tidal wetland area. We also assess the use of different attributes and depths from the USDA-SSURGO database to map soil C density. Finally, we examine the relative benefit of radar, spectral and hybrid approaches to biomass mapping in tidal marshes and mangroves. While the US currently plans to report GHG emissions at a Tier I level, we argue that a Tier II analysis is possible due to national maps of wetland area and soil carbon, as well as sediment accretion and sea-level rise correlations and wetland area change data. The uncertainty analyses performed nationally and in six regionally-representative "sentinel sites" will be an important guide for future efforts towards more accurate and complete wetland C inventories.
Toward Computational Cumulative Biology by Combining Models of Biological Datasets
Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel
2014-01-01
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176
Toward computational cumulative biology by combining models of biological datasets.
Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel
2014-01-01
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations
NASA Astrophysics Data System (ADS)
Moulik, P.; Lekic, V.; Romanowicz, B. A.
2017-12-01
A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history. By assessing inter-dataset consistency across similar paths, we quantify travel-time measurement errors for both surface and body waves. Finally, we discuss challenges associated with combining high frequency ( 1 Hz) and long period (10-20s) body-wave measurements into the REM-3D reference dataset.
A new global 1-km dataset of percentage tree cover derived from remote sensing
DeFries, R.S.; Hansen, M.C.; Townshend, J.R.G.; Janetos, A.C.; Loveland, Thomas R.
2000-01-01
Accurate assessment of the spatial extent of forest cover is a crucial requirement for quantifying the sources and sinks of carbon from the terrestrial biosphere. In the more immediate context of the United Nations Framework Convention on Climate Change, implementation of the Kyoto Protocol calls for estimates of carbon stocks for a baseline year as well as for subsequent years. Data sources from country level statistics and other ground-based information are based on varying definitions of 'forest' and are consequently problematic for obtaining spatially and temporally consistent carbon stock estimates. By combining two datasets previously derived from the Advanced Very High Resolution Radiometer (AVHRR) at 1 km spatial resolution, we have generated a prototype global map depicting percentage tree cover and associated proportions of trees with different leaf longevity (evergreen and deciduous) and leaf type (broadleaf and needleleaf). The product is intended for use in terrestrial carbon cycle models, in conjunction with other spatial datasets such as climate and soil type, to obtain more consistent and reliable estimates of carbon stocks. The percentage tree cover dataset is available through the Global Land Cover Facility at the University of Maryland at http://glcf.umiacs.umd.edu.
Variability of Upper-Tropospheric Precipitable from Satellite and Model Reanalysis Datasets
NASA Technical Reports Server (NTRS)
Jedlovec, Gary J.; Iwai, Hisaki
1999-01-01
Numerous datasets have been used to quantify water vapor and its variability in the upper-troposphere from satellite and model reanalysis data. These investigations have shown some usefulness in monitoring seasonal and inter-annual variations in moisture either globally, with polar orbiting satellite data or global model output analysis, or regionally, with the higher spatial and temporal resolution geostationary measurements. The datasets are not without limitations, however, due to coverage or limited temporal sampling, and may also contain bias in their representation of moisture processes. The research presented in this conference paper inter-compares the NVAP, NCEP/NCAR and DAO reanalysis models, and GOES satellite measurements of upper-tropospheric,precipitable water for the period from 1988-1994. This period captures several dramatic swings in climate events associated with ENSO events. The data are evaluated for temporal and spatial continuity, inter-compared to assess reliability and potential bias, and analyzed in light of expected trends due to changes in precipitation and synoptic-scale weather features. This work is the follow-on to previous research which evaluated total precipitable water over the same period. The relationship between total and upper-level precipitable water in the datasets will be discussed as well.
New insights into the biogenesis of nuclear RNA polymerases?
Cloutier, Philippe; Coulombe, Benoit
2010-04-01
More than 30 years of research on nuclear RNA polymerases (RNAP I, II, and III) has uncovered numerous factors that regulate the activity of these enzymes during the transcription reaction. However, very little is known about the machinery that regulates the fate of RNAPs before or after transcription. In particular, the mechanisms of biogenesis of the 3 nuclear RNAPs, which comprise both common and specific subunits, remains mostly uncharacterized and the proteins involved are yet to be discovered. Using protein affinity purification coupled to mass spectrometry (AP-MS), we recently unraveled a high-density interaction network formed by nuclear RNAP subunits from the soluble fraction of human cell extracts. Validation of the dataset using a machine learning approach trained to minimize the rate of false positives and false negatives yielded a high-confidence dataset and uncovered novel interactors that regulate the RNAP II transcription machinery, including a set of proteins we named the RNAP II-associated proteins (RPAPs). One of the RPAPs, RPAP3, is part of an 11-subunit complex we termed the RPAP3/R2TP/prefoldin-like complex. Here, we review the literature on the subunits of this complex, which points to a role in nuclear RNAP biogenesis.
New insights into the biogenesis of nuclear RNA polymerases?1
Cloutier, Philippe; Coulombe, Benoit
2015-01-01
More than 30 years of research on nuclear RNA polymerases (RNAP I, II, and III) has uncovered numerous factors that regulate the activity of these enzymes during the transcription reaction. However, very little is known about the machinery that regulates the fate of RNAPs before or after transcription. In particular, the mechanisms of biogenesis of the 3 nuclear RNAPs, which comprise both common and specific subunits, remains mostly uncharacterized and the proteins involved are yet to be discovered. Using protein affinity purification coupled to mass spectrometry (AP–MS), we recently unraveled a high-density interaction network formed by nuclear RNAP subunits from the soluble fraction of human cell extracts. Validation of the dataset using a machine learning approach trained to minimize the rate of false positives and false negatives yielded a high-confidence dataset and uncovered novel interactors that regulate the RNAP II transcription machinery, including a set of proteins we named the RNAP II-associated proteins (RPAPs). One of the RPAPs, RPAP3, is part of an 11-subunit complex we termed the RPAP3/R2TP/prefoldin-like complex. Here, we review the literature on the subunits of this complex, which points to a role in nuclear RNAP biogenesis. PMID:20453924
How to get statistically significant effects in any ERP experiment (and why you shouldn't).
Luck, Steven J; Gaspelin, Nicholas
2017-01-01
ERP experiments generate massive datasets, often containing thousands of values for each participant, even after averaging. The richness of these datasets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant but bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand-averaged data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multifactor statistical analyses. Reanalyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant but bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. © 2016 Society for Psychophysiological Research.
Comparison of Five Modeling Approaches to Quantify and ...
A generally accepted value for the Radiation Amplification Factor (RAF), with respect to the erythemal action spectrum for sunburn of human skin, is −1.1, indicating that a 1.0% increase in stratospheric ozone leads to a 1.1% decrease in the biologically damaging UV radiation in the erythemal action spectrum reaching the Earth. The RAF is used to quantify the non-linear change in the biologically damaging UV radiation in the erythemal action spectrum as a function of total column ozone (O3). Spectrophotometer measurements recorded at ten US monitoring sites were used in this analysis, and over 71,000 total UVR measurement scans of the sky were collected at those 10 sites between 1998 and 2000 to assess the RAF value. This UVR dataset was examined to determine the specific impact of clouds on the RAF. Five de novo modeling approaches were used on the dataset, and the calculated RAF values ranged from a low of −0.80 to a high of −1.38. To determine the impact of clouds on RAF, which is an indicator of the amount of UV radiation reaching the earth which can affect sunburn of human skin.
Data Used in Quantified Reliability Models
NASA Technical Reports Server (NTRS)
DeMott, Diana; Kleinhammer, Roger K.; Kahn, C. J.
2014-01-01
Data is the crux to developing quantitative risk and reliability models, without the data there is no quantification. The means to find and identify reliability data or failure numbers to quantify fault tree models during conceptual and design phases is often the quagmire that precludes early decision makers consideration of potential risk drivers that will influence design. The analyst tasked with addressing a system or product reliability depends on the availability of data. But, where is does that data come from and what does it really apply to? Commercial industries, government agencies, and other international sources might have available data similar to what you are looking for. In general, internal and external technical reports and data based on similar and dissimilar equipment is often the first and only place checked. A common philosophy is "I have a number - that is good enough". But, is it? Have you ever considered the difference in reported data from various federal datasets and technical reports when compared to similar sources from national and/or international datasets? Just how well does your data compare? Understanding how the reported data was derived, and interpreting the information and details associated with the data is as important as the data itself.
Arsalan, Muhammad; Naqvi, Rizwan Ali; Kim, Dong Seop; Nguyen, Phong Ha; Owais, Muhammad; Park, Kang Ryoung
2018-01-01
The recent advancements in computer vision have opened new horizons for deploying biometric recognition algorithms in mobile and handheld devices. Similarly, iris recognition is now much needed in unconstraint scenarios with accuracy. These environments make the acquired iris image exhibit occlusion, low resolution, blur, unusual glint, ghost effect, and off-angles. The prevailing segmentation algorithms cannot cope with these constraints. In addition, owing to the unavailability of near-infrared (NIR) light, iris recognition in visible light environment makes the iris segmentation challenging with the noise of visible light. Deep learning with convolutional neural networks (CNN) has brought a considerable breakthrough in various applications. To address the iris segmentation issues in challenging situations by visible light and near-infrared light camera sensors, this paper proposes a densely connected fully convolutional network (IrisDenseNet), which can determine the true iris boundary even with inferior-quality images by using better information gradient flow between the dense blocks. In the experiments conducted, five datasets of visible light and NIR environments were used. For visible light environment, noisy iris challenge evaluation part-II (NICE-II selected from UBIRIS.v2 database) and mobile iris challenge evaluation (MICHE-I) datasets were used. For NIR environment, the institute of automation, Chinese academy of sciences (CASIA) v4.0 interval, CASIA v4.0 distance, and IIT Delhi v1.0 iris datasets were used. Experimental results showed the optimal segmentation of the proposed IrisDenseNet and its excellent performance over existing algorithms for all five datasets. PMID:29748495
Arsalan, Muhammad; Naqvi, Rizwan Ali; Kim, Dong Seop; Nguyen, Phong Ha; Owais, Muhammad; Park, Kang Ryoung
2018-05-10
The recent advancements in computer vision have opened new horizons for deploying biometric recognition algorithms in mobile and handheld devices. Similarly, iris recognition is now much needed in unconstraint scenarios with accuracy. These environments make the acquired iris image exhibit occlusion, low resolution, blur, unusual glint, ghost effect, and off-angles. The prevailing segmentation algorithms cannot cope with these constraints. In addition, owing to the unavailability of near-infrared (NIR) light, iris recognition in visible light environment makes the iris segmentation challenging with the noise of visible light. Deep learning with convolutional neural networks (CNN) has brought a considerable breakthrough in various applications. To address the iris segmentation issues in challenging situations by visible light and near-infrared light camera sensors, this paper proposes a densely connected fully convolutional network (IrisDenseNet), which can determine the true iris boundary even with inferior-quality images by using better information gradient flow between the dense blocks. In the experiments conducted, five datasets of visible light and NIR environments were used. For visible light environment, noisy iris challenge evaluation part-II (NICE-II selected from UBIRIS.v2 database) and mobile iris challenge evaluation (MICHE-I) datasets were used. For NIR environment, the institute of automation, Chinese academy of sciences (CASIA) v4.0 interval, CASIA v4.0 distance, and IIT Delhi v1.0 iris datasets were used. Experimental results showed the optimal segmentation of the proposed IrisDenseNet and its excellent performance over existing algorithms for all five datasets.
High resolution global gridded data for use in population studies
NASA Astrophysics Data System (ADS)
Lloyd, Christopher T.; Sorichetta, Alessandro; Tatem, Andrew J.
2017-01-01
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website.
High resolution global gridded data for use in population studies.
Lloyd, Christopher T; Sorichetta, Alessandro; Tatem, Andrew J
2017-01-31
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website.
NASA Astrophysics Data System (ADS)
DeLong, S. B.; Avdievitch, N. N.
2014-12-01
As high-resolution topographic data become increasingly available, comparison of multitemporal and disparate datasets (e.g. airborne and terrestrial lidar) enable high-accuracy quantification of landscape change and detailed mapping of surface processes. However, if these data are not properly managed and aligned with maximum precision, results may be spurious. Often this is due to slight differences in coordinate systems that require complex geographic transformations and systematic error that is difficult to diagnose and correct. Here we present an analysis of four airborne and three terrestrial lidar datasets collected between 2003 and 2014 that we use to quantify change at an active earthflow in Mill Gulch, Sonoma County, California. We first identify and address systematic error internal to each dataset, such as registration offset between flight lines or scan positions. We then use a variant of an iterative closest point (ICP) algorithm to align point cloud data by maximizing use of stable portions of the landscape with minimal internal error. Using products derived from the aligned point clouds, we make our geomorphic analyses. These methods may be especially useful for change detection analyses in which accurate georeferencing is unavailable, as is often the case with some terrestrial lidar or "structure from motion" data. Our results show that the Mill Gulch earthflow has been active throughout the study period. We see continuous downslope flow, ongoing incorporation of new hillslope material into the flow, sediment loss from hillslopes, episodic fluvial erosion of the earthflow toe, and an indication of increased activity during periods of high precipitation.
IMG/M: integrated genome and metagenome comparative data analysis system
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; ...
2016-10-13
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less
IMG/M: integrated genome and metagenome comparative data analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less
IMG/M: integrated genome and metagenome comparative data analysis system
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N.; Kyrpides, Nikos C.
2017-01-01
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system. PMID:27738135
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolf, Rachel C.; D’Andrea, Chris B.; Gupta, Ravi R.
2016-04-20
Using the largest single-survey sample of Type Ia supernovae (SNe Ia) to date, we study the relationship between properties of SNe Ia and those of their host galaxies, focusing primarily on correlations with Hubble residuals (HR). Our sample consists of 345 photometrically-classified or spectroscopicallyconfirmed SNe Ia discovered as part of the SDSS-II Supernova Survey (SDSS-SNS). This analysis utilizes host-galaxy spectroscopy obtained during the SDSS-I/II spectroscopic survey and from an ancillary program on the SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS) that obtained spectra for nearly all host galaxies of SDSS-II SN candidates. In addition, we use photometric hostgalaxy properties from themore » SDSS-SNS data release (Sako et al. 2014) such as host stellar mass and star-formation rate. We confirm the well-known relation between HR and host-galaxy mass and find a 3.6σ significance of a non-zero linear slope. We also recover correlations between HR and hostgalaxy gas-phase metallicity and specific star-formation rate as they are reported in the literature. With our large dataset, we examine correlations between HR and multiple host-galaxy properties simultaneously and find no evidence of a significant correlation. We also independently analyze our spectroscopically-confirmed and photometrically-classified SNe Ia and comment on the significance of similar combined datasets for future surveys.« less
Network analysis of mesoscale optical recordings to assess regional, functional connectivity.
Lim, Diana H; LeDue, Jeffrey M; Murphy, Timothy H
2015-10-01
With modern optical imaging methods, it is possible to map structural and functional connectivity. Optical imaging studies that aim to describe large-scale neural connectivity often need to handle large and complex datasets. In order to interpret these datasets, new methods for analyzing structural and functional connectivity are being developed. Recently, network analysis, based on graph theory, has been used to describe and quantify brain connectivity in both experimental and clinical studies. We outline how to apply regional, functional network analysis to mesoscale optical imaging using voltage-sensitive-dye imaging and channelrhodopsin-2 stimulation in a mouse model. We include links to sample datasets and an analysis script. The analyses we employ can be applied to other types of fluorescence wide-field imaging, including genetically encoded calcium indicators, to assess network properties. We discuss the benefits and limitations of using network analysis for interpreting optical imaging data and define network properties that may be used to compare across preparations or other manipulations such as animal models of disease.
A database of marine phytoplankton abundance, biomass and species composition in Australian waters
NASA Astrophysics Data System (ADS)
Davies, Claire H.; Coughlan, Alex; Hallegraeff, Gustaaf; Ajani, Penelope; Armbrecht, Linda; Atkins, Natalia; Bonham, Prudence; Brett, Steve; Brinkman, Richard; Burford, Michele; Clementson, Lesley; Coad, Peter; Coman, Frank; Davies, Diana; Dela-Cruz, Jocelyn; Devlin, Michelle; Edgar, Steven; Eriksen, Ruth; Furnas, Miles; Hassler, Christel; Hill, David; Holmes, Michael; Ingleton, Tim; Jameson, Ian; Leterme, Sophie C.; Lønborg, Christian; McLaughlin, James; McEnnulty, Felicity; McKinnon, A. David; Miller, Margaret; Murray, Shauna; Nayar, Sasi; Patten, Renee; Pritchard, Tim; Proctor, Roger; Purcell-Meyerink, Diane; Raes, Eric; Rissik, David; Ruszczyk, Jason; Slotwinski, Anita; Swadling, Kerrie M.; Tattersall, Katherine; Thompson, Peter; Thomson, Paul; Tonks, Mark; Trull, Thomas W.; Uribe-Palomino, Julian; Waite, Anya M.; Yauwenas, Rouna; Zammit, Anthony; Richardson, Anthony J.
2016-06-01
There have been many individual phytoplankton datasets collected across Australia since the mid 1900s, but most are unavailable to the research community. We have searched archives, contacted researchers, and scanned the primary and grey literature to collate 3,621,847 records of marine phytoplankton species from Australian waters from 1844 to the present. Many of these are small datasets collected for local questions, but combined they provide over 170 years of data on phytoplankton communities in Australian waters. Units and taxonomy have been standardised, obviously erroneous data removed, and all metadata included. We have lodged this dataset with the Australian Ocean Data Network (http://portal.aodn.org.au/) allowing public access. The Australian Phytoplankton Database will be invaluable for global change studies, as it allows analysis of ecological indicators of climate change and eutrophication (e.g., changes in distribution; diatom:dinoflagellate ratios). In addition, the standardised conversion of abundance records to biomass provides modellers with quantifiable data to initialise and validate ecosystem models of lower marine trophic levels.
Quantifying fossil fuel CO2 from continuous measurements of APO: a novel approach
NASA Astrophysics Data System (ADS)
Pickers, Penelope; Manning, Andrew C.; Forster, Grant L.; van der Laan, Sander; Wilson, Phil A.; Wenger, Angelina; Meijer, Harro A. J.; Oram, David E.; Sturges, William T.
2016-04-01
Using atmospheric measurements to accurately quantify CO2 emissions from fossil fuel sources requires the separation of biospheric and anthropogenic CO2 fluxes. The ability to quantify the fossil fuel component of CO2 (ffCO2) from atmospheric measurements enables more accurate 'top-down' verification of CO2 emissions inventories, which frequently have large uncertainty. Typically, ffCO2 is quantified (in ppm units) from discrete atmospheric measurements of Δ14CO2, combined with higher resolution atmospheric CO measurements, and with knowledge of CO:ffCO2 ratios. In the United Kingdom (UK), however, measurements of Δ14CO2 are often significantly biased by nuclear power plant influences, which limit the use of this approach. We present a novel approach for quantifying ffCO2 using measurements of APO (Atmospheric Potential Oxygen; a tracer derived from concurrent measurements of CO2 and O2) from two measurement sites in Norfolk, UK. Our approach is similar to that used for quantifying ffCO2 from CO measurements (ffCO2(CO)), whereby ffCO2(APO) = (APOmeas - APObg)/RAPO, where (APOmeas - APObg) is the APO deviation from the background, and RAPO is the APO:CO2 combustion ratio for fossil fuel. Time varying values of RAPO are calculated from the global gridded COFFEE (CO2 release and Oxygen uptake from Fossil Fuel Emission Estimate) dataset, combined with NAME (Numerical Atmospheric-dispersion Modelling Environment) transport model footprints. We compare our ffCO2(APO) results to results obtained using the ffCO2(CO) method, using CO:CO2 fossil fuel emission ratios (RCO) from the EDGAR (Emission Database for Global Atmospheric Research) database. We find that the APO ffCO2 quantification method is more precise than the CO method, owing primarily to a smaller range of possible APO:CO2 fossil fuel emission ratios, compared to the CO:CO2 emission ratio range. Using a long-term dataset of atmospheric O2, CO2, CO and Δ14CO2 from Lutjewad, The Netherlands, we examine the accuracy of our ffCO2(APO) method, and assess the potential of using APO to quantify ffCO2 independently from Δ14CO2 measurements, which, as well as being unreliable in many UK regions, are very costly. Using APO to quantify ffCO2 has significant policy relevance, with the potential to provide more accurate and more precise top-down verification of fossil fuel emissions.
NASA Astrophysics Data System (ADS)
Bremer, Magnus; Sass, Oliver; Vetter, Michael; Geilhausen, Martin
2010-05-01
Country-wide ALS datasets of high resolution become more and more available and can provide a solid basis for geomorphological research. On the other hand, terrain changes after geomorphological extreme events can be quickly and flexibly documented by TLS and be compared to the pre-existing ALS datasets. For quantifying net-erosion, net-sedimentation and transport rates of events like rock falls, landslides and debris flows, comparing TLS surveys after the event to ALS data before the event is likely to become a widespread and powerful tool. However, the accuracy and possible errors of fitting ALS and TLS data have to be carefully assessed. We tried to quantify sediment movement and terrain changes caused by a major debris-flow-event in the Halltal in the Karwendel Mountains (Tyrol, Austria). Wide areas of limestone debris were dissected and relocated in the course of an exceptional rainstorm event on 29th June 2008. The event occurred 64 years after wildfire-driven deforestation. In the area, dense dwarf pine (pinus mugo) shrub cover is widespread, causing specific problems in generating terrain models. We compared a pre-event ALS-dataset, provided by the federal-state of Tyrol, and a post-event TLS survey. The two scanner systems have differing system characteristics (scan angles, resolutions, application of dGPS, etc.), causing different systematic and random errors. Combining TLS and ALS point data was achieved using an algorithm of the RISCAN_PRO software (Multi Station Adjustment), enabling a least square fitting between the two surfaces. Adjustment and registration accuracies as well as the quality of applied vegetation filters, mainly eliminating non-groundpoints from the raw data, are crucial for the generation of high-quality terrain models and a reliable comparison of the two data sets. Readily available filter algorithms provide good performance for gently sloped terrain and high forest vegetation. However, the low krummholz vegetation on steep terrain proved difficult to be filtered. This is due to a small height difference between terrain and canopy, a very strong height variation of the terrain points compared to the height variation of the canopy points and a very high density of the vegetation. The letter leads to very low percentages of groundpoints (1 - 5%). A combined filtering approach using a surface-based filter and a morphological filter, adapted to the characteristics of the krummholz vegetation were applied to overcome these problems. In the next step, the datasets were compared, erosion- and sedimentation areas were detected and quantified (cut-and-fill) in view of the accuracy achieved. The position of the relocated surface areas were compared to the morphological structures of the initial surface (inclination, curvature, flowpaths, hydrological catchments). Considerable deviations between the datasets were caused, besides the geomorphic terrain changes, by systematic and random errors. Due to the scanner perspective, parts of the steep slopes are depicted inaccurately by ALS. Rugged terrain surfaces cause random errors of ALS/TLS adjustment when the ratio of point density to surface variability is low. Due to multiple returns and alteration of pulse shape, terrain altitude is frequently overestimated when dense shrub cover is present. This effect becomes stronger with larger footprints. Despite these problems, erosional and depositional areas of debris flows could be clearly identified and match the results of field surveys. Strongest erosion occurred along the flowpaths with the greatest runoff concentration, mainly at the bedrock-debris interface.
Kaushik, Abhinav; Ali, Shakir; Gupta, Dinesh
2017-01-01
Gene connection rewiring is an essential feature of gene network dynamics. Apart from its normal functional role, it may also lead to dysregulated functional states by disturbing pathway homeostasis. Very few computational tools measure rewiring within gene co-expression and its corresponding regulatory networks in order to identify and prioritize altered pathways which may or may not be differentially regulated. We have developed Altered Pathway Analyzer (APA), a microarray dataset analysis tool for identification and prioritization of altered pathways, including those which are differentially regulated by TFs, by quantifying rewired sub-network topology. Moreover, APA also helps in re-prioritization of APA shortlisted altered pathways enriched with context-specific genes. We performed APA analysis of simulated datasets and p53 status NCI-60 cell line microarray data to demonstrate potential of APA for identification of several case-specific altered pathways. APA analysis reveals several altered pathways not detected by other tools evaluated by us. APA analysis of unrelated prostate cancer datasets identifies sample-specific as well as conserved altered biological processes, mainly associated with lipid metabolism, cellular differentiation and proliferation. APA is designed as a cross platform tool which may be transparently customized to perform pathway analysis in different gene expression datasets. APA is freely available at http://bioinfo.icgeb.res.in/APA. PMID:28084397
CLIC, a tool for expanding biological pathways based on co-expression across thousands of datasets
Li, Yang; Liu, Jun S.; Mootha, Vamsi K.
2017-01-01
In recent years, there has been a huge rise in the number of publicly available transcriptional profiling datasets. These massive compendia comprise billions of measurements and provide a special opportunity to predict the function of unstudied genes based on co-expression to well-studied pathways. Such analyses can be very challenging, however, since biological pathways are modular and may exhibit co-expression only in specific contexts. To overcome these challenges we introduce CLIC, CLustering by Inferred Co-expression. CLIC accepts as input a pathway consisting of two or more genes. It then uses a Bayesian partition model to simultaneously partition the input gene set into coherent co-expressed modules (CEMs), while assigning the posterior probability for each dataset in support of each CEM. CLIC then expands each CEM by scanning the transcriptome for additional co-expressed genes, quantified by an integrated log-likelihood ratio (LLR) score weighted for each dataset. As a byproduct, CLIC automatically learns the conditions (datasets) within which a CEM is operative. We implemented CLIC using a compendium of 1774 mouse microarray datasets (28628 microarrays) or 1887 human microarray datasets (45158 microarrays). CLIC analysis reveals that of 910 canonical biological pathways, 30% consist of strongly co-expressed gene modules for which new members are predicted. For example, CLIC predicts a functional connection between protein C7orf55 (FMC1) and the mitochondrial ATP synthase complex that we have experimentally validated. CLIC is freely available at www.gene-clic.org. We anticipate that CLIC will be valuable both for revealing new components of biological pathways as well as the conditions in which they are active. PMID:28719601
Fast randomization of large genomic datasets while preserving alteration counts.
Gobbi, Andrea; Iorio, Francesco; Dawson, Kevin J; Wedge, David C; Tamborero, David; Alexandrov, Ludmil B; Lopez-Bigas, Nuria; Garnett, Mathew J; Jurman, Giuseppe; Saez-Rodriguez, Julio
2014-09-01
Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Asfaw, Alemayehu; Shucksmith, James; Smith, Andrea; Cherry, Katherine
2015-04-01
Metaldehyde is an active ingredient in agricultural pesticides such as slug pellets, which are heavily applied to UK farmland during the autumn application season. There is current concern that existing drinking water treatment processes may be inadequate in reducing potentially high levels of metaldehyde in surface waters to below the UK drinking water quality regulation limit of 0.1 µg/l. In addition, current water quality monitoring methods can miss short term fluctuations in metaldehyde concentration caused by rainfall driven runoff, hampering prediction of the potential risk of exposure. Datasets describing levels, fate and transport of metaldehyde in river catchments are currently very scarce. This work presents results from an ongoing study to quantify the presence of metaldehyde in surface waters within a UK catchment used for drinking water abstraction. High resolution water quality data from auto-samplers installed in rivers are coupled with radar rainfall, catchment characteristics and land use data to i) understand which hydro-meteorological characteristics of the catchment trigger the peak migration of metaldehyde to surface waters; ii) assess the relationship between measured metaldehyde levels and catchment characteristics such as land use, topographic index, proximity to water bodies and runoff generation area; iii) describe the current risks to drinking water supply and discuss mitigation options based on modelling and real-time control of water abstraction. Identifying the correlation between catchment attributes and metaldehyde generation will help in the development of effective catchment management strategies, which can help to significantly reduce the amount of metaldehyde finding its way into river water. Furthermore, the effectiveness of current water quality monitoring strategy in accurately quantifying the generation of metaldehyde from the catchment and its ability to benefit the development of effective catchment management practices has also been investigated.
NASA Astrophysics Data System (ADS)
Zhou, Y.; Zhang, W.; Rinne, J.
2016-12-01
Climate feedbacks represent the large uncertainty in the climate projection partly due to the difficulties to quantify the feedback mechanisms in the biosphere-atmosphere interaction. Recently, a negative climate feedback mechanism whereby higher temperatures and CO2-levels boost continental biomass production, leading to increased biogenic secondary organic aerosol (SOA) and cloud condensation nuclei concentrations, tending to cause cooling, has been attached much attention. To quantify the relationship between biogenic organic compounds (BVOCs) and SOA, a five-year data set (2008, 2010-2011,2013-2014) for SOA and monoterpenes concentrations (the dominant fraction of BVOCs) measured at the SMEAR II station in Hyytiälä, Finland, is analyzed. Our results show that there is a moderate linear correlation between SOA and monoterpenes concentration with the correlation coefficient (R) as 0.66. To rule out the influence of anthropogenic aerosols, the dataset is further filtered by selecting the data at the wind direction of cleaner air mass, leading to an improved R as 0.68. As temperature is a critical factor for vegetation growth, BVOC emissions, and condensation rate, the correlation between SOA and monoterpenes concentration at different temperature windows are studied. The result shows a higher R and slope of linear regression as temperature increases. To identify the dominant oxidant responsible for the BVOC-SOA conversion, the correlations between SOA concentration and the monoterpenes oxidation rates by O3 and OH are compared, suggesting more SOA is contributed by O3 oxidation process. Finally, the possible processes and factors such as the atmospheric boundary layer depth, limiting factor in the monoterpenes oxidation process, as well as temperature sensitivity in the condensation process contributing to the temperature dependence of correlation between BVOA and SOA are investigated.
Higgins, H M; Dryden, I L; Green, M J
2012-09-15
The two key aims of this research were: (i) to conduct a probabilistic elicitation to quantify the variation in veterinarians' beliefs regarding the efficacy of systemic antibiotics when used as an adjunct to intra-mammary dry cow therapy and (ii) to investigate (in a Bayesian statistical framework) the strength of future research evidence required (in theory) to change the beliefs of practising veterinary surgeons regarding the efficacy of systemic antibiotics, given their current clinical beliefs. The beliefs of 24 veterinarians in 5 practices in England were quantified as probability density functions. Classic multidimensional scaling revealed major variations in beliefs both within and between veterinary practices which included: confident optimism, confident pessimism and considerable uncertainty. Of the 9 veterinarians interviewed holding further cattle qualifications, 6 shared a confidently pessimistic belief in the efficacy of systemic therapy and whilst 2 were more optimistic, they were also more uncertain. A Bayesian model based on a synthetic dataset from a randomised clinical trial (showing no benefit with systemic therapy) predicted how each of the 24 veterinarians' prior beliefs would alter as the size of the clinical trial increased, assuming that practitioners would update their beliefs rationally in accordance with Bayes' theorem. The study demonstrated the usefulness of probabilistic elicitation for evaluating the diversity and strength of practitioners' beliefs. The major variation in beliefs observed raises interest in the veterinary profession's approach to prescribing essential medicines. Results illustrate the importance of eliciting prior beliefs when designing clinical trials in order to increase the chance that trial data are of sufficient strength to alter the clinical beliefs of practitioners and do not merely serve to satisfy researchers. Copyright © 2012 Elsevier B.V. All rights reserved.
Quantification of the thorax-to-abdomen breathing ratio for breathing motion modeling.
White, Benjamin M; Zhao, Tianyu; Lamb, James; Bradley, Jeffrey D; Low, Daniel A
2013-06-01
The purpose of this study was to develop a methodology to quantitatively measure the thorax-to-abdomen breathing ratio from a 4DCT dataset for breathing motion modeling and breathing motion studies. The thorax-to-abdomen breathing ratio was quantified by measuring the rate of cross-sectional volume increase throughout the thorax and abdomen as a function of tidal volume. Twenty-six 16-slice 4DCT patient datasets were acquired during quiet respiration using a protocol that acquired 25 ciné scans at each couch position. Fifteen datasets included data from the neck through the pelvis. Tidal volume, measured using a spirometer and abdominal pneumatic bellows, was used as breathing-cycle surrogates. The cross-sectional volume encompassed by the skin contour when compared for each CT slice against the tidal volume exhibited a nearly linear relationship. A robust iteratively reweighted least squares regression analysis was used to determine η(i), defined as the amount of cross-sectional volume expansion at each slice i per unit tidal volume. The sum Ση(i) throughout all slices was predicted to be the ratio of the geometric expansion of the lung and the tidal volume; 1.11. The Xiphoid process was selected as the boundary between the thorax and abdomen. The Xiphoid process slice was identified in a scan acquired at mid-inhalation. The imaging protocol had not originally been designed for purposes of measuring the thorax-to-abdomen breathing ratio so the scans did not extend to the anatomy with η(i) = 0. Extrapolation of η(i)-η(i) = 0 was used to include the entire breathing volume. The thorax and abdomen regions were individually analyzed to determine the thorax-to-abdomen breathing ratios. There were 11 image datasets that had been scanned only through the thorax. For these cases, the abdomen breathing component was equal to 1.11 - Ση(i) where the sum was taken throughout the thorax. The average Ση(i) for thorax and abdomen image datasets was found to be 1.20 ± 0.17, close to the expected value of 1.11. The thorax-to-abdomen breathing ratio was 0.32 ± 0.24. The average Ση(i) was 0.26 ± 0.14 in the thorax and 0.93 ± 0.22 in the abdomen. In the scan datasets that encompassed only the thorax, the average Ση(i) was 0.21 ± 0.11. A method to quantify the relationship between abdomen and thoracic breathing was developed and characterized.
Clinical Value of Prognosis Gene Expression Signatures in Colorectal Cancer: A Systematic Review
Cordero, David; Riccadonna, Samantha; Solé, Xavier; Crous-Bou, Marta; Guinó, Elisabet; Sanjuan, Xavier; Biondo, Sebastiano; Soriano, Antonio; Jurman, Giuseppe; Capella, Gabriel; Furlanello, Cesare; Moreno, Victor
2012-01-01
Introduction The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets. Methods A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples. Results Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system. Conclusions The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic. PMID:23145004
Quantifying Stock Return Distributions in Financial Markets
Botta, Federico; Moat, Helen Susannah; Stanley, H. Eugene; Preis, Tobias
2015-01-01
Being able to quantify the probability of large price changes in stock markets is of crucial importance in understanding financial crises that affect the lives of people worldwide. Large changes in stock market prices can arise abruptly, within a matter of minutes, or develop across much longer time scales. Here, we analyze a dataset comprising the stocks forming the Dow Jones Industrial Average at a second by second resolution in the period from January 2008 to July 2010 in order to quantify the distribution of changes in market prices at a range of time scales. We find that the tails of the distributions of logarithmic price changes, or returns, exhibit power law decays for time scales ranging from 300 seconds to 3600 seconds. For larger time scales, we find that the distributions tails exhibit exponential decay. Our findings may inform the development of models of market behavior across varying time scales. PMID:26327593
Quantifying Stock Return Distributions in Financial Markets.
Botta, Federico; Moat, Helen Susannah; Stanley, H Eugene; Preis, Tobias
2015-01-01
Being able to quantify the probability of large price changes in stock markets is of crucial importance in understanding financial crises that affect the lives of people worldwide. Large changes in stock market prices can arise abruptly, within a matter of minutes, or develop across much longer time scales. Here, we analyze a dataset comprising the stocks forming the Dow Jones Industrial Average at a second by second resolution in the period from January 2008 to July 2010 in order to quantify the distribution of changes in market prices at a range of time scales. We find that the tails of the distributions of logarithmic price changes, or returns, exhibit power law decays for time scales ranging from 300 seconds to 3600 seconds. For larger time scales, we find that the distributions tails exhibit exponential decay. Our findings may inform the development of models of market behavior across varying time scales.
ISRUC-Sleep: A comprehensive public dataset for sleep researchers.
Khalighi, Sirvan; Sousa, Teresa; Santos, José Moutinho; Nunes, Urbano
2016-02-01
To facilitate the performance comparison of new methods for sleep patterns analysis, datasets with quality content, publicly-available, are very important and useful. We introduce an open-access comprehensive sleep dataset, called ISRUC-Sleep. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC). The dataset comprises three groups of data: (1) data concerning 100 subjects, with one recording session per subject; (2) data gathered from 8 subjects; two recording sessions were performed per subject, and (3) data collected from one recording session related to 10 healthy subjects. The polysomnography (PSG) recordings, associated with each subject, were visually scored by two human experts. Comparing the existing sleep-related public datasets, ISRUC-Sleep provides data of a reasonable number of subjects with different characteristics such as: data useful for studies involving changes in the PSG signals over time; and data of healthy subjects useful for studies involving comparison of healthy subjects with the patients, suffering from sleep disorders. This dataset was created aiming to complement existing datasets by providing easy-to-apply data collection with some characteristics not covered yet. ISRUC-Sleep can be useful for analysis of new contributions: (i) in biomedical signal processing; (ii) in development of ASSC methods; and (iii) on sleep physiology studies. To evaluate and compare new contributions, which use this dataset as a benchmark, results of applying a subject-independent automatic sleep stage classification (ASSC) method on ISRUC-Sleep dataset are presented. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
LiDAR Vegetation Investigation and Signature Analysis System (LVISA)
NASA Astrophysics Data System (ADS)
Höfle, Bernhard; Koenig, Kristina; Griesbaum, Luisa; Kiefer, Andreas; Hämmerle, Martin; Eitel, Jan; Koma, Zsófia
2015-04-01
Our physical environment undergoes constant changes in space and time with strongly varying triggers, frequencies, and magnitudes. Monitoring these environmental changes is crucial to improve our scientific understanding of complex human-environmental interactions and helps us to respond to environmental change by adaptation or mitigation. The three-dimensional (3D) description of the Earth surface features and the detailed monitoring of surface processes using 3D spatial data have gained increasing attention within the last decades, such as in climate change research (e.g., glacier retreat), carbon sequestration (e.g., forest biomass monitoring), precision agriculture and natural hazard management. In all those areas, 3D data have helped to improve our process understanding by allowing quantifying the structural properties of earth surface features and their changes over time. This advancement has been fostered by technological developments and increased availability of 3D sensing systems. In particular, LiDAR (light detection and ranging) technology, also referred to as laser scanning, has made significant progress and has evolved into an operational tool in environmental research and geosciences. The main result of LiDAR measurements is a highly spatially resolved 3D point cloud. Each point within the LiDAR point cloud has a XYZ coordinate associated with it and often additional information such as the strength of the returned backscatter. The point cloud provided by LiDAR contains rich geospatial, structural, and potentially biochemical information about the surveyed objects. To deal with the inherently unorganized datasets and the large data volume (frequently millions of XYZ coordinates) of LiDAR datasets, a multitude of algorithms for automatic 3D object detection (e.g., of single trees) and physical surface description (e.g., biomass) have been developed. However, so far the exchange of datasets and approaches (i.e., extraction algorithms) among LiDAR users lacks behind. We propose a novel concept, the LiDAR Vegetation Investigation and Signature Analysis System (LVISA), which shall enhance sharing of i) reference datasets of single vegetation objects with rich reference data (e.g., plant species, basic plant morphometric information) and ii) approaches for information extraction (e.g., single tree detection, tree species classification based on waveform LiDAR features). We will build an extensive LiDAR data repository for supporting the development and benchmarking of LiDAR-based object information extraction. The LiDAR Vegetation Investigation and Signature Analysis System (LVISA) uses international web service standards (Open Geospatial Consortium, OGC) for geospatial data access and also analysis (e.g., OGC Web Processing Services). This will allow the research community identifying plant object specific vegetation features from LiDAR data, while accounting for differences in LiDAR systems (e.g., beam divergence), settings (e.g., point spacing), and calibration techniques. It is the goal of LVISA to develop generic 3D information extraction approaches, which can be seamlessly transferred to other datasets, timestamps and also extraction tasks. The current prototype of LVISA can be visited and tested online via http://uni-heidelberg.de/lvisa. Video tutorials provide a quick overview and entry into the functionality of LVISA. We will present the current advances of LVISA and we will highlight future research and extension of LVISA, such as integrating low-cost LiDAR data and datasets acquired by highly temporal scanning of vegetation (e.g., continuous measurements). Everybody is invited to join the LVISA development and share datasets and analysis approaches in an interoperable way via the web-based LVISA geoportal.
Neuro-evolutionary computing paradigm for Painlevé equation-II in nonlinear optics
NASA Astrophysics Data System (ADS)
Ahmad, Iftikhar; Ahmad, Sufyan; Awais, Muhammad; Ul Islam Ahmad, Siraj; Asif Zahoor Raja, Muhammad
2018-05-01
The aim of this study is to investigate the numerical treatment of the Painlevé equation-II arising in physical models of nonlinear optics through artificial intelligence procedures by incorporating a single layer structure of neural networks optimized with genetic algorithms, sequential quadratic programming and active set techniques. We constructed a mathematical model for the nonlinear Painlevé equation-II with the help of networks by defining an error-based cost function in mean square sense. The performance of the proposed technique is validated through statistical analyses by means of the one-way ANOVA test conducted on a dataset generated by a large number of independent runs.
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-07-01
In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Oh, Min; Ahn, Jaegyoon; Yoon, Youngmi
2014-01-01
The growing number and variety of genetic network datasets increases the feasibility of understanding how drugs and diseases are associated at the molecular level. Properly selected features of the network representations of existing drug-disease associations can be used to infer novel indications of existing drugs. To find new drug-disease associations, we generated an integrative genetic network using combinations of interactions, including protein-protein interactions and gene regulatory network datasets. Within this network, network adjacencies of drug-drug and disease-disease were quantified using a scored path between target sets of them. Furthermore, the common topological module of drugs or diseases was extracted, and thereby the distance between topological drug-module and disease (or disease-module and drug) was quantified. These quantified scores were used as features for the prediction of novel drug-disease associations. Our classifiers using Random Forest, Multilayer Perceptron and C4.5 showed a high specificity and sensitivity (AUC score of 0.855, 0.828 and 0.797 respectively) in predicting novel drug indications, and displayed a better performance than other methods with limited drug and disease properties. Our predictions and current clinical trials overlap significantly across the different phases of drug development. We also identified and visualized the topological modules of predicted drug indications for certain types of cancers, and for Alzheimer’s disease. Within the network, those modules show potential pathways that illustrate the mechanisms of new drug indications, including propranolol as a potential anticancer agent and telmisartan as treatment for Alzheimer’s disease. PMID:25356910
Xu, Lingyu; Xu, Yuancheng; Coulden, Richard; Sonnex, Emer; Hrybouski, Stanislau; Paterson, Ian; Butler, Craig
2018-05-11
Epicardial adipose tissue (EAT) volume derived from contrast enhanced (CE) computed tomography (CT) scans is not well validated. We aim to establish a reliable threshold to accurately quantify EAT volume from CE datasets. We analyzed EAT volume on paired non-contrast (NC) and CE datasets from 25 patients to derive appropriate Hounsfield (HU) cutpoints to equalize two EAT volume estimates. The gold standard threshold (-190HU, -30HU) was used to assess EAT volume on NC datasets. For CE datasets, EAT volumes were estimated using three previously reported thresholds: (-190HU, -30HU), (-190HU, -15HU), (-175HU, -15HU) and were analyzed by a semi-automated 3D Fat analysis software. Subsequently, we applied a threshold correction to (-190HU, -30HU) based on mean differences in radiodensity between NC and CE images (ΔEATrd = CE radiodensity - NC radiodensity). We then validated our findings on EAT threshold in 21 additional patients with paired CT datasets. EAT volume from CE datasets using previously published thresholds consistently underestimated EAT volume from NC dataset standard by a magnitude of 8.2%-19.1%. Using our corrected threshold (-190HU, -3HU) in CE datasets yielded statistically identical EAT volume to NC EAT volume in the validation cohort (186.1 ± 80.3 vs. 185.5 ± 80.1 cm 3 , Δ = 0.6 cm 3 , 0.3%, p = 0.374). Estimating EAT volume from contrast enhanced CT scans using a corrected threshold of -190HU, -3HU provided excellent agreement with EAT volume from non-contrast CT scans using a standard threshold of -190HU, -30HU. Copyright © 2018. Published by Elsevier B.V.
Predicting MHC-II binding affinity using multiple instance regression
EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2011-01-01
Reliably predicting the ability of antigen peptides to bind to major histocompatibility complex class II (MHC-II) molecules is an essential step in developing new vaccines. Uncovering the amino acid sequence correlates of the binding affinity of MHC-II binding peptides is important for understanding pathogenesis and immune response. The task of predicting MHC-II binding peptides is complicated by the significant variability in their length. Most existing computational methods for predicting MHC-II binding peptides focus on identifying a nine amino acids core region in each binding peptide. We formulate the problems of qualitatively and quantitatively predicting flexible length MHC-II peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we introduce MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression. We present results of experiments using several benchmark datasets that show that MHCMIR is competitive with the state-of-the-art methods for predicting MHC-II binding peptides. An online web server that implements the MHCMIR method for MHC-II binding affinity prediction is freely accessible at http://ailab.cs.iastate.edu/mhcmir. PMID:20855923
Identifying and Quantifying Chemical Forms of Sediment-Bound Ferrous Iron.
NASA Astrophysics Data System (ADS)
Kohler, M.; Kent, D. B.; Bekins, B. A.; Cozzarelli, I.; Ng, G. H. C.
2015-12-01
Aqueous Fe(II) produced by dissimilatory iron reduction comprises only a small fraction of total biogenic Fe(II) within an aquifer. Most biogenic Fe(II) is bound to sediments on ion exchange sites; as surface complexes and, possibly, surface precipitates; or incorporated into solid phases (e.g., siderite, magnetite). Different chemical forms of sediment-bound Fe(II) have different reactivities (e.g., with dissolved oxygen) and their formation or destruction by sorption/desorption and precipitation/dissolution is coupled to different solutes (e.g., major cations, H+, carbonate). We are quantifying chemical forms of sediment-bound Fe(II) using previously published extractions, novel extractions, and experimental studies (e.g., Fe isotopic exchange). Sediments are from Bemidji, Minnesota, where biodegradation of hydrocarbons from a burst oil pipeline has driven extensive dissimilatory Fe(III) reduction, and sites potentially impacted by unconventional oil and gas development. Generally, minimal Fe(II) was mobilized from ion exchange sites (batch desorption with MgCl2 and repeated desorption with NH4Cl). A < 2mm sediment fraction from the iron-reducing zone at Bemidji had 1.8umol/g Fe(II) as surface complexes or carbonate phases (sodium acetate at pH 5) of which ca. 13% was present as surface complexes (FerroZine extractions). Total bioavailable Fe(III) and biogenic Fe(II) (HCl extractions) was 40-50 umole/g on both background and iron-reducing zone sediments . Approximately half of the HCl-extractable Fe from Fe-reducing zone sediments was Fe(II) whereas 12 - 15% of Fe extracted from background sediments was present as Fe(II). One-third to one-half of the total biogenic Fe(II) extracted from sediments collected from a Montana prairie pothole located downgradient from a produced-water disposal pit was present as surface-complexed Fe(II).
NASA Astrophysics Data System (ADS)
Nedoluha, Gerald E.; Kiefer, Michael; Lossow, Stefan; Gomez, R. Michael; Kämpfer, Niklaus; Lainer, Martin; Forkman, Peter; Christensen, Ole Martin; Oh, Jung Jin; Hartogh, Paul; Anderson, John; Bramstedt, Klaus; Dinelli, Bianca M.; Garcia-Comas, Maya; Hervig, Mark; Murtagh, Donal; Raspollini, Piera; Read, William G.; Rosenlof, Karen; Stiller, Gabriele P.; Walker, Kaley A.
2017-12-01
As part of the second SPARC (Stratosphere-troposphere Processes And their Role in Climate) water vapor assessment (WAVAS-II), we present measurements taken from or coincident with seven sites from which ground-based microwave instruments measure water vapor in the middle atmosphere. Six of the ground-based instruments are part of the Network for the Detection of Atmospheric Composition Change (NDACC) and provide datasets that can be used for drift and trend assessment. We compare measurements from these ground-based instruments with satellite datasets that have provided retrievals of water vapor in the lower mesosphere over extended periods since 1996. We first compare biases between the satellite and ground-based instruments from the upper stratosphere to the upper mesosphere. We then show a number of time series comparisons at 0.46 hPa, a level that is sensitive to changes in H2O and CH4 entering the stratosphere but, because almost all CH4 has been oxidized, is relatively insensitive to dynamical variations. Interannual variations and drifts are investigated with respect to both the Aura Microwave Limb Sounder (MLS; from 2004 onwards) and each instrument's climatological mean. We find that the variation in the interannual difference in the mean H2O measured by any two instruments is typically ˜ 1%. Most of the datasets start in or after 2004 and show annual increases in H2O of 0-1 % yr-1. In particular, MLS shows a trend of between 0.5 % yr-1 and 0.7 % yr-1 at the comparison sites. However, the two longest measurement datasets used here, with measurements back to 1996, show much smaller trends of +0.1 % yr-1 (at Mauna Loa, Hawaii) and -0.1 % yr-1 (at Lauder, New Zealand).
EFEHR - the European Facilities for Earthquake Hazard and Risk: beyond the web-platform
NASA Astrophysics Data System (ADS)
Danciu, Laurentiu; Wiemer, Stefan; Haslinger, Florian; Kastli, Philipp; Giardini, Domenico
2017-04-01
European Facilities for Earthquake Hazard and Risk (EEFEHR) represents the sustainable community resource for seismic hazard and risk in Europe. The EFEHR web platform is the main gateway to access data, models and tools as well as provide expertise relevant for assessment of seismic hazard and risk. The main services (databases and web-platform) are hosted at ETH Zurich and operated by the Swiss Seismological Service (Schweizerischer Erdbebendienst SED). EFEHR web-portal (www.efehr.org) collects and displays (i) harmonized datasets necessary for hazard and risk modeling, e.g. seismic catalogues, fault compilations, site amplifications, vulnerabilities, inventories; (ii) extensive seismic hazard products, namely hazard curves, uniform hazard spectra and maps for national and regional assessments. (ii) standardized configuration files for re-computing the regional seismic hazard models; (iv) relevant documentation of harmonized datasets, models and web-services. Today, EFEHR distributes full output of the 2013 European Seismic Hazard Model, ESHM13, as developed within the SHARE project (http://www.share-eu.org/); the latest results of the 2014 Earthquake Model of the Middle East (EMME14), derived within the EMME Project (www.emme-gem.org); the 2001 Global Seismic Hazard Assessment Project (GSHAP) results and the 2015 updates of the Swiss Seismic Hazard. New datasets related to either seismic hazard or risk will be incorporated as they become available. We present the currents status of the EFEHR platform, with focus on the challenges, summaries of the up-to-date datasets, user experience and feedback, as well as the roadmap to future technological innovation beyond the web-platform development. We also show the new services foreseen to fully integrate with the seismological core services of European Plate Observing System (EPOS).
The long-term dynamic changes in the triad, energy consumption, economic development, and Greenhouse gas (GHG) emissions, in Japan after World War II were quantified, and the interactions among them were analyzed based on an integrated suite of energy, emergy and economic indices...
Aerosol direct and indirect radiative effect over Eastern Mediterranean
NASA Astrophysics Data System (ADS)
Georgoulias, Aristeidis; Alexandri, Georgia; Zanis, Prodromos; Ntogras, Christos; Poeschl, Ulrich; Kourtidis, Kostas
In this work, we present results from the QUADIEEMS project which is focused on the aerosol-cloud relations and the aerosol direct and indirect radiative effect over the region of Eastern Mediterranean. First, a gridded dataset at a resolution of 0.1x0.1 degrees (~10km) with aerosol and cloud related parameters was compiled, using level-2 satellite observations from MODIS TERRA (3/2000-12/2012) and AQUA (7/2002-12/2012). The aerosol gridded dataset has been validated against sunphotometric measurements from 12 AERONET ground stations, showing that generally MODIS overestimates aerosol optical depth (AOD550). Then, the AOD550 and fine mode ratio (FMR550) data from MODIS were combined with aerosol index (AI) data from the Earth Probe TOMS and OMI satellite sensors, wind field data from the ERA-interim reanalysis and AOD550 data for various aerosol types from the GOCART model and the MACC reanalysis to quantify the relative contribution of different aerosol types (marine, dust, anthropogenic, fine-mode natural) to the total AOD550. The aerosol-cloud relations over the region were investigated with the use of the joint high resolution aerosol-cloud gridded dataset. Specifically, we focused on the seasonal relations between the cloud droplet number concentration (CDNC) and AOD550. The aerosol direct and first indirect radiative effect was then calculated for each aerosol type separately making use of the aerosol relative contribution to the total AOD550, the CDND-AOD550 relations and satellite-based parameterizations. The direct radiative effect was also quantified using simulations from a regional climate model (REGCM4), simulations with a radiative transfer model (SBDART) and the three methods were finally intervalidated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dhou, S; Cai, W; Hurwitz, M
Purpose: The goal of this study is to quantify the interfraction reproducibility of patient-specific motion models derived from 4DCBCT acquired on the day of treatment of lung cancer stereotactic body radiotherapy (SBRT) patients. Methods: Motion models are derived from patient 4DCBCT images acquired daily over 3–5 fractions of treatment by 1) applying deformable image registration between each 4DCBCT image and a reference phase from that day, resulting in a set of displacement vector fields (DVFs), and 2) performing principal component analysis (PCA) on the DVFs to derive a motion model. The motion model from the first day of treatment ismore » compared to motion models from each successive day of treatment to quantify variability in motion models generated from different days. Four SBRT patient datasets have been acquired thus far in this IRB approved study. Results: Fraction-specific motion models for each fraction and patient were derived and PCA eigenvectors and their associated eigenvalues are compared for each fraction. For the first patient dataset, the average root mean square error between the first two eigenvectors associated with the highest two eigenvalues, in four fractions was 0.1, while it was 0.25 between the last three PCA eigenvectors associated with the lowest three eigenvalues. It was found that the eigenvectors and eigenvalues of PCA motion models for each treatment fraction have variations and the first few eigenvectors are shown to be more stable across treatment fractions than others. Conclusion: Analysis of this dataset showed that the first two eigenvectors of the PCA patient-specific motion models derived from 4DCBCT were stable over the course of several treatment fractions. The third, fourth, and fifth eigenvectors had larger variations.« less
Unnikrishnan, Ginu; Xu, Chun; Popp, Kristin L; Hughes, Julie M; Yuan, Amy; Guerriere, Katelyn I; Caksa, Signe; Ackerman, Kathryn E; Bouxsein, Mary L; Reifman, Jaques
2018-07-01
Whole-bone analyses can obscure regional heterogeneities in bone characteristics. Quantifying these heterogeneities might improve our understanding of the etiology of injuries, such as lower-extremity stress fractures. Here, we performed regional analyses of high-resolution peripheral quantitative computed tomography images of the ultradistal tibia in young, healthy subjects (age range, 18 to 30 years). We quantified bone characteristics across four regional sectors of the tibia for the following datasets: white women (n = 50), black women (n = 51), white men (n = 50), black men (n = 34), and all subjects (n = 185). After controlling for potentially confounding variables, we observed statistically significant variations in most of the characteristics across sectors (p < 0.05). Most of the bone characteristics followed a similar trend for all datasets but with different magnitudes. Regardless of race or sex, the anterior sector had the lowest trabecular and total volumetric bone mineral density and highest trabecular separation (p < 0.001), while cortical thickness was lowest in the medial sector (p < 0.05). Accordingly, the anterior sector also had the lowest elastic modulus in the anterior-posterior and superior-inferior directions (p < 0.001). In all sectors, the mean anisotropy was ~3, suggesting cross-sector similarity in the ratios of loading in these directions. In addition, the bone characteristics from regional and whole-bone analyses differed in all datasets (p < 0.05). Our findings on the heterogeneous nature of bone microarchitecture in the ultradistal tibia may reflect an adaptation of the bone to habitual loading conditions. Published by Elsevier Inc.
Automatic three-dimensional registration of intravascular optical coherence tomography images
NASA Astrophysics Data System (ADS)
Ughi, Giovanni J.; Adriaenssens, Tom; Larsson, Matilda; Dubois, Christophe; Sinnaeve, Peter R.; Coosemans, Mark; Desmet, Walter; D'hooge, Jan
2012-02-01
Intravascular optical coherence tomography (IV-OCT) is a catheter-based high-resolution imaging technique able to visualize the inner wall of the coronary arteries and implanted devices in vivo with an axial resolution below 20 μm. IV-OCT is being used in several clinical trials aiming to quantify the vessel response to stent implantation over time. However, stent analysis is currently performed manually and corresponding images taken at different time points are matched through a very labor-intensive and subjective procedure. We present an automated method for the spatial registration of IV-OCT datasets. Stent struts are segmented through consecutive images and three-dimensional models of the stents are created for both datasets to be registered. The two models are initially roughly registered through an automatic initialization procedure and an iterative closest point algorithm is subsequently applied for a more precise registration. To correct for nonuniform rotational distortions (NURDs) and other potential acquisition artifacts, the registration is consecutively refined on a local level. The algorithm was first validated by using an in vitro experimental setup based on a polyvinyl-alcohol gel tubular phantom. Subsequently, an in vivo validation was obtained by exploiting stable vessel landmarks. The mean registration error in vitro was quantified to be 0.14 mm in the longitudinal axis and 7.3-deg mean rotation error. In vivo validation resulted in 0.23 mm in the longitudinal axis and 10.1-deg rotation error. These results indicate that the proposed methodology can be used for automatic registration of in vivo IV-OCT datasets. Such a tool will be indispensable for larger studies on vessel healing pathophysiology and reaction to stent implantation. As such, it will be valuable in testing the performance of new generations of intracoronary devices and new therapeutic drugs.
NASA Astrophysics Data System (ADS)
Fu, Yi; Yu, Guoqiang; Levine, Douglas A.; Wang, Niya; Shih, Ie-Ming; Zhang, Zhen; Clarke, Robert; Wang, Yue
2015-09-01
Most published copy number datasets on solid tumors were obtained from specimens comprised of mixed cell populations, for which the varying tumor-stroma proportions are unknown or unreported. The inability to correct for signal mixing represents a major limitation on the use of these datasets for subsequent analyses, such as discerning deletion types or detecting driver aberrations. We describe the BACOM2.0 method with enhanced accuracy and functionality to normalize copy number signals, detect deletion types, estimate tumor purity, quantify true copy numbers, and calculate average-ploidy value. While BACOM has been validated and used with promising results, subsequent BACOM analysis of the TCGA ovarian cancer dataset found that the estimated average tumor purity was lower than expected. In this report, we first show that this lowered estimate of tumor purity is the combined result of imprecise signal normalization and parameter estimation. Then, we describe effective allele-specific absolute normalization and quantification methods that can enhance BACOM applications in many biological contexts while in the presence of various confounders. Finally, we discuss the advantages of BACOM in relation to alternative approaches. Here we detail this revised computational approach, BACOM2.0, and validate its performance in real and simulated datasets.
Soil microbial C:N ratio is a robust indicator of soil productivity for paddy fields
NASA Astrophysics Data System (ADS)
Li, Yong; Wu, Jinshui; Shen, Jianlin; Liu, Shoulong; Wang, Cong; Chen, Dan; Huang, Tieping; Zhang, Jiabao
2016-10-01
Maintaining good soil productivity in rice paddies is important for global food security. Numerous methods have been developed to evaluate paddy soil productivity (PSP), most based on soil physiochemical properties and relatively few on biological indices. Here, we used a long-term dataset from experiments on paddy fields at eight county sites and a short-term dataset from a single field experiment in southern China, and aimed at quantifying relationships between PSP and the ratios of carbon (C) to nutrients (N and P) in soil microbial biomass (SMB). In the long-term dataset, SMB variables generally showed stronger correlations with the relative PSP (rPSP) compared to soil chemical properties. Both correlation and variation partitioning analyses suggested that SMB N, P and C:N ratio were good predictors of rPSP. In the short-term dataset, we found a significant, negative correlation of annual rice yield with SMB C:N (r = -0.99), confirming SMB C:N as a robust indicator for PSP. In treatments of the short-term experiment, soil amendment with biochar lowered SMB C:N and improved PSP, while incorporation of rice straw increased SMB C:N and reduced PSP. We conclude that SMB C:N ratio does not only indicate PSP but also helps to identify management practices that improve PSP.
Sapak, Z; Salam, M U; Minchinton, E J; MacManus, G P V; Joyce, D C; Galea, V J
2017-09-01
A weather-based simulation model, called Powdery Mildew of Cucurbits Simulation (POMICS), was constructed to predict fungicide application scheduling to manage powdery mildew of cucurbits. The model was developed on the principle that conditions favorable for Podosphaera xanthii, a causal pathogen of this crop disease, generate a number of infection cycles in a single growing season. The model consists of two components that (i) simulate the disease progression of P. xanthii in secondary infection cycles under natural conditions and (ii) predict the disease severity with application of fungicides at any recurrent disease cycles. The underlying environmental factors associated with P. xanthii infection were quantified from laboratory and field studies, and also gathered from literature. The performance of the POMICS model when validated with two datasets of uncontrolled natural infection was good (the mean difference between simulated and observed disease severity on a scale of 0 to 5 was 0.02 and 0.05). In simulations, POMICS was able to predict high- and low-risk disease alerts. Furthermore, the predicted disease severity was responsive to the number of fungicide applications. Such responsiveness indicates that the model has the potential to be used as a tool to guide the scheduling of judicious fungicide applications.
Seq-ing answers: uncovering the unexpected in global gene regulation.
Otto, George Maxwell; Brar, Gloria Ann
2018-04-19
The development of techniques for measuring gene expression globally has greatly expanded our understanding of gene regulatory mechanisms in depth and scale. We can now quantify every intermediate and transition in the canonical pathway of gene expression-from DNA to mRNA to protein-genome-wide. Employing such measurements in parallel can produce rich datasets, but extracting the most information requires careful experimental design and analysis. Here, we argue for the value of genome-wide studies that measure multiple outputs of gene expression over many timepoints during the course of a natural developmental process. We discuss our findings from a highly parallel gene expression dataset of meiotic differentiation, and those of others, to illustrate how leveraging these features can provide new and surprising insight into fundamental mechanisms of gene regulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walkokwicz, K.; Duran, A.
2014-06-01
The Fleet DNA project objectives include capturing and quantifying drive cycle and technology variation for the multitude of medium- and heavy-duty vocations; providing a common data storage warehouse for medium- and heavy-duty vehicle fleet data across DOE activities and laboratories; and integrating existing DOE tools, models, and analyses to provide data-driven decision making capabilities. Fleet DNA advantages include: for Government - providing in-use data for standard drive cycle development, R&D, tech targets, and rule making; for OEMs - real-world usage datasets provide concrete examples of customer use profiles; for fleets - vocational datasets help illustrate how to maximize return onmore » technology investments; for Funding Agencies - ways are revealed to optimize the impact of financial incentive offers; and for researchers -a data source is provided for modeling and simulation.« less
Discovery of a missing disease spreader
NASA Astrophysics Data System (ADS)
Maeno, Yoshiharu
2011-10-01
This study presents a method to discover an outbreak of an infectious disease in a region for which data are missing, but which is at work as a disease spreader. Node discovery for the spread of an infectious disease is defined as discriminating between the nodes which are neighboring to a missing disease spreader node, and the rest, given a dataset on the number of cases. The spread is described by stochastic differential equations. A perturbation theory quantifies the impact of the missing spreader on the moments of the number of cases. Statistical discriminators examine the mid-body or tail-ends of the probability density function, and search for the disturbance from the missing spreader. They are tested with computationally synthesized datasets, and applied to the SARS outbreak and flu pandemic.
Developing Global Building Exposure for Disaster Forecasting, Mitigation, and Response
NASA Astrophysics Data System (ADS)
Huyck, C. K.
2016-12-01
Nongovernmental organizations and governments are recognizing the importance of insurance penetration in developing countries to mitigate the tremendous setbacks that follow natural disasters., but to effectively manage risk stakeholders must accurately quantify the built environment. Although there are countless datasets addressing elements of buildings, there are surprisingly few that are directly applicable to assessing vulnerability to natural disasters without skewing the spatial distribution of risk towards known assets. Working with NASA center partners Center for International Earth Science Information Network (CIESIN) at Columbia University in New York (http://www.ciesin.org), ImageCat have developed a novel method of developing Global Exposure Data (GED) from EO sources. The method has been applied to develop exposure datasets for GFDRR, CAT modelers, and aid in post-earthquake allocation of resources for UNICEF.
Functional evaluation of out-of-the-box text-mining tools for data-mining tasks
Jung, Kenneth; LePendu, Paea; Iyer, Srinivasan; Bauer-Mehren, Anna; Percha, Bethany; Shah, Nigam H
2015-01-01
Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications. Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice. PMID:25336595
An Active Patch Model for Real World Texture and Appearance Classification
Mao, Junhua; Zhu, Jun; Yuille, Alan L.
2014-01-01
This paper addresses the task of natural texture and appearance classification. Our goal is to develop a simple and intuitive method that performs at state of the art on datasets ranging from homogeneous texture (e.g., material texture), to less homogeneous texture (e.g., the fur of animals), and to inhomogeneous texture (the appearance patterns of vehicles). Our method uses a bag-of-words model where the features are based on a dictionary of active patches. Active patches are raw intensity patches which can undergo spatial transformations (e.g., rotation and scaling) and adjust themselves to best match the image regions. The dictionary of active patches is required to be compact and representative, in the sense that we can use it to approximately reconstruct the images that we want to classify. We propose a probabilistic model to quantify the quality of image reconstruction and design a greedy learning algorithm to obtain the dictionary. We classify images using the occurrence frequency of the active patches. Feature extraction is fast (about 100 ms per image) using the GPU. The experimental results show that our method improves the state of the art on a challenging material texture benchmark dataset (KTH-TIPS2). To test our method on less homogeneous or inhomogeneous images, we construct two new datasets consisting of appearance image patches of animals and vehicles cropped from the PASCAL VOC dataset. Our method outperforms competing methods on these datasets. PMID:25531013
Hee, S.; Vázquez, J. A.; Handley, W. J.; ...
2016-12-01
Data-driven model-independent reconstructions of the dark energy equation of state w(z) are presented using Planck 2015 era CMB, BAO, SNIa and Lyman-α data. These reconstructions identify the w(z) behaviour supported by the data and show a bifurcation of the equation of state posterior in the range 1.5 < z < 3. Although the concordance ΛCDM model is consistent with the data at all redshifts in one of the bifurcated spaces, in the other a supernegative equation of state (also known as ‘phantom dark energy’) is identified within the 1.5σ confidence intervals of the posterior distribution. In order to identify themore » power of different datasets in constraining the dark energy equation of state, we use a novel formulation of the Kullback–Leibler divergence. Moreover, this formalism quantifies the information the data add when moving from priors to posteriors for each possible dataset combination. The SNIa and BAO datasets are shown to provide much more constraining power in comparison to the Lyman-α datasets. Furthermore, SNIa and BAO constrain most strongly around redshift range 0.1 - 0.5, whilst the Lyman-α data constrains weakly over a broader range. We do not attribute the supernegative favouring to any particular dataset, and note that the ΛCDM model was favoured at more than 2 log-units in Bayes factors over all the models tested despite the weakly preferred w(z) structure in the data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hee, S.; Vázquez, J. A.; Handley, W. J.
Data-driven model-independent reconstructions of the dark energy equation of state w(z) are presented using Planck 2015 era CMB, BAO, SNIa and Lyman-α data. These reconstructions identify the w(z) behaviour supported by the data and show a bifurcation of the equation of state posterior in the range 1.5 < z < 3. Although the concordance ΛCDM model is consistent with the data at all redshifts in one of the bifurcated spaces, in the other a supernegative equation of state (also known as ‘phantom dark energy’) is identified within the 1.5σ confidence intervals of the posterior distribution. In order to identify themore » power of different datasets in constraining the dark energy equation of state, we use a novel formulation of the Kullback–Leibler divergence. Moreover, this formalism quantifies the information the data add when moving from priors to posteriors for each possible dataset combination. The SNIa and BAO datasets are shown to provide much more constraining power in comparison to the Lyman-α datasets. Furthermore, SNIa and BAO constrain most strongly around redshift range 0.1 - 0.5, whilst the Lyman-α data constrains weakly over a broader range. We do not attribute the supernegative favouring to any particular dataset, and note that the ΛCDM model was favoured at more than 2 log-units in Bayes factors over all the models tested despite the weakly preferred w(z) structure in the data.« less
High resolution global gridded data for use in population studies
Lloyd, Christopher T.; Sorichetta, Alessandro; Tatem, Andrew J.
2017-01-01
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website. PMID:28140386
Rebaudo, François; Faye, Emile; Dangles, Olivier
2016-01-01
A large body of literature has recently recognized the role of microclimates in controlling the physiology and ecology of species, yet the relevance of fine-scale climatic data for modeling species performance and distribution remains a matter of debate. Using a 6-year monitoring of three potato moth species, major crop pests in the tropical Andes, we asked whether the spatiotemporal resolution of temperature data affect the predictions of models of moth performance and distribution. For this, we used three different climatic data sets: (i) the WorldClim dataset (global dataset), (ii) air temperature recorded using data loggers (weather station dataset), and (iii) air crop canopy temperature (microclimate dataset). We developed a statistical procedure to calibrate all datasets to monthly and yearly variation in temperatures, while keeping both spatial and temporal variances (air monthly temperature at 1 km² for the WorldClim dataset, air hourly temperature for the weather station, and air minute temperature over 250 m radius disks for the microclimate dataset). Then, we computed pest performances based on these three datasets. Results for temperature ranging from 9 to 11°C revealed discrepancies in the simulation outputs in both survival and development rates depending on the spatiotemporal resolution of the temperature dataset. Temperature and simulated pest performances were then combined into multiple linear regression models to compare predicted vs. field data. We used an additional set of study sites to test the ability of the results of our model to be extrapolated over larger scales. Results showed that the model implemented with microclimatic data best predicted observed pest abundances for our study sites, but was less accurate than the global dataset model when performed at larger scales. Our simulations therefore stress the importance to consider different temperature datasets depending on the issue to be solved in order to accurately predict species abundances. In conclusion, keeping in mind that the mismatch between the size of organisms and the scale at which climate data are collected and modeled remains a key issue, temperature dataset selection should be balanced by the desired output spatiotemporal scale for better predicting pest dynamics and developing efficient pest management strategies.
Rebaudo, François; Faye, Emile; Dangles, Olivier
2016-01-01
A large body of literature has recently recognized the role of microclimates in controlling the physiology and ecology of species, yet the relevance of fine-scale climatic data for modeling species performance and distribution remains a matter of debate. Using a 6-year monitoring of three potato moth species, major crop pests in the tropical Andes, we asked whether the spatiotemporal resolution of temperature data affect the predictions of models of moth performance and distribution. For this, we used three different climatic data sets: (i) the WorldClim dataset (global dataset), (ii) air temperature recorded using data loggers (weather station dataset), and (iii) air crop canopy temperature (microclimate dataset). We developed a statistical procedure to calibrate all datasets to monthly and yearly variation in temperatures, while keeping both spatial and temporal variances (air monthly temperature at 1 km² for the WorldClim dataset, air hourly temperature for the weather station, and air minute temperature over 250 m radius disks for the microclimate dataset). Then, we computed pest performances based on these three datasets. Results for temperature ranging from 9 to 11°C revealed discrepancies in the simulation outputs in both survival and development rates depending on the spatiotemporal resolution of the temperature dataset. Temperature and simulated pest performances were then combined into multiple linear regression models to compare predicted vs. field data. We used an additional set of study sites to test the ability of the results of our model to be extrapolated over larger scales. Results showed that the model implemented with microclimatic data best predicted observed pest abundances for our study sites, but was less accurate than the global dataset model when performed at larger scales. Our simulations therefore stress the importance to consider different temperature datasets depending on the issue to be solved in order to accurately predict species abundances. In conclusion, keeping in mind that the mismatch between the size of organisms and the scale at which climate data are collected and modeled remains a key issue, temperature dataset selection should be balanced by the desired output spatiotemporal scale for better predicting pest dynamics and developing efficient pest management strategies. PMID:27148077
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Emerging Heterogeneities in Italian Customs and Comparison with Nearby Countries
Agliari, Elena; Barra, Adriano; Galluzzi, Andrea; Javarone, Marco Alberto; Pizzoferrato, Andrea; Tantari, Daniele
2015-01-01
In this work we apply techniques and modus operandi typical of Statistical Mechanics to a large dataset about key social quantifiers and compare the resulting behaviors of five European nations, namely France, Germany, Italy, Spain and Switzerland. The social quantifiers considered are i. the evolution of the number of autochthonous marriages (i.e., between two natives) within a given territorial district and ii. the evolution of the number of mixed marriages (i.e., between a native and an immigrant) within a given territorial district. Our investigations are twofold. From a theoretical perspective, we develop novel techniques, complementary to classical methods (e.g., historical series and logistic regression), in order to detect possible collective features underlying the empirical behaviors; from an experimental perspective, we evidence a clear outline for the evolution of the social quantifiers considered. The comparison between experimental results and theoretical predictions is excellent and allows speculating that France, Italy and Spain display a certain degree of internal heterogeneity, that is not found in Germany and Switzerland; such heterogeneity, quite mild in France and in Spain, is not negligible in Italy and highlights quantitative differences in the habits of Northern and Southern regions. These findings may suggest the persistence of two culturally distinct communities, long-term lasting heritages of different and well-established customs. Also, we find qualitative differences between the evolution of autochthonous and of mixed marriages: for the former imitation in decisional mechanisms seems to play a key role (and this results in a square root relation between the number of autochthonous marriages versus the percentage of possible couples inside that country), while for the latter the emerging behavior can be recovered (in most cases) with elementary models with no interactions, suggesting weak imitation patterns between natives and migrants (and this translates in a linear growth for the number of mixed marriages versus the percentage of possible mixed couples in the country). However, the case of mixed marriages displays a more complex phenomenology, where further details (e.g., the provenance and the status of migrants, linguistic barriers, etc.) should also be accounted for. PMID:26713615
Emerging Heterogeneities in Italian Customs and Comparison with Nearby Countries.
Agliari, Elena; Barra, Adriano; Galluzzi, Andrea; Javarone, Marco Alberto; Pizzoferrato, Andrea; Tantari, Daniele
2015-01-01
In this work we apply techniques and modus operandi typical of Statistical Mechanics to a large dataset about key social quantifiers and compare the resulting behaviors of five European nations, namely France, Germany, Italy, Spain and Switzerland. The social quantifiers considered are i. the evolution of the number of autochthonous marriages (i.e., between two natives) within a given territorial district and ii. the evolution of the number of mixed marriages (i.e., between a native and an immigrant) within a given territorial district. Our investigations are twofold. From a theoretical perspective, we develop novel techniques, complementary to classical methods (e.g., historical series and logistic regression), in order to detect possible collective features underlying the empirical behaviors; from an experimental perspective, we evidence a clear outline for the evolution of the social quantifiers considered. The comparison between experimental results and theoretical predictions is excellent and allows speculating that France, Italy and Spain display a certain degree of internal heterogeneity, that is not found in Germany and Switzerland; such heterogeneity, quite mild in France and in Spain, is not negligible in Italy and highlights quantitative differences in the habits of Northern and Southern regions. These findings may suggest the persistence of two culturally distinct communities, long-term lasting heritages of different and well-established customs. Also, we find qualitative differences between the evolution of autochthonous and of mixed marriages: for the former imitation in decisional mechanisms seems to play a key role (and this results in a square root relation between the number of autochthonous marriages versus the percentage of possible couples inside that country), while for the latter the emerging behavior can be recovered (in most cases) with elementary models with no interactions, suggesting weak imitation patterns between natives and migrants (and this translates in a linear growth for the number of mixed marriages versus the percentage of possible mixed couples in the country). However, the case of mixed marriages displays a more complex phenomenology, where further details (e.g., the provenance and the status of migrants, linguistic barriers, etc.) should also be accounted for.
Serial femtosecond crystallography datasets from G protein-coupled receptors
White, Thomas A.; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A.; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R.; Yoon, Chun Hong; Yefanov, Oleksandr M.; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E.; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim
2016-01-01
We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data. PMID:27479354
Serial femtosecond crystallography datasets from G protein-coupled receptors.
White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim
2016-08-01
We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data.
Identification of mechanisms responsible for adverse developmental effects is the first step in creating predictive toxicity models. Identification of putative mechanisms was performed by co-analyzing three datasets for the effects of ToxCast phase Ia and II chemicals: 1.In vitro...
GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow
Falcone, James A.
2011-01-01
This dataset, termed "GAGES II", an acronym for Geospatial Attributes of Gages for Evaluating Streamflow, version II, provides geospatial data and classifications for 9,322 stream gages maintained by the U.S. Geological Survey (USGS). It is an update to the original GAGES, which was published as a Data Paper on the journal Ecology's website (Falcone and others, 2010b) in 2010. The GAGES II dataset consists of gages which have had either 20+ complete years (not necessarily continuous) of discharge record since 1950, or are currently active, as of water year 2009, and whose watersheds lie within the United States, including Alaska, Hawaii, and Puerto Rico. Reference gages were identified based on indicators that they were the least-disturbed watersheds within the framework of broad regions, based on 12 major ecoregions across the United States. Of the 9,322 total sites, 2,057 are classified as reference, and 7,265 as non-reference. Of the 2,057 reference sites, 1,633 have (through 2009) 20+ years of record since 1950. Some sites have very long flow records: a number of gages have been in continuous service since 1900 (at least), and have 110 years of complete record (1900-2009) to date. The geospatial data include several hundred watershed characteristics compiled from national data sources, including environmental features (e.g. climate – including historical precipitation, geology, soils, topography) and anthropogenic influences (e.g. land use, road density, presence of dams, canals, or power plants). The dataset also includes comments from local USGS Water Science Centers, based on Annual Data Reports, pertinent to hydrologic modifications and influences. The data posted also include watershed boundaries in GIS format. This overall dataset is different in nature to the USGS Hydro-Climatic Data Network (HCDN; Slack and Landwehr 1992), whose data evaluation ended with water year 1988. The HCDN identifies stream gages which at some point in their history had periods which represented natural flow, and the years in which those natural flows occurred were identified (i.e. not all HCDN sites were in reference condition even in 1988, for example, 02353500). The HCDN remains a valuable indication of historic natural streamflow data. However, the goal of this dataset was to identify watersheds which currently have near-natural flow conditions, and the 2,057 reference sites identified here were derived independently of the HCDN. A subset, however, noted in the BasinID worksheet as “HCDN-2009”, has been identified as an updated list of 743 sites for potential hydro-climatic study. The HCDN-2009 sites fulfill all of the following criteria: (a) have 20 years of complete and continuous flow record in the last 20 years (water years 1990-2009), and were thus also currently active as of 2009, (b) are identified as being in current reference condition according to the GAGES-II classification, (c) have less than 5 percent imperviousness as measured from the NLCD 2006, and (d) were not eliminated by a review from participating state Water Science Center evaluators. The data posted here consist of the following items:- This point shapefile, with summary data for the 9,322 gages.- A zip file containing basin characteristics, variable definitions, and a more detailed report.- A zip file containing shapefiles of basin boundaries, organized by classification and aggregated ecoregion.- A zip file containing mainstem stream lines (Arc line coverages) for each gage.
Authorship Identification for Tamil Classical Poem using Subspace Discriminant Algorithm
NASA Astrophysics Data System (ADS)
Pandian, A.; Ramalingam, V. V.; Manikandan, K.; Vishnu Preet, R. P.
2018-04-01
The Development of extensive perceiving confirmation of a creator's work consolidates stylometry examination that joins various fascinating issues. Extraction of specific kind of highlights from the substance draws in us with the chance to perceive the producers of obscure works. Center of this paper is to briefly recognize the creators of unidentified Tamil dataset in context of crafted by known creators. Content preparing is the technique for getting amazing data from the dataset that joins quantifiable highlights from the dataset. This paper proposes content preparing method to concentrate features and perform grouping on the same. Crafted by a unidentified sonnet or content can be discovered in light of performing arrangement on potential creators' past known work and building a classifier to characterize the obscure lyric or content in any dialect. This procedure can be additionally reached out to every single provincial dialect around the globe. Numerous writing analysts are thinking that it’s hard to sort ballads as the writers of them are not recognized. By playing out this procedure, creators of different lyrics in Tamil vernacular can be perceived which will be significant to the general public.
Fantuzzo, J. A.; Mirabella, V. R.; Zahn, J. D.
2017-01-01
Abstract Synapse formation analyses can be performed by imaging and quantifying fluorescent signals of synaptic markers. Traditionally, these analyses are done using simple or multiple thresholding and segmentation approaches or by labor-intensive manual analysis by a human observer. Here, we describe Intellicount, a high-throughput, fully-automated synapse quantification program which applies a novel machine learning (ML)-based image processing algorithm to systematically improve region of interest (ROI) identification over simple thresholding techniques. Through processing large datasets from both human and mouse neurons, we demonstrate that this approach allows image processing to proceed independently of carefully set thresholds, thus reducing the need for human intervention. As a result, this method can efficiently and accurately process large image datasets with minimal interaction by the experimenter, making it less prone to bias and less liable to human error. Furthermore, Intellicount is integrated into an intuitive graphical user interface (GUI) that provides a set of valuable features, including automated and multifunctional figure generation, routine statistical analyses, and the ability to run full datasets through nested folders, greatly expediting the data analysis process. PMID:29218324
A database of marine phytoplankton abundance, biomass and species composition in Australian waters
Davies, Claire H.; Coughlan, Alex; Hallegraeff, Gustaaf; Ajani, Penelope; Armbrecht, Linda; Atkins, Natalia; Bonham, Prudence; Brett, Steve; Brinkman, Richard; Burford, Michele; Clementson, Lesley; Coad, Peter; Coman, Frank; Davies, Diana; Dela-Cruz, Jocelyn; Devlin, Michelle; Edgar, Steven; Eriksen, Ruth; Furnas, Miles; Hassler, Christel; Hill, David; Holmes, Michael; Ingleton, Tim; Jameson, Ian; Leterme, Sophie C.; Lønborg, Christian; McLaughlin, James; McEnnulty, Felicity; McKinnon, A. David; Miller, Margaret; Murray, Shauna; Nayar, Sasi; Patten, Renee; Pritchard, Tim; Proctor, Roger; Purcell-Meyerink, Diane; Raes, Eric; Rissik, David; Ruszczyk, Jason; Slotwinski, Anita; Swadling, Kerrie M.; Tattersall, Katherine; Thompson, Peter; Thomson, Paul; Tonks, Mark; Trull, Thomas W.; Uribe-Palomino, Julian; Waite, Anya M.; Yauwenas, Rouna; Zammit, Anthony; Richardson, Anthony J.
2016-01-01
There have been many individual phytoplankton datasets collected across Australia since the mid 1900s, but most are unavailable to the research community. We have searched archives, contacted researchers, and scanned the primary and grey literature to collate 3,621,847 records of marine phytoplankton species from Australian waters from 1844 to the present. Many of these are small datasets collected for local questions, but combined they provide over 170 years of data on phytoplankton communities in Australian waters. Units and taxonomy have been standardised, obviously erroneous data removed, and all metadata included. We have lodged this dataset with the Australian Ocean Data Network (http://portal.aodn.org.au/) allowing public access. The Australian Phytoplankton Database will be invaluable for global change studies, as it allows analysis of ecological indicators of climate change and eutrophication (e.g., changes in distribution; diatom:dinoflagellate ratios). In addition, the standardised conversion of abundance records to biomass provides modellers with quantifiable data to initialise and validate ecosystem models of lower marine trophic levels. PMID:27328409
Regional climate change study requires new temperature datasets
NASA Astrophysics Data System (ADS)
Wang, K.; Zhou, C.
2016-12-01
Analyses of global mean air temperature (Ta), i. e., NCDC GHCN, GISS, and CRUTEM4, are the fundamental datasets for climate change study and provide key evidence for global warming. All of the global temperature analyses over land are primarily based on meteorological observations of the daily maximum and minimum temperatures (Tmax and Tmin) and their averages (T2) because in most weather stations, the measurements of Tmax and Tmin may be the only choice for a homogenous century-long analysis of mean temperature. Our studies show that these datasets are suitable for long-term global warming studies. However, they may introduce substantial bias in quantifying local and regional warming rates, i.e., with a root mean square error of more than 25% at 5°x 5° grids. From 1973 to 1997, the current datasets tend to significantly underestimate the warming rate over the central U.S. and overestimate the warming rate over the northern high latitudes. Similar results revealed during the period 1998-2013, the warming hiatus period, indicate the use of T2 enlarges the spatial contrast of temperature trends. This because T2 over land only sample air temperature twice daily and cannot accurately reflect land-atmosphere and incoming radiation variations in the temperature diurnal cycle. For better regional climate change detection and attribution, we suggest creating new global mean air temperature datasets based on the recently available high spatiotemporal resolution meteorological observations, i.e., daily four observations weather station since 1960s, These datasets will not only help investigate dynamical processes on temperature variances but also help better evaluate the reanalyzed and modeled simulations of temperature and make some substantial improvements for other related climate variables in models, especially over regional and seasonal aspects.
Regional climate change study requires new temperature datasets
NASA Astrophysics Data System (ADS)
Wang, Kaicun; Zhou, Chunlüe
2017-04-01
Analyses of global mean air temperature (Ta), i. e., NCDC GHCN, GISS, and CRUTEM4, are the fundamental datasets for climate change study and provide key evidence for global warming. All of the global temperature analyses over land are primarily based on meteorological observations of the daily maximum and minimum temperatures (Tmax and Tmin) and their averages (T2) because in most weather stations, the measurements of Tmax and Tmin may be the only choice for a homogenous century-long analysis of mean temperature. Our studies show that these datasets are suitable for long-term global warming studies. However, they may have substantial biases in quantifying local and regional warming rates, i.e., with a root mean square error of more than 25% at 5 degree grids. From 1973 to 1997, the current datasets tend to significantly underestimate the warming rate over the central U.S. and overestimate the warming rate over the northern high latitudes. Similar results revealed during the period 1998-2013, the warming hiatus period, indicate the use of T2 enlarges the spatial contrast of temperature trends. This is because T2 over land only samples air temperature twice daily and cannot accurately reflect land-atmosphere and incoming radiation variations in the temperature diurnal cycle. For better regional climate change detection and attribution, we suggest creating new global mean air temperature datasets based on the recently available high spatiotemporal resolution meteorological observations, i.e., daily four observations weather station since 1960s. These datasets will not only help investigate dynamical processes on temperature variances but also help better evaluate the reanalyzed and modeled simulations of temperature and make some substantial improvements for other related climate variables in models, especially over regional and seasonal aspects.
The taste of toxicity: A quantitative analysis of bitter and toxic molecules.
Nissim, Ido; Dagan-Wiener, Ayana; Niv, Masha Y
2017-12-01
The role of bitter taste-one of the few basic taste modalities-is commonly assumed to signal toxicity and alert animals against consuming harmful compounds. However, it is known that some toxic compounds are not bitter and that many bitter compounds have negligible toxicity while having important health benefits. Here we apply a quantitative analysis of the chemical space to shed light on the bitterness-toxicity relationship. Using the BitterDB dataset of bitter molecules, The BitterPredict prediction tool, and datasets of toxic compounds, we quantify the identity and similarity between bitter and toxic compounds. About 60% of the bitter compounds have documented toxicity and only 56% of the toxic compounds are known or predicted to be bitter. The LD 50 value distributions suggest that most of the bitter compounds are not very toxic, but there is a somewhat higher chance of toxicity for known bitter compounds compared to known nonbitter ones. Flavonoids and alpha acids are more common in the bitter dataset compared with the toxic dataset. In contrast, alkaloids are more common in the toxic datasets compared to the bitter dataset. Interestingly, no trend linking LD 50 values with the number of activated bitter taste receptors (TAS2Rs) subtypes is apparent in the currently available data. This is in accord with the newly discovered expression of TAS2Rs in several extra-oral tissues, in which they might be activated by yet unknown endogenous ligands and play non-gustatory physiological roles. These results suggest that bitter taste is not a very reliable marker for toxicity, and is likely to have other physiological roles. © 2017 IUBMB Life, 69(12):938-946, 2017. © 2017 International Union of Biochemistry and Molecular Biology.
NASA Technical Reports Server (NTRS)
Wang, Weile; Nemani, Ramakrishna R.; Michaelis, Andrew; Hashimoto, Hirofumi; Dungan, Jennifer L.; Thrasher, Bridget L.; Dixon, Keith W.
2016-01-01
The NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) dataset is comprised of downscaled climate projections that are derived from 21 General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) and across two of the four greenhouse gas emissions scenarios (RCP4.5 and RCP8.5). Each of the climate projections includes daily maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2100 and the spatial resolution is 0.25 degrees (approximately 25 km x 25 km). The GDDP dataset has received warm welcome from the science community in conducting studies of climate change impacts at local to regional scales, but a comprehensive evaluation of its uncertainties is still missing. In this study, we apply the Perfect Model Experiment framework (Dixon et al. 2016) to quantify the key sources of uncertainties from the observational baseline dataset, the downscaling algorithm, and some intrinsic assumptions (e.g., the stationary assumption) inherent to the statistical downscaling techniques. We developed a set of metrics to evaluate downscaling errors resulted from bias-correction ("quantile-mapping"), spatial disaggregation, as well as the temporal-spatial non-stationarity of climate variability. Our results highlight the spatial disaggregation (or interpolation) errors, which dominate the overall uncertainties of the GDDP dataset, especially over heterogeneous and complex terrains (e.g., mountains and coastal area). In comparison, the temporal errors in the GDDP dataset tend to be more constrained. Our results also indicate that the downscaled daily precipitation also has relatively larger uncertainties than the temperature fields, reflecting the rather stochastic nature of precipitation in space. Therefore, our results provide insights in improving statistical downscaling algorithms and products in the future.
USDA-ARS?s Scientific Manuscript database
To better understand and quantify the effectiveness of wetland vegetation in mitigating the impact of hurricane and storm surges, this SERRI project (No. 80037) examined surge and wave attenuation by vegetation through laboratory experiments, field observations and computational modeling. It was a c...
Quantifying Soiling Loss Directly From PV Yield
Deceglie, Michael G.; Micheli, Leonardo; Muller, Matthew
2018-01-23
Soiling of photovoltaic (PV) panels is typically quantified through the use of specialized sensors. Here, we describe and validate a method for estimating soiling loss experienced by PV systems directly from system yield without the need for precipitation data. The method, termed the stochastic rate and recovery (SRR) method, automatically detects soiling intervals in a dataset, then stochastically generates a sample of possible soiling profiles based on the observed characteristics of each interval. In this paper, we describe the method, validate it against soiling station measurements, and compare it with other PV-yield-based soiling estimation methods. The broader application of themore » SRR method will enable the fleet scale assessment of soiling loss to facilitate mitigation planning and risk assessment.« less
Muthu, Pravin; Lutz, Stefan
2016-04-05
Fast, simple and cost-effective methods for detecting and quantifying pharmaceutical agents in patients are highly sought after to replace equipment and labor-intensive analytical procedures. The development of new diagnostic technology including portable detection devices also enables point-of-care by non-specialists in resource-limited environments. We have focused on the detection and dose monitoring of nucleoside analogues used in viral and cancer therapies. Using deoxyribonucleoside kinases (dNKs) as biosensors, our chemometric model compares observed time-resolved kinetics of unknown analytes to known substrate interactions across multiple enzymes. The resulting dataset can simultaneously identify and quantify multiple nucleosides and nucleoside analogues in complex sample mixtures. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Quantifying Soiling Loss Directly From PV Yield
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deceglie, Michael G.; Micheli, Leonardo; Muller, Matthew
Soiling of photovoltaic (PV) panels is typically quantified through the use of specialized sensors. Here, we describe and validate a method for estimating soiling loss experienced by PV systems directly from system yield without the need for precipitation data. The method, termed the stochastic rate and recovery (SRR) method, automatically detects soiling intervals in a dataset, then stochastically generates a sample of possible soiling profiles based on the observed characteristics of each interval. In this paper, we describe the method, validate it against soiling station measurements, and compare it with other PV-yield-based soiling estimation methods. The broader application of themore » SRR method will enable the fleet scale assessment of soiling loss to facilitate mitigation planning and risk assessment.« less
NASA Astrophysics Data System (ADS)
Marquis, J. W.; Campbell, J. R.; Oyola, M. I.; Ruston, B. C.; Zhang, J.
2017-12-01
This is part II of a two-part series examining the impacts of aerosol particles on weather forecasts. In this study, the aerosol indirect effects on weather forecasts are explored by examining the temperature and moisture analysis associated with assimilating dust contaminated hyperspectral infrared radiances. The dust induced temperature and moisture biases are quantified for different aerosol vertical distribution and loading scenarios. The overall impacts of dust contamination on temperature and moisture forecasts are quantified over the west coast of Africa, with the assistance of aerosol retrievals from AERONET, MPL, and CALIOP. At last, methods for improving hyperspectral infrared data assimilation in dust contaminated regions are proposed.
NASA Astrophysics Data System (ADS)
Arendt, A. A.; Houser, P.; Kapnick, S. B.; Kargel, J. S.; Kirschbaum, D.; Kumar, S.; Margulis, S. A.; McDonald, K. C.; Osmanoglu, B.; Painter, T. H.; Raup, B. H.; Rupper, S.; Tsay, S. C.; Velicogna, I.
2017-12-01
The High Mountain Asia Team (HiMAT) is an assembly of 13 research groups funded by NASA to improve understanding of cryospheric and hydrological changes in High Mountain Asia (HMA). Our project goals are to quantify historical and future variability in weather and climate over the HMA, partition the components of the water budget across HMA watersheds, explore physical processes driving changes, and predict couplings and feedbacks between physical and human systems through assessment of hazards and downstream impacts. These objectives are being addressed through analysis of remote sensing datasets combined with modeling and assimilation methods to enable data integration across multiple spatial and temporal scales. Our work to date has focused on developing improved high resolution precipitation, snow cover and snow water equivalence products through a variety of statistical uncertainty analysis, dynamical downscaling and assimilation techniques. These and other high resolution climate products are being used as input and validation for an assembly of land surface and General Circulation Models. To quantify glacier change in the region we have calculated multidecadal mass balances of a subset of HMA glaciers by comparing commercial satellite imagery with earlier elevation datasets. HiMAT is using these tools and datasets to explore the impact of atmospheric aerosols and surface impurities on surface energy exchanges, to determine drivers of glacier and snowpack melt rates, and to improve our capacity to predict future hydrological variability. Outputs from the climate and land surface assessments are being combined with landslide and glacier lake inventories to refine our ability to predict hazards in the region. Economic valuation models are also being used to assess impacts on water resources and hydropower. Field data of atmospheric aerosol, radiative flux and glacier lake conditions are being collected to provide ground validation for models and remote sensing products. In this presentation we will discuss initial results and outline plans for a scheduled release of our datasets and findings to the broader community. We will also describe our methods for cross-team collaboration through the adoption of cloud computing and data integration tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lafata, K; Ren, L; Cai, J
2016-06-15
Purpose: To develop a methodology based on digitally-reconstructed-fluoroscopy (DRF) to quantitatively assess target localization accuracy of lung SBRT, and to evaluate using both a dynamic digital phantom and a patient dataset. Methods: For each treatment field, a 10-phase DRF is generated based on the planning 4DCT. Each frame is pre-processed with a morphological top-hat filter, and corresponding beam apertures are projected to each detector plane. A template-matching algorithm based on cross-correlation is used to detect the tumor location in each frame. Tumor motion relative beam aperture is extracted in the superior-inferior direction based on each frame’s impulse response to themore » template, and the mean tumor position (MTP) is calculated as the average tumor displacement. The DRF template coordinates are then transferred to the corresponding MV-cine dataset, which is retrospectively filtered as above. The treatment MTP is calculated within each field’s projection space, relative to the DRF-defined template. The field’s localization error is defined as the difference between the DRF-derived-MTP (planning) and the MV-cine-derived-MTP (delivery). A dynamic digital phantom was used to assess the algorithm’s ability to detect intra-fractional changes in patient alignment, by simulating different spatial variations in the MV-cine and calculating the corresponding change in MTP. Inter-and-intra-fractional variation, IGRT accuracy, and filtering effects were investigated on a patient dataset. Results: Phantom results demonstrated a high accuracy in detecting both translational and rotational variation. The lowest localization error of the patient dataset was achieved at each fraction’s first field (mean=0.38mm), with Fx3 demonstrating a particularly strong correlation between intra-fractional motion-caused localization error and treatment progress. Filtering significantly improved tracking visibility in both the DRF and MV-cine images. Conclusion: We have developed and evaluated a methodology to quantify lung SBRT target localization accuracy based on digitally-reconstructed-fluoroscopy. Our approach may be useful in potentially reducing treatment margins to optimize lung SBRT outcomes. R01-184173.« less
(Sample) Size Matters: Best Practices for Defining Error in Planktic Foraminiferal Proxy Records
NASA Astrophysics Data System (ADS)
Lowery, C.; Fraass, A. J.
2016-02-01
Paleoceanographic research is a vital tool to extend modern observational datasets and to study the impact of climate events for which there is no modern analog. Foraminifera are one of the most widely used tools for this type of work, both as paleoecological indicators and as carriers for geochemical proxies. However, the use of microfossils as proxies for paleoceanographic conditions brings about a unique set of problems. This is primarily due to the fact that groups of individual foraminifera, which usually live about a month, are used to infer average conditions for time periods ranging from hundreds to tens of thousands of years. Because of this, adequate sample size is very important for generating statistically robust datasets, particularly for stable isotopes. In the early days of stable isotope geochemistry, instrumental limitations required hundreds of individual foraminiferal tests to return a value. This had the fortunate side-effect of smoothing any seasonal to decadal changes within the planktic foram population. With the advent of more sensitive mass spectrometers, smaller sample sizes have now become standard. While this has many advantages, the use of smaller numbers of individuals to generate a data point has lessened the amount of time averaging in the isotopic analysis and decreased precision in paleoceanographic datasets. With fewer individuals per sample, the differences between individual specimens will result in larger variation, and therefore error, and less precise values for each sample. Unfortunately, most (the authors included) do not make a habit of reporting the error associated with their sample size. We have created an open-source model in R to quantify the effect of sample sizes under various realistic and highly modifiable parameters (calcification depth, diagenesis in a subset of the population, improper identification, vital effects, mass, etc.). For example, a sample in which only 1 in 10 specimens is diagenetically altered can be off by >0.3‰ δ18O VPDB, or 1°C. Here, we demonstrate the use of this tool to quantify error in micropaleontological datasets, and suggest best practices for minimizing error when generating stable isotope data with foraminifera.
Avulsion research using flume experiments and highly accurate and temporal-rich SfM datasets
NASA Astrophysics Data System (ADS)
Javernick, L.; Bertoldi, W.; Vitti, A.
2017-12-01
SfM's ability to produce high-quality, large-scale digital elevation models (DEMs) of complicated and rapidly evolving systems has made it a valuable technique for low-budget researchers and practitioners. While SfM has provided valuable datasets that capture single-flood event DEMs, there is an increasing scientific need to capture higher temporal resolution datasets that can quantify the evolutionary processes instead of pre- and post-flood snapshots. However, flood events' dangerous field conditions and image matching challenges (e.g. wind, rain) prevent quality SfM-image acquisition. Conversely, flume experiments offer opportunities to document flood events, but achieving consistent and accurate DEMs to detect subtle changes in dry and inundated areas remains a challenge for SfM (e.g. parabolic error signatures).This research aimed at investigating the impact of naturally occurring and manipulated avulsions on braided river morphology and on the encroachment of floodplain vegetation, using laboratory experiments. This required DEMs with millimeter accuracy and precision and at a temporal resolution to capture the processes. SfM was chosen as it offered the most practical method. Through redundant local network design and a meticulous ground control point (GCP) survey with a Leica Total Station in red laser configuration (reported 2 mm accuracy), the SfM residual errors compared to separate ground truthing data produced mean errors of 1.5 mm (accuracy) and standard deviations of 1.4 mm (precision) without parabolic error signatures. Lighting conditions in the flume were limited to uniform, oblique, and filtered LED strips, which removed glint and thus improved bed elevation mean errors to 4 mm, but errors were further reduced by means of an open source software for refraction correction. The obtained datasets have provided the ability to quantify how small flood events with avulsion can have similar morphologic and vegetation impacts as large flood events without avulsion. Further, this research highlights the potential application of SfM in the laboratory and ability to document physical and biological processes at greater spatial and temporal resolution. Marie Sklodowska-Curie Individual Fellowship: River-HMV, 656917
Restricted movement by mottled sculpin (Pisces: Cottidae) in a southern Appalachian stream.
J. Todd Petty; Gary D. Grossman
2004-01-01
1. We used direct observation and mark-recapture techniques to quantify movements by mottled sculpins (Cottus bairdi) in a 1 km segment of Shope Fork in western North Carolina. Our objectives were to: (i) quantify the overall rate of sculpin movement, (ii) assess variation in movement among years, individuals, and sculpin size classes, (iii) relate movement to...
Assareh, Hassan; Achat, Helen M.; Stubbs, Joanne M.; Guevarra, Veth M.; Hill, Kim
2016-01-01
Diagnostic data routinely collected for hospital admitted patients and used for case-mix adjustment in care provider comparisons and reimbursement are prone to biases. We aim to measure discrepancies, variations and associated factors in recorded chronic morbidities for hospital admitted patients in New South Wales (NSW), Australia. Of all admissions between July 2010 and June 2014 in all NSW public and private acute hospitals, admissions with over 24 hours stay and one or more of the chronic conditions of diabetes, smoking, hepatitis, HIV, and hypertension were included. The incidence of a non-recorded chronic condition in an admission occurring after the first admission with a recorded chronic condition (index admission) was considered as a discrepancy. Poisson models were employed to (i) derive adjusted discrepancy incidence rates (IR) and rate ratios (IRR) accounting for patient, admission, comorbidity and hospital characteristics and (ii) quantify variation in rates among hospitals. The discrepancy incidence rate was highest for hypertension (51% of 262,664 admissions), followed by hepatitis (37% of 12,107), smoking (33% of 548,965), HIV (27% of 1500) and diabetes (19% of 228,687). Adjusted rates for all conditions declined over the four-year period; with the sharpest drop of over 80% for diabetes (47.7% in 2010 vs. 7.3% in 2014), and 20% to 55% for the other conditions. Discrepancies were more common in private hospitals and smaller public hospitals. Inter-hospital differences were responsible for 1% (HIV) to 9.4% (smoking) of variation in adjusted discrepancy incidences, with an increasing trend for diabetes and HIV. Chronic conditions are recorded inconsistently in hospital administrative datasets, and hospitals contribute to the discrepancies. Adjustment for patterns and stratification in risk adjustments; and furthermore longitudinal accumulation of clinical data at patient level, refinement of clinical coding systems and standardisation of comorbidity recording across hospitals would enhance accuracy of datasets and validity of case-mix adjustment. PMID:26808428
NASA Astrophysics Data System (ADS)
Beaufort, Aurélien; Lamouroux, Nicolas; Pella, Hervé; Datry, Thibault; Sauquet, Eric
2018-05-01
Headwater streams represent a substantial proportion of river systems and many of them have intermittent flows due to their upstream position in the network. These intermittent rivers and ephemeral streams have recently seen a marked increase in interest, especially to assess the impact of drying on aquatic ecosystems. The objective of this paper is to quantify how discrete (in space and time) field observations of flow intermittence help to extrapolate over time the daily probability of drying (defined at the regional scale). Two empirical models based on linear or logistic regressions have been developed to predict the daily probability of intermittence at the regional scale across France. Explanatory variables were derived from available daily discharge and groundwater-level data of a dense gauging/piezometer network, and models were calibrated using discrete series of field observations of flow intermittence. The robustness of the models was tested using an independent, dense regional dataset of intermittence observations and observations of the year 2017 excluded from the calibration. The resulting models were used to extrapolate the daily regional probability of drying in France: (i) over the period 2011-2017 to identify the regions most affected by flow intermittence; (ii) over the period 1989-2017, using a reduced input dataset, to analyse temporal variability of flow intermittence at the national level. The two empirical regression models performed equally well between 2011 and 2017. The accuracy of predictions depended on the number of continuous gauging/piezometer stations and intermittence observations available to calibrate the regressions. Regions with the highest performance were located in sedimentary plains, where the monitoring network was dense and where the regional probability of drying was the highest. Conversely, the worst performances were obtained in mountainous regions. Finally, temporal projections (1989-2016) suggested the highest probabilities of intermittence (> 35 %) in 1989-1991, 2003 and 2005. A high density of intermittence observations improved the information provided by gauging stations and piezometers to extrapolate the temporal variability of intermittent rivers and ephemeral streams.
The 3D Reference Earth Model: Status and Preliminary Results
NASA Astrophysics Data System (ADS)
Moulik, P.; Lekic, V.; Romanowicz, B. A.
2017-12-01
In the 20th century, seismologists constructed models of how average physical properties (e.g. density, rigidity, compressibility, anisotropy) vary with depth in the Earth's interior. These one-dimensional (1D) reference Earth models (e.g. PREM) have proven indispensable in earthquake location, imaging of interior structure, understanding material properties under extreme conditions, and as a reference in other fields, such as particle physics and astronomy. Over the past three decades, new datasets motivated more sophisticated efforts that yielded models of how properties vary both laterally and with depth in the Earth's interior. Though these three-dimensional (3D) models exhibit compelling similarities at large scales, differences in the methodology, representation of structure, and dataset upon which they are based, have prevented the creation of 3D community reference models. As part of the REM-3D project, we are compiling and reconciling reference seismic datasets of body wave travel-time measurements, fundamental mode and overtone surface wave dispersion measurements, and normal mode frequencies and splitting functions. These reference datasets are being inverted for a long-wavelength, 3D reference Earth model that describes the robust long-wavelength features of mantle heterogeneity. As a community reference model with fully quantified uncertainties and tradeoffs and an associated publically available dataset, REM-3D will facilitate Earth imaging studies, earthquake characterization, inferences on temperature and composition in the deep interior, and be of improved utility to emerging scientific endeavors, such as neutrino geoscience. Here, we summarize progress made in the construction of the reference long period dataset and present a preliminary version of REM-3D in the upper-mantle. In order to determine the level of detail warranted for inclusion in REM-3D, we analyze the spectrum of discrepancies between models inverted with different subsets of the reference dataset. This procedure allows us to evaluate the extent of consistency in imaging heterogeneity at various depths and between spatial scales.
NASA Astrophysics Data System (ADS)
Cammalleri, Carmelo; Vogt, Jürgen V.; Bisselink, Bernard; de Roo, Ad
2017-12-01
Agricultural drought events can affect large regions across the world, implying the need for a suitable global tool for an accurate monitoring of this phenomenon. Soil moisture anomalies are considered a good metric to capture the occurrence of agricultural drought events, and they have become an important component of several operational drought monitoring systems. In the framework of the JRC Global Drought Observatory (GDO, http://edo.jrc.ec.europa.eu/gdo/), the suitability of three datasets as possible representations of root zone soil moisture anomalies has been evaluated: (1) the soil moisture from the Lisflood distributed hydrological model (namely LIS), (2) the remotely sensed Land Surface Temperature data from the MODIS satellite (namely LST), and (3) the ESA Climate Change Initiative combined passive/active microwave skin soil moisture dataset (namely CCI). Due to the independency of these three datasets, the triple collocation (TC) technique has been applied, aiming at quantifying the likely error associated with each dataset in comparison to the unknown true status of the system. TC analysis was performed on five macro-regions (namely North America, Europe, India, southern Africa and Australia) detected as suitable for the experiment, providing insight into the mutual relationship between these datasets as well as an assessment of the accuracy of each method. Even if no definitive statement on the spatial distribution of errors can be provided, a clear outcome of the TC analysis is the good performance of the remote sensing datasets, especially CCI, over dry regions such as Australia and southern Africa, whereas the outputs of LIS seem to be more reliable over areas that are well monitored through meteorological ground station networks, such as North America and Europe. In a global drought monitoring system, the results of the error analysis are used to design a weighted-average ensemble system that exploits the advantages of each dataset.
NASA Astrophysics Data System (ADS)
Murakami, H.; Chen, X.; Hahn, M. S.; Over, M. W.; Rockhold, M. L.; Vermeul, V.; Hammond, G. E.; Zachara, J. M.; Rubin, Y.
2010-12-01
Subsurface characterization for predicting groundwater flow and contaminant transport requires us to integrate large and diverse datasets in a consistent manner, and quantify the associated uncertainty. In this study, we sequentially assimilated multiple types of datasets for characterizing a three-dimensional heterogeneous hydraulic conductivity field at the Hanford 300 Area. The datasets included constant-rate injection tests, electromagnetic borehole flowmeter tests, lithology profile and tracer tests. We used the method of anchored distributions (MAD), which is a modular-structured Bayesian geostatistical inversion method. MAD has two major advantages over the other inversion methods. First, it can directly infer a joint distribution of parameters, which can be used as an input in stochastic simulations for prediction. In MAD, in addition to typical geostatistical structural parameters, the parameter vector includes multiple point values of the heterogeneous field, called anchors, which capture local trends and reduce uncertainty in the prediction. Second, MAD allows us to integrate the datasets sequentially in a Bayesian framework such that it updates the posterior distribution, as a new dataset is included. The sequential assimilation can decrease computational burden significantly. We applied MAD to assimilate different combinations of the datasets, and then compared the inversion results. For the injection and tracer test assimilation, we calculated temporal moments of pressure build-up and breakthrough curves, respectively, to reduce the data dimension. A massive parallel flow and transport code PFLOTRAN is used for simulating the tracer test. For comparison, we used different metrics based on the breakthrough curves not used in the inversion, such as mean arrival time, peak concentration and early arrival time. This comparison intends to yield the combined data worth, i.e. which combination of the datasets is the most effective for a certain metric, which will be useful for guiding the further characterization effort at the site and also the future characterization projects at the other sites.
Emissions due to the natural gas storage well-casing blowout at Aliso Canyon/SS-25
NASA Astrophysics Data System (ADS)
Herndon, Scott; Daube, Conner; Jervis, Dylan; Yacovitch, Tara; Roscioli, Joseph; Curry, Jason; Nelson, David; Knighton, Berk
2017-04-01
The pronounced increase in unconventional gas production in North America over the last fifteen years has intensified interest in understanding emissions and leaks in the supply chain from well pad to end use. Los Angeles, California is home 19 million consumers of natural gas in both industry and domestic end use. The well blowout at Aliso Canyon Natural Gas Storage Facility in the greater Los Angeles area was quantified using the tracer flux ratio method (TFR). Over 400 tracer plume transects were collected, each lasting 15-300 seconds, using instrumentation aboard a mobile platform on 25 days between December 21, 2015 and March 9, 2016. The leak rate from October 23rd to February 11th has been estimated here using a combination of this work (TFR) and the flight mass balance (FMB) data [Conley et al., 2016]. This estimate relies on the TFR data as the most specific SS-25 emission dataset. Scaling the FMB dataset, the leak rate is projected from Oct 23rd to December 21th. Adding up the emissions inferred and measured suggests a total leak burden of 86,022 ± 8,393 metric tons of methane. This work quantified the emissions during the "bottom kill" procedure which halted the primary emission leak. The ethane to methane enhancement ratio observed downwind of the leak site is consistent with the content of ethane in the natural gas at this site and provides definitive evidence that the methane emission rate quantified via tracer flux ratio is not due to a nearby landfill or other potential biogenic sources. Additionally, the TFR approach employed here is assessing only the leaks due to the SS-25 well blowout and excludes other possible emissions at the facility.
Saeed, Mohammad
2017-05-01
Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.
Soil Organic Carbon Degradation during Incubation, Barrow, Alaska, 2012
Elizabeth Herndon; Ziming Yang; Baohua Gu
2017-01-05
This dataset provides information about soil organic carbon decomposition in Barrow soil incubation studies. The soil cores were collected from low-center polygon (Area A) and were incubated in the laboratory at different temperatures for up to 60 days. Transformations of soil organic carbon were characterized by UV and FT-IR, and small organic acids in water-soluble carbons were quantified by ion chromatography during the incubation (Herndon et al., 2015).
Bayesian Hierarchical Models to Augment the Mediterranean Forecast System
2010-09-30
In part 2 (Bonazzi et al., 2010), the impact of the ensemble forecast methodology based on MFS-Wind-BHM perturbations is documented. Forecast...absence of dt data stage inputs, the forecast impact of MFS-Error-BHM is neutral. Experiments are underway now to introduce dt back into the MFS-Error...BHM and quantify forecast impacts at MFS. MFS-SuperEnsemble-BHM We have assembled all needed datasets and completed algorithmic development
Temporal Coherence: A Model for Non-Stationarity in Natural and Simulated Wind Records
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rinker, Jennifer M.; Gavin, Henri P.; Clifton, Andrew
We present a novel methodology for characterizing and simulating non-stationary stochastic wind records. In this new method, non-stationarity is characterized and modelled via temporal coherence, which is quantified in the discrete frequency domain by probability distributions of the differences in phase between adjacent Fourier components. Temporal coherence can also be used to quantify non-stationary characteristics in wind data. Three case studies are presented that analyze the non-stationarity of turbulent wind data obtained at the National Wind Technology Center near Boulder, Colorado, USA. The first study compares the temporal and spectral characteristics of a stationary wind record and a non-stationary windmore » record in order to highlight their differences in temporal coherence. The second study examines the distribution of one of the proposed temporal coherence parameters and uses it to quantify the prevalence of nonstationarity in the dataset. The third study examines how temporal coherence varies with a range of atmospheric parameters to determine what conditions produce more non-stationarity.« less
Analysis of empty ATLAS pilot jobs
NASA Astrophysics Data System (ADS)
Love, P. A.; Alef, M.; Dal Pra, S.; Di Girolamo, A.; Forti, A.; Templon, J.; Vamvakopoulos, E.; ATLAS Collaboration
2017-10-01
In this analysis we quantify the wallclock time used by short empty pilot jobs on a number of WLCG compute resources. Pilot factory logs and site batch logs are used to provide independent accounts of the usage. Results show a wide variation of wallclock time used by short jobs depending on the site and queue, and changing with time. For a reference dataset of all jobs in August 2016, the fraction of wallclock time used by empty jobs per studied site ranged from 0.1% to 0.8%. Aside from the wall time used by empty pilots, we also looked at how many pilots were empty as a fraction of all pilots sent. Binning the August dataset into days, empty fractions between 2% and 90% were observed. The higher fractions correlate well with periods of few actual payloads being sent to the site.
Barlow, Andrew L; Macleod, Alasdair; Noppen, Samuel; Sanderson, Jeremy; Guérin, Christopher J
2010-12-01
One of the most routine uses of fluorescence microscopy is colocalization, i.e., the demonstration of a relationship between pairs of biological molecules. Frequently this is presented simplistically by the use of overlays of red and green images, with areas of yellow indicating colocalization of the molecules. Colocalization data are rarely quantified and can be misleading. Our results from both synthetic and biological datasets demonstrate that the generation of Pearson's correlation coefficient between pairs of images can overestimate positive correlation and fail to demonstrate negative correlation. We have demonstrated that the calculation of a thresholded Pearson's correlation coefficient using only intensity values over a determined threshold in both channels produces numerical values that more accurately describe both synthetic datasets and biological examples. Its use will bring clarity and accuracy to colocalization studies using fluorescent microscopy.
A large dataset of protein dynamics in the mammalian heart proteome.
Lau, Edward; Cao, Quan; Ng, Dominic C M; Bleakley, Brian J; Dincer, T Umut; Bot, Brian M; Wang, Ding; Liem, David A; Lam, Maggie P Y; Ge, Junbo; Ping, Peipei
2016-03-15
Protein stability is a major regulatory principle of protein function and cellular homeostasis. Despite limited understanding on mechanisms, disruption of protein turnover is widely implicated in diverse pathologies from heart failure to neurodegenerations. Information on global protein dynamics therefore has the potential to expand the depth and scope of disease phenotyping and therapeutic strategies. Using an integrated platform of metabolic labeling, high-resolution mass spectrometry and computational analysis, we report here a comprehensive dataset of the in vivo half-life of 3,228 and the expression of 8,064 cardiac proteins, quantified under healthy and hypertrophic conditions across six mouse genetic strains commonly employed in biomedical research. We anticipate these data will aid in understanding key mitochondrial and metabolic pathways in heart diseases, and further serve as a reference for methodology development in dynamics studies in multiple organ systems.
Genetic improvement of U.S. soybean in Maturity Groups II, III, and IV
USDA-ARS?s Scientific Manuscript database
Soybean [Glycine max (L.) Merr.] improvement via plant breeding has been critical for the success of the crop. The objective of this study was to quantify genetic change in yield and other traits that occurred over the past 80 years of North American soybean breeding in maturity groups (MGs) II, III...
An evaluation of the toxicogenomic data set for dibutyl phthalate (DBP) and male reproductive developmental effects was performed as part of a larger case study to test an approach for incorporating genomic data in risk assessment. The DBP toxicogenomic data set is composed of ni...
University President Compensation: Evidence from the United States
ERIC Educational Resources Information Center
Bai, Ge
2014-01-01
I examine whether compensation of the university president is a function of university type (i.e., top, research, master's, bachelor's/specialized). Using a panel dataset containing 761 private universities in the United States, I find that (i) the president's pay is linked to the university's performance in the previous period and (ii) the…
The Surface Brightness Contribution of II Peg: A Comparison of TiO Band Analysis and Doppler Imaging
NASA Astrophysics Data System (ADS)
Senavci, H. V.; O'Neal, D.; Hussain, G. A. J.; Barnes, J. R.
2015-01-01
We investigate the surface brightness contribution of the very well known active SB1 binary II Pegasi , to determine the star spot filling factor and the spot temperature parameters. In this context, we analyze 54 spectra of the system taken over 6 nights in September - October of 1996, using the 2.1m Otto Struve Telescope equipped with SES at the McDonald Observatory. We measure the spot temperatures and spot filling factors by fitting TiO molecular bands in this spectroscopic dataset, with model atmosphere approximation using ATLAS9 and with proxy stars obtained with the same instrument. The same dataset is then used to also produce surface spot maps using the Doppler imaging technique. We compare the spot filling factors obtained with the two independent techniques in order to better characterise the spot properties of the system and to better assess the limitations inherent to both techniques. The results obtained from both techniques show that the variation of spot filling factor as a function of phase agree well with each other, while the amount of TiO and DI spot
NASA Astrophysics Data System (ADS)
Zhao, Yinan; Ge, Jian; Yuan, Xiaoyong; Li, Xiaolin; Zhao, Tiffany; Wang, Cindy
2018-01-01
Metal absorption line systems in the distant quasar spectra have been used as one of the most powerful tools to probe gas content in the early Universe. The MgII λλ 2796, 2803 doublet is one of the most popular metal absorption lines and has been used to trace gas and global star formation at redshifts between ~0.5 to 2.5. In the past, machine learning algorithms have been used to detect absorption lines systems in the large sky survey, such as Principle Component Analysis, Gaussian Process and decision tree, but the overall detection process is not only complicated, but also time consuming. It usually takes a few months to go through the entire quasar spectral dataset from each of the Sloan Digital Sky Survey (SDSS) data release. In this work, we applied the deep neural network, or “ deep learning” algorithms, in the most recently SDSS DR14 quasar spectra and were able to randomly search 20000 quasar spectra and detect 2887 strong Mg II absorption features in just 9 seconds. Our detection algorithms were verified with previously released DR12 and DR7 data and published Mg II catalog and the detection accuracy is 90%. This is the first time that deep neural network has demonstrated its promising power in both speed and accuracy in replacing tedious, repetitive human work in searching for narrow absorption patterns in a big dataset. We will present our detection algorithms and also statistical results of the newly detected Mg II absorption lines.
Earth System Grid II (ESG): Turning Climate Model Datasets Into Community Resources
NASA Astrophysics Data System (ADS)
Williams, D.; Middleton, D.; Foster, I.; Nevedova, V.; Kesselman, C.; Chervenak, A.; Bharathi, S.; Drach, B.; Cinquni, L.; Brown, D.; Strand, G.; Fox, P.; Garcia, J.; Bernholdte, D.; Chanchio, K.; Pouchard, L.; Chen, M.; Shoshani, A.; Sim, A.
2003-12-01
High-resolution, long-duration simulations performed with advanced DOE SciDAC/NCAR climate models will produce tens of petabytes of output. To be useful, this output must be made available to global change impacts researchers nationwide, both at national laboratories and at universities, other research laboratories, and other institutions. To this end, we propose to create a new Earth System Grid, ESG-II - a virtual collaborative environment that links distributed centers, users, models, and data. ESG-II will provide scientists with virtual proximity to the distributed data and resources that they require to perform their research. The creation of this environment will significantly increase the scientific productivity of U.S. climate researchers by turning climate datasets into community resources. In creating ESG-II, we will integrate and extend a range of Grid and collaboratory technologies, including the DODS remote access protocols for environmental data, Globus Toolkit technologies for authentication, resource discovery, and resource access, and Data Grid technologies developed in other projects. We will develop new technologies for (1) creating and operating "filtering servers" capable of performing sophisticated analyses, and (2) delivering results to users. In so doing, we will simultaneously contribute to climate science and advance the state of the art in collaboratory technology. We expect our results to be useful to numerous other DOE projects. The three-year R&D program will be undertaken by a talented and experienced team of computer scientists at five laboratories (ANL, LBNL, LLNL, NCAR, ORNL) and one university (ISI), working in close collaboration with climate scientists at several sites.
Functional evaluation of out-of-the-box text-mining tools for data-mining tasks.
Jung, Kenneth; LePendu, Paea; Iyer, Srinivasan; Bauer-Mehren, Anna; Percha, Bethany; Shah, Nigam H
2015-01-01
The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications. We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Rantalainen, Timo; Chivers, Paola; Beck, Belinda R; Robertson, Sam; Hart, Nicolas H; Nimphius, Sophia; Weeks, Benjamin K; McIntyre, Fleur; Hands, Beth; Siafarikas, Aris
Most imaging methods, including peripheral quantitative computed tomography (pQCT), are susceptible to motion artifacts particularly in fidgety pediatric populations. Methods currently used to address motion artifact include manual screening (visual inspection) and objective assessments of the scans. However, previously reported objective methods either cannot be applied on the reconstructed image or have not been tested for distal bone sites. Therefore, the purpose of the present study was to develop and validate motion artifact classifiers to quantify motion artifact in pQCT scans. Whether textural features could provide adequate motion artifact classification performance in 2 adolescent datasets with pQCT scans from tibial and radial diaphyses and epiphyses was tested. The first dataset was split into training (66% of sample) and validation (33% of sample) datasets. Visual classification was used as the ground truth. Moderate to substantial classification performance (J48 classifier, kappa coefficients from 0.57 to 0.80) was observed in the validation dataset with the novel texture-based classifier. In applying the same classifier to the second cross-sectional dataset, a slight-to-fair (κ = 0.01-0.39) classification performance was observed. Overall, this novel textural analysis-based classifier provided a moderate-to-substantial classification of motion artifact when the classifier was specifically trained for the measurement device and population. Classification based on textural features may be used to prescreen obviously acceptable and unacceptable scans, with a subsequent human-operated visual classification of any remaining scans. Copyright © 2017 The International Society for Clinical Densitometry. Published by Elsevier Inc. All rights reserved.
Yang, Jie; McArdle, Conor; Daniels, Stephen
2014-01-01
A new data dimension-reduction method, called Internal Information Redundancy Reduction (IIRR), is proposed for application to Optical Emission Spectroscopy (OES) datasets obtained from industrial plasma processes. For example in a semiconductor manufacturing environment, real-time spectral emission data is potentially very useful for inferring information about critical process parameters such as wafer etch rates, however, the relationship between the spectral sensor data gathered over the duration of an etching process step and the target process output parameters is complex. OES sensor data has high dimensionality (fine wavelength resolution is required in spectral emission measurements in order to capture data on all chemical species involved in plasma reactions) and full spectrum samples are taken at frequent time points, so that dynamic process changes can be captured. To maximise the utility of the gathered dataset, it is essential that information redundancy is minimised, but with the important requirement that the resulting reduced dataset remains in a form that is amenable to direct interpretation of the physical process. To meet this requirement and to achieve a high reduction in dimension with little information loss, the IIRR method proposed in this paper operates directly in the original variable space, identifying peak wavelength emissions and the correlative relationships between them. A new statistic, Mean Determination Ratio (MDR), is proposed to quantify the information loss after dimension reduction and the effectiveness of IIRR is demonstrated using an actual semiconductor manufacturing dataset. As an example of the application of IIRR in process monitoring/control, we also show how etch rates can be accurately predicted from IIRR dimension-reduced spectral data. PMID:24451453
Wainwright, Haruko M; Seki, Akiyuki; Chen, Jinsong; Saito, Kimiaki
2017-02-01
This paper presents a multiscale data integration method to estimate the spatial distribution of air dose rates in the regional scale around the Fukushima Daiichi Nuclear Power Plant. We integrate various types of datasets, such as ground-based walk and car surveys, and airborne surveys, all of which have different scales, resolutions, spatial coverage, and accuracy. This method is based on geostatistics to represent spatial heterogeneous structures, and also on Bayesian hierarchical models to integrate multiscale, multi-type datasets in a consistent manner. The Bayesian method allows us to quantify the uncertainty in the estimates, and to provide the confidence intervals that are critical for robust decision-making. Although this approach is primarily data-driven, it has great flexibility to include mechanistic models for representing radiation transport or other complex correlations. We demonstrate our approach using three types of datasets collected at the same time over Fukushima City in Japan: (1) coarse-resolution airborne surveys covering the entire area, (2) car surveys along major roads, and (3) walk surveys in multiple neighborhoods. Results show that the method can successfully integrate three types of datasets and create an integrated map (including the confidence intervals) of air dose rates over the domain in high resolution. Moreover, this study provides us with various insights into the characteristics of each dataset, as well as radiocaesium distribution. In particular, the urban areas show high heterogeneity in the contaminant distribution due to human activities as well as large discrepancy among different surveys due to such heterogeneity. Copyright © 2016 Elsevier Ltd. All rights reserved.
Robust Statistical Fusion of Image Labels
Landman, Bennett A.; Asman, Andrew J.; Scoggins, Andrew G.; Bogovic, John A.; Xing, Fangxu; Prince, Jerry L.
2011-01-01
Image labeling and parcellation (i.e. assigning structure to a collection of voxels) are critical tasks for the assessment of volumetric and morphometric features in medical imaging data. The process of image labeling is inherently error prone as images are corrupted by noise and artifacts. Even expert interpretations are subject to subjectivity and the precision of the individual raters. Hence, all labels must be considered imperfect with some degree of inherent variability. One may seek multiple independent assessments to both reduce this variability and quantify the degree of uncertainty. Existing techniques have exploited maximum a posteriori statistics to combine data from multiple raters and simultaneously estimate rater reliabilities. Although quite successful, wide-scale application has been hampered by unstable estimation with practical datasets, for example, with label sets with small or thin objects to be labeled or with partial or limited datasets. As well, these approaches have required each rater to generate a complete dataset, which is often impossible given both human foibles and the typical turnover rate of raters in a research or clinical environment. Herein, we propose a robust approach to improve estimation performance with small anatomical structures, allow for missing data, account for repeated label sets, and utilize training/catch trial data. With this approach, numerous raters can label small, overlapping portions of a large dataset, and rater heterogeneity can be robustly controlled while simultaneously estimating a single, reliable label set and characterizing uncertainty. The proposed approach enables many individuals to collaborate in the construction of large datasets for labeling tasks (e.g., human parallel processing) and reduces the otherwise detrimental impact of rater unavailability. PMID:22010145
EBprot: Statistical analysis of labeling-based quantitative proteomics data.
Koh, Hiromi W L; Swa, Hannah L F; Fermin, Damian; Ler, Siok Ghee; Gunaratne, Jayantha; Choi, Hyungwon
2015-08-01
Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver-operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/. All MS data have been deposited in the ProteomeXchange with identifier PXD001426 (http://proteomecentral.proteomexchange.org/dataset/PXD001426/). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States
NASA Astrophysics Data System (ADS)
Robinson, Nathaniel Paul
Human driven alteration of the earth's terrestrial surface is accelerating through land use changes, intensification of human activity, climate change, and other anthropogenic pressures. These changes occur at broad spatio-temporal scales, challenging our ability to effectively monitor and assess the impacts and subsequent conservation strategies. While satellite remote sensing (SRS) products enable monitoring of the earth's terrestrial surface continuously across space and time, the practical applications for conservation and management of these products are limited. Often the processes driving ecological change occur at fine spatial resolutions and are undetectable given the resolution of available datasets. Additionally, the links between SRS data and ecologically meaningful metrics are weak. Recent advances in cloud computing technology along with the growing record of high resolution SRS data enable the development of SRS products that quantify ecologically meaningful variables at relevant scales applicable for conservation and management. The focus of my dissertation is to improve the applicability of terrestrial gross and net primary productivity (GPP/NPP) datasets for the conterminous United States (CONUS). In chapter one, I develop a framework for creating high resolution datasets of vegetation dynamics. I use the entire archive of Landsat 5, 7, and 8 surface reflectance data and a novel gap filling approach to create spatially continuous 30 m, 16-day composites of the normalized difference vegetation index (NDVI) from 1986 to 2016. In chapter two, I integrate this with other high resolution datasets and the MOD17 algorithm to create the first high resolution GPP and NPP datasets for CONUS. I demonstrate the applicability of these products for conservation and management, showing the improvements beyond currently available products. In chapter three, I utilize this dataset to evaluate the relationships between land ownership and terrestrial production across the CONUS domain. The main results of this work are three publicly available datasets: 1) 30 m Landsat NDVI; 2) 250 m MODIS based GPP and NPP; and 3) 30 m Landsat based GPP and NPP. My goal is that these products prove useful for the wider scientific, conservation, and land management communities as we continue to strive for better conservation and management practices.
NASA Astrophysics Data System (ADS)
Merchant, C. J.; Hulley, G. C.
2013-12-01
There are many datasets describing the evolution of global sea surface temperature (SST) over recent decades -- so why make another one? Answer: to provide observations of SST that have particular qualities relevant to climate applications: independence, accuracy and stability. This has been done within the European Space Agency (ESA) Climate Change Initative (CCI) project on SST. Independence refers to the fact that the new SST CCI dataset is not derived from or tuned to in situ observations. This matters for climate because the in situ observing network used to assess marine climate change (1) was not designed to monitor small changes over decadal timescales, and (2) has evolved significantly in its technology and mix of types of observation, even during the past 40 years. The potential for significant artefacts in our picture of global ocean surface warming is clear. Only by having an independent record can we confirm (or refute) that the work done to remove biases/trend artefacts in in-situ datasets has been successful. Accuracy is the degree to which SSTs are unbiased. For climate applications, a common accuracy target is 0.1 K for all regions of the ocean. Stability is the degree to which the bias, if any, in a dataset is constant over time. Long-term instability introduces trend artefacts. To observe trends of the magnitude of 'global warming', SST datasets need to be stable to <5 mK/year. The SST CCI project has produced a satellite-based dataset that addresses these characteristics relevant to climate applications. Satellite radiances (brightness temperatures) have been harmonised exploiting periods of overlapping observations between sensors. Less well-characterised sensors have had their calibration tuned to that of better characterised sensors (at radiance level). Non-conventional retrieval methods (optimal estimation) have been employed to reduce regional biases to the 0.1 K level, a target violated in most satellite SST datasets. Models for quantifying uncertainty have been developed to attach uncertainty to SST across a range of space-time scales. The stability of the data has been validated.
Data Recommender: An Alternative Way to Discover Open Scientific Datasets
NASA Astrophysics Data System (ADS)
Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.
2017-12-01
Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce similar and serendipitous data recommendations. It measures the relevance between datasets based on their properties, and search and download patterns. We evaluated the recommendation approach in a user study, and the obtained user judgments revealed the ability of the approach to accurately quantify the relevance of the datasets.
Acoustic Telemetry Validates a Citizen Science Approach for Monitoring Sharks on Coral Reefs
Vianna, Gabriel M. S.; Meekan, Mark G.; Bornovski, Tova H.; Meeuwig, Jessica J.
2014-01-01
Citizen science is promoted as a simple and cost-effective alternative to traditional approaches for the monitoring of populations of marine megafauna. However, the reliability of datasets collected by these initiatives often remains poorly quantified. We compared datasets of shark counts collected by professional dive guides with acoustic telemetry data from tagged sharks collected at the same coral reef sites over a period of five years. There was a strong correlation between the number of grey reef sharks (Carcharhinus amblyrhynchos) observed by dive guides and the telemetry data at both daily and monthly intervals, suggesting that variation in relative abundance of sharks was detectable in datasets collected by dive guides in a similar manner to data derived from telemetry at these time scales. There was no correlation between the number or mean depth of sharks recorded by telemetry and the presence of tourist divers, suggesting that the behaviour of sharks was not affected by the presence of divers during our study. Data recorded by dive guides showed that current strength and temperature were important drivers of the relative abundance of sharks at monitored sites. Our study validates the use of datasets of shark abundance collected by professional dive guides in frequently-visited dive sites in Palau, and supports the participation of experienced recreational divers as contributors to long-term monitoring programs of shark populations. PMID:24760081
Acoustic telemetry validates a citizen science approach for monitoring sharks on coral reefs.
Vianna, Gabriel M S; Meekan, Mark G; Bornovski, Tova H; Meeuwig, Jessica J
2014-01-01
Citizen science is promoted as a simple and cost-effective alternative to traditional approaches for the monitoring of populations of marine megafauna. However, the reliability of datasets collected by these initiatives often remains poorly quantified. We compared datasets of shark counts collected by professional dive guides with acoustic telemetry data from tagged sharks collected at the same coral reef sites over a period of five years. There was a strong correlation between the number of grey reef sharks (Carcharhinus amblyrhynchos) observed by dive guides and the telemetry data at both daily and monthly intervals, suggesting that variation in relative abundance of sharks was detectable in datasets collected by dive guides in a similar manner to data derived from telemetry at these time scales. There was no correlation between the number or mean depth of sharks recorded by telemetry and the presence of tourist divers, suggesting that the behaviour of sharks was not affected by the presence of divers during our study. Data recorded by dive guides showed that current strength and temperature were important drivers of the relative abundance of sharks at monitored sites. Our study validates the use of datasets of shark abundance collected by professional dive guides in frequently-visited dive sites in Palau, and supports the participation of experienced recreational divers as contributors to long-term monitoring programs of shark populations.
An MCMC determination of the primordial helium abundance
NASA Astrophysics Data System (ADS)
Aver, Erik; Olive, Keith A.; Skillman, Evan D.
2012-04-01
Spectroscopic observations of the chemical abundances in metal-poor H II regions provide an independent method for estimating the primordial helium abundance. H II regions are described by several physical parameters such as electron density, electron temperature, and reddening, in addition to y, the ratio of helium to hydrogen. It had been customary to estimate or determine self-consistently these parameters to calculate y. Frequentist analyses of the parameter space have been shown to be successful in these parameter determinations, and Markov Chain Monte Carlo (MCMC) techniques have proven to be very efficient in sampling this parameter space. Nevertheless, accurate determination of the primordial helium abundance from observations of H II regions is constrained by both systematic and statistical uncertainties. In an attempt to better reduce the latter, and continue to better characterize the former, we apply MCMC methods to the large dataset recently compiled by Izotov, Thuan, & Stasińska (2007). To improve the reliability of the determination, a high quality dataset is needed. In pursuit of this, a variety of cuts are explored. The efficacy of the He I λ4026 emission line as a constraint on the solutions is first examined, revealing the introduction of systematic bias through its absence. As a clear measure of the quality of the physical solution, a χ2 analysis proves instrumental in the selection of data compatible with the theoretical model. Nearly two-thirds of the observations fall outside a standard 95% confidence level cut, which highlights the care necessary in selecting systems and warrants further investigation into potential deficiencies of the model or data. In addition, the method also allows us to exclude systems for which parameter estimations are statistical outliers. As a result, the final selected dataset gains in reliability and exhibits improved consistency. Regression to zero metallicity yields Yp = 0.2534 ± 0.0083, in broad agreement with the WMAP result. The inclusion of more observations shows promise for further reducing the uncertainty, but more high quality spectra are required.
CO2 CH4 flux Air temperature Soil temperature and Soil moisture, Barrow, Alaska 2013 ver. 1
Margaret Torn
2015-01-14
This dataset consists of field measurements of CO2 and CH4 flux, as well as soil properties made during 2013 in Areas A-D of Intensive Site 1 at the Next-Generation Ecosystem Experiments (NGEE) Arctic site near Barrow, Alaska. Included are i) measurements of CO2 and CH4 flux made from June to September (ii) Calculation of corresponding Gross Primary Productivity (GPP) and CH4 exchange (transparent minus opaque) between atmosphere and the ecosystem (ii) Measurements of Los Gatos Research (LGR) chamber air temperature made from June to September (ii) measurements of surface layer depth, type of surface layer, soil temperature and soil moisture from June to September.
Rooftop Energy Potential of Low Income Communities in America REPLICA
Mooney, Meghan (ORCID:0000000309406958); Sigrin, Ben
1970-01-01
The Rooftop Energy Potential of Low Income Communities in America REPLICA data set provides estimates of residential rooftop solar technical potential at the tract-level with emphasis on estimates for Low and Moderate Income LMI populations. In addition to technical potential REPLICA is comprised of 10 additional datasets at the tract-level to provide socio-demographic and market context. The model year vintage of REPLICA is 2015. The LMI solar potential estimates are made at the tract level grouped by Area Median Income AMI income tenure and building type. These estimates are based off of LiDAR data of 128 metropolitan areas statistical modeling and ACS 2011-2015 demographic data. The remaining datasets are supplemental datasets that can be used in conjunction with the technical potential data for general LMI solar analysis planning and policy making. The core dataset is a wide-format CSV file seeds_ii_replica.csv that can be tagged to a tract geometry using the GEOID or GISJOIN fields. In addition users can download geographic shapefiles for the main or supplemental datasets. This dataset was generated as part of the larger NREL-led SEEDSII Solar Energy Evolution and Diffusion Studies project and specifically for the NREL technical report titled Rooftop Solar Technical Potential for Low-to-Moderate Income Households in the United States by Sigrin and Mooney 2018. This dataset is intended to give researchers planners advocates and policy-makers access to credible data to analyze low-income solar issues and potentially perform cost-benefit analysis for program design. To explore the data in an interactive web mapping environment use the NREL SolarForAll app.
Analyzing How We Do Analysis and Consume Data, Results from the SciDAC-Data Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, P.; Aliaga, L.; Mubarak, M.
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data deliverymore » is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption« less
Analyzing how we do Analysis and Consume Data, Results from the SciDAC-Data Project
NASA Astrophysics Data System (ADS)
Ding, P.; Aliaga, L.; Mubarak, M.; Tsaris, A.; Norman, A.; Lyon, A.; Ross, R.
2017-10-01
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.
Lee, Danny; Greer, Peter B; Pollock, Sean; Kim, Taeho; Keall, Paul
2016-05-01
The dynamic keyhole is a new MR image reconstruction method for thoracic and abdominal MR imaging. To date, this method has not been investigated with cancer patient magnetic resonance imaging (MRI) data. The goal of this study was to assess the dynamic keyhole method for the task of lung tumor localization using cine-MR images reconstructed in the presence of respiratory motion. The dynamic keyhole method utilizes a previously acquired a library of peripheral k-space datasets at similar displacement and phase (where phase is simply used to determine whether the breathing is inhale to exhale or exhale to inhale) respiratory bins in conjunction with central k-space datasets (keyhole) acquired. External respiratory signals drive the process of sorting, matching, and combining the two k-space streams for each respiratory bin, thereby achieving faster image acquisition without substantial motion artifacts. This study was the first that investigates the impact of k-space undersampling on lung tumor motion and area assessment across clinically available techniques (zero-filling and conventional keyhole). In this study, the dynamic keyhole, conventional keyhole and zero-filling methods were compared to full k-space dataset acquisition by quantifying (1) the keyhole size required for central k-space datasets for constant image quality across sixty four cine-MRI datasets from nine lung cancer patients, (2) the intensity difference between the original and reconstructed images in a constant keyhole size, and (3) the accuracy of tumor motion and area directly measured by tumor autocontouring. For constant image quality, the dynamic keyhole method, conventional keyhole, and zero-filling methods required 22%, 34%, and 49% of the keyhole size (P < 0.0001), respectively, compared to the full k-space image acquisition method. Compared to the conventional keyhole and zero-filling reconstructed images with the keyhole size utilized in the dynamic keyhole method, an average intensity difference of the dynamic keyhole reconstructed images (P < 0.0001) was minimal, and resulted in the accuracy of tumor motion within 99.6% (P < 0.0001) and the accuracy of tumor area within 98.0% (P < 0.0001) for lung tumor monitoring applications. This study demonstrates that the dynamic keyhole method is a promising technique for clinical applications such as image-guided radiation therapy requiring the MR monitoring of thoracic tumors. Based on the results from this study, the dynamic keyhole method could increase the imaging frequency by up to a factor of five compared with full k-space methods for real-time lung tumor MRI.
NASA Astrophysics Data System (ADS)
Nlandu Kamavuako, Ernest; Scheme, Erik Justin; Englehart, Kevin Brian
2016-08-01
Objective. For over two decades, Hudgins’ set of time domain features have extensively been applied for classification of hand motions. The calculation of slope sign change and zero crossing features uses a threshold to attenuate the effect of background noise. However, there is no consensus on the optimum threshold value. In this study, we investigate for the first time the effect of threshold selection on the feature space and classification accuracy using multiple datasets. Approach. In the first part, four datasets were used, and classification error (CE), separability index, scatter matrix separability criterion, and cardinality of the features were used as performance measures. In the second part, data from eight classes were collected during two separate days with two days in between from eight able-bodied subjects. The threshold for each feature was computed as a factor (R = 0:0.01:4) times the average root mean square of data during rest. For each day, we quantified CE for R = 0 (CEr0) and minimum error (CEbest). Moreover, a cross day threshold validation was applied where, for example, CE of day two (CEodt) is computed based on optimum threshold from day one and vice versa. Finally, we quantified the effect of the threshold when using training data from one day and test data of the other. Main results. All performance metrics generally degraded with increasing threshold values. On average, CEbest (5.26 ± 2.42%) was significantly better than CEr0 (7.51 ± 2.41%, P = 0.018), and CEodt (7.50 ± 2.50%, P = 0.021). During the two-fold validation between days, CEbest performed similar to CEr0. Interestingly, when using the threshold values optimized per subject from day one and day two respectively, on the cross-days classification, the performance decreased. Significance. We have demonstrated that threshold value has a strong impact on the feature space and that an optimum threshold can be quantified. However, this optimum threshold is highly data and subject driven and thus do not generalize well. There is a strong evidence that R = 0 provides a good trade-off between system performance and generalization. These findings are important for practical use of pattern recognition based myoelectric control.
NASA Astrophysics Data System (ADS)
Schubert, Brian A.; Jahren, A. Hope
2015-10-01
Modern and ancient wood is a valuable terrestrial record of carbon ultimately derived from the atmosphere and oxygen inherited from local meteoric water. Many modern and fossil wood specimens display rings sufficiently thick for intra-annual sampling, and analytical techniques are rapidly improving to allow for precise carbon and oxygen isotope measurements on very small samples, yielding unprecedented resolution of seasonal isotope records. However, the interpretation of these records across diverse environments has been problematic because a unifying model for the quantitative interpretation of seasonal climate parameters from oxygen isotopes in wood is lacking. Towards such a model, we compiled a dataset of intra-ring oxygen isotope measurements on modern wood cellulose (δ18Ocell) from 33 globally distributed sites. Five of these sites represent original data produced for this study, while the data for the other 28 sites were taken from the literature. We defined the intra-annual change in oxygen isotope value of wood cellulose [Δ(δ18Ocell)] as the difference between the maximum and minimum δ18Ocell values determined within the ring. Then, using the monthly-resolved dataset of the oxygen isotope composition of meteoric water (δ18OMW) provided by the Global Network of Isotopes in Precipitation database, we quantified the empirical relationship between the intra-annual change in meteoric water [Δ(δ18OMW)] and Δ(δ18Ocell). We then used monthly-resolved datasets of temperature and precipitation to develop a global relationship between Δ(δ18OMW) and maximum/minimum monthly temperatures and winter/summer precipitation amounts. By combining these relationships we produced a single equation that explains much of the variability in the intra-ring δ18Ocell signal through only changes in seasonal temperature and precipitation amount (R2 = 0.82). We show how our recent model that quantifies seasonal precipitation from intra-ring carbon isotope profiles can be incorporated into the oxygen model above in order to separately quantify both seasonal temperature and seasonal precipitation. Determination of seasonal climate variation using high-resolution isotopes in tree-ring records makes possible a new understanding of the seasonal fluctuations that control the environmental conditions to which organisms are subject, both during recent history and in the geologic past.
Kamavuako, Ernest Nlandu; Scheme, Erik Justin; Englehart, Kevin Brian
2016-08-01
For over two decades, Hudgins' set of time domain features have extensively been applied for classification of hand motions. The calculation of slope sign change and zero crossing features uses a threshold to attenuate the effect of background noise. However, there is no consensus on the optimum threshold value. In this study, we investigate for the first time the effect of threshold selection on the feature space and classification accuracy using multiple datasets. In the first part, four datasets were used, and classification error (CE), separability index, scatter matrix separability criterion, and cardinality of the features were used as performance measures. In the second part, data from eight classes were collected during two separate days with two days in between from eight able-bodied subjects. The threshold for each feature was computed as a factor (R = 0:0.01:4) times the average root mean square of data during rest. For each day, we quantified CE for R = 0 (CEr0) and minimum error (CEbest). Moreover, a cross day threshold validation was applied where, for example, CE of day two (CEodt) is computed based on optimum threshold from day one and vice versa. Finally, we quantified the effect of the threshold when using training data from one day and test data of the other. All performance metrics generally degraded with increasing threshold values. On average, CEbest (5.26 ± 2.42%) was significantly better than CEr0 (7.51 ± 2.41%, P = 0.018), and CEodt (7.50 ± 2.50%, P = 0.021). During the two-fold validation between days, CEbest performed similar to CEr0. Interestingly, when using the threshold values optimized per subject from day one and day two respectively, on the cross-days classification, the performance decreased. We have demonstrated that threshold value has a strong impact on the feature space and that an optimum threshold can be quantified. However, this optimum threshold is highly data and subject driven and thus do not generalize well. There is a strong evidence that R = 0 provides a good trade-off between system performance and generalization. These findings are important for practical use of pattern recognition based myoelectric control.
NASA Astrophysics Data System (ADS)
Hawkins, Ed; Day, Jonny; Tietsche, Steffen
2016-04-01
Recent years have seen significant developments in seasonal-to-interannual timescale climate prediction capabilities. However, until recently the potential of such systems to predict Arctic climate had not been assessed. We describe a multi-model predictability experiment which was run as part of the Arctic Predictability and Prediction On Seasonal to Inter-annual TimEscales (APPOSITE) project. The main goal of APPOSITE was to quantify the timescales on which Arctic climate is predictable. In order to achieve this, a coordinated set of idealised initial-value predictability experiments, with seven general circulation models, was conducted. This was the first model intercomparison project designed to quantify the predictability of Arctic climate on seasonal to inter-annual timescales. Here we provide a summary and update of the project's results which include: (1) quantifying the predictability of Arctic climate, especially sea ice; (2) the state-dependence of this predictability, finding that extreme years are potentially more predictable than neutral years; (3) analysing a spring 'predictability barrier' to skillful forecasts; (4) initial sea ice thickness information provides much of the skill for summer forecasts; (5) quantifying the sources of error growth and uncertainty in Arctic predictions. The dataset is now publicly available.
Yang, Yu; Fritzsching, Keith J; Hong, Mei
2013-11-01
A multi-objective genetic algorithm is introduced to predict the assignment of protein solid-state NMR (SSNMR) spectra with partial resonance overlap and missing peaks due to broad linewidths, molecular motion, and low sensitivity. This non-dominated sorting genetic algorithm II (NSGA-II) aims to identify all possible assignments that are consistent with the spectra and to compare the relative merit of these assignments. Our approach is modeled after the recently introduced Monte-Carlo simulated-annealing (MC/SA) protocol, with the key difference that NSGA-II simultaneously optimizes multiple assignment objectives instead of searching for possible assignments based on a single composite score. The multiple objectives include maximizing the number of consistently assigned peaks between multiple spectra ("good connections"), maximizing the number of used peaks, minimizing the number of inconsistently assigned peaks between spectra ("bad connections"), and minimizing the number of assigned peaks that have no matching peaks in the other spectra ("edges"). Using six SSNMR protein chemical shift datasets with varying levels of imperfection that was introduced by peak deletion, random chemical shift changes, and manual peak picking of spectra with moderately broad linewidths, we show that the NSGA-II algorithm produces a large number of valid and good assignments rapidly. For high-quality chemical shift peak lists, NSGA-II and MC/SA perform similarly well. However, when the peak lists contain many missing peaks that are uncorrelated between different spectra and have chemical shift deviations between spectra, the modified NSGA-II produces a larger number of valid solutions than MC/SA, and is more effective at distinguishing good from mediocre assignments by avoiding the hazard of suboptimal weighting factors for the various objectives. These two advantages, namely diversity and better evaluation, lead to a higher probability of predicting the correct assignment for a larger number of residues. On the other hand, when there are multiple equally good assignments that are significantly different from each other, the modified NSGA-II is less efficient than MC/SA in finding all the solutions. This problem is solved by a combined NSGA-II/MC algorithm, which appears to have the advantages of both NSGA-II and MC/SA. This combination algorithm is robust for the three most difficult chemical shift datasets examined here and is expected to give the highest-quality de novo assignment of challenging protein NMR spectra.
Comparison of software tools for kinetic evaluation of chemical degradation data.
Ranke, Johannes; Wöltjen, Janina; Meinecke, Stefan
2018-01-01
For evaluating the fate of xenobiotics in the environment, a variety of degradation or environmental metabolism experiments are routinely conducted. The data generated in such experiments are evaluated by optimizing the parameters of kinetic models in a way that the model simulation fits the data. No comparison of the main software tools currently in use has been published to date. This article shows a comparison of numerical results as well as an overall, somewhat subjective comparison based on a scoring system using a set of criteria. The scoring was separately performed for two types of uses. Uses of type I are routine evaluations involving standard kinetic models and up to three metabolites in a single compartment. Evaluations involving non-standard model components, more than three metabolites or more than a single compartment belong to use type II. For use type I, usability is most important, while the flexibility of the model definition is most important for use type II. Test datasets were assembled that can be used to compare the numerical results for different software tools. These datasets can also be used to ensure that no unintended or erroneous behaviour is introduced in newer versions. In the comparison of numerical results, good agreement between the parameter estimates was observed for datasets with up to three metabolites. For the now unmaintained reference software DegKinManager/ModelMaker, and for OpenModel which is still under development, user options were identified that should be taken care of in order to obtain results that are as reliable as possible. Based on the scoring system mentioned above, the software tools gmkin, KinGUII and CAKE received the best scores for use type I. Out of the 15 software packages compared with respect to use type II, again gmkin and KinGUII were the first two, followed by the script based tool mkin, which is the technical basis for gmkin, and by OpenModel. Based on the evaluation using the system of criteria mentioned above and the comparison of numerical results for the suite of test datasets, the software tools gmkin, KinGUII and CAKE are recommended for use type I, and gmkin and KinGUII for use type II. For users that prefer to work with scripts instead of graphical user interfaces, mkin is recommended. For future software evaluations, it is recommended to include a measure for the total time that a typical user needs for a kinetic evaluation into the scoring scheme. It is the hope of the authors that the publication of test data, source code and overall rankings foster the evolution of useful and reliable software in the field.
Noormohammadpour, Pardis; Tavana, Bahareh; Mansournia, Mohammad Ali; Zeinalizadeh, Mehdi; Mirzashahi, Babak; Rostami, Mohsen; Kordi, Ramin
2018-05-01
Translation and cultural adaptation of the National Institutes of Health (NIH) Task Force's minimal dataset. The purpose of this study was to evaluate validity and reliability of the Farsi version of NIH Task Force's recommended multidimensional minimal dataset for research on chronic low back pain (CLBP). Considering the high treatment cost of CLBP and its increasing prevalence, NIH Pain Consortium developed research standards (including recommendations for definitions, a minimum dataset, and outcomes' report) for studies regarding CLBP. Application of these recommendations could standardize research and improve comparability of different studies in CLBP. This study has three phases: translation of dataset into Farsi and its cultural adaptation, assessment of pre-final version of dataset's comprehensibility via a pilot study, and investigation of the reliability and validity of final version of translated dataset. Subjects were 250 patients with CLBP. Test-retest reliability, content validity, and convergent validity (correlations among different dimensions of dataset and Farsi versions of Oswestry Disability Index, Roland Morris Disability Questionnaire, Fear-Avoidance Belief Questionnaire, and Beck Depression Inventory-II) were assessed. The Farsi version demonstrated good/excellent convergent validity (the correlation coefficient between impact dimension and ODI was r = 0.75 [P < 0.001], between impact dimension and Roland-Morris Disability Questionnaire was r = 0.80 [P < 0.001], and between psychological dimension and BDI was r = 0.62 [P < 0.001]). The test-retest reliability was also strong (intraclass correlation coefficient value ranged between 0.70 and 0.95) and the internal consistency was good/excellent (Chronbach's alpha coefficients' value for two main dimensions including impact dimension and psychological dimension were 0.91 and 0.82 [P < 0.001], respectively). In addition, its face validity and content validity were acceptable. The Farsi version of minimal dataset for research on CLBP is a reliable and valid instrument for data gathering in patients with CLBP. This minimum dataset can be a step toward standardization of research regarding CLBP. 3.
NASA Astrophysics Data System (ADS)
Walter, C. A.; Braun, A.; Fotopoulos, G.
2017-12-01
Research is being conducted to develop an Unmanned Aerial System (UAS) that is capable of reliably and efficiently collecting high resolution, industry standard magnetic data (magnetic data with a fourth difference of +/- 0.05 nT) via an optically pumped vapour magnetometer. The benefits of developing a UAS with these capabilities include improvements in the resolution of localized airborne surveys (2.5 km by 2.5 km) and the ability to conduct 3D magnetic gradiometry surveys in the observation gap evident between traditional terrestrial and manned airborne magnetic surveys (surface elevation up to 120 m). Quantifying the extent of an optically pumped vapour magnetometer's 3D orientation variations, while in-flight and suspended under a UAS, is a significant advancement to existing knowledge as optically pumped magnetometers have an orientation-dependent (to the primary magnetic field vector) process for measuring the magnetic field. This study investigates the orientation characteristics of a GEM Systems potassium vapour magnetometer, GSMP-35U, while semi-rigidly suspended 3 m under a DJI S900, heavy-lift multi-rotor UAV (Unmanned Aerial Vehicle) during an airborne surveying campaign conducted Northeast of Thunder Bay, Ontario, Canada. A nine degrees of freedom IMU (Inertial Measurement Unit), the Adafruit GY-80, was used to quantify the 3D orientation variations (yaw, pitch and roll) of the magnetic sensor during flight. The orientation and magnetic datasets were indexed and linked with a date and time stamp (within 1 ms) via a Raspberry Pi 2, acting as an on-board computer and data storage system. Analysing the two datasets allowed for the in-flight orientation variations of the potassium vapour magnetometer to be directly compared with the gathered magnetic and signal quality data of the magnetometer. The in-flight orientation characteristics of the magnetometer were also quantified for a range of air-speeds and flight maneuvers throughout the survey. Overall, this study validates that maintaining magnetometer yaw, pitch and roll variations within quantified limits (+/- 5 degrees yaw, +/- 10 degrees pitch, +/- 10 degrees roll) during flight can yield reliable and repeatable industry standard magnetic measurements at an increased spatial resolution over manned airborne surveys.
NASA Technical Reports Server (NTRS)
Crawford, Winifred C.
2010-01-01
The AMU created new logistic regression equations in an effort to increase the skill of the Objective Lightning Forecast Tool developed in Phase II (Lambert 2007). One equation was created for each of five sub-seasons based on the daily lightning climatology instead of by month as was done in Phase II. The assumption was that these equations would capture the physical attributes that contribute to thunderstorm formation more so than monthly equations. However, the SS values in Section 5.3.2 showed that the Phase III equations had worse skill than the Phase II equations and, therefore, will not be transitioned into operations. The current Objective Lightning Forecast Tool developed in Phase II will continue to be used operationally in MIDDS. Three warm seasons were added to the Phase II dataset to increase the POR from 17 to 20 years (1989-2008), and data for October were included since the daily climatology showed lightning occurrence extending into that month. None of the three methods tested to determine the start of the subseason in each individual year were able to discern the start dates with consistent accuracy. Therefore, the start dates were determined by the daily climatology shown in Figure 10 and were the same in every year. The procedures used to create the predictors and develop the equations were identical to those in Phase II. The equations were made up of one to three predictors. TI and the flow regime probabilities were the top predictors followed by 1-day persistence, then VT and Ll. Each equation outperformed four other forecast methods by 7-57% using the verification dataset, but the new equations were outperformed by the Phase II equations in every sub-season. The reason for the degradation may be due to the fact that the same sub-season start dates were used in every year. It is likely there was overlap of sub-season days at the beginning and end of each defined sub-season in each individual year, which could very well affect equation performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Priel, Nadav; Landsman, Hagar; Manfredini, Alessandro
We propose a safeguard procedure for statistical inference that provides universal protection against mismodeling of the background. The method quantifies and incorporates the signal-like residuals of the background model into the likelihood function, using information available in a calibration dataset. This prevents possible false discovery claims that may arise through unknown mismodeling, and corrects the bias in limit setting created by overestimated or underestimated background. We demonstrate how the method removes the bias created by an incomplete background model using three realistic case studies.
The importance of data curation on QSAR Modeling ...
During the last few decades many QSAR models and tools have been developed at the US EPA, including the widely used EPISuite. During this period the arsenal of computational capabilities supporting cheminformatics has broadened dramatically with multiple software packages. These modern tools allow for more advanced techniques in terms of chemical structure representation and storage, as well as enabling automated data-mining and standardization approaches to examine and fix data quality issues.This presentation will investigate the impact of data curation on the reliability of QSAR models being developed within the EPA‘s National Center for Computational Toxicology. As part of this work we have attempted to disentangle the influence of the quality versus quantity of data based on the Syracuse PHYSPROP database partly used by EPISuite software. We will review our automated approaches to examining key datasets related to the EPISuite data to validate across chemical structure representations (e.g., mol file and SMILES) and identifiers (chemical names and registry numbers) and approaches to standardize data into QSAR-ready formats prior to modeling procedures. Our efforts to quantify and segregate data into quality categories has allowed us to evaluate the resulting models that can be developed from these data slices and to quantify to what extent efforts developing high-quality datasets have the expected pay-off in terms of predicting performance. The most accur
Quantifying and Mapping Global Data Poverty.
Leidig, Mathias; Teeuw, Richard M
2015-01-01
Digital information technologies, such as the Internet, mobile phones and social media, provide vast amounts of data for decision-making and resource management. However, access to these technologies, as well as their associated software and training materials, is not evenly distributed: since the 1990s there has been concern about a "Digital Divide" between the data-rich and the data-poor. We present an innovative metric for evaluating international variations in access to digital data: the Data Poverty Index (DPI). The DPI is based on Internet speeds, numbers of computer owners and Internet users, mobile phone ownership and network coverage, as well as provision of higher education. The datasets used to produce the DPI are provided annually for almost all the countries of the world and can be freely downloaded. The index that we present in this 'proof of concept' study is the first to quantify and visualise the problem of global data poverty, using the most recent datasets, for 2013. The effects of severe data poverty, particularly limited access to geoinformatic data, free software and online training materials, are discussed in the context of sustainable development and disaster risk reduction. The DPI highlights countries where support is needed for improving access to the Internet and for the provision of training in geoinfomatics. We conclude that the DPI is of value as a potential metric for monitoring the Sustainable Development Goals of the Sendai Framework for Disaster Risk Reduction.
NASA Astrophysics Data System (ADS)
Adera, S.; Larsen, L.; Levy, M. C.; Thompson, S. E.
2017-12-01
In the Brazilian rainforest-savanna transition zone, deforestation has the potential to significantly affect rainfall by disrupting rainfall recycling, the process by which regional evapotranspiration contributes to regional rainfall. Understanding rainfall recycling in this region is important not only for sustaining Amazon and Cerrado ecosystems, but also for cattle ranching, agriculture, hydropower generation, and drinking water management. Simulations in previous studies suggest complex, scale-dependent interactions between forest cover connectivity and rainfall. For example, the size and distribution of deforested patches has been found to affect rainfall quantity and spatial distribution. Here we take an empirical approach, using the spatial connectivity of rainfall as an indicator of rainfall recycling, to ask: as forest cover connectivity decreased from 1981 - 2015, how did the spatial connectivity of rainfall change in the Brazilian rainforest-savanna transition zone? We use satellite forest cover and rainfall data covering this period of intensive forest cover loss in the region (forest cover from the Hansen Global Forest Change dataset; rainfall from the Climate Hazards Infrared Precipitation with Stations dataset). Rainfall spatial connectivity is quantified using transfer entropy, a metric from information theory, and summarized using network statistics. Networks of connectivity are quantified for paired deforested and non-deforested regions before deforestation (1981-1995) and during/after deforestation (2001-2015). Analyses reveal a decline in spatial connectivity networks of rainfall following deforestation.
Les Houches 2017: Physics at TeV Colliders Standard Model Working Group Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andersen, J.R.; et al.
This Report summarizes the proceedings of the 2017 Les Houches workshop on Physics at TeV Colliders. Session 1 dealt with (I) new developments relevant for high precision Standard Model calculations, (II) theoretical uncertainties and dataset dependence of parton distribution functions, (III) new developments in jet substructure techniques, (IV) issues in the theoretical description of the production of Standard Model Higgs bosons and how to relate experimental measurements, (V) phenomenological studies essential for comparing LHC data from Run II with theoretical predictions and projections for future measurements, and (VI) new developments in Monte Carlo event generators.
NASA Astrophysics Data System (ADS)
Cao, Chao-Tun; Bi, Yakun; Cao, Chenzhong
2016-06-01
Fifty-seven samples of model compounds, 4,4‧-disubstituted benzylidene anilines, p-X-ArCH = NAr-p-Y were synthesized. Their infrared absorption spectra were recorded, and the stretching vibration frequencies νCdbnd N of the Cdbnd N bridging bond were determined. New stretching vibration mode was proposed by means of the analysis of the factors affecting νCdbnd N, that is there are mainly three modes in the stretching vibration of Cdbnd N bond: (I) polar double bond form Cdbnd N, (II) single bond-ion form C+-N- and (III) single bond-diradical form Crad -Nrad . The contributions of the forms (I) and (II) to the change of νCdbnd N can be quantified by using Hammett substituent constant (including substituent cross-interaction effects between X and Y groups), whereas the contribution of the form (III) can be quantified by employing the excited-state substituent constant. The most contribution of these three forms is the form (III), the next is the form (II), whose contribution difference was discussed with the viewpoint of energy requirements in vibration with the form (III) and form (II).
NASA Astrophysics Data System (ADS)
Gauger, Tina; Konhauser, Kurt; Kappler, Andreas
2016-04-01
Due to the lack of an ozone layer in the Archean, ultraviolet radiation (UVR) reached early Earth's surface almost unattenuated; as a consequence, a terrestrial biosphere in the form of biological soil crusts would have been highly susceptible to lethal doses of irradiation. However, a self-produced external screen in the form of nanoparticular Fe(III) minerals could have effectively protected those early microorganisms. In this study, we use viability studies by quantifying colony-forming units (CFUs), as well as Fe(II) oxidation and nitrate reduction rates, to show that encrustation in biogenic and abiogenic Fe(III) minerals can protect a common soil bacteria such as the nitrate-reducing Fe(II)-oxidizing microorganisms Acidovorax sp. strain BoFeN1 and strain 2AN from harmful UVC radiation. Analysis of DNA damage by quantifying cyclobutane pyrimidine dimers (CPD) confirmed the protecting effect by Fe(III) minerals. This study suggests that Fe(II)-oxidizing microorganisms, as would have grown in association with mafic and ultramafic soils/outcrops, would have been able to produce their own UV screen, enabling them to live in terrestrial habitats on early Earth.
Gauger, Tina; Konhauser, Kurt; Kappler, Andreas
2016-04-01
Due to the lack of an ozone layer in the Archean, ultraviolet radiation (UVR) reached early Earth's surface almost unattenuated; as a consequence, a terrestrial biosphere in the form of biological soil crusts would have been highly susceptible to lethal doses of irradiation. However, a self-produced external screen in the form of nanoparticular Fe(III) minerals could have effectively protected those early microorganisms. In this study, we use viability studies by quantifying colony-forming units (CFUs), as well as Fe(II) oxidation and nitrate reduction rates, to show that encrustation in biogenic and abiogenic Fe(III) minerals can protect a common soil bacteria such as the nitrate-reducing Fe(II)-oxidizing microorganisms Acidovorax sp. strain BoFeN1 and strain 2AN from harmful UVC radiation. Analysis of DNA damage by quantifying cyclobutane pyrimidine dimers (CPD) confirmed the protecting effect by Fe(III) minerals. This study suggests that Fe(II)-oxidizing microorganisms, as would have grown in association with mafic and ultramafic soils/outcrops, would have been able to produce their own UV screen, enabling them to live in terrestrial habitats on early Earth.
Remote sensing of species diversity using Landsat 8 spectral variables
NASA Astrophysics Data System (ADS)
Madonsela, Sabelo; Cho, Moses Azong; Ramoelo, Abel; Mutanga, Onisimo
2017-11-01
The application of remote sensing in biodiversity estimation has largely relied on the Normalized Difference Vegetation Index (NDVI). The NDVI exploits spectral information from red and near infrared bands of Landsat images and it does not consider canopy background conditions hence it is affected by soil brightness which lowers its sensitivity to vegetation. As such NDVI may be insufficient in explaining tree species diversity. Meanwhile, the Landsat program also collects essential spectral information in the shortwave infrared (SWIR) region which is related to plant properties. The study was intended to: (i) explore the utility of spectral information across Landsat-8 spectrum using the Principal Component Analysis (PCA) and estimate alpha diversity (α-diversity) in the savannah woodland in southern Africa, and (ii) define the species diversity index (Shannon (H‧), Simpson (D2) and species richness (S) - defined as number of species in a community) that best relates to spectral variability on the Landsat-8 Operational Land Imager dataset. We designed 90 m × 90 m field plots (n = 71) and identified all trees with a diameter at breast height (DbH) above 10 cm. H‧, D2 and S were used to quantify tree species diversity within each plot and the corresponding spectral information on all Landsat-8 bands were extracted from each field plot. A stepwise linear regression was applied to determine the relationship between species diversity indices (H‧, D2 and S) and Principal Components (PCs), vegetation indices and Gray Level Co-occurrence Matrix (GLCM) texture layers with calibration (n = 46) and test (n = 23) datasets. The results of regression analysis showed that the Simple Ratio Index derivative had a higher relationship with H‧, D2 and S (r2= 0.36; r2= 0.41; r2= 0.24 respectively) compared to NDVI, EVI, SAVI or their derivatives. Moreover the Landsat-8 derived PCs also had a higher relationship with H‧ and D2 (r2 of 0.36 and 0.35 respectively) than the frequently used NDVI, and this was attributed to the utilization of the entire spectral content of Landsat-8 data. Our results indicate that: (i) the measurement scales of vegetation indices impact their sensitivity to vegetation characteristics and their ability to explain tree species diversity; (ii) principal components enhance the utility of Landsat-8 spectral data for estimating tree species diversity and (iii) species diversity indices that consider both species richness and abundance (H‧ and D2) relates better with Landsat-8 spectral variables.
Promoter classifier: software package for promoter database analysis.
Gershenzon, Naum I; Ioshikhes, Ilya P
2005-01-01
Promoter Classifier is a package of seven stand-alone Windows-based C++ programs allowing the following basic manipulations with a set of promoter sequences: (i) calculation of positional distributions of nucleotides averaged over all promoters of the dataset; (ii) calculation of the averaged occurrence frequencies of the transcription factor binding sites and their combinations; (iii) division of the dataset into subsets of sequences containing or lacking certain promoter elements or combinations; (iv) extraction of the promoter subsets containing or lacking CpG islands around the transcription start site; and (v) calculation of spatial distributions of the promoter DNA stacking energy and bending stiffness. All programs have a user-friendly interface and provide the results in a convenient graphical form. The Promoter Classifier package is an effective tool for various basic manipulations with eukaryotic promoter sequences that usually are necessary for analysis of large promoter datasets. The program Promoter Divider is described in more detail as a representative component of the package.
The metagenomic data life-cycle: standards and best practices
ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo; Corre, Erwan; Meyer, Folker; Mitchell, Alex; Pelletier, Eric; Pesole, Graziano; Santamaria, Monica; Willassen, Nils Peder
2017-01-01
Abstract Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community, but greater awareness and adoption is still needed. We emphasize the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing. PMID:28637310
A dataset mapping the potential biophysical effects of vegetation cover change
NASA Astrophysics Data System (ADS)
Duveiller, Gregory; Hooker, Josh; Cescatti, Alessandro
2018-02-01
Changing the vegetation cover of the Earth has impacts on the biophysical properties of the surface and ultimately on the local climate. Depending on the specific type of vegetation change and on the background climate, the resulting competing biophysical processes can have a net warming or cooling effect, which can further vary both spatially and seasonally. Due to uncertain climate impacts and the lack of robust observations, biophysical effects are not yet considered in land-based climate policies. Here we present a dataset based on satellite remote sensing observations that provides the potential changes i) of the full surface energy balance, ii) at global scale, and iii) for multiple vegetation transitions, as would now be required for the comprehensive evaluation of land based mitigation plans. We anticipate that this dataset will provide valuable information to benchmark Earth system models, to assess future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes.
A dataset mapping the potential biophysical effects of vegetation cover change
Duveiller, Gregory; Hooker, Josh; Cescatti, Alessandro
2018-01-01
Changing the vegetation cover of the Earth has impacts on the biophysical properties of the surface and ultimately on the local climate. Depending on the specific type of vegetation change and on the background climate, the resulting competing biophysical processes can have a net warming or cooling effect, which can further vary both spatially and seasonally. Due to uncertain climate impacts and the lack of robust observations, biophysical effects are not yet considered in land-based climate policies. Here we present a dataset based on satellite remote sensing observations that provides the potential changes i) of the full surface energy balance, ii) at global scale, and iii) for multiple vegetation transitions, as would now be required for the comprehensive evaluation of land based mitigation plans. We anticipate that this dataset will provide valuable information to benchmark Earth system models, to assess future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes. PMID:29461538
ERIC Educational Resources Information Center
Draper, John
2016-01-01
This article contextualises and presents to the academic community the full dataset of the Isan Culture Maintenance and Revitalisation Programme's (ICMRP) multilingual signage survey. The ICMRP is a four-year European Union co-sponsored project in Northeast Thailand. This article focuses on one aspect of the project, four surveys each of 1,500…
NASA Astrophysics Data System (ADS)
Vionnet, Vincent; Six, Delphine; Auger, Ludovic; Lafaysse, Matthieu; Quéno, Louis; Réveillet, Marion; Dombrowski-Etchevers, Ingrid; Thibert, Emmanuel; Dumont, Marie
2017-04-01
Capturing spatial and temporal variabilities of meteorological conditions at fine scale is necessary for modelling snowpack and glacier winter mass balance in alpine terrain. In particular, precipitation amount and phase are strongly influenced by the complex topography. In this study, we assess the impact of three sub-kilometer precipitation datasets (rainfall and snowfall) on distributed simulations of snowpack and glacier winter mass balance with the detailed snowpack model Crocus for winter 2011-2012. The different precipitation datasets at 500-m grid spacing over part of the French Alps (200*200 km2 area) are coming either from (i) the SAFRAN precipitation analysis specially developed for alpine terrain, or from (ii) operational outputs of the atmospheric model AROME at 2.5-km grid spacing downscaled to 500 m with fixed lapse rate or from (iii) a version of the atmospheric model AROME at 500-m grid spacing. Others atmospherics forcings (air temperature and humidity, incoming longwave and shortwave radiation, wind speed) are taken from the AROME simulations at 500-m grid spacing. These atmospheric forcings are firstly compared against a network of automatic weather stations. Results are analysed with respect to station location (valley, mid- and high-altitude). The spatial pattern of seasonal snowfall and its dependency with elevation is then analysed for the different precipitation datasets. Large differences between SAFRAN and the two versions of AROME are found at high-altitude. Finally, results of Crocus snowpack simulations are evaluated against (i) punctual in-situ measurements of snow depth and snow water equivalent, and (ii) maps of snow covered areas retrieved from optical satellite data (MODIS). Measurements of winter accumulation of six glaciers of the French Alps are also used and provide very valuable information on precipitation at high-altitude where the conventional observation network is scarce. This study illustrates the potential and limitations of high-resolution atmospheric models to drive simulations of snowpack and glacier winter mass balance in alpine terrain.
Kinkar, Liina; Laurimäe, Teivi; Acosta-Jamett, Gerardo; Andresiuk, Vanessa; Balkaya, Ibrahim; Casulli, Adriano; Gasser, Robin B; van der Giessen, Joke; González, Luis Miguel; Haag, Karen L; Zait, Houria; Irshadullah, Malik; Jabbar, Abdul; Jenkins, David J; Kia, Eshrat Beigom; Manfredi, Maria Teresa; Mirhendi, Hossein; M'rad, Selim; Rostami-Nejad, Mohammad; Oudni-M'rad, Myriam; Pierangeli, Nora Beatriz; Ponce-Gordo, Francisco; Rehbein, Steffen; Sharbatkhori, Mitra; Simsek, Sami; Soriano, Silvia Viviana; Sprong, Hein; Šnábel, Viliam; Umhang, Gérald; Varcasia, Antonio; Saarma, Urmas
2018-05-19
Echinococcus granulosus sensu stricto (s.s.) is the major cause of human cystic echinococcosis worldwide and is listed among the most severe parasitic diseases of humans. To date, numerous studies have investigated the genetic diversity and population structure of E. granulosus s.s. in various geographic regions. However, there has been no global study. Recently, using mitochondrial DNA, it was shown that E. granulosus s.s. G1 and G3 are distinct genotypes, but a larger dataset is required to confirm the distinction of these genotypes. The objectives of this study were to: (i) investigate the distinction of genotypes G1 and G3 using a large global dataset; and (ii) analyse the genetic diversity and phylogeography of genotype G1 on a global scale using near-complete mitogenome sequences. For this study, 222 globally distributed E. granulosus s.s. samples were used, of which 212 belonged to genotype G1 and 10 to G3. Using a total sequence length of 11,682 bp, we inferred phylogenetic networks for three datasets: E. granulosus s.s. (n = 222), G1 (n = 212) and human G1 samples (n = 41). In addition, the Bayesian phylogenetic and phylogeographic analyses were performed. The latter yielded several strongly supported diffusion routes of genotype G1 originating from Turkey, Tunisia and Argentina. We conclude that: (i) using a considerably larger dataset than employed previously, E. granulosus s.s. G1 and G3 are indeed distinct mitochondrial genotypes; (ii) the genetic diversity of E. granulosus s.s. G1 is high globally, with lower values in South America; and (iii) the complex phylogeographic patterns emerging from the phylogenetic and geographic analyses suggest that the current distribution of genotype G1 has been shaped by intensive animal trade. Copyright © 2018 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.
Effective evaluation of privacy protection techniques in visible and thermal imagery
NASA Astrophysics Data System (ADS)
Nawaz, Tahir; Berg, Amanda; Ferryman, James; Ahlberg, Jörgen; Felsberg, Michael
2017-09-01
Privacy protection may be defined as replacing the original content in an image region with a (less intrusive) content having modified target appearance information to make it less recognizable by applying a privacy protection technique. Indeed, the development of privacy protection techniques also needs to be complemented with an established objective evaluation method to facilitate their assessment and comparison. Generally, existing evaluation methods rely on the use of subjective judgments or assume a specific target type in image data and use target detection and recognition accuracies to assess privacy protection. An annotation-free evaluation method that is neither subjective nor assumes a specific target type is proposed. It assesses two key aspects of privacy protection: "protection" and "utility." Protection is quantified as an appearance similarity, and utility is measured as a structural similarity between original and privacy-protected image regions. We performed an extensive experimentation using six challenging datasets (having 12 video sequences), including a new dataset (having six sequences) that contains visible and thermal imagery. The new dataset is made available online for the community. We demonstrate effectiveness of the proposed method by evaluating six image-based privacy protection techniques and also show comparisons of the proposed method over existing methods.
The Network Structure Underlying the Earth Observation Assessment
NASA Astrophysics Data System (ADS)
Vitkin, S.; Doane, W. E. J.; Mary, J. C.
2017-12-01
The Earth Observations Assessment (EOA 2016) is a multiyear project designed to assess the effectiveness of civil earth observation data sources (instruments, sensors, models, etc.) on societal benefit areas (SBAs) for the United States. Subject matter experts (SMEs) provided input and scored how data sources inform products, product groups, key objectives, SBA sub-areas, and SBAs in an attempt to quantify the relationships between data sources and SBAs. The resulting data were processed by Integrated Applications Incorporated (IAI) using MITRE's PALMA software to create normalized relative impact scores for each of these relationships. However, PALMA processing obscures the natural network representation of the data. Any network analysis that might identify patterns of interaction among data sources, products, and SBAs is therefore impossible. Collaborating with IAI, we cleaned and recreated a network from the original dataset. Using R and Python we explore the underlying structure of the network and apply frequent itemset mining algorithms to identify groups of data sources and products that interact. We reveal interesting patterns and relationships in the EOA dataset that were not immediately observable from the EOA 2016 report and provide a basis for further exploration of the EOA network dataset.
Hockenberry, Adam J; Pah, Adam R; Jewett, Michael C; Amaral, Luís A N
2017-01-01
Studies dating back to the 1970s established that sequence complementarity between the anti-Shine-Dalgarno (aSD) sequence on prokaryotic ribosomes and the 5' untranslated region of mRNAs helps to facilitate translation initiation. The optimal location of aSD sequence binding relative to the start codon, the full extents of the aSD sequence and the functional form of the relationship between aSD sequence complementarity and translation efficiency have not been fully resolved. Here, we investigate these relationships by leveraging the sequence diversity of endogenous genes and recently available genome-wide estimates of translation efficiency. We show that-after accounting for predicted mRNA structure-aSD sequence complementarity increases the translation of endogenous mRNAs by roughly 50%. Further, we observe that this relationship is nonlinear, with translation efficiency maximized for mRNAs with intermediate levels of aSD sequence complementarity. The mechanistic insights that we observe are highly robust: we find nearly identical results in multiple datasets spanning three distantly related bacteria. Further, we verify our main conclusions by re-analysing a controlled experimental dataset. © 2017 The Authors.
Empirical Studies on the Network of Social Groups: The Case of Tencent QQ
You, Zhi-Qiang; Han, Xiao-Pu; Lü, Linyuan; Yeung, Chi Ho
2015-01-01
Background Participation in social groups are important but the collective behaviors of human as a group are difficult to analyze due to the difficulties to quantify ordinary social relation, group membership, and to collect a comprehensive dataset. Such difficulties can be circumvented by analyzing online social networks. Methodology/Principal Findings In this paper, we analyze a comprehensive dataset released from Tencent QQ, an instant messenger with the highest market share in China. Specifically, we analyze three derivative networks involving groups and their members—the hypergraph of groups, the network of groups and the user network—to reveal social interactions at microscopic and mesoscopic level. Conclusions/Significance Our results uncover interesting behaviors on the growth of user groups, the interactions between groups, and their relationship with member age and gender. These findings lead to insights which are difficult to obtain in social networks based on personal contacts. PMID:26176850
A family of interaction-adjusted indices of community similarity.
Schmidt, Thomas Sebastian Benedikt; Matias Rodrigues, João Frederico; von Mering, Christian
2017-03-01
Interactions between taxa are essential drivers of ecological community structure and dynamics, but they are not taken into account by traditional indices of β diversity. In this study, we propose a novel family of indices that quantify community similarity in the context of taxa interaction networks. Using publicly available datasets, we assessed the performance of two specific indices that are Taxa INteraction-Adjusted (TINA, based on taxa co-occurrence networks), and Phylogenetic INteraction-Adjusted (PINA, based on phylogenetic similarities). TINA and PINA outperformed traditional indices when partitioning human-associated microbial communities according to habitat, even for extremely downsampled datasets, and when organising ocean micro-eukaryotic plankton diversity according to geographical and physicochemical gradients. We argue that interaction-adjusted indices capture novel aspects of diversity outside the scope of traditional approaches, highlighting the biological significance of ecological association networks in the interpretation of community similarity.
A family of interaction-adjusted indices of community similarity
Schmidt, Thomas Sebastian Benedikt; Matias Rodrigues, João Frederico; von Mering, Christian
2017-01-01
Interactions between taxa are essential drivers of ecological community structure and dynamics, but they are not taken into account by traditional indices of β diversity. In this study, we propose a novel family of indices that quantify community similarity in the context of taxa interaction networks. Using publicly available datasets, we assessed the performance of two specific indices that are Taxa INteraction-Adjusted (TINA, based on taxa co-occurrence networks), and Phylogenetic INteraction-Adjusted (PINA, based on phylogenetic similarities). TINA and PINA outperformed traditional indices when partitioning human-associated microbial communities according to habitat, even for extremely downsampled datasets, and when organising ocean micro-eukaryotic plankton diversity according to geographical and physicochemical gradients. We argue that interaction-adjusted indices capture novel aspects of diversity outside the scope of traditional approaches, highlighting the biological significance of ecological association networks in the interpretation of community similarity. PMID:27935587
A large dataset of protein dynamics in the mammalian heart proteome
Lau, Edward; Cao, Quan; Ng, Dominic C.M.; Bleakley, Brian J.; Dincer, T. Umut; Bot, Brian M.; Wang, Ding; Liem, David A.; Lam, Maggie P.Y.; Ge, Junbo; Ping, Peipei
2016-01-01
Protein stability is a major regulatory principle of protein function and cellular homeostasis. Despite limited understanding on mechanisms, disruption of protein turnover is widely implicated in diverse pathologies from heart failure to neurodegenerations. Information on global protein dynamics therefore has the potential to expand the depth and scope of disease phenotyping and therapeutic strategies. Using an integrated platform of metabolic labeling, high-resolution mass spectrometry and computational analysis, we report here a comprehensive dataset of the in vivo half-life of 3,228 and the expression of 8,064 cardiac proteins, quantified under healthy and hypertrophic conditions across six mouse genetic strains commonly employed in biomedical research. We anticipate these data will aid in understanding key mitochondrial and metabolic pathways in heart diseases, and further serve as a reference for methodology development in dynamics studies in multiple organ systems. PMID:26977904
Spatiotemporal Permutation Entropy as a Measure for Complexity of Cardiac Arrhythmia
NASA Astrophysics Data System (ADS)
Schlemmer, Alexander; Berg, Sebastian; Lilienkamp, Thomas; Luther, Stefan; Parlitz, Ulrich
2018-05-01
Permutation entropy (PE) is a robust quantity for measuring the complexity of time series. In the cardiac community it is predominantly used in the context of electrocardiogram (ECG) signal analysis for diagnoses and predictions with a major application found in heart rate variability parameters. In this article we are combining spatial and temporal PE to form a spatiotemporal PE that captures both, complexity of spatial structures and temporal complexity at the same time. We demonstrate that the spatiotemporal PE (STPE) quantifies complexity using two datasets from simulated cardiac arrhythmia and compare it to phase singularity analysis and spatial PE (SPE). These datasets simulate ventricular fibrillation (VF) on a two-dimensional and a three-dimensional medium using the Fenton-Karma model. We show that SPE and STPE are robust against noise and demonstrate its usefulness for extracting complexity features at different spatial scales.
Empirical Studies on the Network of Social Groups: The Case of Tencent QQ.
You, Zhi-Qiang; Han, Xiao-Pu; Lü, Linyuan; Yeung, Chi Ho
2015-01-01
Participation in social groups are important but the collective behaviors of human as a group are difficult to analyze due to the difficulties to quantify ordinary social relation, group membership, and to collect a comprehensive dataset. Such difficulties can be circumvented by analyzing online social networks. In this paper, we analyze a comprehensive dataset released from Tencent QQ, an instant messenger with the highest market share in China. Specifically, we analyze three derivative networks involving groups and their members-the hypergraph of groups, the network of groups and the user network-to reveal social interactions at microscopic and mesoscopic level. Our results uncover interesting behaviors on the growth of user groups, the interactions between groups, and their relationship with member age and gender. These findings lead to insights which are difficult to obtain in social networks based on personal contacts.
Feedback control in deep drawing based on experimental datasets
NASA Astrophysics Data System (ADS)
Fischer, P.; Heingärtner, J.; Aichholzer, W.; Hortig, D.; Hora, P.
2017-09-01
In large-scale production of deep drawing parts, like in automotive industry, the effects of scattering material properties as well as warming of the tools have a significant impact on the drawing result. In the scope of the work, an approach is presented to minimize the influence of these effects on part quality by optically measuring the draw-in of each part and adjusting the settings of the press to keep the strain distribution, which is represented by the draw-in, inside a certain limit. For the design of the control algorithm, a design of experiments for in-line tests is used to quantify the influence of the blank holder force as well as the force distribution on the draw-in. The results of this experimental dataset are used to model the process behavior. Based on this model, a feedback control loop is designed. Finally, the performance of the control algorithm is validated in the production line.
Lefering, Rolf; Huber-Wagner, Stefan; Nienaber, Ulrike; Maegele, Marc; Bouillon, Bertil
2014-09-05
The TraumaRegister DGU™ (TR-DGU) has used the Revised Injury Severity Classification (RISC) score for outcome adjustment since 2003. In recent years, however, the observed mortality rate has fallen to about 2% below the prognosis, and it was felt that further prognostic factors, like pupil size and reaction, should be included as well. Finally, an increasing number of cases did not receive a RISC prognosis due to the missing values. Therefore, there was a need for an updated model for risk of death prediction in severely injured patients to be developed and validated using the most recent data. The TR-DGU has been collecting data from severely injured patients since 1993. All injuries are coded according to the Abbreviated Injury Scale (AIS, version 2008). Severely injured patients from Europe (ISS ≥ 4) documented between 2010 and 2011 were selected for developing the new score (n = 30,866), and 21,918 patients from 2012 were used for validation. Age and injury codes were required, and transferred patients were excluded. Logistic regression analysis was applied with hospital mortality as the dependent variable. Results were evaluated in terms of discrimination (area under the receiver operating characteristic curve, AUC), precision (observed versus predicted mortality), and calibration (Hosmer-Lemeshow goodness-of-fit statistic). The mean age of the development population was 47.3 years; 71.6% were males, and the average ISS was 19.3 points. Hospital mortality rate was 11.5% in this group. The new RISC II model consists of the following predictors: worst and second-worst injury (AIS severity level), head injury, age, sex, pupil reactivity and size, pre-injury health status, blood pressure, acidosis (base deficit), coagulation, haemoglobin, and cardiopulmonary resuscitation. Missing values are included as a separate category for every variable. In the development and the validation dataset, the new RISC II outperformed the original RISC score, for example AUC in the development dataset 0.953 versus 0.939. The updated RISC II prognostic score has several advantages over the previous RISC model. Discrimination, precision and calibration are improved, and patients with partial missing values could now be included. Results were confirmed in a validation dataset.
NASA Astrophysics Data System (ADS)
Qiu, T.; Song, C.
2017-12-01
Many studies have examined the urbanization-induced vegetation phenology changes in urban environments at regional scales. However, relatively few studies have investigated the effects of urban expansion on vegetation phenology at global scale. In this study, we used times series of NASA Vegetation Index and Phenology (VIP) and ESA Climate Change Initiative Land Cover datasets to quantify how urban expansion affects growing seasons of vegetation in 14 different biomes along both latitude and urbanization gradients from 1993 to 2014. First, we calculated the percentages of impervious surface area (ISA) at 0.05˚ grid to match the spatial resolution of VIP dataset. We then applied logistic models to the ISA series to characterize the time periods of stable ISA, pre-urbanization and post-urbanization for each grid. The amplitudes of urbanization were also derived from the fitted ISA series. We then calculated the mean values of the Start of Season (SOS), End of Season (EOS) and Length of Season (LOS) from VIP datasets within each period. Linear regressions were used to quantify the correlations between ISA and SOS/EOS/LOS in 14 biomes along the latitude gradient for each period. We also calculated the differences of SOS/EOS/LOS between pre-urbanization and post-urbanization periods and applied quantile regressions to characterize the relationships between amplitudes of urbanization and those differences. We found significant correlations (p-value < 0.05) between ISA and the growing seasons of a) boreal forests at 55-60 ˚N; b) temperate broadleaf and mixed forests at 30-55 ˚N; c) temperate coniferous forests at 30-45 ˚N; d) temperate grasslands, savannas, and shrublands at 35-60 ˚N and 30-35 ˚S. We also found a significant positive correlation (p-value <0.05) between amplitudes of urbanization and LOS as well as a significant negative correlation (p-value<0.05) between amplitudes of urbanization and SOS in temperate broadleaf and mixed forest.
Assembling a protein-protein interaction map of the SSU processome from existing datasets.
Lim, Young H; Charette, J Michael; Baserga, Susan J
2011-03-10
The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis.
Assembling a Protein-Protein Interaction Map of the SSU Processome from Existing Datasets
Baserga, Susan J.
2011-01-01
Background The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. Methodology We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. Conclusions We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis. PMID:21423703
Vanderhoof, Melanie; Distler, Hayley; Lang, Megan W.; Alexander, Laurie C.
2018-01-01
The dependence of downstream waters on upstream ecosystems necessitates an improved understanding of watershed-scale hydrological interactions including connections between wetlands and streams. An evaluation of such connections is challenging when, (1) accurate and complete datasets of wetland and stream locations are often not available and (2) natural variability in surface-water extent influences the frequency and duration of wetland/stream connectivity. The Upper Choptank River watershed on the Delmarva Peninsula in eastern Maryland and Delaware is dominated by a high density of small, forested wetlands. In this analysis, wetland/stream surface water connections were quantified using multiple wetland and stream datasets, including headwater streams and depressions mapped from a lidar-derived digital elevation model. Surface-water extent was mapped across the watershed for spring 2015 using Landsat-8, Radarsat-2 and Worldview-3 imagery. The frequency of wetland/stream connections increased as a more complete and accurate stream dataset was used and surface-water extent was included, in particular when the spatial resolution of the imagery was finer (i.e., <10 m). Depending on the datasets used, 12–60% of wetlands by count (21–93% of wetlands by area) experienced surface-water interactions with streams during spring 2015. This translated into a range of 50–94% of the watershed contributing direct surface water runoff to streamflow. This finding suggests that our interpretation of the frequency and duration of wetland/stream connections will be influenced not only by the spatial and temporal characteristics of wetlands, streams and potential flowpaths, but also by the completeness, accuracy and resolution of input datasets.
Heino, Jani; Melo, Adriano S; Bini, Luis Mauricio; Altermatt, Florian; Al-Shami, Salman A; Angeler, David G; Bonada, Núria; Brand, Cecilia; Callisto, Marcos; Cottenie, Karl; Dangles, Olivier; Dudgeon, David; Encalada, Andrea; Göthe, Emma; Grönroos, Mira; Hamada, Neusa; Jacobsen, Dean; Landeiro, Victor L; Ligeiro, Raphael; Martins, Renato T; Miserendino, María Laura; Md Rawi, Che Salmah; Rodrigues, Marciel E; Roque, Fabio de Oliveira; Sandin, Leonard; Schmera, Denes; Sgarbi, Luciano F; Simaika, John P; Siqueira, Tadeu; Thompson, Ross M; Townsend, Colin R
2015-03-01
The hypotheses that beta diversity should increase with decreasing latitude and increase with spatial extent of a region have rarely been tested based on a comparative analysis of multiple datasets, and no such study has focused on stream insects. We first assessed how well variability in beta diversity of stream insect metacommunities is predicted by insect group, latitude, spatial extent, altitudinal range, and dataset properties across multiple drainage basins throughout the world. Second, we assessed the relative roles of environmental and spatial factors in driving variation in assemblage composition within each drainage basin. Our analyses were based on a dataset of 95 stream insect metacommunities from 31 drainage basins distributed around the world. We used dissimilarity-based indices to quantify beta diversity for each metacommunity and, subsequently, regressed beta diversity on insect group, latitude, spatial extent, altitudinal range, and dataset properties (e.g., number of sites and percentage of presences). Within each metacommunity, we used a combination of spatial eigenfunction analyses and partial redundancy analysis to partition variation in assemblage structure into environmental, shared, spatial, and unexplained fractions. We found that dataset properties were more important predictors of beta diversity than ecological and geographical factors across multiple drainage basins. In the within-basin analyses, environmental and spatial variables were generally poor predictors of variation in assemblage composition. Our results revealed deviation from general biodiversity patterns because beta diversity did not show the expected decreasing trend with latitude. Our results also call for reconsideration of just how predictable stream assemblages are along ecological gradients, with implications for environmental assessment and conservation decisions. Our findings may also be applicable to other dynamic systems where predictability is low.
NASA Astrophysics Data System (ADS)
Sidibe, Moussa; Dieppois, Bastien; Mahé, Gil; Paturel, Jean-Emmanuel; Amoussou, Ernest; Anifowose, Babatunde; Lawler, Damian
2018-06-01
Over recent decades, regions of West and Central Africa have experienced different and significant changes in climatic patterns, which have significantly impacted hydrological regimes. Such impacts, however, are not fully understood at the regional scale, largely because of scarce hydroclimatic data. Therefore, the aim of this study is to (a) assemble a new, robust, reconstructed streamflow dataset of 152 gauging stations; (b) quantify changes in streamflow over 1950-2005 period, using these newly reconstructed datasets; (c) significantly reveal trends and variability in streamflow over West and Central Africa based on new reconstructions; and (d) assess the robustness of this dataset by comparing the results with those identified in key climatic drivers (e.g. precipitation and temperature) over the region. Gap filling methods applied to monthly time series (1950-2005) yielded robust results (median Kling-Gupta Efficiency >0.75). The study underlines a good agreement between precipitation and streamflow trends and reveals contrasts between western Africa (negative trends) and Central Africa (positive trends) in the 1950s and 1960s. Homogenous dry conditions of the 1970s and 1980s, characterized by reduced significant negative trends resulting from quasi-decadal modulations of the trend, are replaced by wetter conditions in the recent period (1993-2005). The effect of this rainfall recovery (which extends to West and Central Africa) on increased river flows are further amplified by land use change in some Sahelian basins. This is partially offset, however, by higher potential evapotranspiration rates over parts of Niger and Nigeria. Crucially, the new reconstructed streamflow datasets presented here will be available for both the scientific community and water resource managers.
Status update: is smoke on your mind? Using social media to assess smoke exposure
NASA Astrophysics Data System (ADS)
Ford, Bonne; Burke, Moira; Lassman, William; Pfister, Gabriele; Pierce, Jeffrey R.
2017-06-01
Exposure to wildland fire smoke is associated with negative effects on human health. However, these effects are poorly quantified. Accurately attributing health endpoints to wildland fire smoke requires determining the locations, concentrations, and durations of smoke events. Most current methods for assessing these smoke events (ground-based measurements, satellite observations, and chemical transport modeling) are limited temporally, spatially, and/or by their level of accuracy. In this work, we explore using daily social media posts from Facebook regarding smoke, haze, and air quality to assess population-level exposure for the summer of 2015 in the western US. We compare this de-identified, aggregated Facebook dataset to several other datasets that are commonly used for estimating exposure, such as satellite observations (MODIS aerosol optical depth and Hazard Mapping System smoke plumes), daily (24 h) average surface particulate matter measurements, and model-simulated (WRF-Chem) surface concentrations. After adding population-weighted spatial smoothing to the Facebook data, this dataset is well correlated (R2 generally above 0.5) with the other methods in smoke-impacted regions. The Facebook dataset is better correlated with surface measurements of PM2. 5 at a majority of monitoring sites (163 of 293 sites) than the satellite observations and our model simulation. We also present an example case for Washington state in 2015, for which we combine this Facebook dataset with MODIS observations and WRF-Chem-simulated PM2. 5 in a regression model. We show that the addition of the Facebook data improves the regression model's ability to predict surface concentrations. This high correlation of the Facebook data with surface monitors and our Washington state example suggests that this social-media-based proxy can be used to estimate smoke exposure in locations without direct ground-based particulate matter measurements.
PROMETHEE II: A knowledge-driven method for copper exploration
NASA Astrophysics Data System (ADS)
Abedi, Maysam; Ali Torabi, S.; Norouzi, Gholam-Hossain; Hamzeh, Mohammad; Elyasi, Gholam-Reza
2012-09-01
This paper describes the application of a well-known Multi Criteria Decision Making (MCDM) technique called Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE II) to explore porphyry copper deposits. Various raster-based evidential layers involving geological, geophysical, and geochemical geo-datasets are integrated to prepare a mineral prospectivity mapping (MPM). In a case study, thirteen layers of the Now Chun copper deposit located in the Kerman province of Iran are used to explore the region of interest. The PROMETHEE II technique is applied to produce the desired MPM, and the outputs are validated using twenty-one boreholes that have been classified into five classes. This proposed method shows a high performance when providing the MPM while reducing the cost of exploratory drilling in the study area.
Quantification of arrestin-rhodopsin binding stoichiometry.
Lally, Ciara C M; Sommer, Martha E
2015-01-01
We have developed several methods to quantify arrestin-1 binding to rhodopsin in the native rod disk membrane. These methods can be applied to study arrestin interactions with all functional forms of rhodopsin, including dark-state rhodopsin, light-activated metarhodopsin II (Meta II), and the products of Meta II decay, opsin and all-trans-retinal. When used in parallel, these methods report both the actual amount of arrestin bound to the membrane surface and the functional aspects of arrestin binding, such as which arrestin loops are engaged and whether Meta II is stabilized. Most of these methods can also be applied to recombinant receptor reconstituted into liposomes, bicelles, and nanodisks.
Bryndová, Michala; Kasari, Liis; Norberg, Anna; Weiss, Matthias; Bishop, Tom R.; Luke, Sarah H.; Sam, Katerina; Le Bagousse-Pinguet, Yoann; Lepš, Jan; Götzenberger, Lars; de Bello, Francesco
2016-01-01
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data. PMID:26881747
A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets
NASA Astrophysics Data System (ADS)
JafarGandomi, Arash; Binley, Andrew
2013-09-01
We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is successful not only in enhancing the subsurface information but also as a survey design tool to identify the appropriate combination of the geophysical tools and show whether application of an individual method for further investigation of a specific site is beneficial.
On the uncertainties associated with using gridded rainfall data as a proxy for observed
NASA Astrophysics Data System (ADS)
Tozer, C. R.; Kiem, A. S.; Verdon-Kidd, D. C.
2011-09-01
Gridded rainfall datasets are used in many hydrological and climatological studies, in Australia and elsewhere, including for hydroclimatic forecasting, climate attribution studies and climate model performance assessments. The attraction of the spatial coverage provided by gridded data is clear, particularly in Australia where the spatial and temporal resolution of the rainfall gauge network is sparse. However, the question that must be asked is whether it is suitable to use gridded data as a proxy for observed point data, given that gridded data is inherently "smoothed" and may not necessarily capture the temporal and spatial variability of Australian rainfall which leads to hydroclimatic extremes (i.e. droughts, floods)? This study investigates this question through a statistical analysis of three monthly gridded Australian rainfall datasets - the Bureau of Meteorology (BOM) dataset, the Australian Water Availability Project (AWAP) and the SILO dataset. To demonstrate the hydrological implications of using gridded data as a proxy for gauged data, a rainfall-runoff model is applied to one catchment in South Australia (SA) initially using gridded data as the source of rainfall input and then gauged rainfall data. The results indicate a markedly different runoff response associated with each of the different sources of rainfall data. It should be noted that this study does not seek to identify which gridded dataset is the "best" for Australia, as each gridded data source has its pros and cons, as does gauged or point data. Rather the intention is to quantify differences between various gridded data sources and how they compare with gauged data so that these differences can be considered and accounted for in studies that utilise these gridded datasets. Ultimately, if key decisions are going to be based on the outputs of models that use gridded data, an estimate (or at least an understanding) of the uncertainties relating to the assumptions made in the development of gridded data and how that gridded data compares with reality should be made.
Májeková, Maria; Paal, Taavi; Plowman, Nichola S; Bryndová, Michala; Kasari, Liis; Norberg, Anna; Weiss, Matthias; Bishop, Tom R; Luke, Sarah H; Sam, Katerina; Le Bagousse-Pinguet, Yoann; Lepš, Jan; Götzenberger, Lars; de Bello, Francesco
2016-01-01
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package "traitor" to facilitate assessments of missing trait data.
Synergistic Instance-Level Subspace Alignment for Fine-Grained Sketch-Based Image Retrieval.
Li, Ke; Pang, Kaiyue; Song, Yi-Zhe; Hospedales, Timothy M; Xiang, Tao; Zhang, Honggang
2017-08-25
We study the problem of fine-grained sketch-based image retrieval. By performing instance-level (rather than category-level) retrieval, it embodies a timely and practical application, particularly with the ubiquitous availability of touchscreens. Three factors contribute to the challenging nature of the problem: (i) free-hand sketches are inherently abstract and iconic, making visual comparisons with photos difficult, (ii) sketches and photos are in two different visual domains, i.e. black and white lines vs. color pixels, and (iii) fine-grained distinctions are especially challenging when executed across domain and abstraction-level. To address these challenges, we propose to bridge the image-sketch gap both at the high-level via parts and attributes, as well as at the low-level, via introducing a new domain alignment method. More specifically, (i) we contribute a dataset with 304 photos and 912 sketches, where each sketch and image is annotated with its semantic parts and associated part-level attributes. With the help of this dataset, we investigate (ii) how strongly-supervised deformable part-based models can be learned that subsequently enable automatic detection of part-level attributes, and provide pose-aligned sketch-image comparisons. To reduce the sketch-image gap when comparing low-level features, we also (iii) propose a novel method for instance-level domain-alignment, that exploits both subspace and instance-level cues to better align the domains. Finally (iv) these are combined in a matching framework integrating aligned low-level features, mid-level geometric structure and high-level semantic attributes. Extensive experiments conducted on our new dataset demonstrate effectiveness of the proposed method.
Wu, Zhaohua; Feng, Jiaxin; Qiao, Fangli; Tan, Zhe-Min
2016-04-13
In this big data era, it is more urgent than ever to solve two major issues: (i) fast data transmission methods that can facilitate access to data from non-local sources and (ii) fast and efficient data analysis methods that can reveal the key information from the available data for particular purposes. Although approaches in different fields to address these two questions may differ significantly, the common part must involve data compression techniques and a fast algorithm. This paper introduces the recently developed adaptive and spatio-temporally local analysis method, namely the fast multidimensional ensemble empirical mode decomposition (MEEMD), for the analysis of a large spatio-temporal dataset. The original MEEMD uses ensemble empirical mode decomposition to decompose time series at each spatial grid and then pieces together the temporal-spatial evolution of climate variability and change on naturally separated timescales, which is computationally expensive. By taking advantage of the high efficiency of the expression using principal component analysis/empirical orthogonal function analysis for spatio-temporally coherent data, we design a lossy compression method for climate data to facilitate its non-local transmission. We also explain the basic principles behind the fast MEEMD through decomposing principal components instead of original grid-wise time series to speed up computation of MEEMD. Using a typical climate dataset as an example, we demonstrate that our newly designed methods can (i) compress data with a compression rate of one to two orders; and (ii) speed-up the MEEMD algorithm by one to two orders. © 2016 The Authors.
2015-09-30
playbacks Killer whale (O. orca) 10 4 8 1 2 LF pilot whale (G. melas ) 30 8 14 4 8 Sperm whale (P. Macrocephalus) 10 4 10 2 5 Humpback whale (M...exposure dataset of the long-finned pilot whale (Globicephala melas ). A hidden Markov model (HMM) approach was developed to quantify behavioral states...Experimental Exposures of Killer (Orcinus orca), Long-Finned Pilot (Globicephala melas ), and Sperm (Physeter macrocephalus) Whales to Naval Sonar. Aquat
Blind Pose Prediction, Scoring, and Affinity Ranking of the CSAR 2014 Dataset.
Martiny, Virginie Y; Martz, François; Selwa, Edithe; Iorga, Bogdan I
2016-06-27
The 2014 CSAR Benchmark Exercise was focused on three protein targets: coagulation factor Xa, spleen tyrosine kinase, and bacterial tRNA methyltransferase. Our protocol involved a preliminary analysis of the structural information available in the Protein Data Bank for the protein targets, which allowed the identification of the most appropriate docking software and scoring functions to be used for the rescoring of several docking conformations datasets, as well as for pose prediction and affinity ranking. The two key points of this study were (i) the prior evaluation of molecular modeling tools that are most adapted for each target and (ii) the increased search efficiency during the docking process to better explore the conformational space of big and flexible ligands.
Sornborger, Andrew; Broder, Josef; Majumder, Anirban; Srinivasamoorthy, Ganesh; Porter, Erika; Reagin, Sean S; Keith, Charles; Lauderdale, James D
2008-09-01
Ratiometric fluorescent indicators are used for making quantitative measurements of a variety of physiological variables. Their utility is often limited by noise. This is the second in a series of papers describing statistical methods for denoising ratiometric data with the aim of obtaining improved quantitative estimates of variables of interest. Here, we outline a statistical optimization method that is designed for the analysis of ratiometric imaging data in which multiple measurements have been taken of systems responding to the same stimulation protocol. This method takes advantage of correlated information across multiple datasets for objectively detecting and estimating ratiometric signals. We demonstrate our method by showing results of its application on multiple, ratiometric calcium imaging experiments.
Development of vulnerability curves to typhoon hazards based on insurance policy and claim dataset
NASA Astrophysics Data System (ADS)
Mo, Wanmei; Fang, Weihua; li, Xinze; Wu, Peng; Tong, Xingwei
2016-04-01
Vulnerability refers to the characteristics and circumstances of an exposure that make it vulnerable to the effects of some certain hazards. It can be divided into physical vulnerability, social vulnerability, economic vulnerabilities and environmental vulnerability. Physical vulnerability indicates the potential physical damage of exposure caused by natural hazards. Vulnerability curves, quantifying the loss ratio against hazard intensity with a horizontal axis for the intensity and a vertical axis for the Mean Damage Ratio (MDR), is essential to the vulnerability assessment and quantitative evaluation of disasters. Fragility refers to the probability of diverse damage states under different hazard intensity, revealing a kind of characteristic of the exposure. Fragility curves are often used to quantify the probability of a given set of exposure at or exceeding a certain damage state. The development of quantitative fragility and vulnerability curves is the basis of catastrophe modeling. Generally, methods for quantitative fragility and vulnerability assessment can be categorized into empirical, analytical and expert opinion or judgment-based ones. Empirical method is one of the most popular methods and it relies heavily on the availability and quality of historical hazard and loss dataset, which has always been a great challenge. Analytical method is usually based on the engineering experiments and it is time-consuming and lacks built-in validation, so its credibility is also sometimes criticized widely. Expert opinion or judgment-based method is quite effective in the absence of data but the results could be too subjective so that the uncertainty is likely to be underestimated. In this study, we will present the fragility and vulnerability curves developed with empirical method based on simulated historical typhoon wind, rainfall and induced flood, and insurance policy and claim datasets of more than 100 historical typhoon events. Firstly, an insurance exposure classification system is built according to structure type, occupation type and insurance coverage. Then MDR estimation method based on considering insurance policy structure and claim information is proposed and validated. Following that, fragility and vulnerability curves of the major exposure types for construction, homeowner insurance and enterprise property insurance are fitted with empirical function based on the historical dataset. The results of this study can not only help understand catastrophe risk and mange insured disaster risks, but can also be applied in other disaster risk reduction efforts.
Measuring resilience to financial instability: A new dataset.
Lombardi, Domenico; Siklos, Pierre
2016-12-01
In recognition of the severe consequences of the recent international financial crisis, the topic of macroprudential policy has elicited considerable research effort. The data set reports, for 46 economies around the globe, an index of the capacity to deploy macroprudential policies. The index aims to represent the essence of what constitutes a macroprudential regime is developed and used in http://dx.doi.org/10.1016/j.jfs.2016.08.007 (D. Lombardi, P.L. Siklos, 2016) [1]. Specifically, the index quantifies: (1) how existing macroprudential frameworks are organized; and (2) how far a particular jurisdiction is from reaching the goals established by the Group of Twenty (G20) and the Financial Stability Board (FSB). The latter is a benchmark that has not been considered in the burgeoning literature that seeks to quantify the role of macroprudential policies.
Panayi, Efstathios; Peters, Gareth W; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields.
Defect Detection and Segmentation Framework for Remote Field Eddy Current Sensor Data
2017-01-01
Remote-Field Eddy-Current (RFEC) technology is often used as a Non-Destructive Evaluation (NDE) method to prevent water pipe failures. By analyzing the RFEC data, it is possible to quantify the corrosion present in pipes. Quantifying the corrosion involves detecting defects and extracting their depth and shape. For large sections of pipelines, this can be extremely time-consuming if performed manually. Automated approaches are therefore well motivated. In this article, we propose an automated framework to locate and segment defects in individual pipe segments, starting from raw RFEC measurements taken over large pipelines. The framework relies on a novel feature to robustly detect these defects and a segmentation algorithm applied to the deconvolved RFEC signal. The framework is evaluated using both simulated and real datasets, demonstrating its ability to efficiently segment the shape of corrosion defects. PMID:28984823
Panayi, Efstathios; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields. PMID:28961254
Defect Detection and Segmentation Framework for Remote Field Eddy Current Sensor Data.
Falque, Raphael; Vidal-Calleja, Teresa; Miro, Jaime Valls
2017-10-06
Remote-Field Eddy-Current (RFEC) technology is often used as a Non-Destructive Evaluation (NDE) method to prevent water pipe failures. By analyzing the RFEC data, it is possible to quantify the corrosion present in pipes. Quantifying the corrosion involves detecting defects and extracting their depth and shape. For large sections of pipelines, this can be extremely time-consuming if performed manually. Automated approaches are therefore well motivated. In this article, we propose an automated framework to locate and segment defects in individual pipe segments, starting from raw RFEC measurements taken over large pipelines. The framework relies on a novel feature to robustly detect these defects and a segmentation algorithm applied to the deconvolved RFEC signal. The framework is evaluated using both simulated and real datasets, demonstrating its ability to efficiently segment the shape of corrosion defects.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
NASA Astrophysics Data System (ADS)
Koch, J.; Jensen, K. H.; Stisen, S.
2017-12-01
Hydrological models that integrate numerical process descriptions across compartments of the water cycle are typically required to undergo thorough model calibration in order to estimate suitable effective model parameters. In this study, we apply a spatially distributed hydrological model code which couples the saturated zone with the unsaturated zone and the energy portioning at the land surface. We conduct a comprehensive multi-constraint model calibration against nine independent observational datasets which reflect both the temporal and the spatial behavior of hydrological response of a 1000km2 large catchment in Denmark. The datasets are obtained from satellite remote sensing and in-situ measurements and cover five keystone hydrological variables: discharge, evapotranspiration, groundwater head, soil moisture and land surface temperature. Results indicate that a balanced optimization can be achieved where errors on objective functions for all nine observational datasets can be reduced simultaneously. The applied calibration framework was tailored with focus on improving the spatial pattern performance; however results suggest that the optimization is still more prone to improve the temporal dimension of model performance. This study features a post-calibration linear uncertainty analysis. This allows quantifying parameter identifiability which is the worth of a specific observational dataset to infer values to model parameters through calibration. Furthermore the ability of an observation to reduce predictive uncertainty is assessed as well. Such findings determine concrete implications on the design of model calibration frameworks and, in more general terms, the acquisition of data in hydrological observatories.
2013-01-01
Background Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons. Results We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention. Conclusion CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets. PMID:23617892
Yang, Fang; Chia, Nicholas; White, Bryan A; Schook, Lawrence B
2013-04-23
Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons. We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention. CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets.
Rapid, semi-automatic fracture and contact mapping for point clouds, images and geophysical data
NASA Astrophysics Data System (ADS)
Thiele, Samuel T.; Grose, Lachlan; Samsu, Anindita; Micklethwaite, Steven; Vollgger, Stefan A.; Cruden, Alexander R.
2017-12-01
The advent of large digital datasets from unmanned aerial vehicle (UAV) and satellite platforms now challenges our ability to extract information across multiple scales in a timely manner, often meaning that the full value of the data is not realised. Here we adapt a least-cost-path solver and specially tailored cost functions to rapidly interpolate structural features between manually defined control points in point cloud and raster datasets. We implement the method in the geographic information system QGIS and the point cloud and mesh processing software CloudCompare. Using these implementations, the method can be applied to a variety of three-dimensional (3-D) and two-dimensional (2-D) datasets, including high-resolution aerial imagery, digital outcrop models, digital elevation models (DEMs) and geophysical grids. We demonstrate the algorithm with four diverse applications in which we extract (1) joint and contact patterns in high-resolution orthophotographs, (2) fracture patterns in a dense 3-D point cloud, (3) earthquake surface ruptures of the Greendale Fault associated with the Mw7.1 Darfield earthquake (New Zealand) from high-resolution light detection and ranging (lidar) data, and (4) oceanic fracture zones from bathymetric data of the North Atlantic. The approach improves the consistency of the interpretation process while retaining expert guidance and achieves significant improvements (35-65 %) in digitisation time compared to traditional methods. Furthermore, it opens up new possibilities for data synthesis and can quantify the agreement between datasets and an interpretation.
Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E
2015-01-01
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Soranno, Patricia A.; Bissell, E.G.; Cheruvelil, Kendra S.; Christel, Samuel T.; Collins, Sarah M.; Fergus, C. Emi; Filstrup, Christopher T.; Lapierre, Jean-Francois; Lotting, Noah R.; Oliver, Samantha K.; Scott, Caren E.; Smith, Nicole J.; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A.; Gries, Corinna; Henry, Emily N.; Skaff, Nick K.; Stanley, Emily H.; Stow, Craig A.; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E.
2015-01-01
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km2). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Recent QCD Studies at the Tevatron
DOE Office of Scientific and Technical Information (OSTI.GOV)
Group, Robert Craig
2008-04-01
Since the beginning of Run II at the Fermilab Tevatron the QCD physics groups of the CDF and D0 experiments have worked to reach unprecedented levels of precision for many QCD observables. Thanks to the large dataset--over 3 fb{sup -1} of integrated luminosity recorded by each experiment--important new measurements have recently been made public and will be summarized in this paper.
Multi-body Dynamic Contact Analysis Tool for Transmission Design
2003-04-01
frequencies were computed in COSMIC NASTRAN, and were validated against the published experimental modal analysis [17]. • Using assumed time domain... modal superposition. • Results from the structural analysis (mode shapes or forced response) were converted into IDEAS universal format (dataset 55...ARMY RESEARCH LABORATORY Multi-body Dynamic Contact Analysis Tool for Transmission Design SBIR Phase II Final Report by
Overview of the 2013 FireFlux II grass fire field experiment
C.B. Clements; B. Davis; D. Seto; J. Contezac; A. Kochanski; J.-B. Fillipi; N. Lareau; B. Barboni; B. Butler; S. Krueger; R. Ottmar; R. Vihnanek; W.E. Heilman; J. Flynn; M.A. Jenkins; J. Mandel; C. Teske; D. Jimenez; J. O' Brien; B. Lefer
2014-01-01
In order to better understand the dynamics of fire-atmosphere interactions and the role of micrometeorology on fire behaviour the FireFlux campaign was conducted in 2006 on a coastal tall-grass prairie in southeast Texas, USA. The FireFlux campaign dataset has become the international standard for evaluating coupled fire-atmosphere model systems. While FireFlux is one...
Is hyporheic flow an indicator for salmonid spawning site selection?
NASA Astrophysics Data System (ADS)
Benjankar, R. M.; Tonina, D.; Marzadri, A.; McKean, J. A.; Isaak, D.
2015-12-01
Several studies have investigated the role of hydraulic variables in the selection of spawning sites by salmonids. Some recent studies suggest that the intensity of the ambient hyporheic flow, that present without a salmon egg pocket, is a cue for spawning site selection, but others have argued against it. We tested this hypothesis by using a unique dataset of field surveyed spawning site locations and an unprecedented meter-scale resolution bathymetry of a 13.5 km long reach of Bear Valley Creek (Idaho, USA), an important Chinook salmon spawning stream. We used a two-dimensional surface water model to quantify stream hydraulics and a three-dimensional hyporheic model to quantify the hyporheic flows. Our results show that the intensity of ambient hyporheic flows is not a statistically significant variable for spawning site selection. Conversely, the intensity of the water surface curvature and the habitat quality, quantified as a function of stream hydraulics and morphology, are the most important variables for salmonid spawning site selection. KEY WORDS: Salmonid spawning habitat, pool-riffle system, habitat quality, surface water curvature, hyporheic flow
NASA Astrophysics Data System (ADS)
Barra, Adriano; Contucci, Pierluigi; Sandell, Rickard; Vernia, Cecilia
2014-02-01
How does immigrant integration in a country change with immigration density? Guided by a statistical mechanics perspective we propose a novel approach to this problem. The analysis focuses on classical integration quantifiers such as the percentage of jobs (temporary and permanent) given to immigrants, mixed marriages, and newborns with parents of mixed origin. We find that the average values of different quantifiers may exhibit either linear or non-linear growth on immigrant density and we suggest that social action, a concept identified by Max Weber, causes the observed non-linearity. Using the statistical mechanics notion of interaction to quantitatively emulate social action, a unified mathematical model for integration is proposed and it is shown to explain both growth behaviors observed. The linear theory instead, ignoring the possibility of interaction effects would underestimate the quantifiers up to 30% when immigrant densities are low, and overestimate them as much when densities are high. The capacity to quantitatively isolate different types of integration mechanisms makes our framework a suitable tool in the quest for more efficient integration policies.
Dawood, Faten A; Rahmat, Rahmita W; Kadiman, Suhaini B; Abdullah, Lili N; Zamrin, Mohd D
2014-01-01
This paper presents a hybrid method to extract endocardial contour of the right ventricular (RV) in 4-slices from 3D echocardiography dataset. The overall framework comprises four processing phases. In Phase I, the region of interest (ROI) is identified by estimating the cavity boundary. Speckle noise reduction and contrast enhancement were implemented in Phase II as preprocessing tasks. In Phase III, the RV cavity region was segmented by generating intensity threshold which was used for once for all frames. Finally, Phase IV is proposed to extract the RV endocardial contour in a complete cardiac cycle using a combination of shape-based contour detection and improved radial search algorithm. The proposed method was applied to 16 datasets of 3D echocardiography encompassing the RV in long-axis view. The accuracy of experimental results obtained by the proposed method was evaluated qualitatively and quantitatively. It has been done by comparing the segmentation results of RV cavity based on endocardial contour extraction with the ground truth. The comparative analysis results show that the proposed method performs efficiently in all datasets with overall performance of 95% and the root mean square distances (RMSD) measure in terms of mean ± SD was found to be 2.21 ± 0.35 mm for RV endocardial contours.
Northern Hemisphere Nitrous Oxide Morphology during the 1989 AASE and the 1991-1992 AASE 2 Campaigns
NASA Technical Reports Server (NTRS)
Podolske, James R.; Loewenstein, Max; Weaver, Alex; Strahan, Susan; Chan, K. Roland
1993-01-01
Nitrous oxide vertical profiles and latitudinal distributions for the 1989 AASE and 1992 AASE II northern polar winters are developed from the ATLAS N2O dataset, using both potential temperature and pressure as vertical coordinates. Morphologies show strong descent occurring poleward of the polar jet. The AASE II morphology shows a mid latitude 'surf zone,' characterized by strong horizontal mixing, and a horizontal gradient south of 30 deg N due to the sub-tropical jet. These features are similar to those produced by two-dimensional photochemical models which include coupling between transport, radiation, and chemistry.
Measurement of B sup 0 -- B sup 0 mixing using the MARK II at PEP
DOE Office of Scientific and Technical Information (OSTI.GOV)
Porter, F.
B{sup 0}{bar B}{sup 0} mixing has been observed now by several experiments. The signature is the observation of an excess of same-sign dilepton events in datasets containing semileptonic B decays. Several years ago the MARK II published an upper limit on B{sup 0}{bar B}{sup 0} mixing at E{sub cm} = 29 GeV, using data taken at the e{sup +}e{sup {minus}} storage ring PEP. Here we report on the results of a new analysis with increased statistics, using refined methods with better sensitivity and control of systematic effects. 10 refs., 2 figs., 2 tab.
Moghram, Basem Ameen; Nabil, Emad; Badr, Amr
2018-01-01
T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.
Arsenic mobilization and immobilization in paddy soils
NASA Astrophysics Data System (ADS)
Kappler, A.; Hohmann, C.; Zhu, Y. G.; Morin, G.
2010-05-01
Arsenic is oftentimes of geogenic origin and in many cases bound to iron(III) minerals. Iron(III)-reducing bacteria can harvest energy by coupling the oxidation of organic or inorganic electron donors to the reduction of Fe(III). This process leads either to dissolution of Fe(III)-containing minerals and thus to a release of the arsenic into the environment or to secondary Fe-mineral formation and immobilisation of arsenic. Additionally, aerobic and anaerobic iron(II)-oxidizing bacteria have the potential to co-precipitate or sorb arsenic during iron(II) oxidation at neutral pH that is usually followed by iron(III) mineral precipitation. We are currently investigating arsenic immobilization by Fe(III)-reducing bacteria and arsenic co-precipitation and immobilization by anaerobic iron(II)-oxidizing bacteria in batch, microcosm and rice pot experiments. Co-precipitation batch experiments with pure cultures of nitrate-dependent Fe(II)-oxidizing bacteria are used to quantify the amount of arsenic that can be immobilized during microbial iron mineral precipitation, to identify the minerals formed and to analyze the arsenic binding environment in the precipitates. Microcosm and rice pot experiments are set-up with arsenic-contaminated rice paddy soil. The microorganisms (either the native microbial population or the soil amended with the nitrate-dependent iron(II)-oxidizing Acidovorax sp. strain BoFeN1) are stimulated either with iron(II), nitrate, or oxygen. Dissolved and solid-phase arsenic and iron are quantified. Iron and arsenic speciation and redox state in batch and microcosm experiments are determined by LC-ICP-MS and synchrotron-based methods (EXAFS, XANES).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Danny; Pollock, Sean; Keall, Paul, E-mail: paul.keall@sydney.edu.au
2016-05-15
Purpose: The dynamic keyhole is a new MR image reconstruction method for thoracic and abdominal MR imaging. To date, this method has not been investigated with cancer patient magnetic resonance imaging (MRI) data. The goal of this study was to assess the dynamic keyhole method for the task of lung tumor localization using cine-MR images reconstructed in the presence of respiratory motion. Methods: The dynamic keyhole method utilizes a previously acquired a library of peripheral k-space datasets at similar displacement and phase (where phase is simply used to determine whether the breathing is inhale to exhale or exhale to inhale)more » respiratory bins in conjunction with central k-space datasets (keyhole) acquired. External respiratory signals drive the process of sorting, matching, and combining the two k-space streams for each respiratory bin, thereby achieving faster image acquisition without substantial motion artifacts. This study was the first that investigates the impact of k-space undersampling on lung tumor motion and area assessment across clinically available techniques (zero-filling and conventional keyhole). In this study, the dynamic keyhole, conventional keyhole and zero-filling methods were compared to full k-space dataset acquisition by quantifying (1) the keyhole size required for central k-space datasets for constant image quality across sixty four cine-MRI datasets from nine lung cancer patients, (2) the intensity difference between the original and reconstructed images in a constant keyhole size, and (3) the accuracy of tumor motion and area directly measured by tumor autocontouring. Results: For constant image quality, the dynamic keyhole method, conventional keyhole, and zero-filling methods required 22%, 34%, and 49% of the keyhole size (P < 0.0001), respectively, compared to the full k-space image acquisition method. Compared to the conventional keyhole and zero-filling reconstructed images with the keyhole size utilized in the dynamic keyhole method, an average intensity difference of the dynamic keyhole reconstructed images (P < 0.0001) was minimal, and resulted in the accuracy of tumor motion within 99.6% (P < 0.0001) and the accuracy of tumor area within 98.0% (P < 0.0001) for lung tumor monitoring applications. Conclusions: This study demonstrates that the dynamic keyhole method is a promising technique for clinical applications such as image-guided radiation therapy requiring the MR monitoring of thoracic tumors. Based on the results from this study, the dynamic keyhole method could increase the imaging frequency by up to a factor of five compared with full k-space methods for real-time lung tumor MRI.« less
NASA Astrophysics Data System (ADS)
Hancock, Matthew C.; Magnan, Jerry F.
2017-03-01
To determine the potential usefulness of quantified diagnostic image features as inputs to a CAD system, we investigate the predictive capabilities of statistical learning methods for classifying nodule malignancy, utilizing the Lung Image Database Consortium (LIDC) dataset, and only employ the radiologist-assigned diagnostic feature values for the lung nodules therein, as well as our derived estimates of the diameter and volume of the nodules from the radiologists' annotations. We calculate theoretical upper bounds on the classification accuracy that is achievable by an ideal classifier that only uses the radiologist-assigned feature values, and we obtain an accuracy of 85.74 (+/-1.14)% which is, on average, 4.43% below the theoretical maximum of 90.17%. The corresponding area-under-the-curve (AUC) score is 0.932 (+/-0.012), which increases to 0.949 (+/-0.007) when diameter and volume features are included, along with the accuracy to 88.08 (+/-1.11)%. Our results are comparable to those in the literature that use algorithmically-derived image-based features, which supports our hypothesis that lung nodules can be classified as malignant or benign using only quantified, diagnostic image features, and indicates the competitiveness of this approach. We also analyze how the classification accuracy depends on specific features, and feature subsets, and we rank the features according to their predictive power, statistically demonstrating the top four to be spiculation, lobulation, subtlety, and calcification.
Quantifying the benefits of vehicle pooling with shareability networks
Santi, Paolo; Resta, Giovanni; Szell, Michael; Sobolevsky, Stanislav; Strogatz, Steven H.; Ratti, Carlo
2014-01-01
Taxi services are a vital part of urban transportation, and a considerable contributor to traffic congestion and air pollution causing substantial adverse effects on human health. Sharing taxi trips is a possible way of reducing the negative impact of taxi services on cities, but this comes at the expense of passenger discomfort quantifiable in terms of a longer travel time. Due to computational challenges, taxi sharing has traditionally been approached on small scales, such as within airport perimeters, or with dynamical ad hoc heuristics. However, a mathematical framework for the systematic understanding of the tradeoff between collective benefits of sharing and individual passenger discomfort is lacking. Here we introduce the notion of shareability network, which allows us to model the collective benefits of sharing as a function of passenger inconvenience, and to efficiently compute optimal sharing strategies on massive datasets. We apply this framework to a dataset of millions of taxi trips taken in New York City, showing that with increasing but still relatively low passenger discomfort, cumulative trip length can be cut by 40% or more. This benefit comes with reductions in service cost, emissions, and with split fares, hinting toward a wide passenger acceptance of such a shared service. Simulation of a realistic online system demonstrates the feasibility of a shareable taxi service in New York City. Shareability as a function of trip density saturates fast, suggesting effectiveness of the taxi sharing system also in cities with much sparser taxi fleets or when willingness to share is low. PMID:25197046
Quantifying the benefits of vehicle pooling with shareability networks.
Santi, Paolo; Resta, Giovanni; Szell, Michael; Sobolevsky, Stanislav; Strogatz, Steven H; Ratti, Carlo
2014-09-16
Taxi services are a vital part of urban transportation, and a considerable contributor to traffic congestion and air pollution causing substantial adverse effects on human health. Sharing taxi trips is a possible way of reducing the negative impact of taxi services on cities, but this comes at the expense of passenger discomfort quantifiable in terms of a longer travel time. Due to computational challenges, taxi sharing has traditionally been approached on small scales, such as within airport perimeters, or with dynamical ad hoc heuristics. However, a mathematical framework for the systematic understanding of the tradeoff between collective benefits of sharing and individual passenger discomfort is lacking. Here we introduce the notion of shareability network, which allows us to model the collective benefits of sharing as a function of passenger inconvenience, and to efficiently compute optimal sharing strategies on massive datasets. We apply this framework to a dataset of millions of taxi trips taken in New York City, showing that with increasing but still relatively low passenger discomfort, cumulative trip length can be cut by 40% or more. This benefit comes with reductions in service cost, emissions, and with split fares, hinting toward a wide passenger acceptance of such a shared service. Simulation of a realistic online system demonstrates the feasibility of a shareable taxi service in New York City. Shareability as a function of trip density saturates fast, suggesting effectiveness of the taxi sharing system also in cities with much sparser taxi fleets or when willingness to share is low.
Paliwal, Nikhil; Damiano, Robert J; Varble, Nicole A; Tutino, Vincent M; Dou, Zhongwang; Siddiqui, Adnan H; Meng, Hui
2017-12-01
Computational fluid dynamics (CFD) is a promising tool to aid in clinical diagnoses of cardiovascular diseases. However, it uses assumptions that simplify the complexities of the real cardiovascular flow. Due to high-stakes in the clinical setting, it is critical to calculate the effect of these assumptions in the CFD simulation results. However, existing CFD validation approaches do not quantify error in the simulation results due to the CFD solver's modeling assumptions. Instead, they directly compare CFD simulation results against validation data. Thus, to quantify the accuracy of a CFD solver, we developed a validation methodology that calculates the CFD model error (arising from modeling assumptions). Our methodology identifies independent error sources in CFD and validation experiments, and calculates the model error by parsing out other sources of error inherent in simulation and experiments. To demonstrate the method, we simulated the flow field of a patient-specific intracranial aneurysm (IA) in the commercial CFD software star-ccm+. Particle image velocimetry (PIV) provided validation datasets for the flow field on two orthogonal planes. The average model error in the star-ccm+ solver was 5.63 ± 5.49% along the intersecting validation line of the orthogonal planes. Furthermore, we demonstrated that our validation method is superior to existing validation approaches by applying three representative existing validation techniques to our CFD and experimental dataset, and comparing the validation results. Our validation methodology offers a streamlined workflow to extract the "true" accuracy of a CFD solver.
Quantifying and Mapping Global Data Poverty
2015-01-01
Digital information technologies, such as the Internet, mobile phones and social media, provide vast amounts of data for decision-making and resource management. However, access to these technologies, as well as their associated software and training materials, is not evenly distributed: since the 1990s there has been concern about a "Digital Divide" between the data-rich and the data-poor. We present an innovative metric for evaluating international variations in access to digital data: the Data Poverty Index (DPI). The DPI is based on Internet speeds, numbers of computer owners and Internet users, mobile phone ownership and network coverage, as well as provision of higher education. The datasets used to produce the DPI are provided annually for almost all the countries of the world and can be freely downloaded. The index that we present in this ‘proof of concept’ study is the first to quantify and visualise the problem of global data poverty, using the most recent datasets, for 2013. The effects of severe data poverty, particularly limited access to geoinformatic data, free software and online training materials, are discussed in the context of sustainable development and disaster risk reduction. The DPI highlights countries where support is needed for improving access to the Internet and for the provision of training in geoinfomatics. We conclude that the DPI is of value as a potential metric for monitoring the Sustainable Development Goals of the Sendai Framework for Disaster Risk Reduction. PMID:26560884
NASA Astrophysics Data System (ADS)
Yiannikopoulou, I.; Philippopoulos, K.; Deligiorgi, D.
2012-04-01
The vertical thermal structure of the atmosphere is defined by a combination of dynamic and radiation transfer processes and plays an important role in describing the meteorological conditions at local scales. The scope of this work is to develop and quantify the predictive ability of a hybrid dynamic-statistical downscaling procedure to estimate the vertical profile of ambient temperature at finer spatial scales. The study focuses on the warm period of the year (June - August) and the method is applied to an urban coastal site (Hellinikon), located in eastern Mediterranean. The two-step methodology initially involves the dynamic downscaling of coarse resolution climate data via the RegCM4.0 regional climate model and subsequently the statistical downscaling of the modeled outputs by developing and training site-specific artificial neural networks (ANN). The 2.5ox2.5o gridded NCEP-DOE Reanalysis 2 dataset is used as initial and boundary conditions for the dynamic downscaling element of the methodology, which enhances the regional representivity of the dataset to 20km and provides modeled fields in 18 vertical levels. The regional climate modeling results are compared versus the upper-air Hellinikon radiosonde observations and the mean absolute error (MAE) is calculated between the four grid point values nearest to the station and the ambient temperature at the standard and significant pressure levels. The statistical downscaling element of the methodology consists of an ensemble of ANN models, one for each pressure level, which are trained separately and employ the regional scale RegCM4.0 output. The ANN models are theoretically capable of estimating any measurable input-output function to any desired degree of accuracy. In this study they are used as non-linear function approximators for identifying the relationship between a number of predictor variables and the ambient temperature at the various vertical levels. An insight of the statistically derived input-output transfer functions is obtained by utilizing the ANN weights method, which quantifies the relative importance of the predictor variables in the estimation procedure. The overall downscaling performance evaluation incorporates a set of correlation and statistical measures along with appropriate statistical tests. The hybrid downscaling method presented in this work can be extended to various locations by training different site-specific ANN models and the results, depending on the application, can be used for assisting the understanding of the past, present and future climatology. ____________________________ This research has been co-financed by the European Union and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II: Investing in knowledge society through the European Social Fund.
Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.
Rossi, Jean-Pierre; Nardin, Maxime; Godefroid, Martin; Ruiz-Diaz, Manuela; Sergent, Anne-Sophie; Martinez-Meier, Alejandro; Pâques, Luc; Rozenberg, Philippe
2014-01-01
Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA), a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller) during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i) the temporal evolution of spatial structures and ii) the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.
MSWEP V2 global 3-hourly 0.1° precipitation: methodology and quantitative appraisal
NASA Astrophysics Data System (ADS)
Beck, H.; Yang, L.; Pan, M.; Wood, E. F.; William, L.
2017-12-01
Here, we present Multi-Source Weighted-Ensemble Precipitation (MSWEP) V2, the first fully global gridded precipitation (P) dataset with a 0.1° spatial resolution. The dataset covers the period 1979-2016, has a 3-hourly temporal resolution, and was derived by optimally merging a wide range of data sources based on gauges (WorldClim, GHCN-D, GSOD, and others), satellites (CMORPH, GridSat, GSMaP, and TMPA 3B42RT), and reanalyses (ERA-Interim, JRA-55, and NCEP-CFSR). MSWEP V2 implements some major improvements over V1, such as (i) the correction of distributional P biases using cumulative distribution function matching, (ii) increasing the spatial resolution from 0.25° to 0.1°, (iii) the inclusion of ocean areas, (iv) the addition of NCEP-CFSR P estimates, (v) the addition of thermal infrared-based P estimates for the pre-TRMM era, (vi) the addition of 0.1° daily interpolated gauge data, (vii) the use of a daily gauge correction scheme that accounts for regional differences in the 24-hour accumulation period of gauges, and (viii) extension of the data record to 2016. The gauge-based assessment of the reanalysis and satellite P datasets, necessary for establishing the merging weights, revealed that the reanalysis datasets strongly overestimate the P frequency for the entire globe, and that the satellite (resp. reanalysis) datasets consistently performed better at low (high) latitudes. Compared to other state-of-the-art P datasets, MSWEP V2 exhibits more plausible global patterns in mean annual P, percentiles, and annual number of dry days, and better resolves the small-scale variability over topographically complex terrain. Other P datasets appear to consistently underestimate P amounts over mountainous regions. Long-term mean P estimates for the global, land, and ocean domains based on MSWEP V2 are 959, 796, and 1026 mm/yr, respectively, in close agreement with the best previous published estimates.
Quantifying Temperature Effects on Snow, Plant and Streamflow Dynamics in Headwater Catchments
NASA Astrophysics Data System (ADS)
Wainwright, H. M.; Sarah, T.; Siirila-Woodburn, E. R.; Newcomer, M. E.; Williams, K. H.; Hubbard, S. S.; Enquist, B. J.; Steltzer, H.; Carroll, R. W. H.
2017-12-01
Quantifying Temperature Effects on Snow, Plant and Streamflow Dynamics in Headwater Catchments Snow-dominated headwater catchments are critical for water resource throughout the world; particularly in Western US. Under climate change, temperature increases are expected to be amplified in mountainous regions. We use a data-driven approach to better understand the coupling among inter-annual variability in temperature, snow and plant community dynamics and stream discharge. We apply data mining methods (e.g., principal component analysis, random forest) to historical spatiotemporal datasets, including the SNOTEL data, Landsat-based normalized difference vegetation index (NDVI) and airborne LiDAR-based snow distribution. Although both snow distribution and NDVI are extremely heterogeneous spatially, the inter-annual variability and temporal responses are spatially consistent, providing an opportunity to quantify the effect of temperature in the catchment-scale. We demonstrate our approach in the East River Watershed of the Upper Colorado River Basin, including Rocky Mountain Biological Laboratory, where the changes in plant communities and their dynamics have been extensively documented. Results indicate that temperature - particularly spring temperature - has a significant control not only on the timing of snowmelt, plant NDVI and peak flow but also on the magnitude of peak NDVI, peak flow and annual discharge. Monthly temperature in spring explains the variability of snowmelt by the equivalent standard deviation of 3.4-4.4 days, and total discharge by 10-11%. In addition, the high correlation among June temperature, peak NDVI and annual discharge suggests a primary role of spring evapotranspiration on plant community phenology, productivity, and streamflow volume. On the other hand, summer monsoon precipitation does not contribute significantly to annual discharge, further emphasizing the importance of snowmelt. This approach is mostly based on a set of datasets typically available throughout the US, providing a powerful approach to link remote sensing techniques with long-term monitoring of temperature, snowfall, plant, and streamflow dynamics.
NASA Astrophysics Data System (ADS)
Dafflon, B.; Tran, A. P.; Wainwright, H. M.; Hubbard, S. S.; Peterson, J.; Ulrich, C.; Williams, K. H.
2015-12-01
Quantifying water and heat fluxes in the subsurface is crucial for managing water resources and for understanding the terrestrial ecosystem where hydrological properties drive a variety of biogeochemical processes across a large range of spatial and temporal scales. Here, we present the development of an advanced monitoring strategy where hydro-thermal-geophysical datasets are continuously acquired and further involved in a novel inverse modeling framework to estimate the hydraulic and thermal parameter that control heat and water dynamics in the subsurface and further influence surface processes such as evapotranspiration and vegetation growth. The measured and estimated soil properties are also used to investigate co-interaction between subsurface and surface dynamics by using above-ground aerial imaging. The value of this approach is demonstrated at two different sites, one in the polygonal shaped Arctic tundra where water and heat dynamics have a strong impact on freeze-thaw processes, vegetation and biogeochemical processes, and one in a floodplain along the Colorado River where hydrological fluxes between compartments of the system (surface, vadose zone and groundwater) drive biogeochemical transformations. Results show that the developed strategy using geophysical, point-scale and aerial measurements is successful to delineate the spatial distribution of hydrostratigraphic units having distinct physicochemical properties, to monitor and quantify in high resolution water and heat distribution and its linkage with vegetation, geomorphology and weather conditions, and to estimate hydraulic and thermal parameters for enhanced predictions of water and heat fluxes as well as evapotranspiration. Further, in the Colorado floodplain, results document the potential presence of only periodic infiltration pulses as a key hot moment controlling soil hydro and biogeochemical functioning. In the arctic, results show the strong linkage between soil water content, thermal parameters, thaw layer thickness and vegetation distribution. Overall, results of these efforts demonstrate the value of coupling various datasets at high spatial and temporal resolution to improve predictive understanding of subsurface and surface dynamics.
A Data Handling System for Modern and Future Fermilab Experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Illingworth, R. A.
2014-01-01
Current and future Fermilab experiments such as Minerva, NOνA, and MicroBoone are now using an improved version of the Fermilab SAM data handling system. SAM was originally used by the CDF and D0 experiments for Run II of the Fermilab Tevatron to provide file metadata and location cataloguing, uploading of new files to tape storage, dataset management, file transfers between global processing sites, and processing history tracking. However SAM was heavily tailored to the Run II environment and required complex and hard to deploy client software, which made it hard to adapt to new experiments. The Fermilab Computing Sector hasmore » progressively updated SAM to use modern, standardized, technologies in order to more easily deploy it for current and upcoming Fermilab experiments, and to support the data preservation efforts of the Run II experiments.« less
Childhood IQ and In-Service Mortality in Scottish Army Personnel during World War II
ERIC Educational Resources Information Center
Corley, Janie; Crang, Jeremy A.; Deary, Ian J.
2009-01-01
The Scottish Mental Survey of 1932 (SMS1932) provides a record of intelligence test scores for almost a complete year-of-birth group of children born in 1921. By linking UK Army personnel records, the Scottish National War Memorial data, and the SMS1932 dataset it was possible to examine the effect of childhood intelligence scores on wartime…
Benchmarking Deep Learning Models on Large Healthcare Datasets.
Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan
2018-06-04
Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
Herrero, Mario; Havlík, Petr; Valin, Hugo; Notenbaert, An; Rufino, Mariana C.; Thornton, Philip K.; Blümmel, Michael; Weiss, Franz; Grace, Delia; Obersteiner, Michael
2013-01-01
We present a unique, biologically consistent, spatially disaggregated global livestock dataset containing information on biomass use, production, feed efficiency, excretion, and greenhouse gas emissions for 28 regions, 8 livestock production systems, 4 animal species (cattle, small ruminants, pigs, and poultry), and 3 livestock products (milk, meat, and eggs). The dataset contains over 50 new global maps containing high-resolution information for understanding the multiple roles (biophysical, economic, social) that livestock can play in different parts of the world. The dataset highlights: (i) feed efficiency as a key driver of productivity, resource use, and greenhouse gas emission intensities, with vast differences between production systems and animal products; (ii) the importance of grasslands as a global resource, supplying almost 50% of biomass for animals while continuing to be at the epicentre of land conversion processes; and (iii) the importance of mixed crop–livestock systems, producing the greater part of animal production (over 60%) in both the developed and the developing world. These data provide critical information for developing targeted, sustainable solutions for the livestock sector and its widely ranging contribution to the global food system. PMID:24344273
Columbia River Coordinated Information System (CIS); Data Catalog, 1992 Technical Report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Connor, Dick; Allen, Stan; Reece, Doug
1993-05-01
The Columbia River Coordinated Information system (CIS) Project started in 1989 to address regional data sharing. Coordinated exchange and dissemination of any data must begin with dissemination of information about those data, such as: what is available; where the data are stored; what form they exist in; who to contact for further information or access to these data. In Phase II of this Project (1991), a Data Catalog describing the contents of regional datasets and less formal data collections useful for system monitoring and evaluation projects was built to improve awareness of their existence. Formal datasets are described in amore » `Dataset Directory,` while collections of data are Used to those that collect such information in the `Data Item Directory.` The Data Catalog will serve regional workers as a useful reference which centralizes the institutional knowledge of many data contacts into a single source. Recommendations for improvement of the Catalog during Phase III of this Project include addressing gaps in coverage, establishing an annual maintenance schedule, and loading the contents into a PC-based electronic database for easier searching and cross-referencing.« less
Leaf Area, Vegetation Biomass and Nutrient Content, Barrow, Alaska, 2012 - 2013
DOE Office of Scientific and Technical Information (OSTI.GOV)
Victoria Sloan; David McGuire; Eugenie Euskirchen
This dataset consists of measurements of vegetation harvested from Areas A to D of Intensive Site 1 at the Next-Generation Ecosystem Experiments (NGEE) Arctic site near Barrow, Alaska. The dataset includes i) values of leaf area index, biomass, carbon (C), nitrogen (N) and phosphorus (P) content of aboveground plant parts from 0.25 m × 0.25 m clip-plots at peak growing season and ii) fine-root biomass from 5.08-cm diameter soil cores taken throughout the active layer in the same location as the clip plots in late July-early August 2012, and iii) values of aboveground biomass and nitrogen (N) content measured frommore » 0.1 m × 0.1 m clip-plots harvested at 2-week intervals throughout the 2013 growing season.« less
GPM Ground Validation: Pre to Post-Launch Era
NASA Astrophysics Data System (ADS)
Petersen, Walt; Skofronick-Jackson, Gail; Huffman, George
2015-04-01
NASA GPM Ground Validation (GV) activities have transitioned from the pre to post-launch era. Prior to launch direct validation networks and associated partner institutions were identified world-wide, covering a plethora of precipitation regimes. In the U.S. direct GV efforts focused on use of new operational products such as the NOAA Multi-Radar Multi-Sensor suite (MRMS) for TRMM validation and GPM radiometer algorithm database development. In the post-launch, MRMS products including precipitation rate, accumulation, types and data quality are being routinely generated to facilitate statistical GV of instantaneous (e.g., Level II orbit) and merged (e.g., IMERG) GPM products. Toward assessing precipitation column impacts on product uncertainties, range-gate to pixel-level validation of both Dual-Frequency Precipitation Radar (DPR) and GPM microwave imager data are performed using GPM Validation Network (VN) ground radar and satellite data processing software. VN software ingests quality-controlled volumetric radar datasets and geo-matches those data to coincident DPR and radiometer level-II data. When combined MRMS and VN datasets enable more comprehensive interpretation of both ground and satellite-based estimation uncertainties. To support physical validation efforts eight (one) field campaigns have been conducted in the pre (post) launch era. The campaigns span regimes from northern latitude cold-season snow to warm tropical rain. Most recently the Integrated Precipitation and Hydrology Experiment (IPHEx) took place in the mountains of North Carolina and involved combined airborne and ground-based measurements of orographic precipitation and hydrologic processes underneath the GPM Core satellite. One more U.S. GV field campaign (OLYMPEX) is planned for late 2015 and will address cold-season precipitation estimation, process and hydrology in the orographic and oceanic domains of western Washington State. Finally, continuous direct and physical validation measurements are also being conducted at the NASA Wallops Flight Facility multi-radar, gauge and disdrometer facility located in coastal Virginia. This presentation will summarize the evolution of the NASA GPM GV program from pre to post-launch eras and place focus on evaluation of year-1 post-launch GPM satellite datasets including Level II GPROF, DPR and Combined algorithms, and Level III IMERG products.
Point-based warping with optimized weighting factors of displacement vectors
NASA Astrophysics Data System (ADS)
Pielot, Ranier; Scholz, Michael; Obermayer, Klaus; Gundelfinger, Eckart D.; Hess, Andreas
2000-06-01
The accurate comparison of inter-individual 3D image brain datasets requires non-affine transformation techniques (warping) to reduce geometric variations. Constrained by the biological prerequisites we use in this study a landmark-based warping method with weighted sums of displacement vectors, which is enhanced by an optimization process. Furthermore, we investigate fast automatic procedures for determining landmarks to improve the practicability of 3D warping. This combined approach was tested on 3D autoradiographs of Gerbil brains. The autoradiographs were obtained after injecting a non-metabolized radioactive glucose derivative into the Gerbil thereby visualizing neuronal activity in the brain. Afterwards the brain was processed with standard autoradiographical methods. The landmark-generator computes corresponding reference points simultaneously within a given number of datasets by Monte-Carlo-techniques. The warping function is a distance weighted exponential function with a landmark- specific weighting factor. These weighting factors are optimized by a computational evolution strategy. The warping quality is quantified by several coefficients (correlation coefficient, overlap-index, and registration error). The described approach combines a highly suitable procedure to automatically detect landmarks in autoradiographical brain images and an enhanced point-based warping technique, optimizing the local weighting factors. This optimization process significantly improves the similarity between the warped and the target dataset.
The SPEIbase: a new gridded product for the analysis of drought variability and drought impacts
NASA Astrophysics Data System (ADS)
Begueria-Portugues, S.; Vicente-Serrano, S. M.; López-Moreno, J. I.; Angulo-Martínez, M.; El Kenawy, A.
2010-09-01
Recently a new drought indicator, the Standardised Precipitation-Evapotranspiration Index (SPEI), has been proposed to quantify the drought condition over a given area. The SPEI considers not only precipitation but also evapotranspiration (PET) data on its calculation, allowing for a more complete approach to explore the effects of climate change on drought conditions. The SPEI can be calculated at several time scales to adapt to the characteristic times of response to drought of target natural and economic systems, allowing determining their resistance to drought. Following the formulation of the SPEI a global dataset, the SPEIbase, has been made available to the scientific community. The dataset covers the period 1901-2006 with a monthly frequency, and offers global coverage at a 0.5 degrees resolution. The dataset consists on the monthly values of the SPEI at the time scales from 1 to 48 months. A description of the data and metadata, and links to download the files, are provided at http://sac.csic.es/spei. On this communication we will detail the methodology for computing the SPEI and the characteristics of the SPEIbase. A thorough discussion of the SPEI index, and some examples of use, will be provided in a companion comunication.
Ronan, Lisa; Voets, Natalie L.; Hough, Morgan; Mackay, Clare; Roberts, Neil; Suckling, John; Bullmore, Edward; James, Anthony; Fletcher, Paul C.
2012-01-01
Several studies have sought to test the neurodevelopmental hypothesis of schizophrenia through analysis of cortical gyrification. However, to date, results have been inconsistent. A possible reason for this is that gyrification measures at the centimeter scale may be insensitive to subtle morphological changes at smaller scales. The lack of consistency in such studies may impede further interpretation of cortical morphology as an aid to understanding the etiology of schizophrenia. In this study we developed a new approach, examining whether millimeter-scale measures of cortical curvature are sensitive to changes in fundamental geometric properties of the cortical surface in schizophrenia. We determined and compared millimeter-scale and centimeter-scale curvature in three separate case–control studies; specifically two adult groups and one adolescent group. The datasets were of different sizes, with different ages and gender-spreads. The results clearly show that millimeter-scale intrinsic curvature measures were more robust and consistent in identifying reduced gyrification in patients across all three datasets. To further interpret this finding we quantified the ratio of expansion in the upper and lower cortical layers. The results suggest that reduced gyrification in schizophrenia is driven by a reduction in the expansion of upper cortical layers. This may plausibly be related to a reduction in short-range connectivity. PMID:22743195
Learning visual balance from large-scale datasets of aesthetically highly rated images
NASA Astrophysics Data System (ADS)
Jahanian, Ali; Vishwanathan, S. V. N.; Allebach, Jan P.
2015-03-01
The concept of visual balance is innate for humans, and influences how we perceive visual aesthetics and cognize harmony. Although visual balance is a vital principle of design and taught in schools of designs, it is barely quantified. On the other hand, with emergence of automantic/semi-automatic visual designs for self-publishing, learning visual balance and computationally modeling it, may escalate aesthetics of such designs. In this paper, we present how questing for understanding visual balance inspired us to revisit one of the well-known theories in visual arts, the so called theory of "visual rightness", elucidated by Arnheim. We define Arnheim's hypothesis as a design mining problem with the goal of learning visual balance from work of professionals. We collected a dataset of 120K images that are aesthetically highly rated, from a professional photography website. We then computed factors that contribute to visual balance based on the notion of visual saliency. We fitted a mixture of Gaussians to the saliency maps of the images, and obtained the hotspots of the images. Our inferred Gaussians align with Arnheim's hotspots, and confirm his theory. Moreover, the results support the viability of the center of mass, symmetry, as well as the Rule of Thirds in our dataset.
David, Simon; Visvikis, Dimitris; Quellec, Gwénolé; Le Rest, Catherine Cheze; Fernandez, Philippe; Allard, Michèle; Roux, Christian; Hatt, Mathieu
2012-09-01
In clinical oncology, positron emission tomography (PET) imaging can be used to assess therapeutic response by quantifying the evolution of semi-quantitative values such as standardized uptake value, early during treatment or after treatment. Current guidelines do not include metabolically active tumor volume (MATV) measurements and derived parameters such as total lesion glycolysis (TLG) to characterize the response to the treatment. To achieve automatic MATV variation estimation during treatment, we propose an approach based on the change detection principle using the recent paradoxical theory, which models imprecision, uncertainty, and conflict between sources. It was applied here simultaneously to pre- and post-treatment PET scans. The proposed method was applied to both simulated and clinical datasets, and its performance was compared to adaptive thresholding applied separately on pre- and post-treatment PET scans. On simulated datasets, the adaptive threshold was associated with significantly higher classification errors than the developed approach. On clinical datasets, the proposed method led to results more consistent with the known partial responder status of these patients. The method requires accurate rigid registration of both scans which can be obtained only in specific body regions and does not explicitly model uptake heterogeneity. In further investigations, the change detection of intra-MATV tracer uptake heterogeneity will be developed by incorporating textural features into the proposed approach.
NASA Astrophysics Data System (ADS)
Brelsford, Christa; Shepherd, Doug
2014-01-01
In desert cities, accurate measurements of vegetation area within residential lots are necessary to understand drivers of change in water consumption. Most residential lots are smaller than an individual 30-m pixel from Landsat satellite images and have a mixture of vegetation and other land covers. Quantifying vegetation change in this environment requires estimating subpixel vegetation area. Mixture-tuned match filtering (MTMF) has been successfully used for subpixel target detection. There have been few successful applications of MTMF to subpixel abundance estimation because the relationship observed between MTMF estimates and ground measurements of abundance is noisy. We use a ground truth dataset over 10 times larger than that available for any previous MTMF application to estimate the bias between ground data and MTMF results. We find that MTMF underestimates the fractional area of vegetation by 5% to 10% and show that averaging over multiple pixels is necessary to reduce noise in the dataset. We conclude that MTMF is a viable technique for fractional area estimation when a large dataset is available for calibration. When this method is applied to estimating vegetation area in Las Vegas, Nevada, spatial and temporal trends are consistent with expectations from known population growth and policy changes.
A guide to evaluating linkage quality for the analysis of linked data.
Harron, Katie L; Doidge, James C; Knight, Hannah E; Gilbert, Ruth E; Goldstein, Harvey; Cromwell, David A; van der Meulen, Jan H
2017-10-01
Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluated and provide widely applicable guidance for both data providers and researchers. Using an illustrative example of a linked dataset of maternal and baby hospital records, we demonstrate three approaches for evaluating linkage quality: applying the linkage algorithm to a subset of gold standard data to quantify linkage error; comparing characteristics of linked and unlinked data to identify potential sources of bias; and evaluating the sensitivity of results to changes in the linkage procedure. These approaches can inform our understanding of the potential impact of linkage error and provide an opportunity to select the most appropriate linkage procedure for a specific analysis. Evaluating linkage quality in this way will improve the quality and transparency of epidemiological and clinical research using linked data. © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association.
Potential for using regional and global datasets for national scale ecosystem service modelling
NASA Astrophysics Data System (ADS)
Maxwell, Deborah; Jackson, Bethanna
2016-04-01
Ecosystem service models are increasingly being used by planners and policy makers to inform policy development and decisions about national-level resource management. Such models allow ecosystem services to be mapped and quantified, and subsequent changes to these services to be identified and monitored. In some cases, the impact of small scale changes can be modelled at a national scale, providing more detailed information to decision makers about where to best focus investment and management interventions that could address these issues, while moving toward national goals and/or targets. National scale modelling often uses national (or local) data (for example, soils, landcover and topographical information) as input. However, there are some places where fine resolution and/or high quality national datasets cannot be easily obtained, or do not even exist. In the absence of such detailed information, regional or global datasets could be used as input to such models. There are questions, however, about the usefulness of these coarser resolution datasets and the extent to which inaccuracies in this data may degrade predictions of existing and potential ecosystem service provision and subsequent decision making. Using LUCI (the Land Utilisation and Capability Indicator) as an example predictive model, we examine how the reliability of predictions change when national datasets of soil, landcover and topography are substituted with coarser scale regional and global datasets. We specifically look at how LUCI's predictions of where water services, such as flood risk, flood mitigation, erosion and water quality, change when national data inputs are replaced by regional and global datasets. Using the Conwy catchment, Wales, as a case study, the land cover products compared are the UK's Land Cover Map (2007), the European CORINE land cover map and the ESA global land cover map. Soils products include the National Soil Map of England and Wales (NatMap) and the European Soils Database. NEXTmap elevation data, which covers the UK and parts of continental Europe, are compared to global AsterDEM and SRTM30 topographical products. While the regional and global datasets can be used to fill gaps in data requirements, the coarser resolution of these datasets means that there is greater aggregation of information over larger areas. This loss of detail impacts on the reliability of model output, particularly where significant discrepancies between datasets exist. The implications of this loss of detail in terms of spatial planning and decision making is discussed. Finally, in the context of broader development the need for better nationally and globally available data to allow LUCI and other ecosystem models to become more globally applicable is highlighted.
Creation of forest edges has a global impact on forest vertebrates
Peres, CA; Banks-Leite, C; Wearn, OR; Marsh, CJ; Butchart, SHM; Arroyo-Rodríguez, V; Barlow, J; Cerezo, A; Cisneros, L; D’Cruze, N; Faria, D; Hadley, A; Harris, S; Klingbeil, BT; Kormann, U; Lens, L; Medina-Rangel, GF; Morante-Filho, JC; Olivier, P; Peters, SL; Pidgeon, A; Ribeiro, DB; Scherber, C; Schneider-Maunory, L; Struebig, M; Urbina-Cardona, N; Watling, JI; Willig, MR; Wood, EM; Ewers, RM
2017-01-01
Summary Forest edges influence more than half the world’s forests and contribute to worldwide declines in biodiversity and ecosystem functions. However, predicting these declines is challenging in heterogeneous fragmented landscapes. We assembled an unmatched global dataset on species responses to fragmentation and developed a new statistical approach for quantifying edge impacts in heterogeneous landscapes to quantify edge-determined changes in abundance of 1673 vertebrate species. We show that 85% of species’ abundances are affected, either positively or negatively, by forest edges. Forest core species, which were more likely to be listed as threatened by the IUCN, only reached peak abundances at sites farther than 200-400 m from sharp high-contrast forest edges. Smaller-bodied amphibians, larger reptiles and medium-sized non-volant mammals experienced a larger reduction in suitable habitat than other forest core species. Our results highlight the pervasive ability of forest edges to restructure ecological communities on a global scale. PMID:29088701
Eriksson, Stefanie; Elbing, Karin; Söderman, Olle; Lindkvist-Petersson, Karin; Topgaard, Daniel; Lasič, Samo
2017-01-01
Water transport across cell membranes can be measured non-invasively with diffusion NMR. We present a method to quantify the intracellular lifetime of water in cell suspensions with short transverse relaxation times, T2, and also circumvent the confounding effect of different T2 values in the intra- and extracellular compartments. Filter exchange spectroscopy (FEXSY) is specifically sensitive to exchange between compartments with different apparent diffusivities. Our investigation shows that FEXSY could yield significantly biased results if differences in T2 are not accounted for. To mitigate this problem, we propose combining FEXSY with diffusion-relaxation correlation experiment, which can quantify differences in T2 values in compartments with different diffusivities. Our analysis uses a joint constrained fitting of the two datasets and considers the effects of diffusion, relaxation and exchange in both experiments. The method is demonstrated on yeast cells with and without human aquaporins.
Eriksson, Stefanie; Elbing, Karin; Söderman, Olle; Lindkvist-Petersson, Karin; Topgaard, Daniel
2017-01-01
Water transport across cell membranes can be measured non-invasively with diffusion NMR. We present a method to quantify the intracellular lifetime of water in cell suspensions with short transverse relaxation times, T2, and also circumvent the confounding effect of different T2 values in the intra- and extracellular compartments. Filter exchange spectroscopy (FEXSY) is specifically sensitive to exchange between compartments with different apparent diffusivities. Our investigation shows that FEXSY could yield significantly biased results if differences in T2 are not accounted for. To mitigate this problem, we propose combining FEXSY with diffusion-relaxation correlation experiment, which can quantify differences in T2 values in compartments with different diffusivities. Our analysis uses a joint constrained fitting of the two datasets and considers the effects of diffusion, relaxation and exchange in both experiments. The method is demonstrated on yeast cells with and without human aquaporins. PMID:28493928
Higgins, Helen M; Green, Laura E; Green, Martin J; Kaler, Jasmeet
2013-01-01
Footrot is a widespread, infectious cause of lameness in sheep, with major economic and welfare costs. The aims of this research were: (i) to quantify how veterinary surgeons' beliefs regarding the efficacy of two treatments for footrot changed following a review of the evidence (ii) to obtain a consensus opinion following group discussions (iii) to capture complementary qualitative data to place their beliefs within a broader clinical context. Grounded in a Bayesian statistical framework, probabilistic elicitation (roulette method) was used to quantify the beliefs of eleven veterinary surgeons during two one-day workshops. There was considerable heterogeneity in veterinary surgeons' beliefs before they listened to a review of the evidence. After hearing the evidence, seven participants quantifiably changed their beliefs. In particular, two participants who initially believed that foot trimming with topical oxytetracycline was the better treatment, changed to entirely favour systemic and topical oxytetracycline instead. The results suggest that a substantial amount of the variation in beliefs related to differences in veterinary surgeons' knowledge of the evidence. Although considerable differences in opinion still remained after the evidence review, with several participants having non-overlapping 95% credible intervals, both groups did achieve a consensus opinion. Two key findings from the qualitative data were: (i) veterinary surgeons believed that farmers are unlikely to actively seek advice on lameness, suggesting a proactive veterinary approach is required (ii) more attention could be given to improving the way in which veterinary advice is delivered to farmers. In summary this study has: (i) demonstrated a practical method for probabilistically quantifying how veterinary surgeons' beliefs change (ii) revealed that the evidence that currently exists is capable of changing veterinary opinion (iii) suggested that improved transfer of research knowledge into veterinary practice is needed (iv) identified some potential obstacles to the implementation of veterinary advice by farmers.
Higgins, Helen M.; Green, Laura E.; Green, Martin J.; Kaler, Jasmeet
2013-01-01
Footrot is a widespread, infectious cause of lameness in sheep, with major economic and welfare costs. The aims of this research were: (i) to quantify how veterinary surgeons’ beliefs regarding the efficacy of two treatments for footrot changed following a review of the evidence (ii) to obtain a consensus opinion following group discussions (iii) to capture complementary qualitative data to place their beliefs within a broader clinical context. Grounded in a Bayesian statistical framework, probabilistic elicitation (roulette method) was used to quantify the beliefs of eleven veterinary surgeons during two one-day workshops. There was considerable heterogeneity in veterinary surgeons’ beliefs before they listened to a review of the evidence. After hearing the evidence, seven participants quantifiably changed their beliefs. In particular, two participants who initially believed that foot trimming with topical oxytetracycline was the better treatment, changed to entirely favour systemic and topical oxytetracycline instead. The results suggest that a substantial amount of the variation in beliefs related to differences in veterinary surgeons’ knowledge of the evidence. Although considerable differences in opinion still remained after the evidence review, with several participants having non-overlapping 95% credible intervals, both groups did achieve a consensus opinion. Two key findings from the qualitative data were: (i) veterinary surgeons believed that farmers are unlikely to actively seek advice on lameness, suggesting a proactive veterinary approach is required (ii) more attention could be given to improving the way in which veterinary advice is delivered to farmers. In summary this study has: (i) demonstrated a practical method for probabilistically quantifying how veterinary surgeons’ beliefs change (ii) revealed that the evidence that currently exists is capable of changing veterinary opinion (iii) suggested that improved transfer of research knowledge into veterinary practice is needed (iv) identified some potential obstacles to the implementation of veterinary advice by farmers. PMID:23696869
Vegetation fire proneness in Europe
NASA Astrophysics Data System (ADS)
Pereira, Mário; Aranha, José; Amraoui, Malik
2015-04-01
Fire selectivity has been studied for vegetation classes in terms of fire frequency and fire size in a few European regions. This analysis is often performed along with other landscape variables such as topography, distance to roads and towns. These studies aims to assess the landscape sensitivity to forest fires in peri-urban areas and land cover changes, to define landscape management guidelines and policies based on the relationships between landscape and fires in the Mediterranean region. Therefore, the objectives of this study includes the: (i) analysis of the spatial and temporal variability statistics within Europe; and, (ii) the identification and characterization of the vegetated land cover classes affected by fires; and, (iii) to propose a fire proneness index. The datasets used in the present study comprises: Corine Land Cover (CLC) maps for 2000 and 2006 (CLC2000, CLC2006) and burned area (BA) perimeters, from 2000 to 2013 in Europe, provided by the European Forest Fire Information System (EFFIS). The CLC is a part of the European Commission programme to COoRdinate INformation on the Environment (Corine) and it provides consistent, reliable and comparable information on land cover across Europe. Both the CLC and EFFIS datasets were combined using geostatistics and Geographical Information System (GIS) techniques to access the spatial and temporal evolution of the types of shrubs and forest affected by fires. Obtained results confirms the usefulness and efficiency of the land cover classification scheme and fire proneness index which allows to quantify and to compare the propensity of vegetation classes and countries to fire. As expected, differences between northern and southern Europe are notorious in what concern to land cover distribution, fire incidence and fire proneness of vegetation cover classes. This work was supported by national funds by FCT - Portuguese Foundation for Science and Technology, under the project PEst-OE/AGR/UI4033/2014 and by the project SUSTAINSYS: Environmental Sustainable Agro-Forestry Systems (NORTE-07-0124-FEDER-000044), financed by the North Portugal Regional Operational Programme (ON.2 - O Novo Norte), under the National Strategic Reference Framework (QREN), through the European Regional Development Fund (FEDER), as well as by National Funds (PIDDAC) through the Portuguese Foundation for Science and Technology (FCT/MEC).
Statistical modeling of isoform splicing dynamics from RNA-seq time series data.
Huang, Yuanhua; Sanguinetti, Guido
2016-10-01
Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here, we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the correlations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real datasets, our results show that DICEseq provides substantially more reproducible and robust quantifications, increasing the correlation of estimates from replicate datasets by up to 10% on genes with low or moderate expression levels (bottom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq experiments, and offer a novel tool for improved analysis of such datasets. Python code is freely available at http://diceseq.sf.net G.Sanguinetti@ed.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Rowson, Steven; Duma, Stefan M
2013-05-01
Recent research has suggested possible long term effects due to repetitive concussions, highlighting the importance of developing methods to accurately quantify concussion risk. This study introduces a new injury metric, the combined probability of concussion, which computes the overall risk of concussion based on the peak linear and rotational accelerations experienced by the head during impact. The combined probability of concussion is unique in that it determines the likelihood of sustaining a concussion for a given impact, regardless of whether the injury would be reported or not. The risk curve was derived from data collected from instrumented football players (63,011 impacts including 37 concussions), which was adjusted to account for the underreporting of concussion. The predictive capability of this new metric is compared to that of single biomechanical parameters. The capabilities of these parameters to accurately predict concussion incidence were evaluated using two separate datasets: the Head Impact Telemetry System (HITS) data and National Football League (NFL) data collected from impact reconstructions using dummies (58 impacts including 25 concussions). Receiver operating characteristic curves were generated, and all parameters were significantly better at predicting injury than random guessing. The combined probability of concussion had the greatest area under the curve for all datasets. In the HITS dataset, the combined probability of concussion and linear acceleration were significantly better predictors of concussion than rotational acceleration alone, but not different from each other. In the NFL dataset, there were no significant differences between parameters. The combined probability of concussion is a valuable method to assess concussion risk in a laboratory setting for evaluating product safety.
NASA Astrophysics Data System (ADS)
Schrön, Martin; Köhli, Markus; Scheiffele, Lena; Iwema, Joost; Bogena, Heye R.; Lv, Ling; Martini, Edoardo; Baroni, Gabriele; Rosolem, Rafael; Weimar, Jannis; Mai, Juliane; Cuntz, Matthias; Rebmann, Corinna; Oswald, Sascha E.; Dietrich, Peter; Schmidt, Ulrich; Zacharias, Steffen
2017-10-01
In the last few years the method of cosmic-ray neutron sensing (CRNS) has gained popularity among hydrologists, physicists, and land-surface modelers. The sensor provides continuous soil moisture data, averaged over several hectares and tens of decimeters in depth. However, the signal still may contain unidentified features of hydrological processes, and many calibration datasets are often required in order to find reliable relations between neutron intensity and water dynamics. Recent insights into environmental neutrons accurately described the spatial sensitivity of the sensor and thus allowed one to quantify the contribution of individual sample locations to the CRNS signal. Consequently, data points of calibration and validation datasets are suggested to be averaged using a more physically based weighting approach. In this work, a revised sensitivity function is used to calculate weighted averages of point data. The function is different from the simple exponential convention by the extraordinary sensitivity to the first few meters around the probe, and by dependencies on air pressure, air humidity, soil moisture, and vegetation. The approach is extensively tested at six distinct monitoring sites: two sites with multiple calibration datasets and four sites with continuous time series datasets. In all cases, the revised averaging method improved the performance of the CRNS products. The revised approach further helped to reveal hidden hydrological processes which otherwise remained unexplained in the data or were lost in the process of overcalibration. The presented weighting approach increases the overall accuracy of CRNS products and will have an impact on all their applications in agriculture, hydrology, and modeling.
Zheng, Yalin; Kwong, Man Ting; MacCormick, Ian J. C.; Beare, Nicholas A. V.; Harding, Simon P.
2014-01-01
Capillary non-perfusion (CNP) in the retina is a characteristic feature used in the management of a wide range of retinal diseases. There is no well-established computation tool for assessing the extent of CNP. We propose a novel texture segmentation framework to address this problem. This framework comprises three major steps: pre-processing, unsupervised total variation texture segmentation, and supervised segmentation. It employs a state-of-the-art multiphase total variation texture segmentation model which is enhanced by new kernel based region terms. The model can be applied to texture and intensity-based multiphase problems. A supervised segmentation step allows the framework to take expert knowledge into account, an AdaBoost classifier with weighted cost coefficient is chosen to tackle imbalanced data classification problems. To demonstrate its effectiveness, we applied this framework to 48 images from malarial retinopathy and 10 images from ischemic diabetic maculopathy. The performance of segmentation is satisfactory when compared to a reference standard of manual delineations: accuracy, sensitivity and specificity are 89.0%, 73.0%, and 90.8% respectively for the malarial retinopathy dataset and 80.8%, 70.6%, and 82.1% respectively for the diabetic maculopathy dataset. In terms of region-wise analysis, this method achieved an accuracy of 76.3% (45 out of 59 regions) for the malarial retinopathy dataset and 73.9% (17 out of 26 regions) for the diabetic maculopathy dataset. This comprehensive segmentation framework can quantify capillary non-perfusion in retinopathy from two distinct etiologies, and has the potential to be adopted for wider applications. PMID:24747681
Moore, Sean M.; Monaghan, Andrew; Griffith, Kevin S.; Apangu, Titus; Mead, Paul S.; Eisen, Rebecca J.
2012-01-01
Climate and weather influence the occurrence, distribution, and incidence of infectious diseases, particularly those caused by vector-borne or zoonotic pathogens. Thus, models based on meteorological data have helped predict when and where human cases are most likely to occur. Such knowledge aids in targeting limited prevention and control resources and may ultimately reduce the burden of diseases. Paradoxically, localities where such models could yield the greatest benefits, such as tropical regions where morbidity and mortality caused by vector-borne diseases is greatest, often lack high-quality in situ local meteorological data. Satellite- and model-based gridded climate datasets can be used to approximate local meteorological conditions in data-sparse regions, however their accuracy varies. Here we investigate how the selection of a particular dataset can influence the outcomes of disease forecasting models. Our model system focuses on plague (Yersinia pestis infection) in the West Nile region of Uganda. The majority of recent human cases have been reported from East Africa and Madagascar, where meteorological observations are sparse and topography yields complex weather patterns. Using an ensemble of meteorological datasets and model-averaging techniques we find that the number of suspected cases in the West Nile region was negatively associated with dry season rainfall (December-February) and positively with rainfall prior to the plague season. We demonstrate that ensembles of available meteorological datasets can be used to quantify climatic uncertainty and minimize its impacts on infectious disease models. These methods are particularly valuable in regions with sparse observational networks and high morbidity and mortality from vector-borne diseases. PMID:23024750
NASA Astrophysics Data System (ADS)
Cescatti, A.; Duveiller, G.; Hooker, J.
2017-12-01
Changing vegetation cover not only affects the atmospheric concentration of greenhouse gases but also alters the radiative and non-radiative properties of the surface. The result of competing biophysical processes on Earth's surface energy balance varies spatially and seasonally, and can lead to warming or cooling depending on the specific vegetation change and on the background climate. To date these effects are not accounted for in land-based climate policies because of the complexity of the phenomena, contrasting model predictions and the lack of global data-driven assessments. To overcome the limitations of available observation-based diagnostics and of the on-going model inter-comparison, here we present a new benchmarking dataset derived from satellite remote sensing. This global dataset provides the potential changes induced by multiple vegetation transitions on the single terms of the surface energy balance. We used this dataset for two major goals: 1) Quantify the impact of actual vegetation changes that occurred during the decade 2000-2010, showing the overwhelming role of tropical deforestation in warming the surface by reducing evapotranspiration despite the concurrent brightening of the Earth. 2) Benchmark a series of ESMs against data-driven metrics of the land cover change impacts on the various terms of the surface energy budget and on the surface temperature. We anticipate that the dataset could be also used to evaluate future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes.
Diverse manganese(II)-oxidizing bacteria are prevalent in drinking water systems.
Marcus, Daniel N; Pinto, Ameet; Anantharaman, Karthik; Ruberg, Steven A; Kramer, Eva L; Raskin, Lutgarde; Dick, Gregory J
2017-04-01
Manganese (Mn) oxides are highly reactive minerals that influence the speciation, mobility, bioavailability and toxicity of a wide variety of organic and inorganic compounds. Although Mn(II)-oxidizing bacteria are known to catalyze the formation of Mn oxides, little is known about the organisms responsible for Mn oxidation in situ, especially in engineered environments. Mn(II)-oxidizing bacteria are important in drinking water systems, including in biofiltration and water distribution systems. Here, we used cultivation dependent and independent approaches to investigate Mn(II)-oxidizing bacteria in drinking water sources, a treatment plant and associated distribution system. We isolated 29 strains of Mn(II)-oxidizing bacteria and found that highly similar 16S rRNA gene sequences were present in all culture-independent datasets and dominant in the studied drinking water treatment plant. These results highlight a potentially important role for Mn(II)-oxidizing bacteria in drinking water systems, where biogenic Mn oxides may affect water quality in terms of aesthetic appearance, speciation of metals and oxidation of organic and inorganic compounds. Deciphering the ecology of these organisms and the factors that regulate their Mn(II)-oxidizing activity could yield important insights into how microbial communities influence the quality of drinking water. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks
Yamanaka, Ryota; Kitano, Hiroaki
2013-01-01
Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007
Data-driven probability concentration and sampling on manifold
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2016-09-15
A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation methodmore » for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems.« less
Howard, B J; Wells, C; Barnett, C L; Howard, D C
2017-02-01
Under the International Atomic Energy Agency (IAEA) MODARIA (Modelling and Data for Radiological Impact Assessments) Programme, there has been an initiative to improve the derivation, provenance and transparency of transfer parameter values for radionuclides from feed to animal products that are for human consumption. A description of the revised MODARIA 2016 cow milk dataset is described in this paper. As previously reported for the MODARIA goat milk dataset, quality control has led to the discounting of some references used in IAEA's Technical Report Series (TRS) report 472 (IAEA, 2010). The number of Concentration Ratio (CR) values has been considerably increased by (i) the inclusion of more literature from agricultural studies which particularly enhanced the stable isotope data of both CR and F m and (ii) by estimating dry matter intake from assumed liveweight. In TRS 472, the data for cow milk were 714 transfer coefficient (F m ) values and 254 CR values describing 31 elements and 26 elements respectively. In the MODARIA 2016 cow milk dataset, F m and CR values are now reported for 43 elements based upon 825 data values for F m and 824 for CR. The MODARIA 2016 cow milk dataset F m values are within an order of magnitude of those reported in TRS 472. Slightly bigger changes are seen in the CR values, but the increase in size of the dataset creates greater confidence in them. Data gaps that still remain are identified for elements with isotopes relevant to radiation protection. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Global Aerosol Direct Radiative Effect From CALIOP and C3M
NASA Technical Reports Server (NTRS)
Winker, Dave; Kato, Seiji; Tackett, Jason
2015-01-01
Aerosols are responsible for the largest uncertainties in current estimates of climate forcing. These uncertainties are due in part to the limited abilities of passive sensors to retrieve aerosols in cloudy skies. We use a dataset which merges CALIOP observations together with other A-train observations to estimate aerosol radiative effects in cloudy skies as well as in cloud-free skies. The results can be used to quantify the reduction of aerosol radiative effects in cloudy skies relative to clear skies and to reduce current uncertainties in aerosol radiative effects.
CACODYLIC ACID (DMAV): METABOLISM AND ...
The cacodylic acid (DMAV) issue paper discusses the metabolism and pharmacokinetics of the various arsenical chemicals; evaluates the appropriate dataset to quantify the potential cancer risk to the organic arsenical herbicides; provides an evaluation of the mode of carcinogenic action (MOA) for DMAV including a consideration of the key events for bladder tumor formation in rats, other potential modes of action; and also considers the human relevance of the proposed animal MOA. As part of tolerance reassessment under the Food Quality Protection Act for the August 3, 2006 deadline, the hazard of cacodylic acid is being reassessed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crowell, Kevin L.; Slysz, Gordon W.; Baker, Erin Shammel
2013-09-05
We introduce a command line software application LC-IMS-MS Feature Finder that searches for molecular ion signatures in multidimensional liquid chromatography-ion mobility spectrometry-mass spectrometry (LC-IMS-MS) data by clustering deisotoped peaks with similar monoisotopic mass, charge state, LC elution time, and ion mobility drift time values. The software application includes an algorithm for detecting and quantifying co-eluting chemical species, including species that exist in multiple conformations that may have been separated in the IMS dimension.
Understanding Systematics in ZZ Ceti Model Fitting to Enable Differential Seismology
NASA Astrophysics Data System (ADS)
Fuchs, J. T.; Dunlap, B. H.; Clemens, J. C.; Meza, J. A.; Dennihy, E.; Koester, D.
2017-03-01
We are conducting a large spectroscopic survey of over 130 Southern ZZ Cetis with the Goodman Spectrograph on the SOAR Telescope. Because it employs a single instrument with high UV throughput, this survey will both improve the signal-to-noise of the sample of SDSS ZZ Cetis and provide a uniform dataset for model comparison. We are paying special attention to systematics in the spectral fitting and quantify three of those systematics here. We show that relative positions in the log g -Teff plane are consistent for these three systematics.
Rumsey and Walker_AMT_2016_Table 2.xlsx
Table summarizes instrument precision assessed by collocating the two sample boxes. Precision is quantified as the standard deviation of the residuals of an orthogonal least squares regression of concentrations from the two sample boxes. This allows for an estimation of gradient precision and ultimately gradient and flux detection limits. This dataset is associated with the following publication:Rumsey, I. Application of an online ion chromatography-based instrument for gradient flux measurements of speciated nitrogen and sulfur. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 9(6): 2581-2592, (2016).
Global Aerosol Direct Radiative Effect from CALIOP and C3M
NASA Technical Reports Server (NTRS)
Winker, Dave; Kato, Seiji; Tackett, Jason
2015-01-01
Aerosols are responsible for the largest uncertainties in current estimates of climate forcing. These uncertainties are due in part to the limited abilities of passive sensors to retrieve aerosols in cloudy skies. We use a dataset which merges CALIOP observations together with other A-train observations to estimate aerosol radiative effects in cloudy skies as well as in cloud-free skies. The results can be used to quantify the reduction of aerosol radiative effects in cloudy skies relative to clear skies and to reduce current uncertainties in aerosol radiative effects.
Federal lands highway phase II benchmarking study
DOT National Transportation Integrated Search
2000-11-01
In order to determine the most effective use of existing and future staff, to quantify the appropriate number of engineers and technicians required to deliver the Federal Lands Highway Program and to identify recommended management practices in proje...
Missing value imputation for microarray data: a comprehensive comparison study and a web tool.
Chiu, Chia-Chun; Chan, Shih-Yao; Wang, Chung-Ching; Wu, Wei-Sheng
2013-01-01
Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.
Kent, Peter; Jensen, Rikke K; Kongsted, Alice
2014-10-02
There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups. Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.
Fully Convolutional Networks for Ground Classification from LIDAR Point Clouds
NASA Astrophysics Data System (ADS)
Rizaldy, A.; Persello, C.; Gevaert, C. M.; Oude Elberink, S. J.
2018-05-01
Deep Learning has been massively used for image classification in recent years. The use of deep learning for ground classification from LIDAR point clouds has also been recently studied. However, point clouds need to be converted into an image in order to use Convolutional Neural Networks (CNNs). In state-of-the-art techniques, this conversion is slow because each point is converted into a separate image. This approach leads to highly redundant computation during conversion and classification. The goal of this study is to design a more efficient data conversion and ground classification. This goal is achieved by first converting the whole point cloud into a single image. The classification is then performed by a Fully Convolutional Network (FCN), a modified version of CNN designed for pixel-wise image classification. The proposed method is significantly faster than state-of-the-art techniques. On the ISPRS Filter Test dataset, it is 78 times faster for conversion and 16 times faster for classification. Our experimental analysis on the same dataset shows that the proposed method results in 5.22 % of total error, 4.10 % of type I error, and 15.07 % of type II error. Compared to the previous CNN-based technique and LAStools software, the proposed method reduces the total error and type I error (while type II error is slightly higher). The method was also tested on a very high point density LIDAR point clouds resulting in 4.02 % of total error, 2.15 % of type I error and 6.14 % of type II error.
Sources of inorganic and monomethyl mercury to high and sub Arctic marine ecosystems
NASA Astrophysics Data System (ADS)
Kirk, Jane Liza
Monomethyl mercury (MMHg), a toxic and bioaccumulative form of Hg, is present in some Canadian high and sub Arctic marine mammals at concentrations high enough to pose health risks to Northern peoples using these animals as food. To quantify potentially large sources of Hg to Arctic marine ecosystems, we examined several aspects of Hg cycling in the Canadian Arctic Archipelago (CAA) and Hudson Bay. Firstly, we quantified net Hg inputs to Hudson Bay from atmospheric Hg depletion events (AMDEs). During AMDEs, gaseous elemental Hg(0) (GEM), which is present in the Arctic atmosphere at global background concentrations, is oxidized to inorganic Hg(II) species that deposit to snowpacks. By simultaneously monitoring Hg in the atmosphere and in snowpacks of western Hudson Bay, we demonstrated that most of the Hg(II) deposited during AMDEs is rapidly (photo)reduced and emitted to the atmosphere. Secondly, we examined Hg speciation in marine waters of the CAA and Hudson Bay. We found high concentrations of MMHg and dimethyl Hg (DMHg; a toxic, gaseous form of Hg) in deep marine waters, where they are likely produced from Hg(II). Arctic marine waters were also found to be a substantial source of DMHg and GEM to the atmosphere. Thirdly, we quantified Hg exports to Hudson Bay from two major rivers, the Nelson and the Churchill, which have been altered for hydroelectric power production. When landscapes are inundated during river diversion or reservoir creation, microbial production of MMHg is stimulated in flooded soils. Newly produced MMHg can then be exported to downstream waterbodies. We found that annual inputs of total Hg (THg; includes both Hg(II) and MMHg) to Hudson Bay from combined Nelson and Churchill River discharge were comparable to inputs from AMDEs. MMHg inputs from river discharge are, however, ˜13 times greater than those from annual snowmelt of Hudson Bay snowpacks. Finally, although combined river and AMDE Hg inputs may account for a large portion of the THg pool in Hudson Bay, these inputs account for a lesser portion of the MMHg pool, thus highlighting the importance of water column Hg(II) methylation as a source of MMHg to Arctic marine foodwebs.
Transitional fossils and the origin of turtles
Lyson, Tyler R.; Bever, Gabe S.; Bhullar, Bhart-Anjan S.; Joyce, Walter G.; Gauthier, Jacques A.
2010-01-01
The origin of turtles is one of the most contentious issues in systematics with three currently viable hypotheses: turtles as the extant sister to (i) the crocodile–bird clade, (ii) the lizard–tuatara clade, or (iii) Diapsida (a clade composed of (i) and (ii)). We reanalysed a recent dataset that allied turtles with the lizard–tuatara clade and found that the inclusion of the stem turtle Proganochelys quenstedti and the ‘parareptile’ Eunotosaurus africanus results in a single overriding morphological signal, with turtles outside Diapsida. This result reflects the importance of transitional fossils when long branches separate crown clades, and highlights unexplored issues such as the role of topological congruence when using fossils to calibrate molecular clocks. PMID:20534602
Crow, Megan; Paul, Anirban; Ballouz, Sara; Huang, Z Josh; Gillis, Jesse
2018-02-28
Single-cell RNA-sequencing (scRNA-seq) technology provides a new avenue to discover and characterize cell types; however, the experiment-specific technical biases and analytic variability inherent to current pipelines may undermine its replicability. Meta-analysis is further hampered by the use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that quantifies the degree to which cell types replicate across datasets, and enables rapid identification of clusters with high similarity. We first measure the replicability of neuronal identity, comparing results across eight technically and biologically diverse datasets to define best practices for more complex assessments. We then apply this to novel interneuron subtypes, finding that 24/45 subtypes have evidence of replication, which enables the identification of robust candidate marker genes. Across tasks we find that large sets of variably expressed genes can identify replicable cell types with high accuracy, suggesting a general route forward for large-scale evaluation of scRNA-seq data.
Shingrani, Rahul; Krenz, Gary; Molthen, Robert
2010-01-01
With advances in medical imaging scanners, it has become commonplace to generate large multidimensional datasets. These datasets require tools for a rapid, thorough analysis. To address this need, we have developed an automated algorithm for morphometric analysis incorporating A Visualization Workshop computational and image processing libraries for three-dimensional segmentation, vascular tree generation and structural hierarchical ordering with a two-stage numeric optimization procedure for estimating vessel diameters. We combine this new technique with our mathematical models of pulmonary vascular morphology to quantify structural and functional attributes of lung arterial trees. Our physiological studies require repeated measurements of vascular structure to determine differences in vessel biomechanical properties between animal models of pulmonary disease. Automation provides many advantages including significantly improved speed and minimized operator interaction and biasing. The results are validated by comparison with previously published rat pulmonary arterial micro-CT data analysis techniques, in which vessels were manually mapped and measured using intense operator intervention. Published by Elsevier Ireland Ltd.
Woo, Jongmin; Han, Dohyun; Park, Joonho; Kim, Sang Jeong; Kim, Youngsoo
2015-11-01
Microglia, astrocytes, and neurons, which have important functions in the central nervous system (CNS), communicate mutually to generate a signal through secreted proteins or small molecules, but many of which have not been identified. Because establishing a reference for the secreted proteins from CNS cells could be invaluable in examining cell-to-cell communication in the brain, we analyzed the secretome of three murine CNS cell lines without prefractionation by high-resolution mass spectrometry. In this study, 2795 proteins were identified from conditioned media of the three cell lines, and 2125 proteins were annotated as secreted proteins by bioinformatics analysis. Further, approximately 500 secreted proteins were quantifiable as differentially expressed proteins by label-free quantitation. As a result, our secretome references are useful datasets for the future study of neuronal diseases. All MS data have been deposited in the ProteomeXchange with identifier PXD001597 (http://proteomecentral.proteomexchange.org/dataset/PXD001597). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Everywhere and nowhere: snow and its linkages
NASA Astrophysics Data System (ADS)
Hiemstra, C. A.
2017-12-01
Interest has grown in quantifying higher latitude precipitation change and snow-related ecosystem and economic impacts. There is a high demand for creating and using snow-related datasets, yet available datasets contain limitations, aren't scale appropriate, or lack thorough validation. Much of the uncertainty in snow estimates relates to ongoing snow measurement problems that are chronic and pervasive in windy, Arctic environments. This, coupled with diminishing support for long-term snow field observations, creates formidable hydrologic gaps in snow dominated landscapes. Snow touches most aspects of high latitude landscapes and spans albedo, ecosystems, soils, permafrost, and sea ice. In turn, snow can be impacted by disturbances, landscape change, ecosystem, structure, and later arrival of sea or lake ice. Snow, and its changes touch infrastructure, housing, and transportation. Advances in snow measurements, modeling, and data assimilation are under way, but more attention and a concerted effort is needed in a time of dwindling resources to make required advances during a time of rapid change.
Automated Spatiotemporal Analysis of Fibrils and Coronal Rain Using the Rolling Hough Transform
NASA Astrophysics Data System (ADS)
Schad, Thomas
2017-09-01
A technique is presented that automates the direction characterization of curvilinear features in multidimensional solar imaging datasets. It is an extension of the Rolling Hough Transform (RHT) technique presented by Clark, Peek, and Putman ( Astrophys. J. 789, 82, 2014), and it excels at rapid quantification of spatial and spatiotemporal feature orientation even for applications with a low signal-to-noise ratio. It operates on a pixel-by-pixel basis within a dataset and reliably quantifies orientation even for locations not centered on a feature ridge, which is used here to derive a quasi-continuous map of the chromospheric fine-structure projection angle. For time-series analysis, a procedure is developed that uses a hierarchical application of the RHT to automatically derive the apparent motion of coronal rain observed off-limb. Essential to the success of this technique is the formulation presented in this article for the RHT error analysis as it provides a means to properly filter results.
Global climate shocks to agriculture from 1950 - 2015
NASA Astrophysics Data System (ADS)
Jackson, N. D.; Konar, M.; Debaere, P.; Sheffield, J.
2016-12-01
Climate shocks represent a major disruption to crop yields and agricultural production, yet a consistent and comprehensive database of agriculturally relevant climate shocks does not exist. To this end, we conduct a spatially and temporally disaggregated analysis of climate shocks to agriculture from 1950-2015 using a new gridded dataset. We quantify the occurrence and magnitude of climate shocks for all global agricultural areas during the growing season using a 0.25-degree spatial grid and daily time scale. We include all major crops and both temperature and precipitation extremes in our analysis. Critically, we evaluate climate shocks to all potential agricultural areas to improve projections within our time series. To do this, we use Global Agro-Ecological Zones maps from the Food and Agricultural Organization, the Princeton Global Meteorological Forcing dataset, and crop calendars from Sacks et al. (2010). We trace the dynamic evolution of climate shocks to agriculture, evaluate the spatial heterogeneity in agriculturally relevant climate shocks, and identify the crops and regions that are most prone to climate shocks.
CImbinator: a web-based tool for drug synergy analysis in small- and large-scale datasets.
Flobak, Åsmund; Vazquez, Miguel; Lægreid, Astrid; Valencia, Alfonso
2017-08-01
Drug synergies are sought to identify combinations of drugs particularly beneficial. User-friendly software solutions that can assist analysis of large-scale datasets are required. CImbinator is a web-service that can aid in batch-wise and in-depth analyzes of data from small-scale and large-scale drug combination screens. CImbinator offers to quantify drug combination effects, using both the commonly employed median effect equation, as well as advanced experimental mathematical models describing dose response relationships. CImbinator is written in Ruby and R. It uses the R package drc for advanced drug response modeling. CImbinator is available at http://cimbinator.bioinfo.cnio.es , the source-code is open and available at https://github.com/Rbbt-Workflows/combination_index . A Docker image is also available at https://hub.docker.com/r/mikisvaz/rbbt-ci_mbinator/ . asmund.flobak@ntnu.no or miguel.vazquez@cnio.es. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
U.S. Maternally Linked Birth Records May Be Biased for Hispanics and Other Population Groups
LEISS, JACK K.; GILES, DENISE; SULLIVAN, KRISTIN M.; MATHEWS, RAHEL; SENTELLE, GLENDA; TOMASHEK, KAY M.
2010-01-01
Purpose To advance understanding of linkage error in U.S. maternally linked datasets, and how the error may affect results of studies based on the linked data. Methods North Carolina birth and fetal death records for 1988-1997 were maternally linked (n=1,030,029). The maternal set probability, defined as the probability that all records assigned to the same maternal set do in fact represent events to the same woman, was used to assess differential maternal linkage error across race/ethnic groups. Results Maternal set probabilities were lower for records specifying Asian or Hispanic race/ethnicity, suggesting greater maternal linkage error. The lower probabilities for Hispanics were concentrated in women of Mexican origin who were not born in the United States. Conclusions Differential maternal linkage error may be a source of bias in studies using U.S. maternally linked datasets to make comparisons between Hispanics and other groups or among Hispanic subgroups. Methods to quantify and adjust for this potential bias are needed. PMID:20006273
Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G
2014-09-30
This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.
Status and interconnections of selected environmental issues in the global coastal zones
Shi, Hua; Singh, Ashbindu
2003-01-01
This study focuses on assessing the state of population distribution, land cover distribution, biodiversity hotspots, and protected areas in global coastal zones. The coastal zone is defined as land within 100 km of the coastline. This study attempts to answer such questions as: how crowded are the coastal zones, what is the pattern of land cover distribution in these areas, how much of these areas are designated as protected areas, what is the state of the biodiversity hotspots, and what are the interconnections between people and coastal environment. This study uses globally consistent and comprehensive geospatial datasets based on remote sensing and other sources. The application of Geographic Information System (GIS) layering methods and consistent datasets has made it possible to identify and quantify selected coastal zones environmental issues and their interconnections. It is expected that such information provide a scientific basis for global coastal zones management and assist in policy formulations at the national and international levels.
Combined Effects of High-Speed Railway Noise and Ground Vibrations on Annoyance.
Yokoshima, Shigenori; Morihara, Takashi; Sato, Tetsumi; Yano, Takashi
2017-07-27
The Shinkansen super-express railway system in Japan has greatly increased its capacity and has expanded nationwide. However, many inhabitants in areas along the railways have been disturbed by noise and ground vibration from the trains. Additionally, the Shinkansen railway emits a higher level of ground vibration than conventional railways at the same noise level. These findings imply that building vibrations affect living environments as significantly as the associated noise. Therefore, it is imperative to quantify the effects of noise and vibration exposures on each annoyance under simultaneous exposure. We performed a secondary analysis using individual datasets of exposure and community response associated with Shinkansen railway noise and vibration. The data consisted of six socio-acoustic surveys, which were conducted separately over the last 20 years in Japan. Applying a logistic regression analysis to the datasets, we confirmed the combined effects of vibration/noise exposure on noise/vibration annoyance. Moreover, we proposed a representative relationship between noise and vibration exposures, and the prevalence of each annoyance associated with the Shinkansen railway.
Recent development in preparation of European soil hydraulic maps
NASA Astrophysics Data System (ADS)
Toth, B.; Weynants, M.; Pasztor, L.; Hengl, T.
2017-12-01
Reliable quantitative information on soil hydraulic properties is crucial for modelling hydrological, meteorological, ecological and biological processes of the Critical Zone. Most of the Earth system models need information on soil moisture retention capacity and hydraulic conductivity in the full matric potential range. These soil hydraulic properties can be quantified, but their measurement is expensive and time consuming, therefore measurement-based catchment scale mapping of these soil properties is not possible. The increasing availability of soil information and methods describing relationships between simple soil characteristics and soil hydraulic properties provide the possibility to derive soil hydraulic maps based on spatial soil datasets and pedotransfer functions (PTFs). Over the last decade there has been a significant development in preparation of soil hydraulic maps. Spatial datasets on model parameters describing the soil hydraulic processes have become available for countries, continents and even for the whole globe. Our aim is to present European soil hydraulic maps, show their performance, highlight their advantages and drawbacks, and propose possible ways to further improve the performance of those.
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
Mozetič, Igor; Grčar, Miha; Smailović, Jasmina
2016-01-01
What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered. PMID:27149621
Use of multidimensional, multimodal imaging and PACS to support neurological diagnoses
NASA Astrophysics Data System (ADS)
Wong, Stephen T. C.; Knowlton, Robert C.; Hoo, Kent S.; Huang, H. K.
1995-05-01
Technological advances in brain imaging have revolutionized diagnosis in neurology and neurological surgery. Major imaging techniques include magnetic resonance imaging (MRI) to visualize structural anatomy, positron emission tomography (PET) to image metabolic function and cerebral blood flow, magnetoencephalography (MEG) to visualize the location of physiologic current sources, and magnetic resonance spectroscopy (MRS) to measure specific biochemicals. Each of these techniques studies different biomedical aspects of the brain, but there lacks an effective means to quantify and correlate the disparate imaging datasets in order to improve clinical decision making processes. This paper describes several techniques developed in a UNIX-based neurodiagnostic workstation to aid the noninvasive presurgical evaluation of epilepsy patients. These techniques include online access to the picture archiving and communication systems (PACS) multimedia archive, coregistration of multimodality image datasets, and correlation and quantitation of structural and functional information contained in the registered images. For illustration, we describe the use of these techniques in a patient case of nonlesional neocortical epilepsy. We also present out future work based on preliminary studies.
Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca
2008-10-29
This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.
Spectrum simulation in DTSA-II.
Ritchie, Nicholas W M
2009-10-01
Spectrum simulation is a useful practical and pedagogical tool. Particularly with complex samples or trace constituents, a simulation can help to understand the limits of the technique and the instrument parameters for the optimal measurement. DTSA-II, software for electron probe microanalysis, provides both easy to use and flexible tools for simulating common and less common sample geometries and materials. Analytical models based on (rhoz) curves provide quick simulations of simple samples. Monte Carlo models based on electron and X-ray transport provide more sophisticated models of arbitrarily complex samples. DTSA-II provides a broad range of simulation tools in a framework with many different interchangeable physical models. In addition, DTSA-II provides tools for visualizing, comparing, manipulating, and quantifying simulated and measured spectra.
Therapeutic Evaluation of Mesenchymal Stem Cells in Chronic Gut Inflammation
2015-09-01
activate mouse splenocytes obtained from OT2 transgenic (tg) mice with ovalbumin peptide ( OVA ) and quantify T cell proliferation in vitro. The T...cell receptors (TCR) on CD4+ T cells in OT2 tg mice recognize only OVA presented by the major histocompatibility complex II (MHC II) expressed on...mouse OT2 splenocytes with OVA in the presence of increasing numbers of un-manipulated or irradiated hMSCs, we observe little or no suppression of T
Network analysis reveals multiscale controls on streamwater chemistry
McGuire, Kevin J.; Torgersen, Christian E.; Likens, Gene E.; Buso, Donald C.; Lowe, Winsor H.; Bailey, Scott W.
2014-01-01
By coupling synoptic data from a basin-wide assessment of streamwater chemistry with network-based geostatistical analysis, we show that spatial processes differentially affect biogeochemical condition and pattern across a headwater stream network. We analyzed a high-resolution dataset consisting of 664 water samples collected every 100 m throughout 32 tributaries in an entire fifth-order stream network. These samples were analyzed for an exhaustive suite of chemical constituents. The fine grain and broad extent of this study design allowed us to quantify spatial patterns over a range of scales by using empirical semivariograms that explicitly incorporated network topology. Here, we show that spatial structure, as determined by the characteristic shape of the semivariograms, differed both among chemical constituents and by spatial relationship (flow-connected, flow-unconnected, or Euclidean). Spatial structure was apparent at either a single scale or at multiple nested scales, suggesting separate processes operating simultaneously within the stream network and surrounding terrestrial landscape. Expected patterns of spatial dependence for flow-connected relationships (e.g., increasing homogeneity with downstream distance) occurred for some chemical constituents (e.g., dissolved organic carbon, sulfate, and aluminum) but not for others (e.g., nitrate, sodium). By comparing semivariograms for the different chemical constituents and spatial relationships, we were able to separate effects on streamwater chemistry of (i) fine-scale versus broad-scale processes and (ii) in-stream processes versus landscape controls. These findings provide insight on the hierarchical scaling of local, longitudinal, and landscape processes that drive biogeochemical patterns in stream networks.
Network analysis reveals multiscale controls on streamwater chemistry
McGuire, Kevin J.; Torgersen, Christian E.; Likens, Gene E.; Buso, Donald C.; Lowe, Winsor H.; Bailey, Scott W.
2014-01-01
By coupling synoptic data from a basin-wide assessment of streamwater chemistry with network-based geostatistical analysis, we show that spatial processes differentially affect biogeochemical condition and pattern across a headwater stream network. We analyzed a high-resolution dataset consisting of 664 water samples collected every 100 m throughout 32 tributaries in an entire fifth-order stream network. These samples were analyzed for an exhaustive suite of chemical constituents. The fine grain and broad extent of this study design allowed us to quantify spatial patterns over a range of scales by using empirical semivariograms that explicitly incorporated network topology. Here, we show that spatial structure, as determined by the characteristic shape of the semivariograms, differed both among chemical constituents and by spatial relationship (flow-connected, flow-unconnected, or Euclidean). Spatial structure was apparent at either a single scale or at multiple nested scales, suggesting separate processes operating simultaneously within the stream network and surrounding terrestrial landscape. Expected patterns of spatial dependence for flow-connected relationships (e.g., increasing homogeneity with downstream distance) occurred for some chemical constituents (e.g., dissolved organic carbon, sulfate, and aluminum) but not for others (e.g., nitrate, sodium). By comparing semivariograms for the different chemical constituents and spatial relationships, we were able to separate effects on streamwater chemistry of (i) fine-scale versus broad-scale processes and (ii) in-stream processes versus landscape controls. These findings provide insight on the hierarchical scaling of local, longitudinal, and landscape processes that drive biogeochemical patterns in stream networks. PMID:24753575
Network analysis reveals multiscale controls on streamwater chemistry.
McGuire, Kevin J; Torgersen, Christian E; Likens, Gene E; Buso, Donald C; Lowe, Winsor H; Bailey, Scott W
2014-05-13
By coupling synoptic data from a basin-wide assessment of streamwater chemistry with network-based geostatistical analysis, we show that spatial processes differentially affect biogeochemical condition and pattern across a headwater stream network. We analyzed a high-resolution dataset consisting of 664 water samples collected every 100 m throughout 32 tributaries in an entire fifth-order stream network. These samples were analyzed for an exhaustive suite of chemical constituents. The fine grain and broad extent of this study design allowed us to quantify spatial patterns over a range of scales by using empirical semivariograms that explicitly incorporated network topology. Here, we show that spatial structure, as determined by the characteristic shape of the semivariograms, differed both among chemical constituents and by spatial relationship (flow-connected, flow-unconnected, or Euclidean). Spatial structure was apparent at either a single scale or at multiple nested scales, suggesting separate processes operating simultaneously within the stream network and surrounding terrestrial landscape. Expected patterns of spatial dependence for flow-connected relationships (e.g., increasing homogeneity with downstream distance) occurred for some chemical constituents (e.g., dissolved organic carbon, sulfate, and aluminum) but not for others (e.g., nitrate, sodium). By comparing semivariograms for the different chemical constituents and spatial relationships, we were able to separate effects on streamwater chemistry of (i) fine-scale versus broad-scale processes and (ii) in-stream processes versus landscape controls. These findings provide insight on the hierarchical scaling of local, longitudinal, and landscape processes that drive biogeochemical patterns in stream networks.
Meteorological Drivers of Extreme Air Pollution Events
NASA Astrophysics Data System (ADS)
Horton, D. E.; Schnell, J.; Callahan, C. W.; Suo, Y.
2017-12-01
The accumulation of pollutants in the near-surface atmosphere has been shown to have deleterious consequences for public health, agricultural productivity, and economic vitality. Natural and anthropogenic emissions of ozone and particulate matter can accumulate to hazardous concentrations when atmospheric conditions are favorable, and can reach extreme levels when such conditions persist. Favorable atmospheric conditions for pollutant accumulation include optimal temperatures for photochemical reaction rates, circulation patterns conducive to pollutant advection, and a lack of ventilation, dispersion, and scavenging in the local environment. Given our changing climate system and the dual ingredients of poor air quality - pollutants and the atmospheric conditions favorable to their accumulation - it is important to characterize recent changes in favorable meteorological conditions, and quantify their potential contribution to recent extreme air pollution events. To facilitate our characterization, this study employs the recently updated Schnell et al (2015) 1°×1° gridded observed surface ozone and particulate matter datasets for the period of 1998 to 2015, in conjunction with reanalysis and climate model simulation data. We identify extreme air pollution episodes in the observational record and assess the meteorological factors of primary support at local and synoptic scales. We then assess (i) the contribution of observed meteorological trends (if extant) to the magnitude of the event, (ii) the return interval of the meteorological event in the observational record, simulated historical climate, and simulated pre-industrial climate, as well as (iii) the probability of the observed meteorological trend in historical and pre-industrial climates.
On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment
Alonso-Mora, Javier; Samaranayake, Samitha; Wallar, Alex; Frazzoli, Emilio; Rus, Daniela
2017-01-01
Ride-sharing services are transforming urban mobility by providing timely and convenient transportation to anybody, anywhere, and anytime. These services present enormous potential for positive societal impacts with respect to pollution, energy consumption, congestion, etc. Current mathematical models, however, do not fully address the potential of ride-sharing. Recently, a large-scale study highlighted some of the benefits of car pooling but was limited to static routes with two riders per vehicle (optimally) or three (with heuristics). We present a more general mathematical model for real-time high-capacity ride-sharing that (i) scales to large numbers of passengers and trips and (ii) dynamically generates optimal routes with respect to online demand and vehicle locations. The algorithm starts from a greedy assignment and improves it through a constrained optimization, quickly returning solutions of good quality and converging to the optimal assignment over time. We quantify experimentally the tradeoff between fleet size, capacity, waiting time, travel delay, and operational costs for low- to medium-capacity vehicles, such as taxis and van shuttles. The algorithm is validated with ∼3 million rides extracted from the New York City taxicab public dataset. Our experimental study considers ride-sharing with rider capacity of up to 10 simultaneous passengers per vehicle. The algorithm applies to fleets of autonomous vehicles and also incorporates rebalancing of idling vehicles to areas of high demand. This framework is general and can be used for many real-time multivehicle, multitask assignment problems. PMID:28049820
On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment.
Alonso-Mora, Javier; Samaranayake, Samitha; Wallar, Alex; Frazzoli, Emilio; Rus, Daniela
2017-01-17
Ride-sharing services are transforming urban mobility by providing timely and convenient transportation to anybody, anywhere, and anytime. These services present enormous potential for positive societal impacts with respect to pollution, energy consumption, congestion, etc. Current mathematical models, however, do not fully address the potential of ride-sharing. Recently, a large-scale study highlighted some of the benefits of car pooling but was limited to static routes with two riders per vehicle (optimally) or three (with heuristics). We present a more general mathematical model for real-time high-capacity ride-sharing that (i) scales to large numbers of passengers and trips and (ii) dynamically generates optimal routes with respect to online demand and vehicle locations. The algorithm starts from a greedy assignment and improves it through a constrained optimization, quickly returning solutions of good quality and converging to the optimal assignment over time. We quantify experimentally the tradeoff between fleet size, capacity, waiting time, travel delay, and operational costs for low- to medium-capacity vehicles, such as taxis and van shuttles. The algorithm is validated with ∼3 million rides extracted from the New York City taxicab public dataset. Our experimental study considers ride-sharing with rider capacity of up to 10 simultaneous passengers per vehicle. The algorithm applies to fleets of autonomous vehicles and also incorporates rebalancing of idling vehicles to areas of high demand. This framework is general and can be used for many real-time multivehicle, multitask assignment problems.
Risk aversion, time preference and health production: theory and empirical evidence from Cambodia.
Rieger, Matthias
2015-04-01
This paper quantifies the relationship between risk aversion and discount rates on the one hand and height and weight on the other. It studies this link in the context of poor households in Cambodia. Evidence is based on an original dataset that contains both experimental measures of risk taking and impatience along with anthropometric measurements of children and adults. The aim of the paper is to (i) explore the importance of risk and time preferences in explaining undernutrition and (ii) compare the evidence stemming from poor households to strikingly similar findings from industrialized countries. It uses an inter-generational approach to explain observed correlations in adults and children that is inspired by the height premium on labor markets. Parents can invest in the health capital of their child to increase future earnings and their consumption when old: better nutrition during infancy translates into better human capital and better wages, and ultimately better financial means to take care of elderly parents. However this investment is subject to considerable uncertainty, since parents neither perfectly foresee economic conditions when the child starts earning nor fully observe the ability to transform nutritional investments into long-term health capital. As a result, risk taking households have taller and heavier children. Conversely, impatience does not affect child health. In the case of adults, only weight and the body mass index (BMI), but not height, are positively and moderately correlated with risk taking and impatience. Copyright © 2014 Elsevier B.V. All rights reserved.
2012-01-01
Purpose A key challenge for providers and commissioners of rehabilitation services is to find optimal balance between service costs and outcomes. This article presents a “real-lifeâ application of the UK Rehabilitation Outcomes Collaborative (UKROC) dataset. We undertook a comparative cohort analysis of case-episode data (n = 173) from two specialist neurological rehabilitation units (A and B), to compare the cost-efficiency of two service models. Key messages (i) Demographics, casemix and levels of functional dependency on admission and discharge were broadly similar for the two units. (ii) The mean length of stay for Unit A was 1.5 times longer than Unit B, which had 85% higher levels of therapy staffing in relation to occupied bed days so despite higher bed-day costs, Unit B was 20% more cost-efficient overall, for similar gain. (iii) Following analysis, engagement with service commissioners led to successful negotiation of a business plan for service reconfiguration with increased staffing levels for Unit A and further development of local community rehabilitation services. Conclusion (i) Lower front-end service costs do not always signify optimal cost-efficiency. (ii) Analysis of routinely collected clinical data can be used to engage commissioners and to make the case for resources to maximise efficiency and improve patient care. PMID:22506504
Turner-Stokes, Lynne; Poppleton, Rob; Williams, Heather; Schoewenaars, Katie; Badwan, Derar
2012-01-01
A key challenge for providers and commissioners of rehabilitation services is to find optimal balance between service costs and outcomes. This article presents a "real-life" application of the UK Rehabilitation Outcomes Collaborative (UKROC) dataset. We undertook a comparative cohort analysis of case-episode data (n = 173) from two specialist neurological rehabilitation units (A and B), to compare the cost-efficiency of two service models. (i) Demographics, casemix and levels of functional dependency on admission and discharge were broadly similar for the two units. (ii) The mean length of stay for Unit A was 1.5 times longer than Unit B, which had 85% higher levels of therapy staffing in relation to occupied bed days so despite higher bed-day costs, Unit B was 20% more cost-efficient overall, for similar gain. (iii) Following analysis, engagement with service commissioners led to successful negotiation of a business plan for service reconfiguration with increased staffing levels for Unit A and further development of local community rehabilitation services. (i) Lower front-end service costs do not always signify optimal cost-efficiency. (ii) Analysis of routinely collected clinical data can be used to engage commissioners and to make the case for resources to maximise efficiency and improve patient care.
Bai, Jian-Ying; Xie, Yu-Zhong; Wang, Chang-Jiang; Fang, Shu-Qing; Cao, Lin-Nan; Wang, Ling-Li; Jin, Jing-Yi
2018-05-28
As a structural analogue of pyridylthiazole, 2-(2-benzothiazoyl)-phenylethynylquinoline (QBT) was designed as a fluorescent probe for Hg(II) based on an intramolecular charge transfer (ICT) mechanism. The compound was synthesized in three steps starting from 6-bromo-2-methylquinoline, with moderate yield. Corresponding studies on the optical properties of QBT indicate that changes in the fluorescence ratio of QBT in response to Hg(II) could be quantified based on dual-emission changes. More specifically, the emission spectrum of QBT before and after interactions with Hg(II) exhibited a remarkable red shift of about 120 nm, which is rarely reported in ICT-based fluorescent sensors. Finally, QBT was applied in the two-channel imaging of Hg(II) in live HeLa cells.
Rail Inspection Systems Analysis and Technology Survey
DOT National Transportation Integrated Search
1977-09-01
The study was undertaken to identify existing rail inspection system capabilities and methods which might be used to improve these capabilities. Task I was a study to quantify existing inspection parameters and Task II was a cost effectiveness study ...
Concept design and analysis of intermodal freight systems : volume II : Methodology and Results
DOT National Transportation Integrated Search
1980-01-01
This report documents the concept design and analysis of intermodal freight systems. The primary objective of this project was to quantify the various tradeoffs and relationships between fundamental system design parameters and operating strategies, ...
The effects of spatial population dataset choice on estimates of population at risk of disease
2011-01-01
Background The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR) of P. falciparum malaria as an example. Methods The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution), LandScan (~1 km), UNEP Global Population Databases (~5 km), and GPW3 (~5 km). More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets. Results The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets. Conclusions Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions. PMID:21299885
Quantifying spatial and temporal trends in beach-dune volumetric changes using spatial statistics
NASA Astrophysics Data System (ADS)
Eamer, Jordan B. R.; Walker, Ian J.
2013-06-01
Spatial statistics are generally underutilized in coastal geomorphology, despite offering great potential for identifying and quantifying spatial-temporal trends in landscape morphodynamics. In particular, local Moran's Ii provides a statistical framework for detecting clusters of significant change in an attribute (e.g., surface erosion or deposition) and quantifying how this changes over space and time. This study analyzes and interprets spatial-temporal patterns in sediment volume changes in a beach-foredune-transgressive dune complex following removal of invasive marram grass (Ammophila spp.). Results are derived by detecting significant changes in post-removal repeat DEMs derived from topographic surveys and airborne LiDAR. The study site was separated into discrete, linked geomorphic units (beach, foredune, transgressive dune complex) to facilitate sub-landscape scale analysis of volumetric change and sediment budget responses. Difference surfaces derived from a pixel-subtraction algorithm between interval DEMs and the LiDAR baseline DEM were filtered using the local Moran's Ii method and two different spatial weights (1.5 and 5 m) to detect statistically significant change. Moran's Ii results were compared with those derived from a more spatially uniform statistical method that uses a simpler student's t distribution threshold for change detection. Morphodynamic patterns and volumetric estimates were similar between the uniform geostatistical method and Moran's Ii at a spatial weight of 5 m while the smaller spatial weight (1.5 m) consistently indicated volumetric changes of less magnitude. The larger 5 m spatial weight was most representative of broader site morphodynamics and spatial patterns while the smaller spatial weight provided volumetric changes consistent with field observations. All methods showed foredune deflation immediately following removal with increased sediment volumes into the spring via deposition at the crest and on lobes in the lee, despite erosion on the stoss slope and dune toe. Generally, the foredune became wider by landward extension and the seaward slope recovered from erosion to a similar height and form to that of pre-restoration despite remaining essentially free of vegetation.
NASA Astrophysics Data System (ADS)
Dube, Timothy; Mutanga, Onisimo
2016-09-01
Reliable and accurate mapping and extraction of key forest indicators of ecosystem development and health, such as aboveground biomass (AGB) and aboveground carbon stocks (AGCS) is critical in understanding forests contribution to the local, regional and global carbon cycle. This information is critical in assessing forest contribution towards ecosystem functioning and services, as well as their conservation status. This work aimed at assessing the applicability of the high resolution 8-band WorldView-2 multispectral dataset together with environmental variables in quantifying AGB and aboveground carbon stocks for three forest plantation species i.e. Eucalyptus dunii (ED), Eucalyptus grandis (EG) and Pinus taeda (PT) in uMgeni Catchment, South Africa. Specifically, the strength of the Worldview-2 sensor in terms of its improved imaging agilities is examined as an independent dataset and in conjunction with selected environmental variables. The results have demonstrated that the integration of high resolution 8-band Worldview-2 multispectral data with environmental variables provide improved AGB and AGCS estimates, when compared to the use of spectral data as an independent dataset. The use of integrated datasets yielded a high R2 value of 0.88 and RMSEs of 10.05 t ha-1 and 5.03 t C ha-1 for E. dunii AGB and carbon stocks; whereas the use of spectral data as an independent dataset yielded slightly weaker results, producing an R2 value of 0.73 and an RMSE of 18.57 t ha-1 and 09.29 t C ha-1. Similarly, high accurate results (R2 value of 0.73 and RMSE values of 27.30 t ha-1 and 13.65 t C ha-1) were observed from the estimation of inter-species AGB and carbon stocks. Overall, the findings of this work have shown that the integration of new generation multispectral datasets with environmental variables provide a robust toolset required for the accurate and reliable retrieval of forest aboveground biomass and carbon stocks in densely forested terrestrial ecosystems.
On the uncertainties associated with using gridded rainfall data as a proxy for observed
NASA Astrophysics Data System (ADS)
Tozer, C. R.; Kiem, A. S.; Verdon-Kidd, D. C.
2012-05-01
Gridded rainfall datasets are used in many hydrological and climatological studies, in Australia and elsewhere, including for hydroclimatic forecasting, climate attribution studies and climate model performance assessments. The attraction of the spatial coverage provided by gridded data is clear, particularly in Australia where the spatial and temporal resolution of the rainfall gauge network is sparse. However, the question that must be asked is whether it is suitable to use gridded data as a proxy for observed point data, given that gridded data is inherently "smoothed" and may not necessarily capture the temporal and spatial variability of Australian rainfall which leads to hydroclimatic extremes (i.e. droughts, floods). This study investigates this question through a statistical analysis of three monthly gridded Australian rainfall datasets - the Bureau of Meteorology (BOM) dataset, the Australian Water Availability Project (AWAP) and the SILO dataset. The results of the monthly, seasonal and annual comparisons show that not only are the three gridded datasets different relative to each other, there are also marked differences between the gridded rainfall data and the rainfall observed at gauges within the corresponding grids - particularly for extremely wet or extremely dry conditions. Also important is that the differences observed appear to be non-systematic. To demonstrate the hydrological implications of using gridded data as a proxy for gauged data, a rainfall-runoff model is applied to one catchment in South Australia initially using gauged data as the source of rainfall input and then gridded rainfall data. The results indicate a markedly different runoff response associated with each of the different sources of rainfall data. It should be noted that this study does not seek to identify which gridded dataset is the "best" for Australia, as each gridded data source has its pros and cons, as does gauged data. Rather, the intention is to quantify differences between various gridded data sources and how they compare with gauged data so that these differences can be considered and accounted for in studies that utilise these gridded datasets. Ultimately, if key decisions are going to be based on the outputs of models that use gridded data, an estimate (or at least an understanding) of the uncertainties relating to the assumptions made in the development of gridded data and how that gridded data compares with reality should be made.
Evaluation of bulk heat fluxes from atmospheric datasets
NASA Astrophysics Data System (ADS)
Farmer, Benton
Heat fluxes at the air-sea interface are an important component of the Earth's heat budget. In addition, they are an integral factor in determining the sea surface temperature (SST) evolution of the oceans. Different representations of these fluxes are used in both the atmospheric and oceanic communities for the purpose of heat budget studies and, in particular, for forcing oceanic models. It is currently difficult to quantify the potential impact varying heat flux representations have on the ocean response. In this study, a diagnostic tool is presented that allows for a straightforward comparison of surface heat flux formulations and atmospheric data sets. Two variables, relaxation time (RT) and the apparent temperature (T*), are derived from the linearization of the bulk formulas. They are then calculated to compare three bulk formulae and five atmospheric datasets. Additionally, the linearization is expanded to the second order to compare the amount of residual flux present. It is found that the use of a bulk formula employing a constant heat transfer coefficient produces longer relaxation times and contains a greater amount of residual flux in the higher order terms of the linearization. Depending on the temperature difference, the residual flux remaining in the second order and above terms can reach as much as 40--50% of the total residual on a monthly time scale. This is certainly a non-negligible residual flux. In contrast, a bulk formula using a stability and wind dependent transfer coefficient retains much of the total flux in the first order term, as only a few percent remain in the residual flux. Most of the difference displayed among the bulk formulas stems from the sensitivity to wind speed and the choice of a constant or spatially varying transfer coefficient. Comparing the representation of RT and T* provides insight into the differences among various atmospheric datasets. In particular, the representations of the western boundary current, upwelling, and the Indian monsoon regions of the oceans have distinct characteristics within each dataset. Localized regions, such as the eastern Mexican and Central American coasts, are also shown to have variability among the datasets. The use of this technique for the evaluation of bulk formulae and datasets is an efficient method for identifying the unique characteristics of each. Furthermore, insight into the heat fluxes produced by particular bulk formula or dataset can be gained.
Gray, Adrian J; Shorter, Kathleen; Cummins, Cloe; Murphy, Aron; Waldron, Mark
2018-06-01
Quantifying the training and competition loads of players in contact team sports can be performed in a variety of ways, including kinematic, perceptual, heart rate or biochemical monitoring methods. Whilst these approaches provide data relevant for team sports practitioners and athletes, their application to a contact team sport setting can sometimes be challenging or illogical. Furthermore, these methods can generate large fragmented datasets, do not provide a single global measure of training load and cannot adequately quantify all key elements of performance in contact team sports. A previous attempt to address these limitations via the estimation of metabolic energy demand (global energy measurement) has been criticised for its inability to fully quantify the energetic costs of team sports, particularly during collisions. This is despite the seemingly unintentional misapplication of the model's principles to settings outside of its intended use. There are other hindrances to the application of such models, which are discussed herein, such as the data-handling procedures of Global Position System manufacturers and the unrealistic expectations of end users. Nevertheless, we propose an alternative energetic approach, based on Global Positioning System-derived data, to improve the assessment of mechanical load in contact team sports. We present a framework for the estimation of mechanical work performed during locomotor and contact events with the capacity to globally quantify the work done during training and matches.
Doing More with Less? Toward Increasing the Resolution of Protistan Grazing-rate Measurements.
NASA Astrophysics Data System (ADS)
Morison, F.; Menden-Deuer, S.
2016-02-01
The dilution method is the standard protocol to quantify phytoplankton grazing-mortality rates and has been key in developing an understanding of protistan grazing impact on ocean primary production. Although the method's extensive use has facilitated the acquisition of a global dataset, its laborious application hinders the sampling resolution needed to fill knowledge gaps remaining at the geographical, seasonal, and vertical scales, and of the effects of climate-related factors influencing grazing magnitude. Here we present a rigorous assessment of an abbreviated method known as the 2-point. We analyzed unpublished results from 77 dilution experiments performed using a series of up to 5 dilutions under a wide range of chlorophyll concentrations and temperatures. We quantified the difference between estimates of both phytoplankton growth and grazing-mortality obtained based on the full dilution series and those obtained when the number of dilutions was reduced to 2. We considered the effect of non-linearity and chlorophyll concentration, and generated quantified estimates of trade-offs when choosing the fraction of seawater in the diluted treatment. Ultimately, we provide an assessment of the reliability of the 2-point method and recommendations on how to apply it.
Comparison and validation of gridded precipitation datasets for Spain
NASA Astrophysics Data System (ADS)
Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo
2016-04-01
In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely better than ERA-Interim, which has a much coarser representation of the relief, which is crucial for precipitation. These results are a contribution to the Spanish Case Study of the eartH2Observe project, which is focused on the simulation of drought processes in Spain using Land-Surface Models (LSM). This study will also be helpful in the Spanish MARCO project, which aims at improving the ability of RCMs to simulate hydrometeorological extremes.
Post-fire Thermokarst Development Along a Planned Road Corridor in Arctic Alaska
NASA Astrophysics Data System (ADS)
Jones, B. M.; Grosse, G.; Larsen, C. F.; Hayes, D. J.; Arp, C. D.; Liu, L.; Miller, E.
2015-12-01
Wildfire disturbance in northern high latitude regions is an important factor contributing to ecosystem and landscape change. In permafrost influenced terrain, fire may initiate thermokarst development which impacts hydrology, vegetation, wildlife, carbon storage and infrastructure. In this study we differenced two airborne LiDAR datasets that were acquired in the aftermath of the large and severe Anaktuvuk River tundra fire, which in 2007 burned across a proposed road corridor in Arctic Alaska. The 2009 LiDAR dataset was acquired by the Alaska Department of Transportation in preparation for construction of a gravel road that would connect the Dalton Highway with the logistical camp of Umiat. The 2014 LiDAR dataset was acquired by the USGS to quantify potential post-fire thermokarst development over the first seven years following the tundra fire event. By differencing the two 1 m resolution digital terrain models, we measured permafrost thaw subsidence across 34% of the burned tundra area studied, and observed less than 1% in similar, undisturbed tundra terrain units. Ice-rich, yedoma upland terrain was most susceptible to thermokarst development following the disturbance, accounting for 50% of the areal and volumetric change detected, with some locations subsiding more than six meters over the study period. Calculation of rugosity, or surface roughness, in the two datasets showed a doubling in microtopography on average across the burned portion of the study area, with a 340% increase in yedoma upland terrain. An additional LiDAR dataset was acquired in April 2015 to document the role of thermokarst development on enhanced snow accumulation and subsequent snowmelt runoff within the burn area. Our findings will enable future vulnerability assessments of ice-rich permafrost terrain as a result of shifting disturbance regimes. Such assessments are needed to address questions focused on the impact of permafrost degradation on physical, ecological, and socio-economic processes.
Multisource Estimation of Long-term Global Terrestrial Surface Radiation
NASA Astrophysics Data System (ADS)
Peng, L.; Sheffield, J.
2017-12-01
Land surface net radiation is the essential energy source at the earth's surface. It determines the surface energy budget and its partitioning, drives the hydrological cycle by providing available energy, and offers heat, light, and energy for biological processes. Individual components in net radiation have changed historically due to natural and anthropogenic climate change and land use change. Decadal variations in radiation such as global dimming or brightening have important implications for hydrological and carbon cycles. In order to assess the trends and variability of net radiation and evapotranspiration, there is a need for accurate estimates of long-term terrestrial surface radiation. While large progress in measuring top of atmosphere energy budget has been made, huge discrepancies exist among ground observations, satellite retrievals, and reanalysis fields of surface radiation, due to the lack of observational networks, the difficulty in measuring from space, and the uncertainty in algorithm parameters. To overcome the weakness of single source datasets, we propose a multi-source merging approach to fully utilize and combine multiple datasets of radiation components separately, as they are complementary in space and time. First, we conduct diagnostic analysis of multiple satellite and reanalysis datasets based on in-situ measurements such as Global Energy Balance Archive (GEBA), existing validation studies, and other information such as network density and consistency with other meteorological variables. Then, we calculate the optimal weighted average of multiple datasets by minimizing the variance of error between in-situ measurements and other observations. Finally, we quantify the uncertainties in the estimates of surface net radiation and employ physical constraints based on the surface energy balance to reduce these uncertainties. The final dataset is evaluated in terms of the long-term variability and its attribution to changes in individual components. The goal of this study is to provide a merged observational benchmark for large-scale diagnostic analyses, remote sensing and land surface modeling.
NASA Astrophysics Data System (ADS)
Javernick, L.; Bertoldi, W.; Redolfi, M.
2017-12-01
Accessing or acquiring high quality, low-cost topographic data has never been easier due to recent developments of the photogrammetric techniques of Structure-from-Motion (SfM). Researchers can acquire the necessary SfM imagery with various platforms, with the ability to capture millimetre resolution and accuracy, or large-scale areas with the help of unmanned platforms. Such datasets in combination with numerical modelling have opened up new opportunities to study river environments physical and ecological relationships. While numerical models overall predictive accuracy is most influenced by topography, proper model calibration requires hydraulic data and morphological data; however, rich hydraulic and morphological datasets remain scarce. This lack in field and laboratory data has limited model advancement through the inability to properly calibrate, assess sensitivity, and validate the models performance. However, new time-lapse imagery techniques have shown success in identifying instantaneous sediment transport in flume experiments and their ability to improve hydraulic model calibration. With new capabilities to capture high resolution spatial and temporal datasets of flume experiments, there is a need to further assess model performance. To address this demand, this research used braided river flume experiments and captured time-lapse observed sediment transport and repeat SfM elevation surveys to provide unprecedented spatial and temporal datasets. Through newly created metrics that quantified observed and modeled activation, deactivation, and bank erosion rates, the numerical model Delft3d was calibrated. This increased temporal data of both high-resolution time series and long-term temporal coverage provided significantly improved calibration routines that refined calibration parameterization. Model results show that there is a trade-off between achieving quantitative statistical and qualitative morphological representations. Specifically, statistical agreement simulations suffered to represent braiding planforms (evolving toward meandering), and parameterization that ensured braided produced exaggerated activation and bank erosion rates. Marie Sklodowska-Curie Individual Fellowship: River-HMV, 656917
A Spatially Distinct History of the Development of California Groundfish Fisheries
Miller, Rebecca R.; Field, John C.; Santora, Jarrod A.; Schroeder, Isaac D.; Huff, David D.; Key, Meisha; Pearson, Don E.; MacCall, Alec D.
2014-01-01
During the past century, commercial fisheries have expanded from small vessels fishing in shallow, coastal habitats to a broad suite of vessels and gears that fish virtually every marine habitat on the globe. Understanding how fisheries have developed in space and time is critical for interpreting and managing the response of ecosystems to the effects of fishing, however time series of spatially explicit data are typically rare. Recently, the 1933–1968 portion of the commercial catch dataset from the California Department of Fish and Wildlife was recovered and digitized, completing the full historical series for both commercial and recreational datasets from 1933–2010. These unique datasets include landing estimates at a coarse 10 by 10 minute “grid-block” spatial resolution and extends the entire length of coastal California up to 180 kilometers from shore. In this study, we focus on the catch history of groundfish which were mapped for each grid-block using the year at 50% cumulative catch and total historical catch per habitat area. We then constructed generalized linear models to quantify the relationship between spatiotemporal trends in groundfish catches, distance from ports, depth, percentage of days with wind speed over 15 knots, SST and ocean productivity. Our results indicate that over the history of these fisheries, catches have taken place in increasingly deeper habitat, at a greater distance from ports, and in increasingly inclement weather conditions. Understanding spatial development of groundfish fisheries and catches in California are critical for improving population models and for evaluating whether implicit stock assessment model assumptions of relative homogeneity of fisheries removals over time and space are reasonable. This newly reconstructed catch dataset and analysis provides a comprehensive appreciation for the development of groundfish fisheries with respect to commonly assumed trends of global fisheries patterns that are typically constrained by a lack of long-term spatial datasets. PMID:24967973
Taoka, Toshiaki; Kawai, Hisashi; Nakane, Toshiki; Hori, Saeka; Ochi, Tomoko; Miyasaka, Toshiteru; Sakamoto, Masahiko; Kichikawa, Kimihiko; Naganawa, Shinji
2016-09-01
The "K2" value is a factor that represents the vascular permeability of tumors and can be calculated from datasets obtained with the dynamic susceptibility contrast (DSC) method. The purpose of the current study was to correlate K2 with Ktrans, which is a well-established permeability parameter obtained with the dynamic contrast enhance (DCE) method, and determine the usefulness of K2 for glioma grading with histogram analysis. The subjects were 22 glioma patients (Grade II: 5, III: 6, IV: 11) who underwent DSC studies, including eight patients in which both DSC and DCE studies were performed on separate days within 10days. We performed histogram analysis of regions of interest of the tumors and acquired 20th percentile values for leakage-corrected cerebral blood volume (rCBV20%ile), K2 (K220%ile), and for patients who underwent a DCE study, Ktrans (Ktrans20%ile). We evaluated the correlation between K220%ile and Ktrans20%ile and the statistical difference between rCBV20%ile and K220%ile. We found a statistically significant correlation between K220%ile and Ktrans20%ile (r=0.717, p<0.05). rCBV20%ile showed a significant difference between Grades II and III and between Grades II and IV, whereas K220%ile showed a statistically significant (p<0.05) difference between Grades II and IV and between Grades III and IV. The K2 value calculated from the DSC dataset, which can be obtained with a short acquisition time, showed a correlation with Ktrans obtained with the DCE method and may be useful for glioma grading when analyzed with histogram analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Sandiego, Christine M.; Weinzimmer, David; Carson, Richard E.
2012-01-01
An important step in PET brain kinetic analysis is the registration of functional data to an anatomical MR image. Typically, PET-MR registrations in nonhuman primate neuroreceptor studies used PET images acquired early post-injection, (e.g., 0–10 min) to closely resemble the subject’s MR image. However, a substantial fraction of these registrations (~25%) fail due to the differences in kinetics and distribution for various radiotracer studies and conditions (e.g., blocking studies). The Multi-Transform Method (MTM) was developed to improve the success of registrations between PET and MR images. Two algorithms were evaluated, MTM-I and MTM-II. The approach involves creating multiple transformations by registering PET images of different time intervals, from a dynamic study, to a single reference (i.e., MR image) (MTM-I) or to multiple reference images (i.e., MR and PET images pre-registered to the MR) (MTM-II). Normalized mutual information was used to compute similarity between the transformed PET images and the reference image(s) to choose the optimal transformation. This final transformation is used to map the dynamic dataset into the animal’s anatomical MR space, required for kinetic analysis. The chosen transformed from MTM-I and MTM-II were evaluated using visual rating scores to assess the quality of spatial alignment between the resliced PET and reference. One hundred twenty PET datasets involving eleven different tracers from 3 different scanners were used to evaluate the MTM algorithms. Studies were performed with baboons and rhesus monkeys on the HR+, HRRT, and Focus-220. Successful transformations increased from 77.5%, 85.8%, to 96.7% using the 0–10 min method, MTM-I, and MTM-II, respectively, based on visual rating scores. The Multi-Transform Methods proved to be a robust technique for PET-MR registrations for a wide range of PET studies. PMID:22926293
Spatial heterogeneity of type I error for local cluster detection tests
2014-01-01
Background Just as power, type I error of cluster detection tests (CDTs) should be spatially assessed. Indeed, CDTs’ type I error and power have both a spatial component as CDTs both detect and locate clusters. In the case of type I error, the spatial distribution of wrongly detected clusters (WDCs) can be particularly affected by edge effect. This simulation study aims to describe the spatial distribution of WDCs and to confirm and quantify the presence of edge effect. Methods A simulation of 40 000 datasets has been performed under the null hypothesis of risk homogeneity. The simulation design used realistic parameters from survey data on birth defects, and in particular, two baseline risks. The simulated datasets were analyzed using the Kulldorff’s spatial scan as a commonly used test whose behavior is otherwise well known. To describe the spatial distribution of type I error, we defined the participation rate for each spatial unit of the region. We used this indicator in a new statistical test proposed to confirm, as well as quantify, the edge effect. Results The predefined type I error of 5% was respected for both baseline risks. Results showed strong edge effect in participation rates, with a descending gradient from center to edge, and WDCs more often centrally situated. Conclusions In routine analysis of real data, clusters on the edge of the region should be carefully considered as they rarely occur when there is no cluster. Further work is needed to combine results from power studies with this work in order to optimize CDTs performance. PMID:24885343
Survival of surf scoters and white-winged scoters during remigial molt
Uher-Koch, Brian D.; Esler, Daniel N.; Dickson, Rian D.; Hupp, Jerry W.; Evenson, Joseph R.; Anderson, Eric M.; Barrett, Jennifer; Schmutz, Joel A.
2014-01-01
Quantifying sources and timing of variation in demographic rates is necessary to determine where and when constraints may exist within the annual cycle of organisms. Surf scoters (Melanitta perspicillata) and white-winged scoters (M. fusca) undergo simultaneous remigial molt during which they are flightless for >1 month. Molt could result in reduced survival due to increased predation risk or increased energetic demands associated with regrowing flight feathers. Waterfowl survival during remigial molt varies across species, and has rarely been assessed for sea ducks. To quantify survival during remigial molt, we deployed very high frequency (VHF) transmitters on surf scoters (n = 108) and white-winged scoters (n = 57) in southeast Alaska and the Salish Sea (British Columbia and Washington) in 2008 and 2009. After censoring mortalities potentially related to capture and handling effects, we detected no mortalities during remigial molt; thus, estimates of daily and period survival for both scoter species during molt were 1.00. We performed sensitivity analyses in which mortalities were added to the dataset to simulate potential mortality rates for the population and then estimated the probability of obtaining a dataset with 0 mortalities. We found that only at high survival rates was there a high probability of observing 0 mortalities. We conclude that remigial molt is normally a period of low mortality in the annual cycle of scoters. The molt period does not appear to be a constraint on scoter populations; therefore, other annual cycle stages should be targeted by research and management efforts to change population trajectories.
Quantifying the impact of human activity on temperatures in Germany
NASA Astrophysics Data System (ADS)
Benz, Susanne A.; Bayer, Peter; Blum, Philipp
2017-04-01
Human activity directly influences ambient air, surface and groundwater temperatures. Alterations of surface cover and land use influence the ambient thermal regime causing spatial temperature anomalies, most commonly heat islands. These local temperature anomalies are primarily described within the bounds of large and densely populated urban settlements, where they form so-called urban heat islands (UHI). This study explores the anthropogenic impact not only for selected cities, but for the thermal regime on a countrywide scale, by analyzing mean annual temperature datasets in Germany in three different compartments: measured surface air temperature (SAT), measured groundwater temperature (GWT), and satellite-derived land surface temperature (LST). As a universal parameter to quantify anthropogenic heat anomalies, the anthropogenic heat intensity (AHI) is introduced. It is closely related to the urban heat island intensity, but determined for each pixel (for satellite-derived LST) or measurement point (for SAT and GWT) of a large, even global, dataset individually, regardless of land use and location. Hence, it provides the unique opportunity to a) compare the anthropogenic impact on temperatures in air, surface and subsurface, b) to find main instances of anthropogenic temperature anomalies within the study area, in this case Germany, and c) to study the impact of smaller settlements or industrial sites on temperatures. For all three analyzed temperature datasets, anthropogenic heat intensity grows with increasing nighttime lights and declines with increasing vegetation, whereas population density has only minor effects. While surface anthropogenic heat intensity cannot be linked to specific land cover types in the studied resolution (1 km × 1 km) and classification system, both air and groundwater show increased heat intensities for artificial surfaces. Overall, groundwater temperature appears most vulnerable to human activity; unlike land surface temperature and surface air temperature, groundwater temperatures are elevated in cultivated areas as well. At the surface of Germany, the highest anthropogenic heat intensity with 4.5 K is found at an open-pit lignite mine near Jülich, followed by three large cities (Munich, Düsseldorf and Nuremberg) with annual mean anthropogenic heat intensities > 4 K. Overall, surface anthropogenic heat intensities > 0 K and therefore urban heat islands are observed in communities down to a population of 5,000.
NASA Astrophysics Data System (ADS)
Maksimowicz, M.; Masarik, M. T.; Brandt, J.; Flores, A. N.
2017-12-01
Land use/land cover (LULC) change directly impacts the partitioning of surface mass and energy fluxes. Regional-scale weather and climate are potentially altered by LULC if the resultant changes in partitioning of surface energy fluxes are significant enough to induce changes in the evolution of the planetary boundary layer and its interaction with the atmosphere above. Dynamics of land use, particularly those related to the social dimensions of the Earth System, are often simplified or not represented in regional land-atmosphere models or Earth System Models. This study explores the role of LULC change on a regional hydroclimate system, focusing on potential hydroclimate changes arising from timber harvesting due to a land grab boom in Mozambique. We also focus more narrowly at quantifying regional impacts on Gorongosa National Park, a nationally important economic and biodiversity resource in southeastern Africa. After nationalizing all land in 1975 after Mozambique gained independence, complex social processes, including an extended low intensity conflict civil war and economic hardships, led to an escalation of land use rights grants to foreign governments. Between 2004 and 2009, large tracts of land were requested for timber. Here we use existing tree cover loss datasets to more accurately represent land cover within a regional weather model. LULC in a region encompassing Gorongosa is updated at three instances between 2001 and 2014 using a tree cover loss dataset. We use these derived LULC datasets to inform lower boundary conditions in the Weather Research and Forecasting (WRF) model. To quantify potential hydrometeorological changes arising from land use change, we performed a factorial-like experiment by mixing input LULC maps and atmospheric forcing data from before, during, and after the land grab. Results suggest that the land grab has impacted microclimate parameters in a significant way via direct and indirect impacts on land-atmosphere interactions. Results of this study suggest that LULC change arising from regional social dynamics are a potentially understudied, yet important human process to capture in both regional reanalyses and climate change projections.
NASA Astrophysics Data System (ADS)
Grall, C.; Steckler, M. S.; Pickering, J.; Goodbred, S. L., Jr.; Sincavage, R.; Hossain, S.; Paola, C.; Spiess, V.
2016-12-01
The hazard associated with sea-level rise (shoreline erosion, flooding and wetlands loss) may dramatically increase when human interventions interfere with the natural responses of the coastal regions to the eustatic rise. We here provide insights about such natural processes, by documenting the manner in which subsidence, sediment input and sediment distribution interact together during the well-known Holocene eustatic rise period, in the Ganges- Brahmaputra-Meghna Delta (GBMD) in Bangladesh. The dataset combines more than 400 hand-drilled stratigraphic wells, 185 radiocarbon ages, and seismic reflection imaging data (255 km of high resolution multichannel seismic dataset), collected thanks to recent research in the BanglaPIRE project. We use two independent approaches for analyzing this broad dataset. First, we estimate the total volume of Holocene sediments in the GBMD. In doing so, we define empirical laws to build up a virtual model of sediment accumulation that takes into account the contrasts in accumulation between rivers and alluvial plains as well as the regional seaward gradient of sediment accumulation. As the evolution of river occupation over the Holocene at the regional scale is now relatively well constrained, we estimate the total volume of sediment deposited in the Delta during the Holocene. Secondly, we use detailed age-models of sediment accumulation at 92 sites (based on 185 radiocarbon ages) for distinguishing the effects of eustasy and subsidence on the sediment accumulation in the different domains of the delta (namely the tidal dominated plain and the fluvial dominated plain). Using these two independent approaches, we are able to quantify the natural subsidence and the relative distribution of subsidence. We emphasize the difference between the subsidence and the sediment accumulation, by showing that sediment accumulation is more than twice the subsidence on average during the Holocene, which allows us to quantify the increase of sediment deposition associated with the eustatic rise in sea-level. We suggests that consequences of sediment starvation in low lying lands associated with human impacts may be masked, and thus underappreciated, during periods of eustatic rise in sea-level.
2012-01-01
Background Metamorphosis in insects transforms the larval into an adult body plan and comprises the destruction and remodeling of larval and the generation of adult tissues. The remodeling of larval into adult muscles promises to be a genetic model for human atrophy since it is associated with dramatic alteration in cell size. Furthermore, muscle development is amenable to 3D in vivo microscopy at high cellular resolution. However, multi-dimensional image acquisition leads to sizeable amounts of data that demand novel approaches in image processing and analysis. Results To handle, visualize and quantify time-lapse datasets recorded in multiple locations, we designed a workflow comprising three major modules. First, the previously introduced TLM-converter concatenates stacks of single time-points. The second module, TLM-2D-Explorer, creates maximum intensity projections for rapid inspection and allows the temporal alignment of multiple datasets. The transition between prepupal and pupal stage serves as reference point to compare datasets of different genotypes or treatments. We demonstrate how the temporal alignment can reveal novel insights into the east gene which is involved in muscle remodeling. The third module, TLM-3D-Segmenter, performs semi-automated segmentation of selected muscle fibers over multiple frames. 3D image segmentation consists of 3 stages. First, the user places a seed into a muscle of a key frame and performs surface detection based on level-set evolution. Second, the surface is propagated to subsequent frames. Third, automated segmentation detects nuclei inside the muscle fiber. The detected surfaces can be used to visualize and quantify the dynamics of cellular remodeling. To estimate the accuracy of our segmentation method, we performed a comparison with a manually created ground truth. Key and predicted frames achieved a performance of 84% and 80%, respectively. Conclusions We describe an analysis pipeline for the efficient handling and analysis of time-series microscopy data that enhances productivity and facilitates the phenotypic characterization of genetic perturbations. Our methodology can easily be scaled up for genome-wide genetic screens using readily available resources for RNAi based gene silencing in Drosophila and other animal models. PMID:23282138
The Structure and Variability of Extended S II 1256Å Emission Near Io
NASA Astrophysics Data System (ADS)
Woodward, R. C.; Roesler, F. L.; Oliversen, R. J.; Smyth, W. H.; Moos, H. W.; Bagenal, F.
2001-05-01
Since the first Space Telescope Imaging Spectrograph (STIS) observations of Io in 1997 [1], 32 spectrally dispersed STIS images of Io containing the S II 1256Å line have been obtained during eight ``visits'' (observing sequences). Each image is a 2'' x 25'' rectangle containing Io, which includes emission out to 15--40 Io radii from the moon, depending on viewing geometry. After carefully removing contamination from spectrally adjacent lines, the variable dark current in the STIS FUV MAMA, and the contribution of the foreground/background plasma torus, we have examined the S II 1256Å emission away from the surface of Io in each image. We have also compared these data with the overall plasma torus, as seen in [S II] 6731Å groundbased images [2] (which have been acquired throughout this time period, and overlap three of the eight visits in particular). We find that the S II 1256Å emission is quite different from the neutral O and S UV emission observed simultaneously. It falls off more slowly and less symmetrically, and has greater temporal variability; these effects cannot adequately be explained as a simple function of phase, viewing geometry, and System~III magnetic longitude, although a System~III dependence is present. Earlier [3], we reported a large, highly asymmetric brightening in the extended S II 1256Å emission on 14 October 1997, correlated with brightenings in neutral O and S UV lines in the same STIS data and with [O I] 6300Å observed from the ground; this brightening is now seen to be unique in the full dataset, both in brightness and in asymmetry. (This is consistent with the much larger groundbased [O I] 6300Å dataset [4], in which features comparable to the 14 October 1997 brightening are rare.) These and other results, and their implications for the Io-torus interaction, will be discussed. This work was supported in part by NASA grants NAS5-30131 and NAG5-6546, and RTOP 344-32-30. References: [1] Roesler et al., Science 283, 353 (1999). [2] Woodward et al., B.A.A.S. 32, 1059 (2000). [3] Woodward et al., Eos 81, S290 (2000). [4] Oliversen et al., J.G.R., in press.
Quantify Lateral Dispersion and Turbulent Mixing by Spatial Array of chi-EM-APEX Floats
2013-09-30
pattern), 18-hour background field on the R/V Oceanus. ii) 10 km, 4-hour butterfly following dye on R/V Endeavor. iii) Dye following to track the...analysis of drogue observations, Deep- Sea Research, 23, 349-352. PUBLICATIONS (wholly or in part supported by this grant) Sanford, T.B. (2013...Spatial Structure of Thermocline and Abyssal Internal Waves, Deep- Sea Res. Part II. 85, 195-209. [published, refereed] Szuts, Z.B. and T. B. Sanford
Bradley, Jeffrey; Bae, Kyounghwa; Choi, Noah; Forster, Ken; Siegel, Barry A; Brunetti, Jacqueline; Purdy, James; Faria, Sergio; Vu, Toni; Thorstad, Wade; Choy, Hak
2012-01-01
Radiation Therapy Oncology Group (RTOG) 0515 is a Phase II prospective trial designed to quantify the impact of positron emission tomography (PET)/computed tomography (CT) compared with CT alone on radiation treatment plans (RTPs) and to determine the rate of elective nodal failure for PET/CT-derived volumes. Each enrolled patient underwent definitive radiation therapy for non-small-cell lung cancer (≥ 60 Gy) and had two RTP datasets generated: gross tumor volume (GTV) derived with CT alone and with PET/CT. Patients received treatment using the PET/CT-derived plan. The primary end point, the impact of PET/CT fusion on treatment plans was measured by differences of the following variables for each patient: GTV, number of involved nodes, nodal station, mean lung dose (MLD), volume of lung exceeding 20 Gy (V20), and mean esophageal dose (MED). Regional failure rate was a secondary end point. The nonparametric Wilcoxon matched-pairs signed-ranks test was used with Bonferroni adjustment for an overall significance level of 0.05. RTOG 0515 accrued 52 patients, 47 of whom are evaluable. The follow-up time for all patients is 12.9 months (2.7-22.2). Tumor staging was as follows: II = 6%; IIIA = 40%; and IIIB = 54%. The GTV was statistically significantly smaller for PET/CT-derived volumes (98.7 vs. 86.2 mL; p < 0.0001). MLDs for PET/CT plans were slightly lower (19 vs. 17.8 Gy; p = 0.06). There was no significant difference in the number of involved nodes (2.1 vs. 2.4), V20 (32% vs. 30.8%), or MED (28.7 vs. 27.1 Gy). Nodal contours were altered by PET/CT for 51% of patients. One patient (2%) has developed an elective nodal failure. PET/CT-derived tumor volumes were smaller than those derived by CT alone. PET/CT changed nodal GTV contours in 51% of patients. The elective nodal failure rate for GTVs derived by PET/CT is quite low, supporting the RTOG standard of limiting the target volume to the primary tumor and involved nodes. Copyright © 2012 Elsevier Inc. All rights reserved.
Origin and z-distribution of Galactic diffuse [C II] emission
NASA Astrophysics Data System (ADS)
Velusamy, T.; Langer, W. D.
2014-12-01
Context. The [C ii] emission is an important probe of star formation in the Galaxy and in external galaxies. The GOT C+ survey and its follow up observations of spectrally resolved 1.9 THz [C ii] emission using Herschel HIFI provides the data needed to quantify the Galactic interstellar [C ii] gas components as tracers of star formation. Aims: We determine the source of the diffuse [C ii] emission by studying its spatial (radial and vertical) distributions by separating and evaluating the fractions of [C ii] and CO emissions in the Galactic ISM gas components. Methods: We used the HIFI [C ii] Galactic survey (GOT C+), along with ancillary H i, 12CO, 13CO, and C18O data toward 354 lines of sight, and several HIFI [C ii] and [C i] position-velocity maps. We quantified the emission in each spectral line profile by evaluating the intensities in 3 km s-1 wide velocity bins, "spaxels". Using the detection of [C ii] with CO or [C i], we separated the dense and diffuse gas components. We derived 2D Galactic disk maps using the spaxel velocities for kinematic distances. We separated the warm and cold H2 gases by comparing CO emissions with and without associated [C ii]. Results: We find evidence of widespread diffuse [C ii] emission with a z-scale distribution larger than that for the total [C ii] or CO. The diffuse [C ii] emission consists of (i) diffuse molecular (CO-faint) H2 clouds and (ii) diffuse H i clouds and/or WIM. In the inner Galaxy we find a lack of [C ii] detections in a majority (~62%) of H i spaxels and show that the diffuse component primarily comes from the WIM (~21%) and that the H i gas is not a major contributor to the diffuse component (~6%). The warm-H2 radial profile shows an excess in the range 4 to 7 kpc, consistent with enhanced star formation there. Conclusions: We derive, for the first time, the 2D [C ii] spatial distribution in the plane and the z-distributions of the individual [C ii] gas component. From the GOT C+ detections we estimate the fractional [C ii] emission tracing (i) H2 gas in dense and diffuse molecular clouds as ~48% and ~14%, respectively, (ii) in the H i gas ~18%, and (iii) in the WIM ~21%. Including non-detections from H i increases the [C ii] in H i to ~27%. The z-scale distributions FWHM from smallest to largest are [C ii] sources with CO, ~130 pc, (CO-faint) diffuse H2 gas, ~200 pc, and the diffuse H i and WIM, ~330 pc. When combined with [C ii], CO observations probe the warm-H2 gas, tracing star formation. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.
NASA Technical Reports Server (NTRS)
Platnick, Steven; Meyer, Kerry G.; King, Michael D.; Wind, Galina; Amarasinghe, Nandana; Marchant, Benjamin G.; Arnold, G. Thomas; Zhang, Zhibo; Hubanks, Paul A.; Holz, Robert E.;
2016-01-01
The MODIS Level-2 cloud product (Earth Science Data Set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud-top properties (day and night pressure, temperature, and height) and cloud optical properties(optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases daytime only). Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively. Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product. Notable C6 optical and microphysical algorithm changes include: (i) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach, (ii) improvement in the skill of the shortwave-derived cloud thermodynamic phase, (iii) separate cloud effective radius retrieval datasets for each spectral combination used in previous collections, (iv) separate retrievals for partly cloudy pixels and those associated with cloud edges, (v) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space, and (vi) enhanced pixel-level retrieval uncertainty calculations.The C6 algorithm changes collectively can result in significant changes relative to C5,though the magnitude depends on the dataset and the pixels retrieval location in the cloud parameter space. Example Level-2 granule and Level-3 gridded dataset differences between the two collections are shown. While the emphasis is on the suite of cloud opticalproperty datasets, other MODIS cloud datasets are discussed when relevant.
Platnick, Steven; Meyer, Kerry G; King, Michael D; Wind, Galina; Amarasinghe, Nandana; Marchant, Benjamin; Arnold, G Thomas; Zhang, Zhibo; Hubanks, Paul A; Holz, Robert E; Yang, Ping; Ridgway, William L; Riedi, Jérôme
2017-01-01
The MODIS Level-2 cloud product (Earth Science Data Set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud-top properties (day and night pressure, temperature, and height) and cloud optical properties (optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases-daytime only). Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively. Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product. Notable C6 optical and microphysical algorithm changes include: (i) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach, (ii) improvement in the skill of the shortwave-derived cloud thermodynamic phase, (iii) separate cloud effective radius retrieval datasets for each spectral combination used in previous collections, (iv) separate retrievals for partly cloudy pixels and those associated with cloud edges, (v) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space, and (vi) enhanced pixel-level retrieval uncertainty calculations. The C6 algorithm changes collectively can result in significant changes relative to C5, though the magnitude depends on the dataset and the pixel's retrieval location in the cloud parameter space. Example Level-2 granule and Level-3 gridded dataset differences between the two collections are shown. While the emphasis is on the suite of cloud optical property datasets, other MODIS cloud datasets are discussed when relevant.
Platnick, Steven; Meyer, Kerry G.; King, Michael D.; Wind, Galina; Amarasinghe, Nandana; Marchant, Benjamin; Arnold, G. Thomas; Zhang, Zhibo; Hubanks, Paul A.; Holz, Robert E.; Yang, Ping; Ridgway, William L.; Riedi, Jérôme
2018-01-01
The MODIS Level-2 cloud product (Earth Science Data Set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud-top properties (day and night pressure, temperature, and height) and cloud optical properties (optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases–daytime only). Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively. Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product. Notable C6 optical and microphysical algorithm changes include: (i) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach, (ii) improvement in the skill of the shortwave-derived cloud thermodynamic phase, (iii) separate cloud effective radius retrieval datasets for each spectral combination used in previous collections, (iv) separate retrievals for partly cloudy pixels and those associated with cloud edges, (v) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space, and (vi) enhanced pixel-level retrieval uncertainty calculations. The C6 algorithm changes collectively can result in significant changes relative to C5, though the magnitude depends on the dataset and the pixel’s retrieval location in the cloud parameter space. Example Level-2 granule and Level-3 gridded dataset differences between the two collections are shown. While the emphasis is on the suite of cloud optical property datasets, other MODIS cloud datasets are discussed when relevant. PMID:29657349
Zhu, Qile; Li, Xiaolin; Conesa, Ana; Pereira, Cécile
2018-05-01
Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. andyli@ece.ufl.edu or aconesa@ufl.edu. Supplementary data are available at Bioinformatics online.
Zhu, Qile; Li, Xiaolin; Conesa, Ana; Pereira, Cécile
2018-01-01
Abstract Motivation Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. Results We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. Availability and implementation The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. Contact andyli@ece.ufl.edu or aconesa@ufl.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:29272325
A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset
NASA Astrophysics Data System (ADS)
Schellekens, Jaap; Dutra, Emanuel; Martínez-de la Torre, Alberto; Balsamo, Gianpaolo; van Dijk, Albert; Sperna Weiland, Frederiek; Minvielle, Marie; Calvet, Jean-Christophe; Decharme, Bertrand; Eisner, Stephanie; Fink, Gabriel; Flörke, Martina; Peßenteiner, Stefanie; van Beek, Rens; Polcher, Jan; Beck, Hylke; Orth, René; Calton, Ben; Burke, Sophia; Dorigo, Wouter; Weedon, Graham P.
2017-07-01
The dataset presented here consists of an ensemble of 10 global hydrological and land surface models for the period 1979-2012 using a reanalysis-based meteorological forcing dataset (0.5° resolution). The current dataset serves as a state of the art in current global hydrological modelling and as a benchmark for further improvements in the coming years. A signal-to-noise ratio analysis revealed low inter-model agreement over (i) snow-dominated regions and (ii) tropical rainforest and monsoon areas. The large uncertainty of precipitation in the tropics is not reflected in the ensemble runoff. Verification of the results against benchmark datasets for evapotranspiration, snow cover, snow water equivalent, soil moisture anomaly and total water storage anomaly using the tools from The International Land Model Benchmarking Project (ILAMB) showed overall useful model performance, while the ensemble mean generally outperformed the single model estimates. The results also show that there is currently no single best model for all variables and that model performance is spatially variable. In our unconstrained model runs the ensemble mean of total runoff into the ocean was 46 268 km3 yr-1 (334 kg m-2 yr-1), while the ensemble mean of total evaporation was 537 kg m-2 yr-1. All data are made available openly through a Water Cycle Integrator portal (WCI, wci.earth2observe.eu), and via a direct http and ftp download. The portal follows the protocols of the open geospatial consortium such as OPeNDAP, WCS and WMS. The DOI for the data is https://doi.org/10.1016/10.5281/zenodo.167070.
NASA Astrophysics Data System (ADS)
Rose, Jake; Martin, Michael; Bourlai, Thirimachos
2014-06-01
In law enforcement and security applications, the acquisition of face images is critical in producing key trace evidence for the successful identification of potential threats. The goal of the study is to demonstrate that steroid usage significantly affects human facial appearance and hence, the performance of commercial and academic face recognition (FR) algorithms. In this work, we evaluate the performance of state-of-the-art FR algorithms on two unique face image datasets of subjects before (gallery set) and after (probe set) steroid (or human growth hormone) usage. For the purpose of this study, datasets of 73 subjects were created from multiple sources found on the Internet, containing images of men and women before and after steroid usage. Next, we geometrically pre-processed all images of both face datasets. Then, we applied image restoration techniques on the same face datasets, and finally, we applied FR algorithms in order to match the pre-processed face images of our probe datasets against the face images of the gallery set. Experimental results demonstrate that only a specific set of FR algorithms obtain the most accurate results (in terms of the rank-1 identification rate). This is because there are several factors that influence the efficiency of face matchers including (i) the time lapse between the before and after image pre-processing and restoration face photos, (ii) the usage of different drugs (e.g. Dianabol, Winstrol, and Decabolan), (iii) the usage of different cameras to capture face images, and finally, (iv) the variability of standoff distance, illumination and other noise factors (e.g. motion noise). All of the previously mentioned complicated scenarios make clear that cross-scenario matching is a very challenging problem and, thus, further investigation is required.
NASA Astrophysics Data System (ADS)
Dube, Timothy; Sibanda, Mbulisi; Shoko, Cletah; Mutanga, Onisimo
2017-10-01
Forest stand volume is one of the crucial stand parameters, which influences the ability of these forests to provide ecosystem goods and services. This study thus aimed at examining the potential of integrating multispectral SPOT 5 image, with ancillary data (forest age and rainfall metrics) in estimating stand volume between coppiced and planted Eucalyptus spp. in KwaZulu-Natal, South Africa. To achieve this objective, Partial Least Squares Regression (PLSR) algorithm was used. The PLSR algorithm was implemented by applying three tier analysis stages: stage I: using ancillary data as an independent dataset, stage II: SPOT 5 spectral bands as an independent dataset and stage III: combined SPOT 5 spectral bands and ancillary data. The results of the study showed that the use of an independent ancillary dataset better explained the volume of Eucalyptus spp. growing from coppices (adjusted R2 (R2Adj) = 0.54, RMSEP = 44.08 m3/ha), when compared with those that were planted (R2Adj = 0.43, RMSEP = 53.29 m3/ha). Similar results were also observed when SPOT 5 spectral bands were applied as an independent dataset, whereas improved volume estimates were produced when using combined dataset. For instance, planted Eucalyptus spp. were better predicted adjusted R2 (R2Adj) = 0.77, adjusted R2Adj = 0.59, RMSEP = 36.02 m3/ha) when compared with those that grow from coppices (R2 = 0.76, R2Adj = 0.46, RMSEP = 40.63 m3/ha). Overall, the findings of this study demonstrated the relevance of multi-source data in ecosystems modelling.
Wendell R. Haag
2009-01-01
There may be bias associated with markârecapture experiments used to estimate age and growth of freshwater mussels. Using subsets of a markârecapture dataset for Quadrula pustulosa, I examined how age and growth parameter estimates are affected by (i) the range and skew of the data and (ii) growth reduction due to handling. I compared predictions...
Heavy flavor decay of Zγ at CDF
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timothy M. Harrington-Taber
2013-01-01
Diboson production is an important and frequently measured parameter of the Standard Model. This analysis considers the previously neglected pmore » $$\\bar{p}$$ →Z γ→ b$$\\bar{b}$$ channel, as measured at the Collider Detector at Fermilab. Using the entire Tevatron Run II dataset, the measured result is consistent with Standard Model predictions, but the statistical error associated with this method of measurement limits the strength of this correlation.« less
An Integrated Suite of Text and Data Mining Tools - Phase II
2005-08-30
Riverside, CA, USA Mazda Motor Corp, Jpn Univ of Darmstadt, Darmstadt, Ger Navy Center for Applied Research in Artificial Intelligence Univ of...with Georgia Tech Research Corporation developed a desktop text-mining software tool named TechOASIS (known commercially as VantagePoint). By the...of this dataset and groups the Corporate Source items that co-occur with the found items. He decides he is only interested in the institutions
NASA Astrophysics Data System (ADS)
Candela, S. G.; Howat, I.; Noh, M. J.; Porter, C. C.; Morin, P. J.
2016-12-01
In the last decade, high resolution satellite imagery has become an increasingly accessible tool for geoscientists to quantify changes in the Arctic land surface due to geophysical, ecological and anthropomorphic processes. However, the trade off between spatial coverage and spatial-temporal resolution has limited detailed, process-level change detection over large (i.e. continental) scales. The ArcticDEM project utilized over 300,000 Worldview image pairs to produce a nearly 100% coverage elevation model (above 60°N) offering the first polar, high spatial - high resolution (2-8m by region) dataset, often with multiple repeats in areas of particular interest to geo-scientists. A dataset of this size (nearly 250 TB) offers endless new avenues of scientific inquiry, but quickly becomes unmanageable computationally and logistically for the computing resources available to the average scientist. Here we present TopoDiff, a framework for a generalized. automated workflow that requires minimal input from the end user about a study site, and utilizes cloud computing resources to provide a temporally sorted and differenced dataset, ready for geostatistical analysis. This hands-off approach allows the end user to focus on the science, without having to manage thousands of files, or petabytes of data. At the same time, TopoDiff provides a consistent and accurate workflow for image sorting, selection, and co-registration enabling cross-comparisons between research projects.
Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed
2017-01-01
Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
Yang, Ruifang; Zhao, Nanjing; Xiao, Xue; Yu, Shaohui; Liu, Jianguo; Liu, Wenqing
2016-01-05
There is not effective method to solve the quenching effect of quencher in fluorescence spectra measurement and recognition of polycyclic aromatic hydrocarbons in aquatic environment. In this work, a four-way dataset combined with four-way parallel factor analysis is used to identify and quantify polycyclic aromatic hydrocarbons in the presence of humic acid, a fluorescent quencher and an ubiquitous substance in aquatic system, through modeling the quenching effect of humic acid by decomposing the four-way dataset into four loading matrices corresponding to relative concentration, excitation spectra, emission spectra and fluorescence quantum yield, respectively. It is found that Phenanthrene, pyrene, anthracene and fluorene can be recognized simultaneously with the similarities all above 0.980 between resolved spectra and reference spectra. Moreover, the concentrations of them ranging from 0 to 8μgL(-1) in the test samples prepared with river water could also be predicted successfully with recovery rate of each polycyclic aromatic hydrocarbon between 100% and 120%, which were higher than those of three-way PARAFAC. These results demonstrate that the combination of four-way dataset with four-way parallel factor analysis could be a promising method to recognize the fluorescence spectra of polycyclic aromatic hydrocarbons in the presence of fluorescent quencher from both qualitative and quantitative perspective. Copyright © 2015 Elsevier B.V. All rights reserved.
A Geospatial Database that Supports Derivation of Climatological Features of Severe Weather
NASA Astrophysics Data System (ADS)
Phillips, M.; Ansari, S.; Del Greco, S.
2007-12-01
The Severe Weather Data Inventory (SWDI) at NOAA's National Climatic Data Center (NCDC) provides user access to archives of several datasets critical to the detection and evaluation of severe weather. These datasets include archives of: · NEXRAD Level-III point features describing general storm structure, hail, mesocyclone and tornado signatures · National Weather Service Storm Events Database · National Weather Service Local Storm Reports collected from storm spotters · National Weather Service Warnings · Lightning strikes from Vaisala's National Lightning Detection Network (NLDN) SWDI archives all of these datasets in a spatial database that allows for convenient searching and subsetting. These data are accessible via the NCDC web site, Web Feature Services (WFS) or automated web services. The results of interactive web page queries may be saved in a variety of formats, including plain text, XML, Google Earth's KMZ, standards-based NetCDF and Shapefile. NCDC's Storm Risk Assessment Project (SRAP) uses data from the SWDI database to derive gridded climatology products that show the spatial distributions of the frequency of various events. SRAP also can relate SWDI events to other spatial data such as roads, population, watersheds, and other geographic, sociological, or economic data to derive products that are useful in municipal planning, emergency management, the insurance industry, and other areas where there is a need to quantify and qualify how severe weather patterns affect people and property.
A 4-D dataset for validation of crystal growth in a complex three-phase material, ice cream
NASA Astrophysics Data System (ADS)
Rockett, P.; Karagadde, S.; Guo, E.; Bent, J.; Hazekamp, J.; Kingsley, M.; Vila-Comamala, J.; Lee, P. D.
2015-06-01
Four dimensional (4D, or 3D plus time) X-ray tomographic imaging of phase changes in materials is quickly becoming an accepted tool for quantifying the development of microstructures to both inform and validate models. However, most of the systems studied have been relatively simple binary compositions with only two phases. In this study we present a quantitative dataset of the phase evolution in a complex three-phase material, ice cream. The microstructure of ice cream is an important parameter in terms of sensorial perception, and therefore quantification and modelling of the evolution of the microstructure with time and temperature is key to understanding its fabrication and storage. The microstructure consists of three phases, air cells, ice crystals, and unfrozen matrix. We perform in situ synchrotron X-ray imaging of ice cream samples using in-line phase contrast tomography, housed within a purpose built cold-stage (-40 to +20oC) with finely controlled variation in specimen temperature. The size and distribution of ice crystals and air cells during programmed temperature cycling are determined using 3D quantification. The microstructural evolution of three-phase materials has many other important applications ranging from biological to structural and functional material, hence this dataset can act as a validation case for numerical investigations on faceted and non-faceted crystal growth in a range of materials.
Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach.
Andreatta, Massimo; Lund, Ole; Nielsen, Morten
2013-01-01
Proteins recognizing short peptide fragments play a central role in cellular signaling. As a result of high-throughput technologies, peptide-binding protein specificities can be studied using large peptide libraries at dramatically lower cost and time. Interpretation of such large peptide datasets, however, is a complex task, especially when the data contain multiple receptor binding motifs, and/or the motifs are found at different locations within distinct peptides. The algorithm presented in this article, based on Gibbs sampling, identifies multiple specificities in peptide data by performing two essential tasks simultaneously: alignment and clustering of peptide data. We apply the method to de-convolute binding motifs in a panel of peptide datasets with different degrees of complexity spanning from the simplest case of pre-aligned fixed-length peptides to cases of unaligned peptide datasets of variable length. Example applications described in this article include mixtures of binders to different MHC class I and class II alleles, distinct classes of ligands for SH3 domains and sub-specificities of the HLA-A*02:01 molecule. The Gibbs clustering method is available online as a web server at http://www.cbs.dtu.dk/services/GibbsCluster.
Quantification of HTLV-1 Clonality and TCR Diversity
Laydon, Daniel J.; Melamed, Anat; Sim, Aaron; Gillet, Nicolas A.; Sim, Kathleen; Darko, Sam; Kroll, J. Simon; Douek, Daniel C.; Price, David A.; Bangham, Charles R. M.; Asquith, Becca
2014-01-01
Estimation of immunological and microbiological diversity is vital to our understanding of infection and the immune response. For instance, what is the diversity of the T cell repertoire? These questions are partially addressed by high-throughput sequencing techniques that enable identification of immunological and microbiological “species” in a sample. Estimators of the number of unseen species are needed to estimate population diversity from sample diversity. Here we test five widely used non-parametric estimators, and develop and validate a novel method, DivE, to estimate species richness and distribution. We used three independent datasets: (i) viral populations from subjects infected with human T-lymphotropic virus type 1; (ii) T cell antigen receptor clonotype repertoires; and (iii) microbial data from infant faecal samples. When applied to datasets with rarefaction curves that did not plateau, existing estimators systematically increased with sample size. In contrast, DivE consistently and accurately estimated diversity for all datasets. We identify conditions that limit the application of DivE. We also show that DivE can be used to accurately estimate the underlying population frequency distribution. We have developed a novel method that is significantly more accurate than commonly used biodiversity estimators in microbiological and immunological populations. PMID:24945836
Matinmanesh, A; Li, Y; Clarkin, O; Zalzal, P; Schemitsch, E H; Towler, M R; Papini, M
2017-11-01
Bioactive glasses have been used as coatings for biomedical implants because they can be formulated to promote osseointegration, antibacterial behavior, bone formation, and tissue healing through the incorporation and subsequent release of certain ions. However, shear loading on coated implants has been reported to cause the delamination and loosening of such coatings. This work uses a recently developed fracture mechanics testing methodology to quantify the critical strain energy release rate under nearly pure mode II conditions, G IIC , of a series of borate-based glass coating/Ti6Al4V alloy substrate systems. Incorporating increasing amounts of SrCO 3 in the glass composition was found to increase the G IIC almost twofold, from 25.3 to 46.9J/m 2 . The magnitude and distribution of residual stresses in the coating were quantified, and it was found that the residual stresses in all cases distributed uniformly over the cross section of the coating. The crack was driven towards, but not into, the glass/Ti6Al4V substrate interface due to the shear loading. This implied that the interface had a higher fracture toughness than the coating itself. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lisboa, Cristiane Varella; Monteiro, Rafael Veríssimo; Martins, Andreia Fonseca; Xavier, Samantha Cristina das Chagas; Lima, Valdirene Dos Santos; Jansen, Ana Maria
2015-05-01
Here, we present a review of the dataset resulting from the 11-years follow-up of Trypanosoma cruzi infection in free-ranging populations of Leontopithecus rosalia (golden lion tamarin) and Leontopithecus chrysomelas (golden-headed lion tamarin) from distinct forest fragments in Atlantic Coastal Rainforest. Additionally, we present new data regarding T. cruzi infection of small mammals (rodents and marsupials) that live in the same areas as golden lion tamarins and characterisation at discrete typing unit (DTU) level of 77 of these isolates. DTU TcII was found to exclusively infect primates, while TcI infected Didelphis aurita and lion tamarins. The majority of T. cruzi isolates derived from L. rosalia were shown to be TcII (33 out 42) Nine T. cruzi isolates displayed a TcI profile. Golden-headed lion tamarins demonstrated to be excellent reservoirs of TcII, as 24 of 26 T. cruzi isolates exhibited the TcII profile. We concluded the following: (i) the transmission cycle of T. cruzi in a same host species and forest fragment is modified over time, (ii) the infectivity competence of the golden lion tamarin population fluctuates in waves that peak every other year and (iii) both golden and golden-headed lion tamarins are able to maintain long-lasting infections by TcII and TcI.
Lisboa, Cristiane Varella; Monteiro, Rafael Veríssimo; Martins, Andreia Fonseca; Xavier, Samantha Cristina das Chagas; Lima, Valdirene dos Santos; Jansen, Ana Maria
2015-01-01
Here, we present a review of the dataset resulting from the 11-years follow-up of Trypanosoma cruzi infection in free-ranging populations of Leontopithecus rosalia (golden lion tamarin) and Leontopithecus chrysomelas (golden-headed lion tamarin) from distinct forest fragments in Atlantic Coastal Rainforest. Additionally, we present new data regarding T. cruzi infection of small mammals (rodents and marsupials) that live in the same areas as golden lion tamarins and characterisation at discrete typing unit (DTU) level of 77 of these isolates. DTU TcII was found to exclusively infect primates, while TcI infected Didelphis aurita and lion tamarins. The majority of T. cruzi isolates derived from L. rosalia were shown to be TcII (33 out 42) Nine T. cruzi isolates displayed a TcI profile. Golden-headed lion tamarins demonstrated to be excellent reservoirs of TcII, as 24 of 26 T. cruzi isolates exhibited the TcII profile. We concluded the following: (i) the transmission cycle of T. cruzi in a same host species and forest fragment is modified over time, (ii) the infectivity competence of the golden lion tamarin population fluctuates in waves that peak every other year and (iii) both golden and golden-headed lion tamarins are able to maintain long-lasting infections by TcII and TcI. PMID:25946156
NASA Astrophysics Data System (ADS)
Ghosh, Ruby; Bruch, Angela A.; Portmann, Felix; Bera, Subir; Paruya, Dipak Kumar; Morthekai, P.; Ali, Sheikh Nawaz
2017-10-01
Relying on the ability of pollen assemblages to differentiate among elevationally stratified vegetation zones, we assess the potential of a modern pollen-climate dataset from the Darjeeling area, eastern Himalaya, in past climate reconstructions. The dataset includes 73 surface samples from 25 sites collected from a c. 130-3600 m a.s.l. elevation gradient along a horizontal distance of c. 150 km and 124 terrestrial pollen taxa, which are analysed with respect to various climatic and environmental variables such as mean annual temperature (MAT), mean annual precipitation (MAP), mean temperature of coldest quarter (MTCQ), mean temperature of warmest quarter (MTWQ), mean precipitation of driest quarter (MPDQ), mean precipitation of wettest quarter (MPWQ), AET (actual evapotranspiration) and MI (moisture index). To check the reliability of the modern pollen-climate relationships different ordination methods are employed and subsequently tested with Huisman-Olff-Fresco (HOF) models. A series of pollen-climate parameter transfer functions using weighted-averaging regression and calibration partial least squares (WA-PLS) models are developed to reconstruct past climate changes from modern pollen data, and have been cross-validated. Results indicate that three of the environmental variables i.e., MTCQ, MPDQ and MI have strong potential for past climate reconstruction based on the available surface pollen dataset. The potential of the present modern pollen-climate relationship for regional quantitative paleoclimate reconstruction is further tested on a Late Quaternary fossil pollen profile from the Darjeeling foothill region with previously reconstructed and quantified climate. The good agreement with existing data allows for new insights in the hydroclimatic conditions during the Last glacial maxima (LGM) with (winter) temperature being the dominant controlling factor for glacial changes during the LGM in the eastern Himalaya.
NASA Technical Reports Server (NTRS)
Otterman, J.; Ardizzone, J.; Atlas, R.; Demaree, G.; Huth, R.; Jaagus, J.; Koslowsky, D.; Przybylak, R.; Wos, A.; Atlas, Robert (Technical Monitor)
1999-01-01
It is well recognized that advection from the North Atlantic has a profound effect on the climatic conditions in central Europe. A new dataset of the ocean-surface winds, derived from the Special Sensor Microwave Imager, SSM/1, is now available. This satellite instrument measures the wind speed, but not the direction. However, variational analysis developed at the Data Assimilation Office, NASA Goddard Space Flight Center, by combining the SSM/I measurements with wind vectors measured from ships, etc., produced global maps of the ocean surface winds suitable for climate analysis. From this SSM/I dataset, a specific index I(sub na) of the North Atlantic surface winds has been developed, which pertinently quantifies the low-level advection into central Europe. For a selected time-period, the index I(sub na) reports the average of the amplitude of the wind, averaging only the speed when the direction is from the southwest (when the wind is from another direction, the contribution counts to the average as zero speed). Strong correlations were found between February I(sub na) and the surface air temperatures in Europe 50-60 deg N. In the present study, we present the correlations between I(sub na) and temperature I(sub s), and also the sensitivity of T(sub s), to an increase in I(sub na), in various seasons and various regions. We specifically analyze the flow of maritime-air from the North Atlantic that produced two extraordinary warm periods: February 1990, and early-winter 2000/2001. The very cold December 2001 was clearly due to a northerly flow. Our conclusion is that the SSM/I dataset is very useful for providing insight to the forcing of climatic fluctuations in Europe.
NASA Astrophysics Data System (ADS)
Bhuiyan, M. A. E.; Nikolopoulos, E. I.; Anagnostou, E. N.
2017-12-01
Quantifying the uncertainty of global precipitation datasets is beneficial when using these precipitation products in hydrological applications, because precipitation uncertainty propagation through hydrologic modeling can significantly affect the accuracy of the simulated hydrologic variables. In this research the Iberian Peninsula has been used as the study area with a study period spanning eleven years (2000-2010). This study evaluates the performance of multiple hydrologic models forced with combined global rainfall estimates derived based on a Quantile Regression Forests (QRF) technique. In QRF technique three satellite precipitation products (CMORPH, PERSIANN, and 3B42 (V7)); an atmospheric reanalysis precipitation and air temperature dataset; satellite-derived near-surface daily soil moisture data; and a terrain elevation dataset are being utilized in this study. A high-resolution, ground-based observations driven precipitation dataset (named SAFRAN) available at 5 km/1 h resolution is used as reference. Through the QRF blending framework the stochastic error model produces error-adjusted ensemble precipitation realizations, which are used to force four global hydrological models (JULES (Joint UK Land Environment Simulator), WaterGAP3 (Water-Global Assessment and Prognosis), ORCHIDEE (Organizing Carbon and Hydrology in Dynamic Ecosystems) and SURFEX (Stands for Surface Externalisée) ) to simulate three hydrologic variables (surface runoff, subsurface runoff and evapotranspiration). The models are forced with the reference precipitation to generate reference-based hydrologic simulations. This study presents a comparative analysis of multiple hydrologic model simulations for different hydrologic variables and the impact of the blending algorithm on the simulated hydrologic variables. Results show how precipitation uncertainty propagates through the different hydrologic model structures to manifest in reduction of error in hydrologic variables.
NASA Astrophysics Data System (ADS)
Cruden, A. R.; Vollgger, S.
2016-12-01
The emerging capability of UAV photogrammetry combines a simple and cost-effective method to acquire digital aerial images with advanced computer vision algorithms that compute spatial datasets from a sequence of overlapping digital photographs from various viewpoints. Depending on flight altitude and camera setup, sub-centimeter spatial resolution orthophotographs and textured dense point clouds can be achieved. Orientation data can be collected for detailed structural analysis by digitally mapping such high-resolution spatial datasets in a fraction of time and with higher fidelity compared to traditional mapping techniques. Here we describe a photogrammetric workflow applied to a structural study of folds and fractures within alternating layers of sandstone and mudstone at a coastal outcrop in SE Australia. We surveyed this location using a downward looking digital camera mounted on commercially available multi-rotor UAV that autonomously followed waypoints at a set altitude and speed to ensure sufficient image overlap, minimum motion blur and an appropriate resolution. The use of surveyed ground control points allowed us to produce a geo-referenced 3D point cloud and an orthophotograph from hundreds of digital images at a spatial resolution < 10 mm per pixel, and cm-scale location accuracy. Orientation data of brittle and ductile structures were semi-automatically extracted from these high-resolution datasets using open-source software. This resulted in an extensive and statistically relevant orientation dataset that was used to 1) interpret the progressive development of folds and faults in the region, and 2) to generate a 3D structural model that underlines the complex internal structure of the outcrop and quantifies spatial variations in fold geometries. Overall, our work highlights how UAV photogrammetry can contribute to new insights in structural analysis.
Seeland, Marco; Rzanny, Michael; Alaqraa, Nedal; Wäldchen, Jana; Mäder, Patrick
2017-01-01
Steady improvements of image description methods induced a growing interest in image-based plant species classification, a task vital to the study of biodiversity and ecological sensitivity. Various techniques have been proposed for general object classification over the past years and several of them have already been studied for plant species classification. However, results of these studies are selective in the evaluated steps of a classification pipeline, in the utilized datasets for evaluation, and in the compared baseline methods. No study is available that evaluates the main competing methods for building an image representation on the same datasets allowing for generalized findings regarding flower-based plant species classification. The aim of this paper is to comparatively evaluate methods, method combinations, and their parameters towards classification accuracy. The investigated methods span from detection, extraction, fusion, pooling, to encoding of local features for quantifying shape and color information of flower images. We selected the flower image datasets Oxford Flower 17 and Oxford Flower 102 as well as our own Jena Flower 30 dataset for our experiments. Findings show large differences among the various studied techniques and that their wisely chosen orchestration allows for high accuracies in species classification. We further found that true local feature detectors in combination with advanced encoding methods yield higher classification results at lower computational costs compared to commonly used dense sampling and spatial pooling methods. Color was found to be an indispensable feature for high classification results, especially while preserving spatial correspondence to gray-level features. In result, our study provides a comprehensive overview of competing techniques and the implications of their main parameters for flower-based plant species classification. PMID:28234999
NASA Astrophysics Data System (ADS)
Sledd, A.; L'Ecuyer, T. S.
2017-12-01
With Arctic sea ice declining rapidly and Arctic temperatures rising faster than the rest of the globe, a better understanding of the Arctic climate, and ice cover-radiation feedbacks in particular, is needed. Here we present the Arctic Observation and Reanalysis Integrated System (ArORIS), a dataset of integrated products to facilitate studying the Arctic using satellite, reanalysis, and in-situ datasets. The data include cloud properties, radiative fluxes, aerosols, meteorology, precipitation, and surface properties, to name just a few. Each dataset has uniform grid-spacing, time-averaging and naming conventions for ease of use between products. One intended use of ArORIS is to assess Arctic radiation and moisture budgets. Following that goal, we use observations from ArORIS - CERES-EBAF radiative fluxes and NSIDC sea ice fraction and area to quantify relationships between the Arctic energy balance and surface properties. We find a discernable difference between energy budgets for years with high and low September sea ice areas. Surface fluxes are especially responsive to the September sea ice minimum in months both leading up to September and the months following. In particular, longwave fluxes at the surface show increased sensitivity in the months preceding September. Using a single-layer model of solar radiation we also investigate the individual responses of surface and planetary albedos to changes in sea ice area. By partitioning the planetary albedo into surface and atmospheric contributions, we find that the atmospheric contribution to planetary albedo is less sensitive to changes in sea ice area than the surface contribution. Further comparisons between observations and reanalyses can be made using the available datasets in ArORIS.
Professional Growth & Support Spending Calculator
ERIC Educational Resources Information Center
Education Resource Strategies, 2013
2013-01-01
This "Professional Growth & Support Spending Calculator" helps school systems quantify all current spending aimed at improving teaching effectiveness. Part I provides worksheets to analyze total investment. Part II provides a system for evaluating investments based on purpose, target group, and delivery. In this Spending Calculator…