Science.gov

Sample records for additional independent datasets

  1. Confronting reanalysis datasets with independent radiosonde observations

    NASA Astrophysics Data System (ADS)

    Marlton, Graeme; Harrison, Giles; Williams, Paul; Nicoll, Keri

    2014-05-01

    Reanalysis datasets are used for a broad range of research purposes in the atmospheric sciences, from studying climate change to examining extreme rainfall events. However, it is often difficult to verify the quality of such datasets, because verification requires the reanalysis data to be confronted with high-quality independent (i.e. unassimilated) observations, which are rare. We have been launching calibrated Vaisala RS92 radiosondes from the University of Reading, UK for the past three years. None of the data from these launches has been assimilated into any forecasts or reanalysis products. The sondes have random ascent trajectories over southern England, making them ideal to cross check the accuracy of the reanalysis data in this region. In this study, our radiosonde observations of temperature, relative humidity, specific humidity, zonal wind, meridional wind, and pressure surface height are compared with the corresponding data from two widely used reanalysis datasets: ERA-Interim and NCEP. The comparison was done at grid points and pressure levels given by the radiosonde's telemetry data. The temperatures, horizontal winds, and pressure surface heights from the reanalysis datasets show excellent agreement with the radiosonde observations. Values of the specific humidity show reasonable agreement, but values of the relative humidity show poor agreement. The stated error tolerance of the radiosonde's humidity sensors is too small to account for the poor correlation in these results. We conclude that, although reanalysis estimates of temperature and wind may be taken to be representative of the values locally within the reanalysis grid box, values of relative humidity may not. Therefore, in studies of local hydrology, reanalysis humidity data must be used with caution.

  2. Depression and Pain: Independent and Additive Relationships to Anger Expression

    DTIC Science & Technology

    2013-10-01

    Naval Health Research Center Depression and Pain : Independent and Additive Relationships to Anger Expression Marcus K. Taylor Gerald E...AND SUBTITLE Depression and Pain : Independent and Additive Relationships to Anger Expression 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM...expression are understudied. OBJECTIVE: This study was designed to examine the independent and additive relationships of depression and pain to anger

  3. IS-Dom: a dataset of independent structural domains automatically delineated from protein structures

    NASA Astrophysics Data System (ADS)

    Ebina, Teppei; Umezawa, Yuki; Kuroda, Yutaka

    2013-05-01

    Protein domains that can fold in isolation are significant targets in diverse area of proteomics research as they are often readily analyzed by high-throughput methods. Here, we report IS-Dom, a dataset of Independent Structural Domains (ISDs) that are most likely to fold in isolation. IS-Dom was constructed by filtering domains from SCOP, CATH, and DomainParser using quantitative structural measures, which were calculated by estimating inter-domain hydrophobic clusters and hydrogen bonds from the full length protein's atomic coordinates. The ISD detection protocol is fully automated, and all of the computed interactions are stored in the server which enables rapid update of IS-Dom. We also prepared a standard IS-Dom using parameters optimized by maximizing the Youden's index. The standard IS-Dom, contained 54,860 ISDs, of which 25.5 % had high sequence identity and termini overlap with a Protein Data Bank (PDB) cataloged sequence and are thus experimentally shown to fold in isolation [coined autonomously folded domain (AFDs)]. Furthermore, our ISD detection protocol missed less than 10 % of the AFDs, which corroborated our protocol's ability to define structural domains that are able to fold independently. IS-Dom is available through the web server (http://domserv.lab.tuat.ac.jp/IS-Dom.html), and users can either, download the standard IS-Dom dataset, construct their own IS-Dom by interactively varying the parameters, or assess the structural independence of newly defined putative domains.

  4. Multi-temporal harmonization of independent land-use/land-cover datasets for the conterminous United States

    NASA Astrophysics Data System (ADS)

    Soulard, C. E.; Acevedo, W.

    2013-12-01

    A wide range of national-scale land-use/land-cover (LULC) classification efforts exist, yet key differences between these data arise because of independent programmatic objectives and methodologies. As part of the USGS Climate and Land Use Change Research and Development Program, researchers on the Land Change Research Project are working to assess correspondence, characterize the uncertainties, and resolve discrepancies between national LULC datasets. A collection of fifteen moderate resolution land classification datasets were identified and evaluated both qualitatively and quantitatively prior to harmonization using a pixel-based data fusion process. During harmonization, we reconciled nomenclature differences through limited aggregation of classes to facilitate comparison, followed by implementation of a process for checking classification uncertainty against reference imagery and validation datasets that correspond to the time frame of each dataset. Areas with LULC uncertainty between datasets were edited to reflect the classification with the most supporting evidence. Our harmonization process identified pixels that remained unchanged across core dates in input datasets, then reconciled LULC changes between input data across three intervals (1992-2001, 2001-2006, and 2006-2011). By relying on convergence of evidence across numerous independent datasets, Land Change Research seeks to better understand the uncertainties between LULC data and leverage the best elements of readily-available data to improve LULC change monitoring across the conterminous United States.

  5. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    SciTech Connect

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  6. Cross-platform comparison of independent datasets identifies an immune signature associated with improved survival in metastatic melanoma

    PubMed Central

    Lardone, Ricardo D.; Plaisier, Seema B.; Navarrete, Marian S.; Shamonki, Jaime M.; Jalas, John R.

    2016-01-01

    Platform and study differences in prognostic signatures from metastatic melanoma (MM) gene expression reports often hinder consensus arrival. We performed survival/outcome-based pairwise comparisons of three independent MM gene expression profiles using the threshold-free algorithm rank-rank hypergeometric overlap analysis (RRHO). We found statistically significant overlap for genes overexpressed in favorable outcome (FO) groups, but no overlap for poor outcome (PO) groups. This “favorable outcome signature” (FOS) of 228 genes coinciding on all three overlapping gene lists showed immune function predominated in FO MM. Surprisingly, specific cell signature-enrichment analysis showed B cell-associated genes enriched in FO MM, along with T cell-associated genes. Higher levels of B and T cells (p<0.05) and their relative proximity (p<0.05) were detected in FO-to-PO tumor comparisons from an independent MM patients cohort. Finally, expression of FOS in two independent Stage III MM tumor datasets correctly predicted clinical outcome in 12/14 and 44/70 patients using a weighted gene voting classifier (area under the curve values 0.96 and 0.75, respectively). This RRHO-based, cross-study analysis emphasizes the RRHO approach power, confirms T cells relevance for prolonged MM survival, supports a favorable role for B cells in anti-melanoma immunity, and suggests B cells potential as means of intervention in melanoma treatment. PMID:26883106

  7. Evaluation of results from genome-wide studies of language and reading in a novel independent dataset.

    PubMed

    Carrion-Castillo, A; van Bergen, E; Vino, A; van Zuijen, T; de Jong, P F; Francks, C; Fisher, S E

    2016-07-01

    Recent genome-wide association scans (GWAS) for reading and language abilities have pin-pointed promising new candidate loci. However, the potential contributions of these loci remain to be validated. In this study, we tested 17 of the most significantly associated single nucleotide polymorphisms (SNPs) from these GWAS studies (P < 10(-6) in the original studies) in a new independent population dataset from the Netherlands: known as Familial Influences on Literacy Abilities. This dataset comprised 483 children from 307 nuclear families and 505 adults (including parents of participating children), and provided adequate statistical power to detect the effects that were previously reported. The following measures of reading and language performance were collected: word reading fluency, nonword reading fluency, phonological awareness and rapid automatized naming. Two SNPs (rs12636438 and rs7187223) were associated with performance in multivariate and univariate testing, but these did not remain significant after correction for multiple testing. Another SNP (rs482700) was only nominally associated in the multivariate test. For the rest of the SNPs, we did not find supportive evidence of association. The findings may reflect differences between our study and the previous investigations with respect to the language of testing, the exact tests used and the recruitment criteria. Alternatively, most of the prior reported associations may have been false positives. A larger scale GWAS meta-analysis than those previously performed will likely be required to obtain robust insights into the genomic architecture underlying reading and language.

  8. Accuracy and Precision in the Southern Hemisphere Additional Ozonesondes (SHADOZ) Dataset in Light of the JOSIE-2000 Results

    NASA Technical Reports Server (NTRS)

    Witte, Jacquelyn C.; Thompson, Anne M.; Schmidlin, F. J.; Oltmans, S. J.; Smit, H. G. J.

    2004-01-01

    Since 1998 the Southern Hemisphere ADditional OZonesondes (SHADOZ) project has provided over 2000 ozone profiles over eleven southern hemisphere tropical and subtropical stations. Balloon-borne electrochemical concentration cell (ECC) ozonesondes are used to measure ozone. The data are archived at: &ttp://croc.gsfc.nasa.gov/shadoz>. In analysis of ozonesonde imprecision within the SHADOZ dataset, Thompson et al. [JGR, 108,8238,20031 we pointed out that variations in ozonesonde technique (sensor solution strength, instrument manufacturer, data processing) could lead to station-to-station biases within the SHADOZ dataset. Imprecisions and accuracy in the SHADOZ dataset are examined in light of new data. First, SHADOZ total ozone column amounts are compared to version 8 TOMS (2004 release). As for TOMS version 7, satellite total ozone is usually higher than the integrated column amount from the sounding. Discrepancies between the sonde and satellite datasets decline two percentage points on average, compared to version 7 TOMS offsets. Second, the SHADOZ station data are compared to results of chamber simulations (JOSE-2000, Juelich Ozonesonde Intercomparison Experiment) in which the various SHADOZ techniques were evaluated. The range of JOSE column deviations from a standard instrument (-10%) in the chamber resembles that of the SHADOZ station data. It appears that some systematic variations in the SHADOZ ozone record are accounted for by differences in solution strength, data processing and instrument type (manufacturer).

  9. Complementary Aerodynamic Performance Datasets for Variable Speed Power Turbine Blade Section from Two Independent Transonic Turbine Cascades

    NASA Technical Reports Server (NTRS)

    Flegel, Ashlie B.; Welch, Gerard E.; Giel, Paul W.; Ames, Forrest E.; Long, Jonathon A.

    2015-01-01

    Two independent experimental studies were conducted in linear cascades on a scaled, two-dimensional mid-span section of a representative Variable Speed Power Turbine (VSPT) blade. The purpose of these studies was to assess the aerodynamic performance of the VSPT blade over large Reynolds number and incidence angle ranges. The influence of inlet turbulence intensity was also investigated. The tests were carried out in the NASA Glenn Research Center Transonic Turbine Blade Cascade Facility and at the University of North Dakota (UND) High Speed Compressible Flow Wind Tunnel Facility. A large database was developed by acquiring total pressure and exit angle surveys and blade loading data for ten incidence angles ranging from +15.8deg to -51.0deg. Data were acquired over six flow conditions with exit isentropic Reynolds number ranging from 0.05×106 to 2.12×106 and at exit Mach numbers of 0.72 (design) and 0.35. Flow conditions were examined within the respective facility constraints. The survey data were integrated to determine average exit total-pressure and flow angle. UND also acquired blade surface heat transfer data at two flow conditions across the entire incidence angle range aimed at quantifying transitional flow behavior on the blade. Comparisons of the aerodynamic datasets were made for three "match point" conditions. The blade loading data at the match point conditions show good agreement between the facilities. This report shows comparisons of other data and highlights the unique contributions of the two facilities. The datasets are being used to advance understanding of the aerodynamic challenges associated with maintaining efficient power turbine operation over a wide shaft-speed range.

  10. Objective identification of mid-latitude storms in satellite imagery: determination of an independent storm validation dataset.

    NASA Astrophysics Data System (ADS)

    Delsol, C.; Hodges, K.

    2003-04-01

    Current methods of validating GCMs involve comparing model results with Re-analysis datasets in which observations have been combined with a model. The quality of this approach depends on the observational data distribution in space and time and on the model formulation. We propose to use an automatic and objective technique that can provide efficiently a dataset of “real” data against which the models and re-analysis can be validated based on the identification and tracking of weather systems in satellite imagery. We present results of a boundary finding method based on Fourier Shape Descriptors for the identification of extra-tropical cyclones in the mid-latitudes using NOAA’s AVHRR IR imagery. The boundary-finding method, initially derived for medical image processing, is designed to incorporate model-based information into a boundary finding process for continuously deformable objects. This allows us to work with objects that are diverse and irregular in their shape such as developing weather systems. The method is suited to work in an environment, which may contain spurious and broken boundaries. The main characteristic features of an extra-tropical system such as the vortex and associated frontal systems are identified. This work provides a basis for statistical analyses of extra-tropical cyclones for climatological studies and for the validation of GCMs, making use of the vast amount of satellite archive data available. It is also useful for individual case studies for weather forecast verification.

  11. 10 CFR 431.174 - Additional requirements applicable to Voluntary Independent Certification Program participants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 3 2011-01-01 2011-01-01 false Additional requirements applicable to Voluntary Independent Certification Program participants. 431.174 Section 431.174 Energy DEPARTMENT OF ENERGY ENERGY CONSERVATION ENERGY EFFICIENCY PROGRAM FOR CERTAIN COMMERCIAL AND INDUSTRIAL EQUIPMENT Provisions...

  12. 10 CFR 431.175 - Additional requirements applicable to non-Voluntary Independent Certification Program participants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... represented value of energy usage is no less than the greater of the mean of the sample, or the upper 95... 10 Energy 3 2011-01-01 2011-01-01 false Additional requirements applicable to non-Voluntary Independent Certification Program participants. 431.175 Section 431.175 Energy DEPARTMENT OF ENERGY...

  13. Concentration Addition, Independent Action and Generalized Concentration Addition Models for Mixture Effect Prediction of Sex Hormone Synthesis In Vitro

    PubMed Central

    Hadrup, Niels; Taxvig, Camilla; Pedersen, Mikael; Nellemann, Christine; Hass, Ulla; Vinggaard, Anne Marie

    2013-01-01

    Humans are concomitantly exposed to numerous chemicals. An infinite number of combinations and doses thereof can be imagined. For toxicological risk assessment the mathematical prediction of mixture effects, using knowledge on single chemicals, is therefore desirable. We investigated pros and cons of the concentration addition (CA), independent action (IA) and generalized concentration addition (GCA) models. First we measured effects of single chemicals and mixtures thereof on steroid synthesis in H295R cells. Then single chemical data were applied to the models; predictions of mixture effects were calculated and compared to the experimental mixture data. Mixture 1 contained environmental chemicals adjusted in ratio according to human exposure levels. Mixture 2 was a potency adjusted mixture containing five pesticides. Prediction of testosterone effects coincided with the experimental Mixture 1 data. In contrast, antagonism was observed for effects of Mixture 2 on this hormone. The mixtures contained chemicals exerting only limited maximal effects. This hampered prediction by the CA and IA models, whereas the GCA model could be used to predict a full dose response curve. Regarding effects on progesterone and estradiol, some chemicals were having stimulatory effects whereas others had inhibitory effects. The three models were not applicable in this situation and no predictions could be performed. Finally, the expected contributions of single chemicals to the mixture effects were calculated. Prochloraz was the predominant but not sole driver of the mixtures, suggesting that one chemical alone was not responsible for the mixture effects. In conclusion, the GCA model seemed to be superior to the CA and IA models for the prediction of testosterone effects. A situation with chemicals exerting opposing effects, for which the models could not be applied, was identified. In addition, the data indicate that in non-potency adjusted mixtures the effects cannot always be

  14. Independent and additive effects of glutamic acid and methionine on yeast longevity.

    PubMed

    Wu, Ziyun; Song, Lixia; Liu, Shao Quan; Huang, Dejian

    2013-01-01

    It is established that glucose restriction extends yeast chronological and replicative lifespan, but little is known about the influence of amino acids on yeast lifespan, although some amino acids were reported to delay aging in rodents. Here we show that amino acid composition greatly alters yeast chronological lifespan. We found that non-essential amino acids (to yeast) methionine and glutamic acid had the most significant impact on yeast chronological lifespan extension, restriction of methionine and/or increase of glutamic acid led to longevity that was not the result of low acetic acid production and acidification in aging media. Remarkably, low methionine, high glutamic acid and glucose restriction additively and independently extended yeast lifespan, which could not be further extended by buffering the medium (pH 6.0). Our preliminary findings using yeasts with gene deletion demonstrate that glutamic acid addition, methionine and glucose restriction prompt yeast longevity through distinct mechanisms. This study may help to fill a gap in yeast model for the fast developing view that nutrient balance is a critical factor to extend lifespan.

  15. Independent and Additive Effects of Glutamic Acid and Methionine on Yeast Longevity

    PubMed Central

    Wu, Ziyun; Song, Lixia; Liu, Shao Quan; Huang, Dejian

    2013-01-01

    It is established that glucose restriction extends yeast chronological and replicative lifespan, but little is known about the influence of amino acids on yeast lifespan, although some amino acids were reported to delay aging in rodents. Here we show that amino acid composition greatly alters yeast chronological lifespan. We found that non-essential amino acids (to yeast) methionine and glutamic acid had the most significant impact on yeast chronological lifespan extension, restriction of methionine and/or increase of glutamic acid led to longevity that was not the result of low acetic acid production and acidification in aging media. Remarkably, low methionine, high glutamic acid and glucose restriction additively and independently extended yeast lifespan, which could not be further extended by buffering the medium (pH 6.0). Our preliminary findings using yeasts with gene deletion demonstrate that glutamic acid addition, methionine and glucose restriction prompt yeast longevity through distinct mechanisms. This study may help to fill a gap in yeast model for the fast developing view that nutrient balance is a critical factor to extend lifespan. PMID:24244480

  16. Evaluating the robustness of models developed from field spectral data in predicting African grass foliar nitrogen concentration using WorldView-2 image as an independent test dataset

    NASA Astrophysics Data System (ADS)

    Mutanga, Onisimo; Adam, Elhadi; Adjorlolo, Clement; Abdel-Rahman, Elfatih M.

    2015-02-01

    In this paper, we evaluate the extent to which the resampled field spectra compare with the actual image spectra of the new generation multispectral WorldView-2 (WV-2) satellite. This was achieved by developing models from resampled field spectra data and testing them on an actual WV-2 image of the study area. We evaluated the performance of reflectance ratios (RI), normalized difference indices (NDI) and random forest (RF) regression model in predicting foliar nitrogen concentration in a grassland environment. The field measured spectra were used to calibrate the RF model using a randomly selected training (n = 70%) nitrogen data set. The model developed from the field spectra resampled to WV-2 wavebands was validated on an independent field spectral test dataset as well as on the actual WV-2 image of the same area (n = 30%, bootstrapped a 100 times). The results show that the model developed using RI could predict nitrogen with a mean R2 of 0.74 and 0.65 on an independent field spectral test data set and on the actual WV-2 image, respectively. The root mean square error of prediction (RMSE %) was 0.17 and 0.22 for the field test data set and the WV-2 image, respectively. Results provide an insight on the magnitude of errors that are expected when up-scaling field spectral models to airborne or satellite image data. The prediction also indicates the unceasing relevance of field spectroscopy studies to better understand the spectral models critical for vegetation quality assessment.

  17. The Use of Additional GPS Frequencies to Independently Determine Tropospheric Water Vapor Profiles

    NASA Technical Reports Server (NTRS)

    Herman, B.M.; Feng, D.; Flittner, D. E.; Kursinski, E. R.

    2000-01-01

    It is well known that the currently employed L1 and L2 GPS/MET frequencies (1.2 - 1.6) Ghz) do not allow for the separation of water vapor and density (or temperature) from active microwave occultation measurements in regions of the troposphere warmer than 240 K Therefore, additional information must be used, from other types of measurements and weather analyses, to recover water vapor (and temperature) profiles. Thus in data sparse regions, these inferred profiles can be subject to larger errors than would result in data rich regions. The use of properly selected additional GPS frequencies enables a direct, independent measurement of the absorption associated with the water vapor profile, which may then be used in the standard GPS/MET retrievals to obtain a more accurate determination of atmospheric temperature throughout the water vapor layer. This study looks at the use of microwave crosslinks in the region of the 22 Ghz water vapor absorption line for this purpose. An added advantage of using 22 Ghz frequencies is that they are only negligibly affected by the ionosphere in contrast to the large effect at the GPS frequencies. The retrieval algorithm uses both amplitude and phase measurements to obtain profiles of atmospheric pressure, temperature and water water vapor pressure with a vertical resolution of 1 km or better. This technique also provides the cloud liquid water content along the ray path, which is in itself an important element in climate monitoring. Advantages of this method include the ability to make measurements in the presence of clouds and the use of techniques and technology proven through the GPS/MET experiment and several of NASA's planetary exploration missions. Simulations demonstrating this method will be presented for both clear and cloudy sky conditions.

  18. Discovery of gene-gene interactions across multiple independent datasets of Late Onset Alzheimer Disease from the Alzheimer Disease Genetics Consortium

    PubMed Central

    Hohman, Timothy J.; Bush, William S.; Jiang, Lan; Brown-Gentry, Kristin D.; Torstenson, Eric S.; Dudek, Scott M.; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W.; Ritchie, Marylyn D.; Martin, Eden R.; Schellenberg, Gerard D.; Mayeux, Richard; Farrer, Lindsay A.; Pericak-Vance, Margaret A.; Haines, Jonathan L.; Thornton-Wells, Tricia A.

    2015-01-01

    Late-onset Alzheimer disease (LOAD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance and gene-gene interactions; however, the investigation of interactions in recent GWAS has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across thirteen datasets from the Alzheimer Disease Genetics Consortium. Fifteen SNP-SNP pairs within three gene-gene combinations were identified: SIRT1 x ABCB1, PSAP x PEBP4, and GRIN2B x ADRA1A. Additionally, we extend a previously identified interaction from an endophenotype analysis between RYR3 x CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this manuscript highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis. PMID:26827652

  19. Accuracy and Precision in the Southern Hemisphere Additional Ozonesondes (SHADOZ) Dataset 1998-2000 in Light of the JOSIE-2000 Results

    NASA Technical Reports Server (NTRS)

    Witte, J. C.; Thompson, A. M.; Schmidlin, F. J.; Oltmans, S. J.; McPeters, R. D.; Smit, H. G. J.

    2003-01-01

    A network of 12 southern hemisphere tropical and subtropical stations in the Southern Hemisphere ADditional OZonesondes (SHADOZ) project has provided over 2000 profiles of stratospheric and tropospheric ozone since 1998. Balloon-borne electrochemical concentration cell (ECC) ozonesondes are used with standard radiosondes for pressure, temperature and relative humidity measurements. The archived data are available at:http: //croc.gsfc.nasa.gov/shadoz. In Thompson et al., accuracies and imprecisions in the SHADOZ 1998- 2000 dataset were examined using ground-based instruments and the TOMS total ozone measurement (version 7) as references. Small variations in ozonesonde technique introduced possible biases from station-to-station. SHADOZ total ozone column amounts are now compared to version 8 TOMS; discrepancies between the two datasets are reduced 2\\% on average. An evaluation of ozone variations among the stations is made using the results of a series of chamber simulations of ozone launches (JOSIE-2000, Juelich Ozonesonde Intercomparison Experiment) in which a standard reference ozone instrument was employed with the various sonde techniques used in SHADOZ. A number of variations in SHADOZ ozone data are explained when differences in solution strength, data processing and instrument type (manufacturer) are taken into account.

  20. Nitrogen addition and warming independently influence the belowground micro-food web in a temperate steppe.

    PubMed

    Li, Qi; Bai, Huahua; Liang, Wenju; Xia, Jianyang; Wan, Shiqiang; van der Putten, Wim H

    2013-01-01

    Climate warming and atmospheric nitrogen (N) deposition are known to influence ecosystem structure and functioning. However, our understanding of the interactive effect of these global changes on ecosystem functioning is relatively limited, especially when it concerns the responses of soils and soil organisms. We conducted a field experiment to study the interactive effects of warming and N addition on soil food web. The experiment was established in 2006 in a temperate steppe in northern China. After three to four years (2009-2010), we found that N addition positively affected microbial biomass and negatively influenced trophic group and ecological indices of soil nematodes. However, the warming effects were less obvious, only fungal PLFA showed a decreasing trend under warming. Interestingly, the influence of N addition did not depend on warming. Structural equation modeling analysis suggested that the direct pathway between N addition and soil food web components were more important than the indirect connections through alterations in soil abiotic characters or plant growth. Nitrogen enrichment also affected the soil nematode community indirectly through changes in soil pH and PLFA. We conclude that experimental warming influenced soil food web components of the temperate steppe less than N addition, and there was little influence of warming on N addition effects under these experimental conditions.

  1. Nitrogen Addition and Warming Independently Influence the Belowground Micro-Food Web in a Temperate Steppe

    PubMed Central

    Li, Qi; Bai, Huahua; Liang, Wenju; Xia, Jianyang; Wan, Shiqiang; van der Putten, Wim H.

    2013-01-01

    Climate warming and atmospheric nitrogen (N) deposition are known to influence ecosystem structure and functioning. However, our understanding of the interactive effect of these global changes on ecosystem functioning is relatively limited, especially when it concerns the responses of soils and soil organisms. We conducted a field experiment to study the interactive effects of warming and N addition on soil food web. The experiment was established in 2006 in a temperate steppe in northern China. After three to four years (2009–2010), we found that N addition positively affected microbial biomass and negatively influenced trophic group and ecological indices of soil nematodes. However, the warming effects were less obvious, only fungal PLFA showed a decreasing trend under warming. Interestingly, the influence of N addition did not depend on warming. Structural equation modeling analysis suggested that the direct pathway between N addition and soil food web components were more important than the indirect connections through alterations in soil abiotic characters or plant growth. Nitrogen enrichment also affected the soil nematode community indirectly through changes in soil pH and PLFA. We conclude that experimental warming influenced soil food web components of the temperate steppe less than N addition, and there was little influence of warming on N addition effects under these experimental conditions. PMID:23544140

  2. Developing Independent Listening Skills for English as an Additional Language Students

    ERIC Educational Resources Information Center

    Picard, Michelle; Velautham, Lalitha

    2016-01-01

    This paper describes an action research project to develop online, self-access listening resources mirroring the authentic academic contexts experienced by graduate university students. Current listening materials for English as an Additional Language (EAL) students mainly use Standard American English or Standard British pronunciation, and far…

  3. Treatment Planning Constraints to Avoid Xerostomia in Head-and-Neck Radiotherapy: An Independent Test of QUANTEC Criteria Using a Prospectively Collected Dataset

    SciTech Connect

    Moiseenko, Vitali; Wu, Jonn; Hovan, Allan; Saleh, Ziad; Apte, Aditya; Deasy, Joseph O.; Harrow, Stephen; Rabuka, Carman; Muggli, Adam; Thompson, Anna

    2012-03-01

    Purpose: The severe reduction of salivary function (xerostomia) is a common complication after radiation therapy for head-and-neck cancer. Consequently, guidelines to ensure adequate function based on parotid gland tolerance dose-volume parameters have been suggested by the QUANTEC group and by Ortholan et al. We perform a validation test of these guidelines against a prospectively collected dataset and compared with a previously published dataset. Methods and Materials: Whole-mouth stimulated salivary flow data from 66 head-and-neck cancer patients treated with radiotherapy at the British Columbia Cancer Agency (BCCA) were measured, and treatment planning data were abstracted. Flow measurements were collected from 50 patients at 3 months, and 60 patients at 12-month follow-up. Previously published data from a second institution, Washington University in St. Louis (WUSTL), were used for comparison. A logistic model was used to describe the incidence of Grade 4 xerostomia as a function of the mean dose of the spared parotid gland. The rate of correctly predicting the lack of xerostomia (negative predictive value [NPV]) was computed for both the QUANTEC constraints and Ortholan et al. recommendation to constrain the total volume of both glands receiving more than 40 Gy to less than 33%. Results: Both datasets showed a rate of xerostomia of less than 20% when the mean dose to the least-irradiated parotid gland is kept to less than 20 Gy. Logistic model parameters for the incidence of xerostomia at 12 months after therapy, based on the least-irradiated gland, were D{sub 50} = 32.4 Gy and and {gamma} = 0.97. NPVs for QUANTEC guideline were 94% (BCCA data), and 90% (WUSTL data). For Ortholan et al. guideline NPVs were 85% (BCCA) and 86% (WUSTL). Conclusion: These data confirm that the QUANTEC guideline effectively avoids xerostomia, and this is somewhat more effective than constraints on the volume receiving more than 40 Gy.

  4. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  5. Additional role of O-acetylserine as a sulfur status-independent regulator during plant growth.

    PubMed

    Hubberten, Hans-Michael; Klie, Sebastian; Caldana, Camila; Degenkolbe, Thomas; Willmitzer, Lothar; Hoefgen, Rainer

    2012-05-01

    O-acetylserine (OAS) is one of the most prominent metabolites whose levels are altered upon sulfur starvation. However, its putative role as a signaling molecule in higher plants is controversial. This paper provides further evidence that OAS is a signaling molecule, based on computational analysis of time-series experiments and on studies of transgenic plants conditionally displaying increased OAS levels. Transcripts whose levels correlated with the transient and specific increase in OAS levels observed in leaves of Arabidopsis thaliana plants 5-10 min after transfer to darkness and with diurnal oscillation of the OAS content, showing a characteristic peak during the night, were identified. Induction of a serine-O-acetyltransferase gene (SERAT) in transgenic A. thaliana plants expressing the genes under the control of an inducible promoter resulted in a specific time-dependent increase in OAS levels. Monitoring the transcriptome response at time points at which no changes in sulfur-related metabolites except OAS were observed and correlating this with the light/dark transition and diurnal experiments resulted in identification of six genes whose expression was highly correlated with that of OAS (adenosine-5'-phosphosulfate reductase 3, sulfur-deficiency-induced 1, sulfur-deficiency-induced 2, low-sulfur-induced 1, serine hydroxymethyltransferase 7 and ChaC-like protein). These data suggest that OAS displays a signalling function leading to changes in transcript levels of a specific gene set irrespective of the sulfur status of the plant. Additionally, a role for OAS in a specific part of the sulfate response can be deduced.

  6. Goal-directed and transfer-cue-elicited drug-seeking are dissociated by pharmacotherapy: evidence for independent additive controllers.

    PubMed

    Hogarth, Lee

    2012-07-01

    According to contemporary learning theory, drug-seeking behavior reflects the summation of 2 dissociable controllers. Whereas goal-directed drug-seeking is determined by the expected current incentive value of the drug, stimulus-elicited drug-seeking is determined by the expected probability of the drug independently of its current incentive value, and these 2 controllers contribute additively to observed drug-seeking. One applied prediction of this model is that smoking cessation pharmacotherapies selectively attenuate tonic but not cue-elicited craving because they downgrade the expected incentive value of the drug but leave expected probability intact. To test this, the current study examined whether nicotine replacement therapy (NRT) nasal spray would modify goal-directed tobacco choice in a human outcome devaluation procedure, but leave cue-elicited tobacco choice in a Pavlovian to instrumental transfer (PIT) procedure intact. Smokers (N= 96) first underwent concurrent choice training in which 2 responses earned tobacco or chocolate points, respectively. Participants then ingested either NRT nasal spray (1 mg) or chocolate (147 g) to devalue 1 outcome. Concurrent choice was then tested again in extinction to measure goal-directed control of choice, and in a PIT test to measure the extent to which tobacco and chocolate stimuli enhanced choice of the same outcome. It was found that NRT modified tobacco choice in the extinction test but not the extent to which the tobacco stimulus enhanced choice of the tobacco outcome in the PIT test. This dissociation suggests that the propensity to engage in drug-seeking is determined independently by the expected value and probability of the drug, and that pharmacotherapy has partial efficacy because it selectively effects expected drug value.

  7. Dataset of calcified plaque condition in the stenotic coronary artery lesion obtained using multidetector computed tomography to indicate the addition of rotational atherectomy during percutaneous coronary intervention.

    PubMed

    Akutsu, Yasushi; Hamazaki, Yuji; Sekimoto, Teruo; Kaneko, Kyouichi; Kodama, Yusuke; Li, Hui-Ling; Suyama, Jumpei; Gokan, Takehiko; Sakai, Koshiro; Kosaki, Ryota; Yokota, Hiroyuki; Tsujita, Hiroaki; Tsukamoto, Shigeto; Sakurai, Masayuki; Sambe, Takehiko; Oguchi, Katsuji; Uchida, Naoki; Kobayashi, Shinichi; Aoki, Atsushi; Kobayashi, Youichi

    2016-06-01

    Our data shows the regional coronary artery calcium scores (lesion CAC) on multidetector computed tomography (MDCT) and the cross-section imaging on MDCT angiography (CTA) in the target lesion of the patients with stable angina pectoris who were scheduled for percutaneous coronary intervention (PCI). CAC and CTA data were measured using a 128-slice scanner (Somatom Definition AS+; Siemens Medical Solutions, Forchheim, Germany) before PCI. CAC was measured in a non-contrast-enhanced scan and was quantified using the Calcium Score module of SYNAPSE VINCENT software (Fujifilm Co. Tokyo, Japan) and expressed in Agatston units. CTA were then continued with a contrast-enhanced ECG gating to measure the severity of the calcified plaque condition. We present that both CAC and CTA data are used as a benchmark to consider the addition of rotational atherectomy during PCI to severely calcified plaque lesions.

  8. Dataset of calcified plaque condition in the stenotic coronary artery lesion obtained using multidetector computed tomography to indicate the addition of rotational atherectomy during percutaneous coronary intervention

    PubMed Central

    Akutsu, Yasushi; Hamazaki, Yuji; Sekimoto, Teruo; Kaneko, Kyouichi; Kodama, Yusuke; Li, Hui-Ling; Suyama, Jumpei; Gokan, Takehiko; Sakai, Koshiro; Kosaki, Ryota; Yokota, Hiroyuki; Tsujita, Hiroaki; Tsukamoto, Shigeto; Sakurai, Masayuki; Sambe, Takehiko; Oguchi, Katsuji; Uchida, Naoki; Kobayashi, Shinichi; Aoki, Atsushi; Kobayashi, Youichi

    2016-01-01

    Our data shows the regional coronary artery calcium scores (lesion CAC) on multidetector computed tomography (MDCT) and the cross-section imaging on MDCT angiography (CTA) in the target lesion of the patients with stable angina pectoris who were scheduled for percutaneous coronary intervention (PCI). CAC and CTA data were measured using a 128-slice scanner (Somatom Definition AS+; Siemens Medical Solutions, Forchheim, Germany) before PCI. CAC was measured in a non-contrast-enhanced scan and was quantified using the Calcium Score module of SYNAPSE VINCENT software (Fujifilm Co. Tokyo, Japan) and expressed in Agatston units. CTA were then continued with a contrast-enhanced ECG gating to measure the severity of the calcified plaque condition. We present that both CAC and CTA data are used as a benchmark to consider the addition of rotational atherectomy during PCI to severely calcified plaque lesions. PMID:26977441

  9. Maturation of poultry G-I microbiome during 42d of growth is independent of organic acid feed additives

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Poultry remains a major source of foodborne infections in the U.S. and globally. A variety of additives with presumed anti-microbial and/or growth-promoting effects are commonly added to poultry feed, yet the effects of these additives on the ecology of the gastro-intestinal microbial community (th...

  10. Statistical Reference Datasets

    National Institute of Standards and Technology Data Gateway

    Statistical Reference Datasets (Web, free access)   The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.

  11. Independent and additive contributions of postvictory testosterone and social experience to the development of the winner effect.

    PubMed

    Fuxjager, Matthew J; Oyegbile, Temitayo O; Marler, Catherine A

    2011-09-01

    The processes through which salient social experiences influence future behavior are not well understood. Winning fights, for example, can increase the odds of future victory, yet little is known about the internal mechanisms that underlie such winner effects. Here, we use the territorial California mouse (Peromyscus californicus) to investigate how the effects of postvictory testosterone (T) release and winning experience individually mediate positive changes in future winning ability and antagonistic behavior. Male mice were castrated and implanted with T capsules to maintain basal levels of this hormone. We found that males form a robust winner effect if they win three separate territorial disputes and experience a single T surge roughly 45 min after each encounter. Meanwhile, males exhibit only an intermediate winner effect if they either 1) acquire three previous wins but do not experience a change in postvictory T or 2) acquire no previous wins but experience three separate T pulses. The results indicate that the effect of postvictory T must be coupled with that of winning experience to trigger the maximum positive shift in winning ability, which highlights the importance of social context in the development of the winner effect. At the same time, however, postvictory T and winning experience are each capable of increasing future winning ability independently, and this finding suggests that these two factors drive plasticity in antagonistic behavior via distinct mechanistic channels. More broadly, our data offer insight into the possible ways in which various species might be able to adjust their behavioral repertoire in response to social interactions through mechanisms that are unlinked from the effects of gonadal steroid action.

  12. Autologous temporomandibular joint reconstruction independent of exogenous additives: a proof-of-concept study for guided self-generation

    PubMed Central

    Wei, Jiao; Herrler, Tanja; Han, Dong; Liu, Kai; Huang, Rulin; Guba, Markus; Dai, Chuanchang; Li, Qingfeng

    2016-01-01

    Joint defects are complex and difficult to reconstruct. By exploiting the body’s own regenerative capacity, we aimed to individually generate anatomically precise neo-tissue constructs for autologous joint reconstruction without using any exogenous additives. In a goat model, CT scans of the mandibular condyle including articular surface and a large portion of the ascending ramus were processed using computer-aided design and manufacturing. A corresponding hydroxylapatite negative mold was printed in 3D and temporarily embedded into the transition zone of costal periosteum and perichondrium. A demineralized bone matrix scaffold implanted on the contralateral side served as control. Neo-tissue constructs obtained by guided self-generation exhibited accurate configuration, robust vascularization, biomechanical stability, and function. After autologous replacement surgery, the constructs showed stable results with similar anatomical, histological, and functional findings compared to native controls. Further studies are required to assess long-term outcome and possible extensions to other further applications. The absence of exogenous cells, growth factors, and scaffolds may facilitate clinical translation of this approach. PMID:27892493

  13. Segmentation of Unstructured Datasets

    NASA Technical Reports Server (NTRS)

    Bhat, Smitha

    1996-01-01

    Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.

  14. Dataset Lifecycle Policy

    NASA Technical Reports Server (NTRS)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  15. A global distributed basin morphometric dataset.

    PubMed

    Shen, Xinyi; Anagnostou, Emmanouil N; Mei, Yiwen; Hong, Yang

    2017-01-05

    Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack's law.

  16. A global distributed basin morphometric dataset

    NASA Astrophysics Data System (ADS)

    Shen, Xinyi; Anagnostou, Emmanouil N.; Mei, Yiwen; Hong, Yang

    2017-01-01

    Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack's law.

  17. A global distributed basin morphometric dataset

    PubMed Central

    Shen, Xinyi; Anagnostou, Emmanouil N.; Mei, Yiwen; Hong, Yang

    2017-01-01

    Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack’s law. PMID:28055032

  18. Fixing Dataset Search

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  19. Integrated fuzzy concentration addition-independent action (IFCA-IA) model outperforms two-stage prediction (TSP) for predicting mixture toxicity.

    PubMed

    Wang, Zhuang; Chen, Jingwen; Huang, Liping; Wang, Ying; Cai, Xiyun; Qiao, Xianliang; Dong, Yuying

    2009-02-01

    Mixture toxicities were determined for 12 industrial organic chemicals bearing four different modes of toxic action (MOAs) to Vibrio fischeri, to compare the predictability of the integrated fuzzy concentration addition-independent action (IFCA-IA) model and the two-stage prediction (TSP) model. Three mixtures were designed: The first and second mixtures were based on the ratios of each component at the 1% and 50% effect concentrations (EC(1) and EC(50)), respectively; and the third mixture contained an equimolar ratio of individual components. For the EC(1), EC(50) and equimolar ratio, prediction errors from the IFCA-IA model at the 50% experimental mixture effects were 0.3%, 6% and 0.6%, respectively; while for the TSP model, the corresponding errors were 2.8%, 19% and 24%, respectively. Thus, the IFCA-IA model performed better than the TSP model. The IFCA-IA model calculated two weight coefficients from the molecular structural descriptors, which weigh the relation between concentration addition (CA) and independent action (IA) through the fuzzy membership functions. Thus, MOAs are not pre-requisites for mixture toxicity prediction by the IFCA-IA approach, implying the practicability of this method in toxicity assessment of mixtures.

  20. Waterfalls linked to the National Hydrography Datasets

    USGS Publications Warehouse

    Wieferich, Daniel

    2016-01-01

    This dataset contains information about waterfall location and characteristics. It also provides a linear reference to two National Hydrography Datasets. This dataset is based off the World Waterfall Database (WWD) and other source datasets. The coordinates and spatial attributes from the WWD were used to help verify locations in Google Earth. From there, the USGS HydroLink Tool was used to link data to both the NHDPlusV2.1 (1:100,000 scale) and NHD High Resolution Dataset (1:24,000 scale). This dataset currently includes waterfalls from the states of Indiana, Wisconsin, Michigan, Nebraska, Illinois and Missouri. Waterfalls from other states are planned to be added through time, see the status and update sections of the metadata for additional information.

  1. Introduction of a simple-model-based land surface dataset for Europe

    NASA Astrophysics Data System (ADS)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar

  2. Fluxnet Synthesis Dataset Collaboration Infrastructure

    SciTech Connect

    Agarwal, Deborah A.; Humphrey, Marty; van Ingen, Catharine; Beekwilder, Norm; Goode, Monte; Jackson, Keith; Rodriguez, Matt; Weber, Robin

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  3. A plant growth form dataset for the New World.

    PubMed

    Engemann, K; Sandel, B; Boyle, B; Enquist, B J; Jørgensen, P M; Kattge, J; McGill, B J; Morueta-Holme, N; Peet, R K; Spencer, N J; Violle, C; Wiser, S K; Svenning, J-C

    2016-11-01

    This dataset provides growth form classifications for 67,413 vascular plant species from North, Central, and South America. The data used to determine growth form were compiled from five major integrated sources and two original publications: the Botanical Information and Ecology Network (BIEN), the Plant Trait Database (TRY), the SALVIAS database, the USDA PLANTS database, Missouri Botanical Garden's Tropicos database, Wright (2010), and Boyle (1996). We defined nine plant growth forms based on woodiness (woody or non-woody), shoot structure (self-supporting or not self-supporting), and root traits (rooted in soil, not rooted in soil, parasitic or aquatic): Epiphyte, Liana, Vine, Herb, Shrub, Tree, Parasite, or Aquatic. Species with multiple growth form classifications were assigned the growth form classification agreed upon by the majority (>2/3) of sources. Species with ambiguous or otherwise not interpretable growth form assignments were excluded from the final dataset but are made available with the original data. Comparisons with independent estimates of species richness for the Western hemisphere suggest that our final dataset includes the majority of New World vascular plant species. Coverage is likely more complete for temperate than for tropical species. In addition, aquatic species are likely under-represented. Nonetheless, this dataset represents the largest compilation of plant growth forms published to date, and should contribute to new insights across a broad range of research in systematics, ecology, biogeography, conservation, and global change science.

  4. National land cover dataset

    USGS Publications Warehouse

    ,

    2000-01-01

    The U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency, has produced a land cover dataset for the conterminous United States on the basis of 1992 Landsat thematic mapper imagery and supplemental data. The National Land Cover Dataset (NLCD) is a component of the USGS Land Cover Characterization Program. The seamless NLCD contains 21 categories of land cover information suitable for a variety of State and regional applications, including landscape analysis, land management, and modeling nutrient and pesticide runoff. The NLCD is distributed by State as 30-meter resolution raster images in an Albers Equal-Area map projection.

  5. STAT4 Associates with SLE Through Two Independent Effects that Correlate with Gene Expression and Act Additively with IRF5 to Increase Risk

    PubMed Central

    Abelson, Anna-Karin; Delgado-Vega, Angélica M.; Kozyrev, Sergey V.; Sánchez, Elena; Velázquez-Cruz, Rafael; Eriksson, Niclas; Wojcik, Jerome; Reddy, Prasad Linga; Lima, Guadalupe; D’Alfonso, Sandra; Migliaresi, Sergio; Baca, Vicente; Orozco, Lorena; Witte, Torsten; Ortego-Centeno, Norberto; Abderrahim, Hadi; Pons-Estel, Bernardo A.; Gutiérrez, Carmen; Suárez, Ana; González-Escribano, Maria Francisca; Martin, Javier; Alarcón-Riquelme, Marta E.

    2013-01-01

    Objectives To confirm and define the genetic association of STAT4 and systemic lupus erythematosus, investigate the possibility of correlations with differential splicing and/or expression levels, and genetic interaction with IRF5. Methods 30 tag SNPs were genotyped in an independent set of Spanish cases and controls. SNPs surviving correction for multiple tests were genotyped in 5 new sets of cases and controls for replication. STAT4 cDNA was analyzed by 5’-RACE PCR and sequencing. Expression levels were measured by quantitative PCR. Results In the fine-mapping, four SNPs were significant after correction for multiple testing, with rs3821236 and rs3024866 as the strongest signals, followed by the previously associated rs7574865, and by rs1467199. Association was replicated in all cohorts. After conditional regression analyses, two major independent signals represented by SNPs rs3821236 and rs7574865, remained significant across the sets. These SNPs belong to separate haplotype blocks. High levels of STAT4 expression correlated with SNPs rs3821236, rs3024866 (both in the same haplotype block) and rs7574865 but not with other SNPs. We also detected transcription of alternative tissue-specific exons 1, indicating presence of tissue-specific promoters of potential importance in the expression of STAT4. No interaction with associated SNPs of IRF5 was observed using regression analysis. Conclusions These data confirm STAT4 as a susceptibility gene for SLE and suggest the presence of at least two functional variants affecting levels of STAT4. Our results also indicate that both genes STAT4 and IRF5 act additively to increase risk for SLE. PMID:19019891

  6. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    SciTech Connect

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul; Li, Yaquin; Garg, Seema; Tobin Jr, Kenneth William; Chaum, Edward

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.

  7. Plant Functional Diversity Can Be Independent of Species Diversity: Observations Based on the Impact of 4-Yrs of Nitrogen and Phosphorus Additions in an Alpine Meadow.

    PubMed

    Li, Wei; Cheng, Ji-Min; Yu, Kai-Liang; Epstein, Howard E; Guo, Liang; Jing, Guang-Hua; Zhao, Jie; Du, Guo-Zhen

    2015-01-01

    Past studies have widely documented the decrease in species diversity in response to addition of nutrients, however functional diversity is often independent from species diversity. In this study, we conducted a field experiment to examine the effect of nitrogen and phosphorus fertilization ((NH4)2 HPO4) at 0, 15, 30 and 60 g m-2 yr-1 (F0, F15, F30 and F60) after 4 years of continuous fertilization on functional diversity and species diversity, and its relationship with productivity in an alpine meadow community on the Tibetan Plateau. To this purpose, three community-weighted mean trait values (specific leaf area, SLA; mature plant height, MPH; and seed size, SS) for 30 common species in each fertilization level were determined; three components of functional diversity (functional richness, FRic; functional evenness, FEve; and Rao's index of quadratic entropy, FRao) were quantified. Our results showed that: (i) species diversity sharply decreased, but functional diversity remained stable with fertilization; (ii) community-weighted mean traits (SLA and MPH) had a significant increase along the fertilization level; (iii) aboveground biomass was not correlated with functional diversity, but it was significantly correlated with species diversity and MPH. Our results suggest that decreases in species diversity due to fertilization do not result in corresponding changes in functional diversity. Functional identity of species may be more important than functional diversity in influencing aboveground productivity in this alpine meadow community, and our results also support the mass ratio hypothesis; that is, the traits of the dominant species influenced the community biomass production.

  8. Plant Functional Diversity Can Be Independent of Species Diversity: Observations Based on the Impact of 4-Yrs of Nitrogen and Phosphorus Additions in an Alpine Meadow

    PubMed Central

    Li, Wei; Cheng, Ji-Min; Yu, Kai-Liang; Epstein, Howard E.; Guo, Liang; Jing, Guang-Hua; Zhao, Jie; Du, Guo-Zhen

    2015-01-01

    Past studies have widely documented the decrease in species diversity in response to addition of nutrients, however functional diversity is often independent from species diversity. In this study, we conducted a field experiment to examine the effect of nitrogen and phosphorus fertilization ((NH4)2 HPO4) at 0, 15, 30 and 60 g m-2 yr-1 (F0, F15, F30 and F60) after 4 years of continuous fertilization on functional diversity and species diversity, and its relationship with productivity in an alpine meadow community on the Tibetan Plateau. To this purpose, three community-weighted mean trait values (specific leaf area, SLA; mature plant height, MPH; and seed size, SS) for 30 common species in each fertilization level were determined; three components of functional diversity (functional richness, FRic; functional evenness, FEve; and Rao’s index of quadratic entropy, FRao) were quantified. Our results showed that: (i) species diversity sharply decreased, but functional diversity remained stable with fertilization; (ii) community-weighted mean traits (SLA and MPH) had a significant increase along the fertilization level; (iii) aboveground biomass was not correlated with functional diversity, but it was significantly correlated with species diversity and MPH. Our results suggest that decreases in species diversity due to fertilization do not result in corresponding changes in functional diversity. Functional identity of species may be more important than functional diversity in influencing aboveground productivity in this alpine meadow community, and our results also support the mass ratio hypothesis; that is, the traits of the dominant species influenced the community biomass production. PMID:26295345

  9. The National Hydrography Dataset

    USGS Publications Warehouse

    ,

    1999-01-01

    The National Hydrography Dataset (NHD) is a newly combined dataset that provides hydrographic data for the United States. The NHD is the culmination of recent cooperative efforts of the U.S. Environmental Protection Agency (USEPA) and the U.S. Geological Survey (USGS). It combines elements of USGS digital line graph (DLG) hydrography files and the USEPA Reach File (RF3). The NHD supersedes RF3 and DLG files by incorporating them, not by replacing them. Users of RF3 or DLG files will find the same data in a new, more flexible format. They will find that the NHD is familiar but greatly expanded and refined. The DLG files contribute a national coverage of millions of features, including water bodies such as lakes and ponds, linear water features such as streams and rivers, and also point features such as springs and wells. These files provide standardized feature types, delineation, and spatial accuracy. From RF3, the NHD acquires hydrographic sequencing, upstream and downstream navigation for modeling applications, and reach codes. The reach codes provide a way to integrate data from organizations at all levels by linking the data to this nationally consistent hydrographic network. The feature names are from the Geographic Names Information System (GNIS). The NHD provides comprehensive coverage of hydrographic data for the United States. Some of the anticipated end-user applications of the NHD are multiuse hydrographic modeling and water-quality studies of fish habitats. Although based on 1:100,000-scale data, the NHD is planned so that it can incorporate and encourage the development of the higher resolution data that many users require. The NHD can be used to promote the exchange of data between users at the national, State, and local levels. Many users will benefit from the NHD and will want to contribute to the dataset as well.

  10. Dataset of the proteome of purified outer membrane vesicles from the human pathogen Aggregatibacter actinomycetemcomintans.

    PubMed

    Kieselbach, Thomas; Oscarsson, Jan

    2017-02-01

    The Gram-negative bacterium Aggregatibacter actinomycetemcomitans is an oral and systemic pathogen, which is linked to aggressive forms of periodontitis and can be associated with endocarditis. The outer membrane vesicles (OMVs) of this species contain effector proteins such as cytolethal distending toxin (CDT) and leukotoxin (LtxA), which they can deliver into human host cells. The OMVs can also activate innate immunity through NOD1- and NOD2-active pathogen-associated molecular patterns. This dataset provides a proteome of highly purified OMVs from A. actinomycetemcomitans serotype e strain 173. The experimental data do not only include the raw data of the LC-MS/MS analysis of four independent preparations of purified OMVs but also the mass lists of the processed data and the Mascot.dat files from the database searches. In total 501 proteins are identified, of which 151 are detected in at least three of four independent preparations. In addition, this dataset contains the COG definitions and the predicted subcellular locations (PSORTb 3.0) for the entire genome of A. actinomycetemcomitans serotype e strain SC1083, which is used for the evaluation of the LC-MS/MS data. These data are deposited in ProteomeXchange in the public dataset PXD002509. In addition, a scientific interpretation of this dataset by Kieselbach et al. (2015) [2] is available at http://dx.doi.org/10.1371/journal.pone.0138591.

  11. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  12. QTLMAS 2009: simulated dataset

    PubMed Central

    2010-01-01

    Background The simulation of the data for the QTLMAS 2009 Workshop is described. Objective was to simulate observations from a growth curve which was influenced by a number of QTL. Results The data consisted of markers, phenotypes and pedigree. Genotypes of 453 markers, distributed over 5 chromosomes of 1 Morgan each, were simulated for 2,025 individuals. From those, 25 individuals were parents of the other 2,000 individuals. The 25 parents were genetically related. Phenotypes were simulated according to a logistic growth curve and were made available for 1,000 of the 2,000 offspring individuals. The logistic growth curve was specified by three parameters. Each parameter was influenced by six Quantitative Trait Loci (QTL), positioned at the five chromosomes. For each parameter, one QTL had a large effect and five QTL had small effects. Variance of large QTL was five times the variance of small QTL. Simulated data was made available at http://www.qtlmas2009.wur.nl/UK/Dataset/. PMID:20380757

  13. Development of an Independent Global Land Cover Validation Dataset

    NASA Astrophysics Data System (ADS)

    Sulla-Menashe, D. J.; Olofsson, P.; Woodcock, C. E.; Holden, C.; Metcalfe, M.; Friedl, M. A.; Stehman, S. V.; Herold, M.; Giri, C.

    2012-12-01

    Accurate information related to the global distribution and dynamics in global land cover is critical for a large number of global change science questions. A growing number of land cover products have been produced at regional to global scales, but the uncertainty in these products and the relative strengths and weaknesses among available products are poorly characterized. To address this limitation we are compiling a database of high spatial resolution imagery to support international land cover validation studies. Validation sites were selected based on a probability sample, and may therefore be used to estimate statistically defensible accuracy statistics and associated standard errors. Validation site locations were identified using a stratified random design based on 21 strata derived from an intersection of Koppen climate classes and a population density layer. In this way, the two major sources of global variation in land cover (climate and human activity) are explicitly included in the stratification scheme. At each site we are acquiring high spatial resolution (< 1-m) satellite imagery for 5-km x 5-km blocks. The response design uses an object-oriented hierarchical legend that is compatible with the UN FAO Land Cover Classification System. Using this response design, we are classifying each site using a semi-automated algorithm that blends image segmentation with a supervised RandomForest classification algorithm. In the long run, the validation site database is designed to support international efforts to validate land cover products. To illustrate, we use the site database to validate the MODIS Collection 4 Land Cover product, providing a prototype for validating the VIIRS Surface Type Intermediate Product scheduled to start operational production early in 2013. As part of our analysis we evaluate sources of error in coarse resolution products including semantic issues related to the class definitions, mixed pixels, and poor spectral separation between classes.

  14. National Elevation Dataset

    USGS Publications Warehouse

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  15. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades

    PubMed Central

    Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K.; Thakor, Nitish

    2015-01-01

    Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches. PMID:26635513

  16. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  17. Independent Contributions of the Central Executive, Intelligence, and In-Class Attentive Behavior to Developmental Change in the Strategies Used to Solve Addition Problems

    ERIC Educational Resources Information Center

    Geary, David C.; Hoard, Mary K.; Nugent, Lara

    2012-01-01

    Children's (N = 275) use of retrieval, decomposition (e.g., 7 = 4+3 and thus 6+7 = 6+4+3), and counting to solve additional problems was longitudinally assessed from first grade to fourth grade, and intelligence, working memory, and in-class attentive behavior was assessed in one or several grades. The goal was to assess the relation between…

  18. Five year global dataset: NMC operational analyses (1978 to 1982)

    NASA Technical Reports Server (NTRS)

    Straus, David; Ardizzone, Joseph

    1987-01-01

    This document describes procedures used in assembling a five year dataset (1978 to 1982) using NMC Operational Analysis data. These procedures entailed replacing missing and unacceptable data in order to arrive at a complete dataset that is continuous in time. In addition, a subjective assessment on the integrity of all data (both preliminary and final) is presented. Documentation on tapes comprising the Five Year Global Dataset is also included.

  19. High-level exogenous glutamic acid-independent production of poly-(γ-glutamic acid) with organic acid addition in a new isolated Bacillus subtilis C10.

    PubMed

    Zhang, Huili; Zhu, Jianzhong; Zhu, Xiangcheng; Cai, Jin; Zhang, Anyi; Hong, Yizhi; Huang, Jin; Huang, Lei; Xu, Zhinan

    2012-07-01

    A new exogenous glutamic acid-independent γ-PGA producing strain was isolated and characterized as Bacillus subtilis C10. The factors influencing the endogenous glutamic acid supply and the biosynthesis of γ-PGA in this strain were investigated. The results indicated that citric acid and oxalic acid showed the significant capability to support the overproduction of γ-PGA. This stimulated increase of γ-PGA biosynthesis by citric acid or oxalic acid was further proved in the 10 L fermentor. To understand the possible mechanism contributing to the improved γ-PGA production, the activities of four key intracellular enzymes were measured, and the possible carbon fluxes were proposed. The result indicated that the enhanced level of pyruvate dehydrogenase (PDH) activity caused by oxalic acid was important for glutamic acid synthesized de novo from glucose. Moreover, isocitrate dehydrogenase (ICDH) and glutamate dehydrogenase (GDH) were the positive regulators of glutamic acid biosynthesis, while 2-oxoglutarate dehydrogenase complex (ODHC) was the negative one.

  20. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    PubMed

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  1. Genomic Datasets for Cancer Research

    Cancer.gov

    A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.

  2. NHDPlus (National Hydrography Dataset Plus)

    EPA Pesticide Factsheets

    NHDPlus is a geospatial, hydrologic framework dataset that is intended for use by geospatial analysts and modelers to support water resources related applications. NHDPlus was developed by the USEPA in partnership with the US Geologic Survey

  3. Preliminary AirMSPI Datasets

    Atmospheric Science Data Center

    2016-12-06

    ... Preliminary AirMSPI Datasets   The data files available through this web page and ftp links are preliminary ... geometric corrections. Caution should be used for science analysis. At a later date, more qualified versions will be made public.   ...

  4. 77 FR 15052 - Dataset Workshop-U.S. Billion Dollar Disasters Dataset (1980-2011): Assessing Dataset Strengths...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-14

    ... (1980- 2011): Assessing Dataset Strengths and Weaknesses for a Pathway to an Improved Dataset AGENCY... meeting is to identify strengths and weaknesses of the current dataset and related methodology....

  5. Providing Geographic Datasets as Linked Data in Sdi

    NASA Astrophysics Data System (ADS)

    Hietanen, E.; Lehto, L.; Latvala, P.

    2016-06-01

    In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  6. The Harvard organic photovoltaic dataset.

    PubMed

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  7. The Harvard organic photovoltaic dataset

    NASA Astrophysics Data System (ADS)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  8. The Harvard organic photovoltaic dataset

    PubMed Central

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  9. An integrated pan-tropical biomass map using multiple reference datasets.

    PubMed

    Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

    2016-04-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets.

  10. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  11. Fragment Prioritization on a Large Mutagenicity Dataset.

    PubMed

    Floris, Matteo; Raitano, Giuseppa; Medda, Ricardo; Benfenati, Emilio

    2016-12-29

    The identification of structural alerts is one of the simplest tools used for the identification of potentially toxic chemical compounds. Structural alerts have served as an aid to quickly identify chemicals that should be either prioritized for testing or for elimination from further consideration and use. In the recent years, the availability of larger datasets, often growing in the context of collaborative efforts and competitions, created the raw material needed to identify new and more accurate structural alerts. This work applied a method to efficiently mine large toxicological dataset for structural alert showing a strong statistical association with mutagenicity. In details, we processed a large Ames mutagenicity dataset comprising 14,015 unique molecules obtained by joining different data sources. After correction for multiple testing, we were able to assign a probability value to each fragment. A total of 51 rules were identified, with p-value < 0.05. Using the same method, we also confirmed the statistical significance of several mutagenicity rules already present and largely recognized in the literature. In addition, we have extended the application of our method by predicting the mutagenicity of an external data set.

  12. Development of a global historic monthly mean precipitation dataset

    NASA Astrophysics Data System (ADS)

    Yang, Su; Xu, Wenhui; Xu, Yan; Li, Qingxiang

    2016-04-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  13. NP_PAH_interaction dataset

    EPA Pesticide Factsheets

    Concentrations of different polyaromatic hydrocarbons in water before and after interaction with nanomaterials. The results show the capacity of engineer nanomaterials for adsorbing different organic pollutants. This dataset is associated with the following publication:Sahle-Demessie, E., A. Zhao, C. Han, B. Hann, and H. Grecsek. Interaction of engineered nanomaterials with hydrophobic organic pollutants.. Journal of Nanotechnology. Hindawi Publishing Corporation, New York, NY, USA, 27(28): 284003, (2016).

  14. Geospatial datasets for watershed delineation and characterization used in the Hawaii StreamStats web application

    USGS Publications Warehouse

    Rea, Alan; Skinner, Kenneth D.

    2012-01-01

    The U.S. Geological Survey Hawaii StreamStats application uses an integrated suite of raster and vector geospatial datasets to delineate and characterize watersheds. The geospatial datasets used to delineate and characterize watersheds on the StreamStats website, and the methods used to develop the datasets are described in this report. The datasets for Hawaii were derived primarily from 10 meter resolution National Elevation Dataset (NED) elevation models, and the National Hydrography Dataset (NHD), using a set of procedures designed to enforce the drainage pattern from the NHD into the NED, resulting in an integrated suite of elevation-derived datasets. Additional sources of data used for computing basin characteristics include precipitation, land cover, soil permeability, and elevation-derivative datasets. The report also includes links for metadata and downloads of the geospatial datasets.

  15. Binary classification of imbalanced datasets using conformal prediction.

    PubMed

    Norinder, Ulf; Boyer, Scott

    2017-03-01

    Aggregated Conformal Prediction is used as an effective alternative to other, more complicated and/or ambiguous methods involving various balancing measures when modelling severely imbalanced datasets. Additional explicit balancing measures other than those already apart of the Conformal Prediction framework are shown not to be required. The Aggregated Conformal Prediction procedure appears to be a promising approach for severely imbalanced datasets in order to retrieve a large majority of active minority class compounds while avoiding information loss or distortion.

  16. Are Independent Probes Truly Independent?

    ERIC Educational Resources Information Center

    Camp, Gino; Pecher, Diane; Schmidt, Henk G.; Zeelenberg, Rene

    2009-01-01

    The independent cue technique has been developed to test traditional interference theories against inhibition theories of forgetting. In the present study, the authors tested the critical criterion for the independence of independent cues: Studied cues not presented during test (and unrelated to test cues) should not contribute to the retrieval…

  17. The new Planetary Science Archive (PSA): Exploration and discovery of scientific datasets from ESA's planetary missions

    NASA Astrophysics Data System (ADS)

    Martinez, Santa; Besse, Sebastien; Heather, Dave; Barbarisi, Isa; Arviset, Christophe; De Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Rios, Carlos; Vallejo, Fran; Saiz, Jaime; ESDC (European Space Data Centre) Team

    2016-10-01

    The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://archives.esac.esa.int/psa. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA is currently implementing a number of significant improvements, mostly driven by the evolution of the PDS standard, and the growing need for better interfaces and advanced applications to support science exploitation. The newly designed PSA will enhance the user experience and will significantly reduce the complexity for users to find their data promoting one-click access to the scientific datasets with more specialised views when needed. This includes a better integration with Planetary GIS analysis tools and Planetary interoperability services (search and retrieve data, supporting e.g. PDAP, EPN-TAP). It will be also up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's ExoMars and upcoming BepiColombo missions. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). This contribution will introduce the new PSA, its key features and access interfaces.

  18. Independence test for sparse data

    NASA Astrophysics Data System (ADS)

    García, J. E.; González-López, V. A.

    2016-06-01

    In this paper a new non-parametric independence test is presented. García and González-López (2014) [1] introduced the LIS test for the hypothesis of independence between two continuous random variables, the test proposed in this work is a generalization of the LIS test. The new test does not require the assumption of continuity for the random variables, it test is applied to two datasets and also compared with the Pearson's Chi-squared test.

  19. ISRUC-Sleep: A comprehensive public dataset for sleep researchers.

    PubMed

    Khalighi, Sirvan; Sousa, Teresa; Santos, José Moutinho; Nunes, Urbano

    2016-02-01

    To facilitate the performance comparison of new methods for sleep patterns analysis, datasets with quality content, publicly-available, are very important and useful. We introduce an open-access comprehensive sleep dataset, called ISRUC-Sleep. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC). The dataset comprises three groups of data: (1) data concerning 100 subjects, with one recording session per subject; (2) data gathered from 8 subjects; two recording sessions were performed per subject, and (3) data collected from one recording session related to 10 healthy subjects. The polysomnography (PSG) recordings, associated with each subject, were visually scored by two human experts. Comparing the existing sleep-related public datasets, ISRUC-Sleep provides data of a reasonable number of subjects with different characteristics such as: data useful for studies involving changes in the PSG signals over time; and data of healthy subjects useful for studies involving comparison of healthy subjects with the patients, suffering from sleep disorders. This dataset was created aiming to complement existing datasets by providing easy-to-apply data collection with some characteristics not covered yet. ISRUC-Sleep can be useful for analysis of new contributions: (i) in biomedical signal processing; (ii) in development of ASSC methods; and (iii) on sleep physiology studies. To evaluate and compare new contributions, which use this dataset as a benchmark, results of applying a subject-independent automatic sleep stage classification (ASSC) method on ISRUC-Sleep dataset are presented.

  20. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    NASA Astrophysics Data System (ADS)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  1. A polymer dataset for accelerated property prediction and design

    PubMed Central

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-01-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. PMID:26927478

  2. A polymer dataset for accelerated property prediction and design

    SciTech Connect

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-03-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. As a result, it will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.

  3. A polymer dataset for accelerated property prediction and design.

    PubMed

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-03-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.

  4. Forecasting medical waste generation using short and extra short datasets: Case study of Lithuania.

    PubMed

    Karpušenkaitė, Aistė; Ruzgas, Tomas; Denafas, Gintaras

    2016-04-01

    The aim of the study is to evaluate the performance of various mathematical modelling methods, while forecasting medical waste generation using Lithuania's annual medical waste data. Only recently has a hazardous waste collection system that includes medical waste been created and therefore the study access to gain large sets of relevant data for its research has been somewhat limited. According to data that was managed to be obtained, it was decided to develop three short and extra short datasets with 20, 10 and 6 observations. Spearman's correlation calculation showed that the influence of independent variables, such as visits at hospitals and other medical institutions, number of children in the region, number of beds in hospital and other medical institutions, average life expectancy and doctor's visits in that region are the most consistent and common in all three datasets. Tests on the performance of artificial neural networks, multiple linear regression, partial least squares, support vector machines and four non-parametric regression methods were conducted on the collected datasets. The best and most promising results were demonstrated by generalised additive (R(2) = 0.90455) in the regional data case, smoothing splines models (R(2) = 0.98584) in the long annual data case and multilayer feedforward artificial neural networks in the short annual data case (R(2) = 0.61103).

  5. Historical Space Weather Datasets within NOAA

    NASA Astrophysics Data System (ADS)

    Denig, W. F.; Mabie, J. J.; Horan, K.; Clark, C.

    2013-12-01

    The National Geophysical Data Center (NGDC) is primarily responsible for scientific data stewardship of operational space weather data from NOAA's fleet of environmental satellites in geostationary and polar, low-earth orbits. In addition to this and as the former World Data Center for Solar Terrestrial Physics from 1957 to 2011 NGDC acquired a large variety of solar and space environmental data in differing formats including paper records and on film. Management of this heterogeneous collection of environmental data is a continued responsibility of NGDC as a participant in the new World Data System. Through the former NOAA Climate Data Modernization Program many of these records were converted to digital format and are readily available online. However, reduced funding and staff have put a strain on NGDC's ability to effectively steward these historical datasets, some of which are unique and, in particular cases, were the basis of fundamental scientific breakthroughs in our understanding of the near-earth space environment. In this talk, I will provide an overview of the historical space weather datasets which are currently managed by NGDC and discuss strategies for preserving these data during these fiscally stressing times.

  6. Salam's independence

    NASA Astrophysics Data System (ADS)

    Fraser, Gordon

    2009-01-01

    In his kind review of my biography of the Nobel laureate Abdus Salam (December 2008 pp45-46), John W Moffat wrongly claims that Salam had "independently thought of the idea of parity violation in weak interactions".

  7. Data Integration for Heterogenous Datasets

    PubMed Central

    2014-01-01

    Abstract More and more, the needs of data analysts are requiring the use of data outside the control of their own organizations. The increasing amount of data available on the Web, the new technologies for linking data across datasets, and the increasing need to integrate structured and unstructured data are all driving this trend. In this article, we provide a technical overview of the emerging “broad data” area, in which the variety of heterogeneous data being used, rather than the scale of the data being analyzed, is the limiting factor in data analysis efforts. The article explores some of the emerging themes in data discovery, data integration, linked data, and the combination of structured and unstructured data. PMID:25553272

  8. Inception of an Australian spine trauma registry: the minimum dataset.

    PubMed

    Tee, J W; Chan, C H P; Gruen, R L; Fitzgerald, M C B; Liew, S M; Cameron, P A; Rosenfeld, J V

    2012-06-01

    Background The establishment of a spine trauma registry collecting both spine column and spinal cord data should improve the evidential basis for clinical decisions. This is a report on the pilot of a spine trauma registry including development of a minimum dataset. Methods A minimum dataset consisting of 56 data items was created using the modified Delphi technique. A pilot study was performed on 104 consecutive spine trauma patients recruited by the Victorian Orthopaedic Trauma Outcomes Registry (VOTOR). Data analysis and collection methodology were reviewed to determine its feasibility. Results Minimum dataset collection aided by a dataset dictionary was uncomplicated (average of 5 minutes per patient). Data analysis revealed three significant findings: (1) a peak in the 40 to 60 years age group; (2) premorbid functional independence in the majority of patients; and (3) significant proportion being on antiplatelet or anticoagulation medications. Of the 141 traumatic spine fractures, the thoracolumbar segment was the most frequent site of injury. Most were neurologically intact (89%). Our study group had satisfactory 6-month patient-reported outcomes. Conclusion The minimum dataset had high completion rates, was practical and feasible to collect. This pilot study is the basis for the development of a spine trauma registry at the Level 1 trauma center.

  9. Watershed Boundary Dataset for Mississippi

    USGS Publications Warehouse

    Wilson, K. Van; Clair, Michael G.; Turnipseed, D. Phil; Rebich, Richard A.

    2009-01-01

    The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (Hydrologic Unit Codes) were further subdivided into 10-digit watersheds (62.5 to 391 square miles (mi2)) and 12-digit subwatersheds (15.6 to 62.5 mi2) - the exceptions being the Delta part of Mississippi and the Mississippi River inside levees, which were subdivided into 10-digit watersheds only. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, subdivision codes and names, and drainage-area data - are stored in a Geographic Information System database, which are available at: http://ms.water.usgs.gov/. This map shows information on drainage and hydrography in the form of U.S. Geological Survey hydrologic unit boundaries for water-resource 2-digit regions, 4-digit subregions, 6-digit basins (formerly called accounting units), 8-digit subbasins (formerly called cataloging units), 10-digit watershed, and 12-digit subwatersheds in Mississippi. A description of the project study area, methods used in the development of watershed and subwatershed boundaries for Mississippi, and results are presented in Wilson and others (2008). The data presented in this map and by Wilson and others (2008) supersede the data presented for Mississippi by Seaber and others (1987) and U.S. Geological Survey (1977).

  10. The Johns Hopkins University multimodal dataset for human action recognition

    NASA Astrophysics Data System (ADS)

    Murray, Thomas S.; Mendat, Daniel R.; Pouliquen, Philippe O.; Andreou, Andreas G.

    2015-05-01

    The Johns Hopkins University MultiModal Action (JHUMMA) dataset contains a set of twenty-one actions recorded with four sensor systems in three different modalities. The data was collected with a data acquisition system that includes three independent active sonar devices at three different frequencies and a Microsoft Kinect sensor that provides both RGB and Depth data. We have developed algorithms for human action recognition from active acoustics and provide benchmark baseline recognition performance results.

  11. A comprehensive polymer dataset for accelerated property prediction and design

    NASA Astrophysics Data System (ADS)

    Tran, Huan; Kumar Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Oilania, Ghanshyam; Ramprasad, Rampi

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. In principle, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to rapidly predict the properties of materials not already in the dataset, thus accelerating the design of materials with preferable properties. Here, we report the development of a dataset of 1,065 polymers and related materials, which is available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. The dataset will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. We discuss some information ``learned`` from the dataset and suggest that it may be used as the playground for further data-mining work.

  12. AMADA-Analysis of multidimensional astronomical datasets

    NASA Astrophysics Data System (ADS)

    de Souza, R. S.; Ciardi, B.

    2015-09-01

    We present AMADA, an interactive web application to analyze multidimensional datasets. The user uploads a simple ASCII file and AMADA performs a number of exploratory analysis together with contemporary visualizations diagnostics. The package performs a hierarchical clustering in the parameter space, and the user can choose among linear, monotonic or non-linear correlation analysis. AMADA provides a number of clustering visualization diagnostics such as heatmaps, dendrograms, chord diagrams, and graphs. In addition, AMADA has the option to run a standard or robust principal components analysis, displaying the results as polar bar plots. The code is written in R and the web interface was created using the SHINY framework. AMADA source-code is freely available at https://goo.gl/KeSPue, and the shiny-app at http://goo.gl/UTnU7I.

  13. Updated archaeointensity dataset from the SW Pacific

    NASA Astrophysics Data System (ADS)

    Hill, Mimi; Nilsson, Andreas; Holme, Richard; Hurst, Elliot; Turner, Gillian; Herries, Andy; Sheppard, Peter

    2016-04-01

    It is well known that there are far more archaeomagnetic data from the Northern Hemisphere than from the Southern. Here we present a compilation of archaeointensity data from the SW Pacific region covering the past 3000 years. The results have primarily been obtained from a collection of ceramics from the SW Pacific Islands including Fiji, Tonga, Papua New Guinea, New Caledonia and Vanuatu. In addition we present results obtained from heated clay balls from Australia. The microwave method has predominantly been used with a variety of experimental protocols including IZZI and Coe variants. Standard Thellier archaeointensity experiments using the IZZI protocol have also been carried out on selected samples. The dataset is compared to regional predictions from current global geomagnetic field models, and the influence of the new data on constraining the pfm9k family of global geomagnetic field models is explored.

  14. Potential crop evapotranspiration and surface evaporation estimates via a gridded weather forcing dataset

    NASA Astrophysics Data System (ADS)

    Lewis, Clayton S.; Allen, L. Niel

    2017-03-01

    Absent local weather stations, a gridded weather dataset can provide information useful for water management in irrigated areas including potential crop evapotranspiration calculations. In estimating crop irrigation requirements and surface evaporation in Utah, United States of America, methodology and software were developed using the ASCE Standardized Penman-Monteith Reference Evapotranspiration equation with input climate drivers from the North American Land Data Assimilation System (NLDAS) gridded weather forcing dataset and a digital elevation model. A simple procedure was devised to correct bias in NLDAS relative humidity and air temperature data based on comparison to weather data from ground stations. Potential evapotranspiration was calculated for 18 crops (including turfgrass), wetlands (large and narrow), and open water evaporation (deep and shallow) by multiplying crop coefficient curves to reference evapotranspiration with annual curve dates set by summation of Hargreaves evapotranspiration, cumulative growing degree days, or number of days. Net potential evapotranspiration was calculated by subtracting effective precipitation estimates from the Daymet gridded precipitation dataset. Analysis of the results showed that daily estimated potential crop evapotranspiration from the model compared well with estimates from electronic weather stations (1980-2014) and with independently calculated potential crop evapotranspiration in adjacent states. Designed for this study but open sourced for other applications, software entitled GridET encapsulated the GIS-based model that provided data download and management, calculation of reference and potential crop evapotranspiration, and viewing and analysis tools. Flexible features in GridET allows a user to specify grid resolution, evapotranspiration equations, cropping information, and additional datasets with the output being transferable to other GIS software.

  15. Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets

    PubMed Central

    Goodenough, Anne E.; Hart, Adam G.; Stafford, Richard

    2012-01-01

    Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines. PMID:22479605

  16. Regression with empirical variable selection: description of a new method and application to ecological datasets.

    PubMed

    Goodenough, Anne E; Hart, Adam G; Stafford, Richard

    2012-01-01

    Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major)--the optimal REVS model explained more variance (higher R(2)), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2) values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.

  17. Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset

    PubMed Central

    Lipps, David; Devineni, Sree

    2016-01-01

    MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy

  18. Independent Living.

    ERIC Educational Resources Information Center

    Nathanson, Jeanne H., Ed.

    1994-01-01

    This issue of "OSERS" addresses the subject of independent living of individuals with disabilities. The issue includes a message from Judith E. Heumann, the Assistant Secretary of the Office of Special Education and Rehabilitative Services (OSERS), and 10 papers. Papers have the following titles and authors: "Changes in the…

  19. Understanding independence

    NASA Astrophysics Data System (ADS)

    Annan, James; Hargreaves, Julia

    2016-04-01

    In order to perform any Bayesian processing of a model ensemble, we need a prior over the ensemble members. In the case of multimodel ensembles such as CMIP, the historical approach of ``model democracy'' (i.e. equal weight for all models in the sample) is no longer credible (if it ever was) due to model duplication and inbreeding. The question of ``model independence'' is central to the question of prior weights. However, although this question has been repeatedly raised, it has not yet been satisfactorily addressed. Here I will discuss the issue of independence and present a theoretical foundation for understanding and analysing the ensemble in this context. I will also present some simple examples showing how these ideas may be applied and developed.

  20. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    NASA Astrophysics Data System (ADS)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  1. A reanalysis dataset of the South China Sea

    PubMed Central

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  2. SisFall: A Fall and Movement Dataset

    PubMed Central

    Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

    2017-01-01

    Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark. PMID:28117691

  3. A reanalysis dataset of the South China Sea.

    PubMed

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992-2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability.

  4. Internal Consistency of the NVAP Water Vapor Dataset

    NASA Technical Reports Server (NTRS)

    Suggs, Ronnie J.; Jedlovec, Gary J.; Arnold, James E. (Technical Monitor)

    2001-01-01

    The NVAP (NASA Water Vapor Project) dataset is a global dataset at 1 x 1 degree spatial resolution consisting of daily, pentad, and monthly atmospheric precipitable water (PW) products. The analysis blends measurements from the Television and Infrared Operational Satellite (TIROS) Operational Vertical Sounder (TOVS), the Special Sensor Microwave/Imager (SSM/I), and radiosonde observations into a daily collage of PW. The original dataset consisted of five years of data from 1988 to 1992. Recent updates have added three additional years (1993-1995) and incorporated procedural and algorithm changes from the original methodology. Since each of the PW sources (TOVS, SSM/I, and radiosonde) do not provide global coverage, each of these sources compliment one another by providing spatial coverage over regions and during times where the other is not available. For this type of spatial and temporal blending to be successful, each of the source components should have similar or compatible accuracies. If this is not the case, regional and time varying biases may be manifested in the NVAP dataset. This study examines the consistency of the NVAP source data by comparing daily collocated TOVS and SSM/I PW retrievals with collocated radiosonde PW observations. The daily PW intercomparisons are performed over the time period of the dataset and for various regions.

  5. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    ERIC Educational Resources Information Center

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  6. Dataset of Scientific Inquiry Learning Environment

    ERIC Educational Resources Information Center

    Ting, Choo-Yee; Ho, Chiung Ching

    2015-01-01

    This paper presents the dataset collected from student interactions with INQPRO, a computer-based scientific inquiry learning environment. The dataset contains records of 100 students and is divided into two portions. The first portion comprises (1) "raw log data", capturing the student's name, interfaces visited, the interface…

  7. Cosmological Analyses Based On The Combined Planck And WMAP Mission Datasets

    NASA Astrophysics Data System (ADS)

    Bennett, Charles

    independently analyzed the WMAP data. Most reproduced WMAP results, while others uncovered additional useful insights into the data, and still others found issues, which the WMAP team examined more carefully. Independent replication was quite important, as was the work extending the results and calling attention to issues. This process was not only helpful for getting the most out of the WMAP mission results, it was essential for establishing confidence in the mission datasets. WMAP team discussions with independent scientists were fruitful and provided invaluable replication and additional peer-review of the WMAP team work, in addition to new analysis and results. We expect that the Planck team will benefit from similar interactions with independent scientists. WMAP team members are especially important for computing detailed comparisons between Planck and WMAP data. Now that the WMAP project has ended, the WMAP team no longer has funding to carry out this crucial and compelling comparison of WMAP and Planck data at the level of detail needed for precision cosmology. This proposal requests that four of the most active and experienced WMAP team members with specialized knowledge in temperature calibration, beam calibration, foreground separation, simulations, power spectrum computation, and more, be supported to reconcile WMAP and Planck data in detail, to combine the datasets to obtain optimal results, and to produce improved cosmological results.

  8. Hybrid independent component analysis and twin support vector machine learning scheme for subtle gesture recognition.

    PubMed

    Naik, Ganesh R; Kumar, Dinesh K; Jayadeva

    2010-10-01

    Myoelectric signal classification is one of the most difficult pattern recognition problems because large variations in surface electromyogram features usually exist. In the literature, attempts have been made to apply various pattern recognition methods to classify surface electromyography into components corresponding to the activities of different muscles, but this has not been very successful, as some muscles are bigger and more active than others. This results in dataset discrepancy during classification. Multicategory classification problems are usually solved by solving many, one-versus-rest binary classification tasks. These subtasks unsurprisingly involve unbalanced datasets. Consequently, we need a learning methodology that can take into account unbalanced datasets in addition to large variations in the distributions of patterns corresponding to different classes. Here, we attempt to address the above issues using hybrid features extracted from independent component analysis and twin support vector machine techniques.

  9. Datasets for evolutionary comparative genomics

    PubMed Central

    Liberles, David A

    2005-01-01

    Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856

  10. The KIT Motion-Language Dataset.

    PubMed

    Plappert, Matthias; Mandery, Christian; Asfour, Tamim

    2016-12-01

    Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.

  11. Benchmark dataset for Whole Genome sequence compression.

    PubMed

    C L, Biji; Nair, Achuthsankar

    2016-05-16

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be bench marked in the absence of such scientifically compiled whole genome sequence dataset and proposes a bench mark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1105 prokaryotes, 200 plasmids, 164 viruses and 65 eukaryotes. This paper reports the results of using 3 established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled bench mark data set.

  12. The generation of China land surface datasets for CLM

    NASA Astrophysics Data System (ADS)

    Li, Haiying; Peng, Hongchun; Li, Xin; Veroustraete, Frank

    2005-10-01

    Community land model or common land model (CLM) describes the exchange of the fluxes of energy, mass and momentum between the earth's surface and the planetary boundary layer. This model is used to simulate the environmental changes in China. Hence, it requires a complete parameters field of the land surface. The present paper focuses on making the surface datasets of CLM in China. In the present paper, vegetation was divided into 39 Plant Function Types (PFTs) of China from its classification map. The land surface datasets were created using vegetation type, five land cover types (lake, wetland, glacier, urban and vegetated), monthly maximum Normalized Difference Vegetation Index (NDVI) derived from SPOT_VGT data and soil properties data. The percentages of glacier, lake and wetland were derived from their own vector maps of China. The fractional coverage of PFTs was derived from China vegetation map. Time-independent vegetation biophysical parameters, such as canopy top and bottom heights and other vegetation parameters related to photosynthesis, were based on the values documented in literatures. The soil color dataset was derived from landuse and vegetation data based on their correspondent relationship. The soil texture (clay%, sand% and silt%) came from global dataset. Time-dependent vegetation biophysical parameters, such as leaf area index(LAI) and fractional absorbed photosynthetically active radiation(FPAR), were calculated from one year of NDVI monthly maximum value composites for the China region based on equations given in Sellers et al. (1996a,b) and Los et al. (2000). The resolution of these datasets for CLM is 1km.

  13. Comparison of global 3-D aviation emissions datasets

    NASA Astrophysics Data System (ADS)

    Olsen, S. C.; Wuebbles, D. J.; Owen, B.

    2013-01-01

    Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx), sulfur oxides, carbon monoxide (CO), and hydrocarbons (HC) impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time) aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution of emissions in certain regions for the Aero2k dataset.

  14. Comparison of global 3-D aviation emissions datasets

    NASA Astrophysics Data System (ADS)

    Olsen, S. C.; Wuebbles, D. J.; Owen, B.

    2012-07-01

    Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx), sulfur oxides, carbon monoxide (CO), and hydrocarbons (HC) impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time) aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly lower in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx while for CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution of emissions in certain regions for the Aero2k dataset.

  15. Assessment of Northern Hemisphere Snow Water Equivalent Datasets in ESA SnowPEx project

    NASA Astrophysics Data System (ADS)

    Luojus, Kari; Pulliainen, Jouni; Cohen, Juval; Ikonen, Jaakko; Derksen, Chris; Mudryk, Lawrence; Nagler, Thomas; Bojkov, Bojan

    2016-04-01

    Reliable information on snow cover across the Northern Hemisphere and Arctic and sub-Arctic regions is needed for climate monitoring, for understanding the Arctic climate system, and for the evaluation of the role of snow cover and its feedback in climate models. In addition to being of significant interest for climatological investigations, reliable information on snow cover is of high value for the purpose of hydrological forecasting and numerical weather prediction. Terrestrial snow covers up to 50 million km² of the Northern Hemisphere in winter and is characterized by high spatial and temporal variability. Therefore satellite observations provide the best means for timely and complete observations of the global snow cover. There are a number of independent SWE products available that describe the snow conditions on multi-decadal and global scales. Some products are derived using satellite-based information while others rely on meteorological observations and modelling. What is common to practically all the existing hemispheric SWE products, is that their retrieval performance on hemispherical and multi-decadal scales are not accurately known. The purpose of the ESA funded SnowPEx project is to obtain a quantitative understanding of the uncertainty in satellite- as well as model-based SWE products through an internationally coordinated and consistent evaluation exercise. The currently available Northern Hemisphere wide satellite-based SWE datasets which were assessed include 1) the GlobSnow SWE, 2) the NASA Standard SWE, 3) NASA prototype and 4) NSIDC-SSM/I SWE products. The model-based datasets include: 5) the Global Land Data Assimilation System Version 2 (GLDAS-2) product 6) the European Centre for Medium-Range Forecasts Interim Land Reanalysis (ERA-I-Land) which uses a simple snow scheme 7) the Modern Era Retrospective Analysis for Research and Applications (MERRA) which uses an intermediate complexity snow scheme; and 8) SWE from the Crocus snow scheme, a

  16. 'Independence' Panorama

    NASA Technical Reports Server (NTRS)

    2005-01-01

    [figure removed for brevity, see original site] Click on the image for 'Independence' Panorama (QTVR)

    This is the Spirit 'Independence' panorama, acquired on martian days, or sols, 536 to 543 (July 6 to 13, 2005), from a position in the 'Columbia Hills' near the summit of 'Husband Hill.' The summit of 'Husband Hill' is the peak near the right side of this panorama and is about 100 meters (328 feet) away from the rover and about 30 meters (98 feet) higher in elevation. The rocky outcrops downhill and on the left side of this mosaic include 'Larry's Lookout' and 'Cumberland Ridge,' which Spirit explored in April, May, and June of 2005.

    The panorama spans 360 degrees and consists of 108 individual images, each acquired with five filters of the rover's panoramic camera. The approximate true color of the mosaic was generated using the camera's 750-, 530-, and 480-nanometer filters. During the 8 martian days, or sols, that it took to acquire this image, the lighting varied considerably, partly because of imaging at different times of sol, and partly because of small sol-to-sol variations in the dustiness of the atmosphere. These slight changes produced some image seams and rock shadows. These seams have been eliminated from the sky portion of the mosaic to better simulate the vista a person standing on Mars would see. However, it is often not possible or practical to smooth out such seams for regions of rock, soil, rover tracks or solar panels. Such is the nature of acquiring and assembling large panoramas from the rovers.

  17. Improved global aerosol datasets for 2008 from Aerosol_cci

    NASA Astrophysics Data System (ADS)

    Holzer-Popp, Thomas; de Leeuw, Gerrit

    2013-04-01

    Within the ESA Climate Change Initiative (CCI) the Aerosol_cci project has meanwhile produced and validated global datasets from AATSR, PARASOL, MERIS, OMI and GOMOS for the complete year 2008. Whereas OMI and GOMOS were used to derive absorbing aerosol index and stratospheric extinction profiles, respectively, Aerosol Optical Depth (AOD) and Angstrom coefficient were retrieved from the three nadir sensors. For AATSR three algorithms were applied. AOD validation was conducted against AERONET sun photometer observations also in comparison to MODIS and MISR datasets. Validation included level2 (pixel level) and level3 (gridded daily) datasets. Several validation metrices were used and in some cases developed further in order to comprehensively evaluate the capabilities and limitations of the datasets. The metrices include standard statistical quantities (bias, rmse, Pearson correlation, linear regression) as well as scoring approaches to quantitatively assess the spatial and temporal correlations against AERONET. Over open ocean also MAN data were used to better constrain the aerosol background, but in 2008 had limited coverage. The validation showed that the PARASOL (ocean only) and AATSR (land and ocean) datasets have improved significantly and now reach the quality level and sometimes even go beyond the level of MODIS and MISR. However, the coverage of these European datasets is weaker than the one of the NASA datasets due to smaller instrument swath width. The MERIS dataset provides better coverage but has lower quality then the other datasets. A detailed regional and seasonal analysis revealed the strengths and weaknesses of each algorithm. Also, Angstrom coefficient was validated and showed encouraging results (more detailed aerosol type information provided in particular from PARASOL was not yet evaluated further). Additionally, pixel uncertainties contained in each dataset were statistically assessed which showed some remaining issues but also the added value

  18. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

    NASA Astrophysics Data System (ADS)

    Heather, David

    2016-07-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  19. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions.

    NASA Astrophysics Data System (ADS)

    Heather, David; Besse, Sebastien; Barbarisi, Isa; Arviset, Christophe; de Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Martinez, Santa; Rios, Carlos

    2016-04-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  20. Utilizing Multiple Datasets for Snow Cover Mapping

    NASA Technical Reports Server (NTRS)

    Tait, Andrew B.; Hall, Dorothy K.; Foster, James L.; Armstrong, Richard L.

    1999-01-01

    Snow-cover maps generated from surface data are based on direct measurements, however they are prone to interpolation errors where climate stations are sparsely distributed. Snow cover is clearly discernable using satellite-attained optical data because of the high albedo of snow, yet the surface is often obscured by cloud cover. Passive microwave (PM) data is unaffected by clouds, however, the snow-cover signature is significantly affected by melting snow and the microwaves may be transparent to thin snow (less than 3cm). Both optical and microwave sensors have problems discerning snow beneath forest canopies. This paper describes a method that combines ground and satellite data to produce a Multiple-Dataset Snow-Cover Product (MDSCP). Comparisons with current snow-cover products show that the MDSCP draws together the advantages of each of its component products while minimizing their potential errors. Improved estimates of the snow-covered area are derived through the addition of two snow-cover classes ("thin or patchy" and "high elevation" snow cover) and from the analysis of the climate station data within each class. The compatibility of this method for use with Moderate Resolution Imaging Spectroradiometer (MODIS) data, which will be available in 2000, is also discussed. With the assimilation of these data, the resolution of the MDSCP would be improved both spatially and temporally and the analysis would become completely automated.

  1. Evaluation and inter-comparison of modern day reanalysis datasets over Africa and the Middle East

    NASA Astrophysics Data System (ADS)

    Shukla, S.; Arsenault, K. R.; Hobbins, M.; Peters-Lidard, C. D.; Verdin, J. P.

    2015-12-01

    Reanalysis datasets are potentially very valuable for otherwise data-sparse regions such as Africa and the Middle East. They are potentially useful for long-term climate and hydrologic analyses and, given their availability in real-time, they are particularity attractive for real-time hydrologic monitoring purposes (e.g. to monitor flood and drought events). Generally in data-sparse regions, reanalysis variables such as precipitation, temperature, radiation and humidity are used in conjunction with in-situ and/or satellite-based datasets to generate long-term gridded atmospheric forcing datasets. These atmospheric forcing datasets are used to drive offline land surface models and simulate soil moisture and runoff, which are natural indicators of hydrologic conditions. Therefore, any uncertainty or bias in the reanalysis datasets contributes to uncertainties in hydrologic monitoring estimates. In this presentation, we report on a comprehensive analysis that evaluates several modern-day reanalysis products (such as NASA's MERRA-1 and -2, ECMWF's ERA-Interim and NCEP's CFS Reanalysis) over Africa and the Middle East region. We compare the precipitation and temperature from the reanalysis products with other independent gridded datasets such as GPCC, CRU, and USGS/UCSB's CHIRPS precipitation datasets, and CRU's temperature datasets. The evaluations are conducted at a monthly time scale, since some of these independent datasets are only available at this temporal resolution. The evaluations range from the comparison of the monthly mean climatology to inter-annual variability and long-term changes. Finally, we also present the results of inter-comparisons of radiation and humidity variables from the different reanalysis datasets.

  2. Simulation of Smart Home Activity Datasets

    PubMed Central

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-01-01

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation. PMID:26087371

  3. Simulation of Smart Home Activity Datasets.

    PubMed

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  4. A polymer dataset for accelerated property prediction and design

    DOE PAGES

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; ...

    2016-03-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate targetmore » of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. As a result, it will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.« less

  5. Relevancy Ranking of Satellite Dataset Search Results

    NASA Technical Reports Server (NTRS)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2017-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  6. Decibel: The Relational Dataset Branching System

    PubMed Central

    Maddox, Michael; Goehring, David; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya; Deshpande, Amol

    2017-01-01

    As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs. PMID:28149668

  7. Enhanced Data Discoverability for in Situ Hyperspectral Datasets

    NASA Astrophysics Data System (ADS)

    Rasaiah, B.; Bellman, C.; Hewson, R. D.; Jones, S. D.; Malthus, T. J.

    2016-06-01

    Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015) with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  8. Interoperability of Multiple Datasets with JMARS

    NASA Astrophysics Data System (ADS)

    Smith, M. E.; Christensen, P. R.; Noss, D.; Anwar, S.; Dickenshied, S.

    2012-12-01

    Planetary Science includes all celestial bodies including Earth. However, when investigating Geographic Information System (GIS) applications, Earth and planetary bodies have the tendency to be separated. One reason is because we have been learning and investigating Earth's properties much longer than we have been studying the other planetary bodies, therefore, the archive of GCS and projections is much larger. The first latitude and longitude system of Earth was invented between 276 BC and 194 BC by Eratosthenes who was also the first to calculate the circumference of the Earth. As time went on, scientists continued to re-measure the Earth on both local and global scales which has created a large collection of projections and geographic coordinate systems (GCS) to choose from. The variety of options can create a time consuming task to determine which GCS or projection gets applied to each dataset and how to convert to the correct GCS or projection. Another issue is presented when determining if the dataset should be applied to a geocentric sphere or a geodetic spheroid. Both of which are measured and determine latitude values differently. This can lead to inconsistent results and frustration for the user. This is not the case with other planetary bodies. Although the existence of other planets have been known since the early Babylon times, the accuracy of the planets rotation, size and geologic properties weren't known for several hundreds of years later. Therefore, the options for projections or GCS's are much smaller than the options one has for Earth's data. Even then, the projection and GCS options for other celestial bodies are informal. So it can be hard for the user to determine which projection or GCS to apply to the other planets. JMARS (Java Mission Analysis for Remote Sensing) is an open source suite that was developed by Arizona State University's Mars Space Flight Facility. The beauty of JMARS is that the tool transforms all datasets behind the scenes

  9. Transforming a research-oriented dataset for evaluation of tactical information extraction technologies

    NASA Astrophysics Data System (ADS)

    Roy, Heather; Kase, Sue E.; Knight, Joanne

    2016-05-01

    The most representative and accurate data for testing and evaluating information extraction technologies is real-world data. Real-world operational data can provide important insights into human and sensor characteristics, interactions, and behavior. However, several challenges limit the feasibility of experimentation with real-world operational data. Realworld data lacks the precise knowledge of a "ground truth," a critical factor for benchmarking progress of developing automated information processing technologies. Additionally, the use of real-world data is often limited by classification restrictions due to the methods of collection, procedures for processing, and tactical sensitivities related to the sources, events, or objects of interest. These challenges, along with an increase in the development of automated information extraction technologies, are fueling an emerging demand for operationally-realistic datasets for benchmarking. An approach to meet this demand is to create synthetic datasets, which are operationally-realistic yet unclassified in content. The unclassified nature of these unclassified synthetic datasets facilitates the sharing of data between military and academic researchers thus increasing coordinated testing efforts. This paper describes the expansion and augmentation of two synthetic text datasets, one initially developed through academic research collaborations with the Army. Both datasets feature simulated tactical intelligence reports regarding fictitious terrorist activity occurring within a counterinsurgency (COIN) operation. The datasets were expanded and augmented to create two military relevant datasets. The first resulting dataset was created by augmenting and merging the two to create a single larger dataset containing ground-truth. The second resulting dataset was restructured to more realistically represent the format and content of intelligence reports. The dataset transformation effort, the final datasets, and their

  10. Spatial Evolution of Openstreetmap Dataset in Turkey

    NASA Astrophysics Data System (ADS)

    Zia, M.; Seker, D. Z.; Cakir, Z.

    2016-10-01

    Large amount of research work has already been done regarding many aspects of OpenStreetMap (OSM) dataset in recent years for developed countries and major world cities. On the other hand, limited work is present in scientific literature for developing or underdeveloped ones, because of poor data coverage. In presented study it has been demonstrated how Turkey-OSM dataset has spatially evolved in an 8 year time span (2007-2015) throughout the country. It is observed that there is an east-west spatial biasedness in OSM features density across the country. Population density and literacy level are found to be the two main governing factors controlling this spatial trend. Future research paradigms may involve considering contributors involvement and commenting about dataset health.

  11. Harvard Aging Brain Study: Dataset and accessibility.

    PubMed

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year.

  12. Introduction of a simple-model-based land surface dataset for Europe

    NASA Astrophysics Data System (ADS)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology is important because it can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for the European continent that consists of soil moisture, runoff and evapotranspiration. It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset covers Europe and extends over the period 1984-2013 with a daily time step and 0.5°x0.5° resolution. We employ a novel approach to calibrate the model, whereby we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), evapotranspiration and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against ERA-Interim/Land and simulations of the Community Land Model Version 4, using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. In terms of runoff the SWBM dataset also performs well versus the benchmarks. They all show a slight dry bias which is probably due to underestimated precipitation used to force the model. The evaluation of the SWBM evapotranspiration dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting evapotranspiration dynamics may not be captured, and quality issues may occur in regions with complex terrain. Furthermore we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar impact on the

  13. Food additives

    PubMed Central

    Spencer, Michael

    1974-01-01

    Food additives are discussed from the food technology point of view. The reasons for their use are summarized: (1) to protect food from chemical and microbiological attack; (2) to even out seasonal supplies; (3) to improve their eating quality; (4) to improve their nutritional value. The various types of food additives are considered, e.g. colours, flavours, emulsifiers, bread and flour additives, preservatives, and nutritional additives. The paper concludes with consideration of those circumstances in which the use of additives is (a) justified and (b) unjustified. PMID:4467857

  14. The Role of Datasets on Scientific Influence within Conflict Research

    PubMed Central

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  15. Identification of druggable cancer driver genes amplified across TCGA datasets.

    PubMed

    Chen, Ying; McGee, Jeremy; Chen, Xianming; Doman, Thompson N; Gong, Xueqian; Zhang, Youyan; Hamm, Nicole; Ma, Xiwen; Higgs, Richard E; Bhagwat, Shripad V; Buchanan, Sean; Peng, Sheng-Bin; Staschke, Kirk A; Yadav, Vipin; Yue, Yong; Kouros-Mehr, Hosein

    2014-01-01

    The Cancer Genome Atlas (TCGA) projects have advanced our understanding of the driver mutations, genetic backgrounds, and key pathways activated across cancer types. Analysis of TCGA datasets have mostly focused on somatic mutations and translocations, with less emphasis placed on gene amplifications. Here we describe a bioinformatics screening strategy to identify putative cancer driver genes amplified across TCGA datasets. We carried out GISTIC2 analysis of TCGA datasets spanning 16 cancer subtypes and identified 486 genes that were amplified in two or more datasets. The list was narrowed to 75 cancer-associated genes with potential "druggable" properties. The majority of the genes were localized to 14 amplicons spread across the genome. To identify potential cancer driver genes, we analyzed gene copy number and mRNA expression data from individual patient samples and identified 42 putative cancer driver genes linked to diverse oncogenic processes. Oncogenic activity was further validated by siRNA/shRNA knockdown and by referencing the Project Achilles datasets. The amplified genes represented a number of gene families, including epigenetic regulators, cell cycle-associated genes, DNA damage response/repair genes, metabolic regulators, and genes linked to the Wnt, Notch, Hedgehog, JAK/STAT, NF-KB and MAPK signaling pathways. Among the 42 putative driver genes were known driver genes, such as EGFR, ERBB2 and PIK3CA. Wild-type KRAS was amplified in several cancer types, and KRAS-amplified cancer cell lines were most sensitive to KRAS shRNA, suggesting that KRAS amplification was an independent oncogenic event. A number of MAP kinase adapters were co-amplified with their receptor tyrosine kinases, such as the FGFR adapter FRS2 and the EGFR family adapters GRB2 and GRB7. The ubiquitin-like ligase DCUN1D1 and the histone methyltransferase NSD3 were also identified as novel putative cancer driver genes. We discuss the patient tailoring implications for existing cancer

  16. Thesaurus Dataset of Educational Technology in Chinese

    ERIC Educational Resources Information Center

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  17. Future weather dataset for fourteen UK sites.

    PubMed

    Liu, Chunde

    2016-09-01

    This Future weather dataset is used for assessing the risk of overheating and thermal discomfort or heat stress in the free running buildings. The weather files are in the format of .epw which can be used in the building simulation packages such as EnergyPlus, DesignBuilder, IES, etc.

  18. Interpolation of diffusion weighted imaging datasets.

    PubMed

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W; Reislev, Nina L; Paulson, Olaf B; Ptito, Maurice; Siebner, Hartwig R

    2014-12-01

    Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical resolution and more anatomical details in complex regions such as tract boundaries and cortical layers, which are normally only visualized at higher image resolutions. Similar results were found with typical clinical human DWI dataset. However, a possible bias in quantitative values imposed by the interpolation method used should be considered. The results indicate that conventional interpolation methods can be successfully applied to DWI datasets for mining anatomical details that are normally seen only at higher resolutions, which will aid in tractography and microstructural mapping of tissue compartments.

  19. LOFT input dataset reference document for RELAP5 validation studies

    SciTech Connect

    Birchley, J.C. )

    1992-04-01

    Analyses of LOFT experiment data are being carried out in order to validate the RELAP5 computer code for future application to PWR plant analysis. The MOD1 dataset was also used by CEGB Barnwood who subsequently converted the dataset to run with MOD2. The modifications included changes to the nodalisation to take advantage of the crossflow junction option at appropriate locations. Additional pipework representation was introduced for breaks in the intact (or active) loop. Further changes have been made by Winfrith following discussion of calculations performed by the CEGB and Winfrith. These concern the degree of noding in the steam generator, the fluid volume of the steam generator downcomer, and the location of the reactor vessel downcomer bypass path. This document describes the dataset contents relating to the volume, junction, and heat slab data for the intact loop, reactor pressure vessel, broken loop, steam generator secondary, and ECC system. Also described are the control system for steady state initialization, standard trip settings and boundary conditions.

  20. A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source.

    PubMed

    Atwood, Robert C; Bodey, Andrew J; Price, Stephen W T; Basham, Mark; Drakopoulos, Michael

    2015-06-13

    Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an 'orthogonal' fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and 'facility-independent': it can run on standard cluster infrastructure at any institution.

  1. Comparison of IGBP DISCover land cover dataset with a land cover dataset in China

    NASA Astrophysics Data System (ADS)

    Chen, Hua; Zhuang, Dafang

    2004-09-01

    Land cover information is important for the study of physical, chemical, biological and anthropological process on the surface of earth. Remote sensing data has been used to produce the land cover map by visual interpretation or automatic classification method in the past years. IGBP DISCover land cover dataset is a global land cover dataset based on remote sensing method in recent years. Firstly, we present a method to compare different land cover dataset based on invariant reliable land unit. Secondly, we compare IGBP Discover land cover dataset with Chinese land cover dataset. Finally, we analyze the possible reasons impacting the differences among the land cover classifications. The comparison results show that most of the land surface in China was identified as different types in those two datasets. For example, 63.7% of the deciduous needleleaf forest units in CLCD are mapped to the mixed forest by IDLCD. The different classification scheme and method used in these datasets are most likely the reasons to explain the differences between them.

  2. FTSPlot: Fast Time Series Visualization for Large Datasets

    PubMed Central

    Riss, Michael

    2014-01-01

    The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of ; the visualization itself can be done with a complexity of and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with ms. The current 64-bit implementation theoretically supports datasets with up to bytes, on the x86_64 architecture currently up to bytes are supported, and benchmarks have been conducted with bytes/1 TiB or double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments. PMID:24732865

  3. Spatially-based quality control for daily precipitation datasets

    NASA Astrophysics Data System (ADS)

    Serrano-Notivoli, Roberto; de Luis, Martín; Beguería, Santiago; Ángel Saz, Miguel

    2016-04-01

    There are many reasons why wrong data can appear in original precipitation datasets but their common characteristic is that all of them do not correspond to the natural variability of the climate variable. For this reason, is necessary a comprehensive analysis of the data of each station in each day, to be certain that the final dataset will be consistent and reliable. Most of quality control techniques applied over daily precipitation are based on the comparison of each observed value with the rest of values in same series or in reference series built from its nearest stations. These methods are inherited from monthly precipitation studies, but in daily scale the variability is bigger and the methods have to be different. A common character shared by all of these approaches is that they made reconstructions based on the best-correlated reference series, which could be a biased decision because, for example, a extreme precipitation occurred in one day in more than one station could be flagged as erroneous. We propose a method based on the specific conditions of the day and location to determine the reliability of each observation. This method keeps the local variance of the variable and the time-structure independence. To do that, individually for each daily value, we first compute the probability of precipitation occurrence through a multivariate logistic regression using the 10 nearest observations in a binomial mode (0=dry; 1=wet), this produces a binomial prediction (PB) between 0 and 1. Then, we compute a prediction of precipitation magnitude (PM) with the raw data of the same 10 nearest observations. Through these predictions we explore the original data in each day and location by five criteria: 1) Suspect data; 2) Suspect zero; 3) Suspect outlier; 4) Suspect wet and 5) Suspect dry. Tests over different datasets addressed that flagged data depend mainly on the number of available data and the homogeneous distribution of them.

  4. Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation

    PubMed Central

    Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B.

    2016-01-01

    Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware

  5. Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation.

    PubMed

    Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B

    2016-01-01

    Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware

  6. Derivation of HLA types from shotgun sequence datasets.

    PubMed

    Warren, René L; Choe, Gina; Freeman, Douglas J; Castellarin, Mauro; Munro, Sarah; Moore, Richard; Holt, Robert A

    2012-01-01

    The human leukocyte antigen (HLA) is key to many aspects of human physiology and medicine. All current sequence-based HLA typing methodologies are targeted approaches requiring the amplification of specific HLA gene segments. Whole genome, exome and transcriptome shotgun sequencing can generate prodigious data but due to the complexity of HLA loci these data have not been immediately informative regarding HLA genotype. We describe HLAminer, a computational method for identifying HLA alleles directly from shotgun sequence datasets (http://www.bcgsc.ca/platform/bioinfo/software/hlaminer). This approach circumvents the additional time and cost of generating HLA-specific data and capitalizes on the increasing accessibility and affordability of massively parallel sequencing.

  7. Method of generating features optimal to a dataset and classifier

    DOEpatents

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    2016-10-18

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  8. Automatic processing of multimodal tomography datasets.

    PubMed

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  9. Automatic processing of multimodal tomography datasets

    PubMed Central

    Parsons, Aaron D.; Price, Stephen W. T.; Wadeson, Nicola; Basham, Mark; Beale, Andrew M.; Ashton, Alun W.; Mosselmans, J. Frederick. W.; Quinn, Paul. D.

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source. PMID:28009564

  10. Global Precipitation Measurement: Methods, Datasets and Applications

    NASA Technical Reports Server (NTRS)

    Tapiador, Francisco; Turk, Francis J.; Petersen, Walt; Hou, Arthur Y.; Garcia-Ortega, Eduardo; Machado, Luiz, A. T.; Angelis, Carlos F.; Salio, Paola; Kidd, Chris; Huffman, George J.; De Castro, Manuel

    2011-01-01

    This paper reviews the many aspects of precipitation measurement that are relevant to providing an accurate global assessment of this important environmental parameter. Methods discussed include ground data, satellite estimates and numerical models. First, the methods for measuring, estimating, and modeling precipitation are discussed. Then, the most relevant datasets gathering precipitation information from those three sources are presented. The third part of the paper illustrates a number of the many applications of those measurements and databases. The aim of the paper is to organize the many links and feedbacks between precipitation measurement, estimation and modeling, indicating the uncertainties and limitations of each technique in order to identify areas requiring further attention, and to show the limits within which datasets can be used.

  11. Food additives

    MedlinePlus

    ... or natural. Natural food additives include: Herbs or spices to add flavor to foods Vinegar for pickling ... Certain colors improve the appearance of foods. Many spices, as well as natural and man-made flavors, ...

  12. Haplotype estimation for biobank scale datasets

    PubMed Central

    O’Connell, Jared; Sharp, Kevin; Shrine, Nick; Wain, Louise; Hall, Ian; Tobin, Martin; Zagury, Jean-Francois; Delaneau, Olivier; Marchini, Jonathan

    2016-01-01

    The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method SHAPEIT3 that can handle such biobank scale datasets and results in switch error rates as low as ~0.3%. The method exhibits O(NlogN) scaling in sample size (N), enabling fast and accurate phasing of even larger cohorts. PMID:27270105

  13. Evaluating AEROCOM Models with Remote Sensing Datasets

    NASA Astrophysics Data System (ADS)

    Schutgens, N.; Gryspeerdt, E.; Weigum, N.; Veira, A.; Partridge, D.; Stier, P.

    2014-12-01

    We present an in-depth evaluation of AEROCOM models with a variety of remote sensing datasets: MODIS AOT (& AE over ocean), AERONET, AOT, AE & SSA and Maritime Aerosol Network (MAN) AOT & AE. Together these datasets provide extensive global and temporal coverage and measure both extensive (AOT) as well as intensive aerosol properties (AE & SSA). Models and observations differ strongly in their spatio-temporal sampling. Model results are typical of large gridboxes (100 by 100 km), while observations are made over much smaller areas (10 by 10 km for MODIS, even smaller for AERONET and MAN). Model results are always available in contrast to observations that are intermittent due to orbital constraints, retrieval limitations and instrument failure/maintenance. We find that differences in AOT due to sampling effects can be 100% for instantaneous values and can still be 40% for monthly or yearly averages. Such differences are comparable to or larger than typical retrieval errors in the observations. We propose strategies (temporal colocation, spatial aggregation) for reducing these sampling errors Finally, we evaluate one year of co-located AOT, AE and SSA from several AEROCOM models against MODIS, AERONET and MAN observations. Where the observational datasets overlap, they give similar results but in general they allow us to evaluate models in very different spatio-temporal domains. We show that even small datasets like MAN AOT or AERONET SSA, provide a useful standard for evaluating models thanks to temporal colocation. The models differ quite a bit from the observations and each model differs in its own way. These results are presented through global maps of yearly averaged differences, time-series of modelled and observed data, scatter plots of correlations among observables (e.g. SSA vs AE) and Taylor diagrams. In particular, we find that the AEROCOM emissions substantially underestimate wildfire emissions and that many models have aerosol that is too absorbing.

  14. Dataset Curation through Renders and Ontology Matching

    DTIC Science & Technology

    2015-09-01

    only of a machine learning hacker, but also those of natural language analysis, human computer interaction, library studies, behavioral economics...representation for a task. Similar to a human in a country in which she does not know the language , it has done the best it can – learn that some words are...benefits of automated labeled dataset creation for fine-grained visual learning tasks. Specifically, we show that utilizing real-world, non-image

  15. The combined inhibitory effect of the adenosine A1 and cannabinoid CB1 receptors on cAMP accumulation in the hippocampus is additive and independent of A1 receptor desensitization.

    PubMed

    Serpa, André; Correia, Sara; Ribeiro, Joaquim A; Sebastião, Ana M; Cascalheira, José F

    2015-01-01

    Adenosine A1 and cannabinoid CB1 receptors are highly expressed in hippocampus where they trigger similar transduction pathways. We investigated how the combined acute activation of A1 and CB1 receptors modulates cAMP accumulation in rat hippocampal slices. The CB1 agonist WIN55212-2 (0.3-30 μM) decreased forskolin-stimulated cAMP accumulation with an EC50 of 6.6±2.7 μM and an Emax of 31%±2%, whereas for the A1 agonist, N6-cyclopentyladenosine (CPA, 10-150 nM), an EC50 of 35±19 nM, and an Emax of 29%±5 were obtained. The combined inhibitory effect of WIN55212-2 (30 μM) and CPA (100 nM) on cAMP accumulation was 41%±6% (n=4), which did not differ (P>0.7) from the sum of the individual effects of each agonist (43%±8%) but was different (P<0.05) from the effects of CPA or WIN55212-2 alone. Preincubation with CPA (100 nM) for 95 min caused desensitization of adenosine A1 activity, which did not modify the effect of WIN55212-2 (30 μM) on cAMP accumulation. In conclusion, the combined effect of CB1 and A1 receptors on cAMP formation is additive and CB1 receptor activity is not affected by short-term A1 receptor desensitization.

  16. Data Assimilation and Model Evaluation Experiment Datasets.

    NASA Astrophysics Data System (ADS)

    Lai, Chung-Chieng A.; Qian, Wen; Glenn, Scott M.

    1994-05-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMÉE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets.The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: 1)collection of observational data; 2) analysis and interpretation; 3) interpolation using the Optimum Thermal Interpolation System package; 4) quality control and re-analysis; and 5) data archiving and software documentation.The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement.Suggestions for DAMEE data usages include 1) ocean modeling and data assimilation studies, 2) diagnosis and theorectical studies, and 3) comparisons with locally detailed observations.

  17. Data assimilation and model evaluation experiment datasets

    NASA Technical Reports Server (NTRS)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  18. First observations using SPICE hyperspectral dataset

    NASA Astrophysics Data System (ADS)

    Rosario, Dalton; Romano, Joao; Borel, Christoph

    2014-06-01

    Our first observations using the longwave infrared (LWIR) hyperspectral data subset of the Spectral and Polarimetric Imagery Collection Experiment (SPICE) database are summarized in this paper, focusing on the inherent challenges associated with using this sensing modality for the purpose of object pattern recognition. Emphases are also put on data quality, qualitative validation of expected atmospheric spectral features, and qualitative comparison against another dataset of the same site using a different LWIR hyperspectral sensor. SPICE is a collaborative effort between the Army Research Laboratory, U.S. Army Armament RDEC, and more recently the Air Force Institute of Technology. It focuses on the collection and exploitation of longwave and midwave infrared (LWIR and MWIR) hyperspectral and polarimetric imagery. We concluded from this work that the quality of SPICE hyperspectral LWIR data is categorically comparable to other datasets recorded by a different sensor of similar specs; and adequate for algorithm research, given the scope of SPICE. The scope was to conduct a long-term infrared data collection of the same site with targets, using both sensing modalities, under various weather and non-ideal conditions. Then use the vast dataset and associated ground truth information to assess performance of the state of the art algorithms, while determining performance degradation sources. The expectation is that results from these assessments will spur new algorithmic ideas with the potential to augment pattern recognition performance in remote sensing applications. Over time, we are confident the SPICE database will prove to be an asset to the wide open remote sensing community.

  19. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    PubMed Central

    Lin, Dongdong; Zhang, Jigang; Li, Jingyao; He, Hao; Deng, Hong-Wen; Wang, Yu-Ping

    2014-01-01

    A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the “small sample, but large variables” problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other

  20. Overview of biological database mapping services for interoperation between different 'omics' datasets

    PubMed Central

    2011-01-01

    Many primary biological databases are dedicated to providing annotation for a specific type of biological molecule such as a clone, transcript, gene or protein, but often with limited cross-references. Therefore, enhanced mapping is required between these databases to facilitate the correlation of independent experimental datasets. For example, molecular biology experiments conducted on samples (DNA, mRNA or protein) often yield more than one type of 'omics' dataset as an object for analysis (eg a sample can have a genomics as well as proteomics expression dataset available for analysis). Thus, in order to map the two datasets, the identifier type from one dataset is required to be linked to another dataset, so preventing loss of critical information in downstream analysis. This identifier mapping can be performed using identifier converter software relevant to the query and target identifier databases. This review presents the publicly available web-based biological database identifier converters, with comparison of their usage, input and output formats, and the types of available query and target database identifier types. PMID:22155608

  1. The Greenwich Photo-heliographic Results (1874 - 1976): Summary of the Observations, Applications, Datasets, Definitions and Errors

    NASA Astrophysics Data System (ADS)

    Willis, D. M.; Coffey, H. E.; Henwood, R.; Erwin, E. H.; Hoyt, D. V.; Wild, M. N.; Denig, W. F.

    2013-11-01

    The measurements of sunspot positions and areas that were published initially by the Royal Observatory, Greenwich, and subsequently by the Royal Greenwich Observatory (RGO), as the Greenwich Photo-heliographic Results ( GPR), 1874 - 1976, exist in both printed and digital forms. These printed and digital sunspot datasets have been archived in various libraries and data centres. Unfortunately, however, typographic, systematic and isolated errors can be found in the various datasets. The purpose of the present paper is to begin the task of identifying and correcting these errors. In particular, the intention is to provide in one foundational paper all the necessary background information on the original solar observations, their various applications in scientific research, the format of the different digital datasets, the necessary definitions of the quantities measured, and the initial identification of errors in both the printed publications and the digital datasets. Two companion papers address the question of specific identifiable errors; namely, typographic errors in the printed publications, and both isolated and systematic errors in the digital datasets. The existence of two independently prepared digital datasets, which both contain information on sunspot positions and areas, makes it possible to outline a preliminary strategy for the development of an even more accurate digital dataset. Further work is in progress to generate an extremely reliable sunspot digital dataset, based on the programme of solar observations supported for more than a century by the Royal Observatory, Greenwich, and the Royal Greenwich Observatory. This improved dataset should be of value in many future scientific investigations.

  2. Quantifying uncertainty in observational rainfall datasets

    NASA Astrophysics Data System (ADS)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  3. Group Sparse Additive Models

    PubMed Central

    Yin, Junming; Chen, Xi; Xing, Eric P.

    2016-01-01

    We consider the problem of sparse variable selection in nonparametric additive models, with the prior knowledge of the structure among the covariates to encourage those variables within a group to be selected jointly. Previous works either study the group sparsity in the parametric setting (e.g., group lasso), or address the problem in the nonparametric setting without exploiting the structural information (e.g., sparse additive models). In this paper, we present a new method, called group sparse additive models (GroupSpAM), which can handle group sparsity in additive models. We generalize the ℓ1/ℓ2 norm to Hilbert spaces as the sparsity-inducing penalty in GroupSpAM. Moreover, we derive a novel thresholding condition for identifying the functional sparsity at the group level, and propose an efficient block coordinate descent algorithm for constructing the estimate. We demonstrate by simulation that GroupSpAM substantially outperforms the competing methods in terms of support recovery and prediction accuracy in additive models, and also conduct a comparative experiment on a real breast cancer dataset.

  4. The Development of a Noncontact Letter Input Interface “Fingual” Using Magnetic Dataset

    NASA Astrophysics Data System (ADS)

    Fukushima, Taishi; Miyazaki, Fumio; Nishikawa, Atsushi

    We have newly developed a noncontact letter input interface called “Fingual”. Fingual uses a glove mounted with inexpensive and small magnetic sensors. Using the glove, users can input letters to form the finger alphabets, a kind of sign language. The proposed method uses some dataset which consists of magnetic field and the corresponding letter information. In this paper, we show two recognition methods using the dataset. First method uses Euclidean norm, and second one additionally uses Gaussian function as a weighting function. Then we conducted verification experiments for the recognition rate of each method in two situations. One of the situations is that subjects used their own dataset; the other is that they used another person's dataset. As a result, the proposed method could recognize letters with a high rate in both situations, even though it is better to use their own dataset than to use another person's dataset. Though Fingual needs to collect magnetic dataset for each letter in advance, its feature is the ability to recognize letters without the complicated calculations such as inverse problems. This paper shows results of the recognition experiments, and shows the utility of the proposed system “Fingual”.

  5. Development of a SPARK Training Dataset

    SciTech Connect

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  6. Potlining Additives

    SciTech Connect

    Rudolf Keller

    2004-08-10

    In this project, a concept to improve the performance of aluminum production cells by introducing potlining additives was examined and tested. Boron oxide was added to cathode blocks, and titanium was dissolved in the metal pool; this resulted in the formation of titanium diboride and caused the molten aluminum to wet the carbonaceous cathode surface. Such wetting reportedly leads to operational improvements and extended cell life. In addition, boron oxide suppresses cyanide formation. This final report presents and discusses the results of this project. Substantial economic benefits for the practical implementation of the technology are projected, especially for modern cells with graphitized blocks. For example, with an energy savings of about 5% and an increase in pot life from 1500 to 2500 days, a cost savings of $ 0.023 per pound of aluminum produced is projected for a 200 kA pot.

  7. Phosphazene additives

    DOEpatents

    Harrup, Mason K; Rollins, Harry W

    2013-11-26

    An additive comprising a phosphazene compound that has at least two reactive functional groups and at least one capping functional group bonded to phosphorus atoms of the phosphazene compound. One of the at least two reactive functional groups is configured to react with cellulose and the other of the at least two reactive functional groups is configured to react with a resin, such as an amine resin of a polycarboxylic acid resin. The at least one capping functional group is selected from the group consisting of a short chain ether group, an alkoxy group, or an aryloxy group. Also disclosed are an additive-resin admixture, a method of treating a wood product, and a wood product.

  8. Environmental Dataset Gateway (EDG) Search Widget

    EPA Pesticide Factsheets

    Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other other applications. This allows individuals to provide direct access to EPA's metadata outside the EDG interface. The EDG Search Widget makes it possible to search the EDG from another web page or application. The search widget can be included on your website by simply inserting one or two lines of code. Users can type a search term or lucene search query in the search field and retrieve a pop-up list of records that match that search.

  9. Multivariate Visual Explanation for High Dimensional Datasets

    PubMed Central

    Barlowe, Scott; Zhang, Tianyi; Liu, Yujie; Yang, Jing; Jacobs, Donald

    2010-01-01

    Understanding multivariate relationships is an important task in multivariate data analysis. Unfortunately, existing multivariate visualization systems lose effectiveness when analyzing relationships among variables that span more than a few dimensions. We present a novel multivariate visual explanation approach that helps users interactively discover multivariate relationships among a large number of dimensions by integrating automatic numerical differentiation techniques and multidimensional visualization techniques. The result is an efficient workflow for multivariate analysis model construction, interactive dimension reduction, and multivariate knowledge discovery leveraging both automatic multivariate analysis and interactive multivariate data visual exploration. Case studies and a formal user study with a real dataset illustrate the effectiveness of this approach. PMID:20694164

  10. Cluster analysis applied to multiparameter geophysical dataset

    NASA Astrophysics Data System (ADS)

    Di Giuseppe, M. G.; Troiano, A.; Troise, C.; De Natale, G.

    2012-04-01

    Multi-parameter acquisition is a common geophysical field practice nowadays. Regularly seismic velocity and attenuation, gravity and electromagnetic dataset are acquired in a certain area, to obtain a complete characterization of the some investigate feature of the subsoil. Such a richness of information is often underestimated, although an integration of the analysis could provide a notable improving in the imaging of the investigated structures, mostly because the handling of distinct parameters and their joint inversion still presents several and severe problems. Post-inversion statistical techniques represent a promising approach to these questions, providing a quick, simple and elegant way to obtain this advantageous but complex integration. We present an approach based on the partition of the analyzed multi parameter dataset in a number of different classes, identified as localized regions of high correlation. These classes, or 'Cluster', are structured in such a way that the observations pertaining to a certain group are more similar to each other than the observations belonging to a different one, according to an optimal logical criterion. Regions of the subsoil sharing the same physical characteristic are so identified, without a-priori or empirical relationship linking the distinct measured parameters. The retrieved imaging results highly affordable in a statistical sense, specifically due to this lack of external hypothesis that are, instead, indispensable in a full joint inversion, were works, as matter of fact, just a real constrain for the inversion process, not seldom of relative consistence. We apply our procedure to a certain number of experimental dataset, related to several structures at very different scales presents in the Campanian district (southern Italy). These structures goes from the shallows evidence of the active fault zone originating the M 7.9 Irpinia earthquake to the main feature characterizing the Campi Flegrei Caldera and the Mt

  11. Environmental Dataset Gateway (EDG) REST Interface

    EPA Pesticide Factsheets

    Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other other applications. This allows individuals to provide direct access to EPA's metadata outside the EDG interface. The EDG REST Interface allows each users to query the catalog through a URL using REST syntax. Accessing individual metadata documents through their REST URLs, or groups of documents that match specific search criteria through a REST-formatted search URL, provides powerful functionality for searching, viewing, and sharing EDG records.

  12. Independent Peer Reviews

    SciTech Connect

    2012-03-16

    Independent Assessments: DOE's Systems Integrator convenes independent technical reviews to gauge progress toward meeting specific technical targets and to provide technical information necessary for key decisions.

  13. Evaluation of anomalies in GLDAS-1996 dataset.

    PubMed

    Zhou, Xinyao; Zhang, Yongqiang; Yang, Yonghui; Yang, Yanmin; Han, Shumin

    2013-01-01

    Global Land Data Assimilation System (GLDAS) data are widely used for land-surface flux simulations. Therefore, the simulation accuracy using GLDAS dataset is largely contingent upon the accuracy of the GLDAS dataset. It is found that GLDAS land-surface model simulated runoff exhibits strong anomalies for 1996. These anomalies are investigated by evaluating four GLDAS meteorological forcing data (precipitation, air temperature, downward shortwave radiation and downward longwave radiation) in six large basins across the world (Danube, Mississippi, Yangtze, Congo, Amazon and Murray-Darling basins). Precipitation data from the Global Precipitation Climatology Centre (GPCC) are also compared with GLDAS forcing precipitation data. Large errors and lack of monthly variability in GLDAS-1996 precipitation data are the main sources for the anomalies in the simulated runoff. The impact of the precipitation data on simulated runoff for 1996 is investigated with the Community Atmosphere Biosphere Land Exchange (CABLE) land-surface model in the Yangtze basin, for which area high-quality local precipitation data are obtained from the China Meteorological Administration (CMA). The CABLE model is driven by GLDAS daily precipitation data and CMA daily precipitation, respectively. The simulated daily and monthly runoffs obtained from CMA data are noticeably better than those obtained from GLDAS data, suggesting that GLDAS-1996 precipitation data are not so reliable for land-surface flux simulations.

  14. Integrated remotely sensed datasets for disaster management

    NASA Astrophysics Data System (ADS)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  15. Land cover trends dataset, 1973-2000

    USGS Publications Warehouse

    Soulard, Christopher E.; Acevedo, William; Auch, Roger F.; Sohl, Terry L.; Drummond, Mark A.; Sleeter, Benjamin M.; Sorenson, Daniel G.; Kambly, Steven; Wilson, Tamara S.; Taylor, Janis L.; Sayler, Kristi L.; Stier, Michael P.; Barnes, Christopher A.; Methven, Steven C.; Loveland, Thomas R.; Headley, Rachel; Brooks, Mark S.

    2014-01-01

    The U.S. Geological Survey Land Cover Trends Project is releasing a 1973–2000 time-series land-use/land-cover dataset for the conterminous United States. The dataset contains 5 dates of land-use/land-cover data for 2,688 sample blocks randomly selected within 84 ecological regions. The nominal dates of the land-use/land-cover maps are 1973, 1980, 1986, 1992, and 2000. The land-use/land-cover maps were classified manually from Landsat Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus imagery using a modified Anderson Level I classification scheme. The resulting land-use/land-cover data has a 60-meter resolution and the projection is set to Albers Equal-Area Conic, North American Datum of 1983. The files are labeled using a standard file naming convention that contains the number of the ecoregion, sample block, and Landsat year. The downloadable files are organized by ecoregion, and are available in the ERDAS IMAGINETM (.img) raster file format.

  16. Independence in ROI analysis: where is the voodoo?

    PubMed

    Poldrack, Russell A; Mumford, Jeanette A

    2009-06-01

    We discuss the effects of non-independence on region of interest (ROI) analysis of functional magnetic resonance imaging data, which has recently been raised in a prominent article by Vul et al. We outline the problem of non-independence, and use a previously published dataset to examine the effects of non-independence. These analyses show that very strong correlations (exceeding 0.8) can occur even when the ROI is completely independent of the data being analyzed, suggesting that the claims of Vul et al. regarding the implausibility of these high correlations are incorrect. We conclude with some recommendations to help limit the potential problems caused by non-independence.

  17. Statistics of large detrital geochronology datasets

    NASA Astrophysics Data System (ADS)

    Saylor, J. E.; Sundell, K. E., II

    2014-12-01

    Implementation of quantitative metrics for inter-sample comparison of detrital geochronological data sets has lagged the increase in data set size, and ability to identify sub-populations and quantify their relative proportions. Visual comparison or application of some statistical approaches, particularly the Kolmogorov-Smirnov (KS) test, that initially appeared to provide a simple way of comparing detrital data sets, may be inadequate to quantify their similarity. We evaluate several proposed metrics by applying them to four large synthetic datasets drawn randomly from a parent dataset, as well as a recently published large empirical dataset consisting of four separate (n = ~1000 each) analyses of the same rock sample. Visual inspection of the cumulative probability density functions (CDF) and relative probability density functions (PDF) confirms an increasingly close correlation between data sets as the number of analyses increases. However, as data set size increases the KS test yields lower mean p-values implying greater confidence that the samples were not drawn from the same parent population and high standard deviations despite minor decreases in the mean difference between sample CDFs. We attribute this to the increasing sensitivity of the KS test when applied to larger data sets, which in turn limits its use for quantitative inter-sample comparison in detrital geochronology. Proposed alternative metrics, including Similarity, Likeness (complement to Mismatch), and the coefficient of determination (R2) of a cross-plot of PDF quantiles, point to an increasingly close correlation between data sets with increasing size, although they are the most sensitive at different ranges of data set sizes. The Similarity test is most sensitive to variation in data sets with n < 100 and is relatively insensitive to further convergence between larger data sets. The Likeness test reaches 90% of its asymptotic maximum at data set sizes of n = 200. The PDF cross-plot R2 value

  18. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  19. A comparison of clustering methods for biogeography with fossil datasets

    PubMed Central

    2016-01-01

    Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place. PMID:26966658

  20. Publicly Releasing a Large Simulation Dataset with NDS Labs

    NASA Astrophysics Data System (ADS)

    Goldbaum, Nathan

    2016-03-01

    Optimally, all publicly funded research should be accompanied by the tools, code, and data necessary to fully reproduce the analysis performed in journal articles describing the research. This ideal can be difficult to attain, particularly when dealing with large (>10 TB) simulation datasets. In this lightning talk, we describe the process of publicly releasing a large simulation dataset to accompany the submission of a journal article. The simulation was performed using Enzo, an open source, community-developed N-body/hydrodynamics code and was analyzed using a wide range of community- developed tools in the scientific Python ecosystem. Although the simulation was performed and analyzed using an ecosystem of sustainably developed tools, we enable sustainable science using our data by making it publicly available. Combining the data release with the NDS Labs infrastructure allows a substantial amount of added value, including web-based access to analysis and visualization using the yt analysis package through an IPython notebook interface. In addition, we are able to accompany the paper submission to the arXiv preprint server with links to the raw simulation data as well as interactive real-time data visualizations that readers can explore on their own or share with colleagues during journal club discussions. It is our hope that the value added by these services will substantially increase the impact and readership of the paper.

  1. Social voting advice applications-definitions, challenges, datasets and evaluation.

    PubMed

    Katakis, Ioannis; Tsapatsoulis, Nicolas; Mendez, Fernando; Triga, Vasiliki; Djouvas, Constantinos

    2014-07-01

    Voting advice applications (VAAs) are online tools that have become increasingly popular and purportedly aid users in deciding which party/candidate to vote for during an election. In this paper we present an innovation to current VAA design which is based on the introduction of a social network element. We refer to this new type of online tool as a social voting advice application (SVAA). SVAAs extend VAAs by providing (a) community-based recommendations, (b) comparison of users' political opinions, and (c) a channel of user communication. In addition, SVAAs enriched with data mining modules, can operate as citizen sensors recording the sentiment of the electorate on issues and candidates. Drawing on VAA datasets generated by the Preference Matcher research consortium, we evaluate the results of the first VAA-Choose4Greece-which incorporated social voting features and was launched during the landmark Greek national elections of 2012. We demonstrate how an SVAA can provide community based features and, at the same time, serve as a citizen sensor. Evaluation of the proposed techniques is realized on a series of datasets collected from various VAAs, including Choose4Greece. The collection is made available online in order to promote research in the field.

  2. Unsupervised verification of laser-induced breakdown spectroscopy dataset clustering

    NASA Astrophysics Data System (ADS)

    Wójcik, Michał R.; Zdunek, Rafał; Antończak, Arkadiusz J.

    2016-12-01

    Laser-induced breakdown spectroscopy is a versatile, optical technique used in a wide range of qualitative and quantitative analyses conducted with the use of various chemometric techniques. The aim of this research is to demonstrate the possibility of unsupervised clustering of an unknown dataset using K-means clustering algorithm, and verifying its input parameters through investigating generalized eigenvalues derived with linear discriminant analysis. In all the cases, principal component analyses have been applied to reduce data dimensionality and shorten computation time of the whole operation. The experiment was conducted on a dataset collected from twenty four different materials divided into six groups: metals, semiconductors, ceramics, rocks, metal alloys and others with the use of a three-channel spectrometer (298.02-628.73nm overall spectral range) and a UV (248nm) excimer laser. Additionally, two more complex groups containing all specimens and all specimens excluding rocks were created. The resulting spaces of eigenvalues were calculated for every group and three different distances in the multidimensional space (cosine, square Euclidean and L1). As expected, the correct numbers of specimens within groups with small deviations were obtained, and the validity of the unsupervised method has thus been proven.

  3. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    PubMed

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis.

  4. Dataset-Driven Research to Support Learning and Knowledge Analytics

    ERIC Educational Resources Information Center

    Verbert, Katrien; Manouselis, Nikos; Drachsler, Hendrik; Duval, Erik

    2012-01-01

    In various research areas, the availability of open datasets is considered as key for research and application purposes. These datasets are used as benchmarks to develop new algorithms and to compare them to other algorithms in given settings. Finding such available datasets for experimentation can be a challenging task in technology enhanced…

  5. LIMS Version 6 Level 3 Dataset

    NASA Technical Reports Server (NTRS)

    Remsberg, Ellis E.; Lingenfelser, Gretchen

    2010-01-01

    This report describes the Limb Infrared Monitor of the Stratosphere (LIMS) Version 6 (V6) Level 3 data products and the assumptions used for their generation. A sequential estimation algorithm was used to obtain daily, zonal Fourier coefficients of the several parameters of the LIMS dataset for 216 days of 1978-79. The coefficients are available at up to 28 pressure levels and at every two degrees of latitude from 64 S to 84 N and at the synoptic time of 12 UT. Example plots were prepared and archived from the data at 10 hPa of January 1, 1979, to illustrate the overall coherence of the features obtained with the LIMS-retrieved parameters.

  6. VAST Contest Dataset Use in Education

    SciTech Connect

    Whiting, Mark A.; North, Chris; Endert, Alexander; Scholtz, Jean; Haack, Jereme N.; Varley, Caroline F.; Thomas, James J.

    2009-12-13

    The IEEE Visual Analytics Science and Technology (VAST) Symposium has held a contest each year since its inception in 2006. These events are designed to provide visual analytics researchers and developers with analytic challenges similar to those encountered by professional information analysts. The VAST contest has had an extended life outside of the symposium, however, as materials are being used in universities and other educational settings, either to help teachers of visual analytics-related classes or for student projects. We describe how we develop VAST contest datasets that results in products that can be used in different settings and review some specific examples of the adoption of the VAST contest materials in the classroom. The examples are drawn from graduate and undergraduate courses at Virginia Tech and from the Visual Analytics "Summer Camp" run by the National Visualization and Analytics Center in 2008. We finish with a brief discussion on evaluation metrics for education

  7. Predicting dataset popularity for the CMS experiment

    NASA Astrophysics Data System (ADS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-10-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  8. ADAM: automated data management for research datasets

    PubMed Central

    Woodbridge, Mark; Tomlinson, Christopher D.; Butcher, Sarah A.

    2013-01-01

    Existing repositories for experimental datasets typically capture snapshots of data acquired using a single experimental technique and often require manual population and continual curation. We present a storage system for heterogeneous research data that performs dynamic automated indexing to provide powerful search, discovery and collaboration features without the restrictions of a structured repository. ADAM is able to index many commonly used file formats generated by laboratory assays and therefore offers specific advantages to the experimental biology community. However, it is not domain specific and can promote sharing and re-use of working data across scientific disciplines. Availability and implementation: ADAM is implemented using Java and supported on Linux. It is open source under the GNU General Public License v3.0. Installation instructions, binary code, a demo system and virtual machine image and are available at http://www.imperial.ac.uk/bioinfsupport/resources/software/adam. Contact: m.woodbridge@imperial.ac.uk PMID:23109181

  9. National hydrography dataset--linear referencing

    USGS Publications Warehouse

    Simley, Jeffrey; Doumbouya, Ariel

    2012-01-01

    Geospatial data normally have a certain set of standard attributes, such as an identification number, the type of feature, and name of the feature. These standard attributes are typically embedded into the default attribute table, which is directly linked to the geospatial features. However, it is impractical to embed too much information because it can create a complex, inflexible, and hard to maintain geospatial dataset. Many scientists prefer to create a modular, or relational, data design where the information about the features is stored and maintained separately, then linked to the geospatial data. For example, information about the water chemistry of a lake can be maintained in a separate file and linked to the lake. A Geographic Information System (GIS) can then relate the water chemistry to the lake and analyze it as one piece of information. For example, the GIS can select all lakes more than 50 acres, with turbidity greater than 1.5 milligrams per liter.

  10. A new bed elevation dataset for Greenland

    NASA Astrophysics Data System (ADS)

    Bamber, J. L.; Griggs, J. A.; Hurkmans, R. T. W. L.; Dowdeswell, J. A.; Gogineni, S. P.; Howat, I.; Mouginot, J.; Paden, J.; Palmer, S.; Rignot, E.; Steinhage, D.

    2013-03-01

    We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM) over the entire island including across the glaciated-ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  11. An Alternative Measure of Solar Activity from Detailed Sunspot Datasets

    NASA Astrophysics Data System (ADS)

    Muraközy, J.; Baranyi, T.; Ludmány, A.

    2016-11-01

    The sunspot number is analyzed by using detailed sunspot data, including aspects of observability, sunspot sizes, and proper identification of sunspot groups as discrete entities of solar activity. The tests show that in addition to the subjective factors there are also objective causes of the ambiguities in the series of sunspot numbers. To introduce an alternative solar-activity measure, the physical meaning of the sunspot number has to be reconsidered. It contains two components whose numbers are governed by different physical mechanisms and this is one source of the ambiguity. This article suggests an activity index, which is the amount of emerged magnetic flux. The only long-term proxy measure is the detailed sunspot-area dataset with proper calibration to the magnetic flux. The Debrecen sunspot databases provide an appropriate source for the establishment of the suggested activity index.

  12. Analysis Summary of an Assembled Western U.S. Dataset

    SciTech Connect

    Ryall, F

    2005-03-22

    The dataset for this report is described in Walter et al. (2004) and consists primarily of Nevada Test Site (NTS) explosions, hole collapse and earthquakes. In addition, there were several earthquakes in California and Utah; earthquakes recorded near Cataract Creek, Arizona; mine blasts at two areas in Arizona; and two mine collapses in Wyoming. In the vicinity of NTS there were mainshock/aftershock sequences at Little Skull Mt, Scotty's Junction and Hector ere mine. All the events were shallow and distances ranged from about 0.1 degree to regional distances. All of the data for these events were carefully reviewed and analyzed. In the following sections of the report, we describe analysis procedures, problems with the data and results of analysis.

  13. Non-local gravity and comparison with observational datasets

    SciTech Connect

    Dirian, Yves; Foffa, Stefano; Kunz, Martin; Maggiore, Michele; Pettorino, Valeria E-mail: stefano.foffa@unige.ch E-mail: michele.maggiore@unige.ch

    2015-04-01

    We study the cosmological predictions of two recently proposed non-local modifications of General Relativity. Both models have the same number of parameters as ΛCDM, with a mass parameter m replacing the cosmological constant. We implement the cosmological perturbations of the non-local models into a modification of the CLASS Boltzmann code, and we make a full comparison to CMB, BAO and supernova data. We find that the non-local models fit these datasets very well, at the same level as ΛCDM. Among the vast literature on modified gravity models, this is, to our knowledge, the only example which fits data as well as ΛCDM without requiring any additional parameter. For both non-local models parameter estimation using Planck +JLA+BAO data gives a value of H{sub 0} slightly higher than in ΛCDM.

  14. EVALUATION OF LAND USE/LAND COVER DATASETS FOR URBAN WATERSHED MODELING

    SciTech Connect

    S.J. BURIAN; M.J. BROWN; T.N. MCPHERSON

    2001-08-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size.

  15. Fused Lasso Additive Model

    PubMed Central

    Petersen, Ashley; Witten, Daniela; Simon, Noah

    2016-01-01

    We consider the problem of predicting an outcome variable using p covariates that are measured on n independent observations, in a setting in which additive, flexible, and interpretable fits are desired. We propose the fused lasso additive model (FLAM), in which each additive function is estimated to be piecewise constant with a small number of adaptively-chosen knots. FLAM is the solution to a convex optimization problem, for which a simple algorithm with guaranteed convergence to a global optimum is provided. FLAM is shown to be consistent in high dimensions, and an unbiased estimator of its degrees of freedom is proposed. We evaluate the performance of FLAM in a simulation study and on two data sets. Supplemental materials are available online, and the R package flam is available on CRAN. PMID:28239246

  16. Independent Study in Iowa.

    ERIC Educational Resources Information Center

    Idaho Univ., Moscow.

    This guide to independent study in Idaho begins with introductory information on the following aspects of independent study: the Independent Study in Idaho consortium, student eligibility, special needs, starting dates, registration, costs, textbooks and instructional materials, e-mail and faxing, refunds, choosing a course, time limits, speed…

  17. Classification of antimicrobial peptides with imbalanced datasets

    NASA Astrophysics Data System (ADS)

    Camacho, Francy L.; Torres, Rodrigo; Ramos Pollán, Raúl

    2015-12-01

    In the last years, pattern recognition has been applied to several fields for solving multiple problems in science and technology as for example in protein prediction. This methodology can be useful for prediction of activity of biological molecules, e.g. for determination of antimicrobial activity of synthetic and natural peptides. In this work, we evaluate the performance of different physico-chemical properties of peptides (descriptors groups) in the presence of imbalanced data sets, when facing the task of detecting whether a peptide has antimicrobial activity. We evaluate undersampling and class weighting techniques to deal with the class imbalance with different classification methods and descriptor groups. Our classification model showed an estimated precision of 96% showing that descriptors used to codify the amino acid sequences contain enough information to correlate the peptides sequences with their antimicrobial activity by means of learning machines. Moreover, we show how certain descriptor groups (pseudoaminoacid composition type I) work better with imbalanced datasets while others (dipeptide composition) work better with balanced ones.

  18. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    SciTech Connect

    Lopez Torres, E. E-mail: cerello@to.infn.it; Fiorina, E.; Pennazio, F.; Peroni, C.; Saletta, M.; Cerello, P. E-mail: cerello@to.infn.it; Camarlinghi, N.; Fantacci, M. E.

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  19. ASSESSING THE ACCURACY OF NATIONAL LAND COVER DATASET AREA ESTIMATES AT MULTIPLE SPATIAL EXTENTS

    EPA Science Inventory

    Site specific accuracy assessments provide fine-scale evaluation of the thematic accuracy of land use/land cover (LULC) datasets; however, they provide little insight into LULC accuracy across varying spatial extents. Additionally, LULC data are typically used to describe lands...

  20. Provenance Challenges for Earth Science Dataset Publication

    NASA Technical Reports Server (NTRS)

    Tilmes, Curt

    2011-01-01

    Modern science is increasingly dependent on computational analysis of very large data sets. Organizing, referencing, publishing those data has become a complex problem. Published research that depends on such data often fails to cite the data in sufficient detail to allow an independent scientist to reproduce the original experiments and analyses. This paper explores some of the challenges related to data identification, equivalence and reproducibility in the domain of data intensive scientific processing. It will use the example of Earth Science satellite data, but the challenges also apply to other domains.

  1. A new bed elevation dataset for Greenland

    NASA Astrophysics Data System (ADS)

    Griggs, J. A.; Bamber, J. L.; Hurkmans, R. T. W. L.; Dowdesewell, J. A.; Gogineni, S. P.; Howat, I.; Mouginot, J.; Paden, J.; Palmer, S.; Rignot, E.; Steinhage, D.

    2012-11-01

    We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2011. Around 344 000 line kilometres of airborne data were used, with the majority of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non glaciated terrain to produce a consistent bed digital elevation model (DEM) over the entire island including across the glaciated/ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice shelf thickness was determined where a floating tongue exists, in particular in the north. The across-track spacing between flight lines warranted interpolation at 1 km postings near the ice sheet margin and 2.5 km in the interior. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±6 m to about ±200 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new data sets, particularly along the ice sheet margin, where ice velocity is highest and changes most marked. We use the new bed and surface DEMs to calculate the hydraulic potential for subglacial flow and present the large scale pattern of water routing. We estimate that the volume of ice included in our land/ice mask would raise eustatic sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  2. The need for a national LIDAR dataset

    USGS Publications Warehouse

    Stoker, Jason M.; Harding, David; Parrish, Jay

    2008-01-01

    On May 21st and 22nd 2008, the U.S. Geological Survey (USGS), the National Aeronautics and Space Administration (NASA), and the Association of American State Geologists (AASG) hosted the Second National Light Detection and Ranging (Lidar) Initiative Strategy Meeting at USGS Headquarters in Reston, Virginia. The USGS is taking the lead in cooperation with many partners to design and implement a future high-resolution National Lidar Dataset. Initial work is focused on determining viability, developing requirements and specifi cations, establishing what types of information contained in a lidar signal are most important, and identifying key stakeholders and their respective roles. In February 2007, USGS hosted the fi rst National Lidar Initiative Strategy Meeting at USGS Headquarters in Virginia. The presentations and a published summary report from the fi rst meeting can be found on the Center for Lidar Information Coordination and Knowledge (CLICK) Website: http://lidar.cr.usgs.gov. The fi rst meeting demonstrated the public need for consistent lidar data at the national scale. The goals of the second meeting were to further expand on the ideas and information developed in the fi rst meeting, to bring more stakeholders together, to both refi ne and expand on the requirements and capabilities needed, and to discuss an organizational and funding approach for an initiative of this magnitude. The approximately 200 participants represented Federal, State, local, commercial and academic interests. The second meeting included a public solicitation for presentations and posters to better democratize the workshop. All of the oral presentation abstracts that were submitted were accepted, and the 25 poster submissions augmented and expanded upon the oral presentations. The presentations from this second meeting, including audio, can be found on CLICK at http://lidar.cr.usgs.gov/national_lidar_2008.php. Based on the presentations and the discussion sessions, the following

  3. A reference GNSS tropospheric dataset over Europe.

    NASA Astrophysics Data System (ADS)

    Pacione, Rosa; Di Tomaso, Simona

    2016-04-01

    The present availability of 18 years of GNSS data belonging to the European Permanent Network (EPN, http://www.epncb.oma.be/) is a valuable database for the development of a climate data record of GNSS tropospheric products over Europe. This dataset has high potential for monitoring trend and variability in atmospheric water vapour, improving the knowledge of climatic trends of atmospheric water vapour and being useful for global and regional NWP reanalyses as well as climate model simulations. In the framework of the EPN-Repro2, a second reprocessing campaign of the EPN, five Analysis Centres have homogenously reprocessed the EPN network for the 1996-2013. Three Analysis Centres are providing homogenously reprocessed solutions for the entire network, which are analyzed by the three different software packages: Bernese, GAMIT and GIPSY-OASIS. Smaller subnetworks based on Bernese 5.2 are also provided. A huge effort is made for providing solutions that are the basis for deriving new coordinates, velocities and troposphere parameters, Zenith Tropospheric Delays and Horizontal Gradients, for the entire EPN. These individual contributions are combined in order to provide the official EPN reprocessed products. A preliminary tropospheric combined solution for the period 1996-2013 has been carried out. It is based on all the available homogenously reprocessed solutions and it offers the possibility to assess each of them prior to the ongoing final combination. We will present the results of the EPN Repro2 tropospheric combined products and how the climate community will benefit from them. Aknowledgment.The EPN Repro2 working group is acknowledged for providing the EPN solutions used in this work. E-GEOS activity is carried out in the framework of ASI contract 2015-050-R.0.

  4. Integrating diverse datasets improves developmental enhancer prediction.

    PubMed

    Erwin, Genevieve D; Oksenberg, Nir; Truty, Rebecca M; Kostka, Dennis; Murphy, Karl K; Ahituv, Nadav; Pollard, Katherine S; Capra, John A

    2014-06-01

    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable

  5. An Approach to Keeping Independent Colleges Independent.

    ERIC Educational Resources Information Center

    Northwest Area Foundation, St. Paul, Minn.

    As a result of the financial difficulties faced by independent colleges in the northwestern United States, the Northwest Area Foundation in 1972 surveyed the administrations of 80 private colleges to get a profile of the colleges, a list of their current problems, and some indication of how the problems might be approached. The three top problems…

  6. Framework for Interactive Parallel Dataset Analysis on the Grid

    SciTech Connect

    Alexander, David A.; Ananthan, Balamurali; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  7. Pgu-Face: A dataset of partially covered facial images.

    PubMed

    Salari, Seyed Reza; Rostami, Habib

    2016-12-01

    In this article we introduce a human face image dataset. Images were taken in close to real-world conditions using several cameras, often mobile phone׳s cameras. The dataset contains 224 subjects imaged under four different figures (a nearly clean-shaven countenance, a nearly clean-shaven countenance with sunglasses, an unshaven or stubble face countenance, an unshaven or stubble face countenance with sunglasses) in up to two recording sessions. Existence of partially covered face images in this dataset could reveal the robustness and efficiency of several facial image processing algorithms. In this work we present the dataset and explain the recording method.

  8. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    SciTech Connect

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul; Li, Yaquin; Garg, Seema; Tobin Jr, Kenneth William; Chaum, Edward

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

  9. Identifying reproducible cancer-associated highly expressed genes with important functional significances using multiple datasets

    PubMed Central

    Huang, Haiyan; Li, Xiangyu; Guo, You; Zhang, Yuncong; Deng, Xusheng; Chen, Lufei; Zhang, Jiahui; Guo, Zheng; Ao, Lu

    2016-01-01

    Identifying differentially expressed (DE) genes between cancer and normal tissues is of basic importance for studying cancer mechanisms. However, current methods, such as the commonly used Significance Analysis of Microarrays (SAM), are biased to genes with low expression levels. Recently, we proposed an algorithm, named the pairwise difference (PD) algorithm, to identify highly expressed DE genes based on reproducibility evaluation of top-ranked expression differences between paired technical replicates of cells under two experimental conditions. In this study, we extended the application of the algorithm to the identification of DE genes between two types of tissue samples (biological replicates) based on several independent datasets or sub-datasets of a dataset, by constructing multiple paired average gene expression profiles for the two types of samples. Using multiple datasets for lung and esophageal cancers, we demonstrated that PD could identify many DE genes highly expressed in both cancer and normal tissues that tended to be missed by the commonly used SAM. These highly expressed DE genes, including many housekeeping genes, were significantly enriched in many conservative pathways, such as ribosome, proteasome, phagosome and TNF signaling pathways with important functional significances in oncogenesis. PMID:27796338

  10. Reference datasets of tufA and UPA markers to identify algae in metabarcoding surveys.

    PubMed

    Rossetto Marcelino, Vanessa; Verbruggen, Heroen

    2017-04-01

    The data presented here are related to the research article "Multi-marker metabarcoding of coral skeletons reveals a rich microbiome and diverse evolutionary origins of endolithic algae" (Marcelino and Verbruggen, 2016) [1]. Here we provide reference datasets of the elongation factor Tu (tufA) and the Universal Plastid Amplicon (UPA) markers in a format that is ready-to-use in the QIIME pipeline (Caporaso et al., 2010) [2]. In addition to sequences previously available in GenBank, we included newly discovered endolithic algae lineages using both amplicon sequencing (Marcelino and Verbruggen, 2016) [1] and chloroplast genome data (Marcelino et al., 2016; Verbruggen et al., in press) [3], [4]. We also provide a script to convert GenBank flatfiles into reference datasets that can be used with other markers. The tufA and UPA reference datasets are made publicly available here to facilitate biodiversity assessments of microalgal communities.

  11. Sharing Clouds: Showing, Distributing, and Sharing Large Point Datasets

    NASA Astrophysics Data System (ADS)

    Grigsby, S.

    2012-12-01

    Sharing large data sets with colleagues and the general public presents a unique technological challenge for scientists. In addition to large data volumes, there are significant challenges in representing data that is often irregular, multidimensional and spatial in nature. For derived data products, additional challenges exist in displaying and providing provenance data. For this presentation, several open source technologies are demonstrated for the remote display and access of large irregular point data sets. These technologies and techniques include the remote viewing of point data using HTML5 and OpenGL, which provides a highly accessible preview of the data sets for a range of audiences. Intermediate levels of accessibility and high levels of interactivity are accomplished with technologies such as wevDAV, which allows collaborators to run analysis on local clients, using data stored and administered on remote servers. Remote processing and analysis, including provenance tracking, will be discussed at the workgroup level. The data sets used for this presentation include data acquired from the NSF funded National Center for Airborne Laser Mapping (NCALM), and data acquired for research and instructional use in NASA's Student Airborne Research Program (SARP). These datasets include Light Ranging And Detection (LiDAR) point clouds ranging in size from several hundred thousand to several hundred million data points; the techniques and technologies discussed are applicable to other forms of irregular point data.

  12. Quantifying the reliability of four global datasets for drought monitoring over a semiarid region

    NASA Astrophysics Data System (ADS)

    Katiraie-Boroujerdy, Pari-Sima; Nasrollahi, Nasrin; Hsu, Kuo-lin; Sorooshian, Soroosh

    2016-01-01

    Drought is one of the most relevant natural disasters, especially in arid regions such as Iran. One of the requirements to access reliable drought monitoring is long-term and continuous high-resolution precipitation data. Different climatic and global databases are being developed and made available in real time or near real time by different agencies and centers; however, for this purpose, these databases must be evaluated regionally and in different local climates. In this paper, a near real-time global climate model, a data assimilation system, and two gridded gauge-based datasets over Iran are evaluated. The ground truth data include 50 gauges from the period of 1980 to 2010. Drought analysis was carried out by means of the Standard Precipitation Index (SPI) at 2-, 3-, 6-, and 12-month timescales. Although the results show spatial variations, overall the two gauge-based datasets perform better than the models. In addition, the results are more reliable for the western portion of the Zagros Range and the eastern region of the country. The analysis of the onsets of the 6-month moderate drought with at least 3 months' persistence indicates that all datasets have a better performance over the western portion of the Zagros Range, but display poor performance over the coast of the Caspian Sea. Base on the results of this study, the Modern-Era Retrospective Analysis for Research and Applications (MERRA) dataset is a preferred alternative for drought analysis in the region when gauge-based datasets are not available.

  13. Accuracy assessment of gridded precipitation datasets in the Himalayas

    NASA Astrophysics Data System (ADS)

    Khan, A.

    2015-12-01

    Accurate precipitation data are vital for hydro-climatic modelling and water resources assessments. Based on mass balance calculations and Turc-Budyko analysis, this study investigates the accuracy of twelve widely used precipitation gridded datasets for sub-basins in the Upper Indus Basin (UIB) in the Himalayas-Karakoram-Hindukush (HKH) region. These datasets are: 1) Global Precipitation Climatology Project (GPCP), 2) Climate Prediction Centre (CPC) Merged Analysis of Precipitation (CMAP), 3) NCEP / NCAR, 4) Global Precipitation Climatology Centre (GPCC), 5) Climatic Research Unit (CRU), 6) Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), 7) Tropical Rainfall Measuring Mission (TRMM), 8) European Reanalysis (ERA) interim data, 9) PRINCETON, 10) European Reanalysis-40 (ERA-40), 11) Willmott and Matsuura, and 12) WATCH Forcing Data based on ERA interim (WFDEI). Precipitation accuracy and consistency was assessed by physical mass balance involving sum of annual measured flow, estimated actual evapotranspiration (average of 4 datasets), estimated glacier mass balance melt contribution (average of 4 datasets), and ground water recharge (average of 3 datasets), during 1999-2010. Mass balance assessment was complemented by Turc-Budyko non-dimensional analysis, where annual precipitation, measured flow and potential evapotranspiration (average of 5 datasets) data were used for the same period. Both analyses suggest that all tested precipitation datasets significantly underestimate precipitation in the Karakoram sub-basins. For the Hindukush and Himalayan sub-basins most datasets underestimate precipitation, except ERA-interim and ERA-40. The analysis indicates that for this large region with complicated terrain features and stark spatial precipitation gradients the reanalysis datasets have better consistency with flow measurements than datasets derived from records of only sparsely distributed climatic

  14. AMADA: Analysis of Multidimensional Astronomical DAtasets

    NASA Astrophysics Data System (ADS)

    de Souza, Rafael S.; Ciardi, Benedetta

    2015-03-01

    AMADA allows an iterative exploration and information retrieval of high-dimensional data sets. This is done by performing a hierarchical clustering analysis for different choices of correlation matrices and by doing a principal components analysis in the original data. Additionally, AMADA provides a set of modern visualization data-mining diagnostics. The user can switch between them using the different tabs.

  15. Prospects for an Independent Kurdistan

    DTIC Science & Technology

    2008-03-01

    are mostly Sunni.25 In addition, about 5 percent of the Kurds are Yazidis, which is a “mixture of Islamic, Christian , Jewish, and pagan beliefs...significant political progress has been made by the Kurds of Iraq. The probability of ongoing ethnic and religious strife within Iraq creates an...seek independence. Figure 2 shows the ethno- religious 38 “Guide to Iraqi political parties,” BBC News, http://news.bbc.co.uk

  16. Cloud and Precipitation Properties Merged Dataset from Vertically Pointing ARM Radars During GOAmazon

    NASA Astrophysics Data System (ADS)

    Toto, T.; Giangrande, S. E.; Troyan, D.; Jensen, M. P.; Bartholomew, M. J.; Johnson, K. L.

    2014-12-01

    The Green Ocean Amazon (GOAmazon) field campaign is in its first year of a two-year deployment in the Amazon Basin to study aerosol and cloud lifecycles as they relate to cloud-aerosol-precipitation interactions. Insights from GOAmazon datasets will fill gaps in our understanding, ultimately improving constraints in tropical rain forest climate model parameterizations. As part of GOAmazon, the Atmospheric Radiation Measurement (ARM) Mobile Facility (AMF) has been collecting a unique set of observations near Manacapuru, Brazil, a site known to experience both the pristine condition of its locale as well as, at times, the effects of the Manaus, Brazil, mega city pollution plume. In order to understand the effects of anthropogenic aerosol on clouds, radiative balance and climate, documentation of cloud and precipitation properties in the absence and presence of the Manaus plume is a necessary complement to the aerosol measurements collected during the campaign. The AMF is uniquely equipped to capture the most complete and continuous record of cloud and precipitation column properties using the UHF (915 MHz) ARM zenith radar (UAZR) and vertically pointing W-Band (95 GHz) ARM Cloud Radar (WACR). Together, these radars provide multiple methods (e.g., moment-based, dual-frequency, and Doppler spectral techniques) to retrieve properties of the cloud field that may be influenced by aerosols. This includes drop size distribution, dynamical and microphysical properties (e.g., vertical air motion, latent heat retrievals), and associated uncertainties. Additional quality assurance is available from independent rain gauge and column platforms. Here, we merge data from the UAZR and WACR (WACR-ARSCL VAP) radars, along with ARM Sounding observations and optical parsivel measurement constraints, to present a first look at select convective and stratiform events, their precipitation properties and statistical profile characterization.

  17. Posterior probability of linkage analysis of autism dataset identifies linkage to chromosome 16

    PubMed Central

    Wassink, Thomas H.; Vieland, Veronica J.; Sheffield, Val C.; Bartlett, Christopher W.; Goedken, Rhinda; Childress, Deborah; Piven, Joseph

    2015-01-01

    Objective To apply phenotypic and statistical methods designed to account for heterogeneity to linkage analyses of the autism Collaborative Linkage Study of Autism (CLSA) affected sibling pair families. Method The CLSA contains two sets of 57 families each; Set 1 has been analyzed previously, whereas this study presents the first analyses of Set 2. The two sets were analyzed independently, and were further split based on the degree of phrase speech delay in the siblings. Linkage analysis was carried out using the posterior probability of linkage (PPL), a Bayesian statistic that provides a mathematically rigorous mechanism for combining linkage evidence across multiple samples. Results Two-point PPLs from Set 1 led to the follow-up genotyping of 18 markers around linkage peaks on 1q, 13p, 13q, 16q, and 17q in both sets of families. Multipoint PPLs were then calculated for the entire CLSA sample. These analyses identified four regions with at least modest evidence in support of linkage: 1q at 173 cM, PPL = 0.12; 13p at 21 cM, PPL = 0.16; 16q at 63 cM, PPL= 0.36; Xq at 40 cM, PPL = 0.11. Conclusion We find strengthened evidence for linkage of autism to chromosomes 1q, 13p, 16q, and Xq, and diminished evidence for linkage to 7q and 13q. The verity of these findings will be tested by continuing to update our PPL analyses with data from additional autism datasets. PMID:18349700

  18. Vikodak - A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets

    PubMed Central

    Nagpal, Sunil; Haque, Mohammed Monzoorul; Mande, Sharmila S.

    2016-01-01

    Background The overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Consequently, prior (collated) information pertaining to genes, enzymes encoded by the resident microbes can aid in indirectly (re)constructing/ inferring the metabolic/ functional potential of a given microbial community (given its taxonomic abundance profile). In this study, we present Vikodak—a multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. With the underlying assumptions of co-metabolism and independent contributions of different microbes in a community, a concerted effort has been made to accommodate microbial co-existence patterns in various modules incorporated in Vikodak. Results Validation experiments on over 1400 metagenomic samples have confirmed the utility of Vikodak in (a) deciphering enzyme abundance profiles of any KEGG metabolic pathway, (b) functional resolution of distinct metagenomic environments, (c) inferring patterns of functional interaction between resident microbes, and (d) automating statistical comparison of functional features of studied microbiomes. Novel features incorporated in Vikodak also facilitate automatic removal of false positives and spurious functional predictions. Conclusions With novel provisions for comprehensive functional analysis, inclusion of microbial co-existence pattern based algorithms, automated inter-environment comparisons; in-depth analysis of individual metabolic pathways and greater flexibilities at the user end, Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction. Availability and Implementation A web implementation of Vikodak

  19. Genetic Architecture of Vitamin B12 and Folate Levels Uncovered Applying Deeply Sequenced Large Datasets

    PubMed Central

    Thorleifsson, Gudmar; Ahluwalia, Tarunveer S.; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F.; Magnusson, Olafur T.; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I.; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

    2013-01-01

    Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations. PMID:23754956

  20. Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets.

    PubMed

    Grarup, Niels; Sulem, Patrick; Sandholt, Camilla H; Thorleifsson, Gudmar; Ahluwalia, Tarunveer S; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F; Magnusson, Olafur T; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

    2013-06-01

    Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B(12) (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B(12) and folate measurements, respectively. We found six novel loci associating with serum B(12) (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B(12) and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B(12) or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations.

  1. American Independence. Fifth Grade.

    ERIC Educational Resources Information Center

    Crosby, Annette

    This fifth grade teaching unit covers early conflicts between the American colonies and Britain, battles of the American Revolutionary War, and the Declaration of Independence. Knowledge goals address the pre-revolutionary acts enforced by the British, the concepts of conflict and independence, and the major events and significant people from the…

  2. Independence of Internal Auditors.

    ERIC Educational Resources Information Center

    Montondon, Lucille; Meixner, Wilda F.

    1993-01-01

    A survey of 288 college and university auditors investigated patterns in their appointment, reporting, and supervisory practices as indicators of independence and objectivity. Results indicate a weakness in the positioning of internal auditing within institutions, possibly compromising auditor independence. Because the auditing function is…

  3. Interface between astrophysical datasets and distributed database management systems (DAVID)

    NASA Technical Reports Server (NTRS)

    Iyengar, S. S.

    1988-01-01

    This is a status report on the progress of the DAVID (Distributed Access View Integrated Database Management System) project being carried out at Louisiana State University, Baton Rouge, Louisiana. The objective is to implement an interface between Astrophysical datasets and DAVID. Discussed are design details and implementation specifics between DAVID and astrophysical datasets.

  4. Querying Patterns in High-Dimensional Heterogenous Datasets

    ERIC Educational Resources Information Center

    Singh, Vishwakarma

    2012-01-01

    The recent technological advancements have led to the availability of a plethora of heterogenous datasets, e.g., images tagged with geo-location and descriptive keywords. An object in these datasets is described by a set of high-dimensional feature vectors. For example, a keyword-tagged image is represented by a color-histogram and a…

  5. Really big data: Processing and analysis of large datasets

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  6. Finding Spatio-Temporal Patterns in Large Sensor Datasets

    ERIC Educational Resources Information Center

    McGuire, Michael Patrick

    2010-01-01

    Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…

  7. Primary Datasets for Case Studies of River-Water Quality

    ERIC Educational Resources Information Center

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  8. A powerful truncated tail strength method for testing multiple null hypotheses in one dataset.

    PubMed

    Jiang, Bo; Zhang, Xiao; Zuo, Yijun; Kang, Guolian

    2011-05-21

    In microarray analysis, medical imaging analysis and functional magnetic resonance imaging, we often need to test an overall null hypothesis involving a large number of single hypotheses (usually larger than 1000) in one dataset. A tail strength statistic (Taylor and Tibshirani, 2006) and Fisher's probability method are useful and can be applied to measure an overall significance for a large set of independent single hypothesis tests with the overall null hypothesis assuming that all single hypotheses are true. In this paper we propose a new method that improves the tail strength statistic by considering only the values whose corresponding p-values are less than some pre-specified cutoff. We call it truncated tail strength statistic. We illustrate our method using a simulation study and two genome-wide datasets by chromosome. Our method not only controls type one error rate quite well, but also has significantly higher power than the tail strength method and Fisher's method in most cases.

  9. Squish: Near-Optimal Compression for Archival of Relational Datasets

    PubMed Central

    Gao, Yihan; Parameswaran, Aditya

    2017-01-01

    Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal for compressing relational datasets since they ignore the table structure and relationships between attributes. We study compression algorithms that leverage the relational structure to compress datasets to a much greater extent. We develop Squish, a system that uses a combination of Bayesian Networks and Arithmetic Coding to capture multiple kinds of dependencies among attributes and achieve near-entropy compression rate. Squish also supports user-defined attributes: users can instantiate new data types by simply implementing five functions for a new class interface. We prove the asymptotic optimality of our compression algorithm and conduct experiments to show the effectiveness of our system: Squish achieves a reduction of over 50% in storage size relative to systems developed in prior work on a variety of real datasets. PMID:28180028

  10. New model for datasets citation and extraction reproducibility in VAMDC

    NASA Astrophysics Data System (ADS)

    Zwölf, Carlo Maria; Moreau, Nicolas; Dubernet, Marie-Lise

    2016-09-01

    In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favor reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.

  11. Usefulness of DARPA dataset for intrusion detection system evaluation

    NASA Astrophysics Data System (ADS)

    Thomas, Ciza; Sharma, Vishwas; Balakrishnan, N.

    2008-03-01

    The MIT Lincoln Laboratory IDS evaluation methodology is a practical solution in terms of evaluating the performance of Intrusion Detection Systems, which has contributed tremendously to the research progress in that field. The DARPA IDS evaluation dataset has been criticized and considered by many as a very outdated dataset, unable to accommodate the latest trend in attacks. Then naturally the question arises as to whether the detection systems have improved beyond detecting these old level of attacks. If not, is it worth thinking of this dataset as obsolete? The paper presented here tries to provide supporting facts for the use of the DARPA IDS evaluation dataset. The two commonly used signature-based IDSs, Snort and Cisco IDS, and two anomaly detectors, the PHAD and the ALAD, are made use of for this evaluation purpose and the results support the usefulness of DARPA dataset for IDS evaluation.

  12. Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System

    NASA Astrophysics Data System (ADS)

    Ji, Z.; Worley, S. J.; Schuster, D. C.

    2011-12-01

    at hourly, daily, monthly, and yearly intervals. DSUPDT is also fully scalable and continues to support addition of new data streams. This paper will introduce the powerful functionality of the RDAMS for operational dataset updates, and provide examples of its use

  13. NCAR's Research Data Archive: OPeNDAP Access for Complex Datasets

    NASA Astrophysics Data System (ADS)

    Dattore, R.; Worley, S. J.

    2014-12-01

    Many datasets have complex structures including hundreds of parameters and numerous vertical levels, grid resolutions, and temporal products. Making these data accessible is a challenge for a data provider. OPeNDAP is powerful protocol for delivering in real-time multi-file datasets that can be ingested by many analysis and visualization tools, but for these datasets there are too many choices about how to aggregate. Simple aggregation schemes can fail to support, or at least make it very challenging, for many potential studies based on complex datasets. We address this issue by using a rich file content metadata collection to create a real-time customized OPeNDAP service to match the full suite of access possibilities for complex datasets. The Climate Forecast System Reanalysis (CFSR) and it's extension, the Climate Forecast System Version 2 (CFSv2) datasets produced by the National Centers for Environmental Prediction (NCEP) and hosted by the Research Data Archive (RDA) at the Computational and Information Systems Laboratory (CISL) at NCAR are examples of complex datasets that are difficult to aggregate with existing data server software. CFSR and CFSv2 contain 141 distinct parameters on 152 vertical levels, six grid resolutions and 36 products (analyses, n-hour forecasts, multi-hour averages, etc.) where not all parameter/level combinations are available at all grid resolution/product combinations. These data are archived in the RDA with the data structure provided by the producer; no additional re-organization or aggregation have been applied. Since 2011, users have been able to request customized subsets (e.g. - temporal, parameter, spatial) from the CFSR/CFSv2, which are processed in delayed-mode and then downloaded to a user's system. Until now, the complexity has made it difficult to provide real-time OPeNDAP access to the data. We have developed a service that leverages the already-existing subsetting interface and allows users to create a virtual dataset

  14. Increasing spatial resolution of CHIRPS rainfall datasets for Cyprus with artificial neural networks

    NASA Astrophysics Data System (ADS)

    Tymvios, Filippos; Michaelides, Silas; Retalis, Adrianos; Katsanos, Dimitrios; Lelieveld, Jos

    2016-08-01

    The use of high resolution rainfall datasets is an alternative way of studying climatological regions where conventional rain measurements are sparse or not available. Starting in 1981 to near-present, the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) dataset incorporates a 5km×5km resolution satellite imagery with in-situ station data to create gridded rainfall time series for trend analysis, severe events and seasonal drought monitoring. The aim of this work is to further increase the resolution of the rainfall dataset for Cyprus to 1km×1km, by correlating the CHIRPS dataset with elevation information, the NDVI index (Normalized Difference Vegetation Index) from satellite images at 1km×1km and precipitation measurements from the official raingauge network of the Cyprus' Department of Meteorology, utilizing Artificial Neural Networks. The Artificial Neural Networks' architecture that was implemented is the Multi-Layer Perceptron (MLP) trained with the back propagation method, which is widely used in environmental studies. Seven different network architectures were tested, all with two hidden layers. The number of neurons ranged from 3 to10 in the first hidden layer and from 5 to 25 in the second hidden layer. The dataset was separated into a randomly selected training set, a validation set and a testing set; the latter is independently used for the final assessment of the models' performance. Using the Artificial Neural Network approach, a new map of the spatial analysis of rainfall is constructed which exhibits a considerable increase in its spatial resolution. A statistical assessment of the new spatial analysis was made using the rainfall ground measurements from the raingauge network. The assessment indicates that the methodology is promising for several applications.

  15. Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis

    PubMed Central

    Hanauer, David A; Saeed, Mohammed; Zheng, Kai; Mei, Qiaozhu; Shedden, Kerby; Aronson, Alan R; Ramakrishnan, Naren

    2014-01-01

    Objective We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. Methods Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. Results The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. Discussion Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. Conclusions In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility. PMID:24928177

  16. The Centennial Trends Greater Horn of Africa precipitation dataset.

    PubMed

    Funk, Chris; Nicholson, Sharon E; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded 'Centennial Trends' precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data.

  17. Genomics dataset on unclassified published organism (patent US 7547531).

    PubMed

    Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier

    2016-12-01

    Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

  18. Mathematical modelling of the MAP kinase pathway using proteomic datasets.

    PubMed

    Tian, Tianhai; Song, Jiangning

    2012-01-01

    The advances in proteomics technologies offer an unprecedented opportunity and valuable resources to understand how living organisms execute necessary functions at systems levels. However, little work has been done up to date to utilize the highly accurate spatio-temporal dynamic proteome data generated by phosphoprotemics for mathematical modeling of complex cell signaling pathways. This work proposed a novel computational framework to develop mathematical models based on proteomic datasets. Using the MAP kinase pathway as the test system, we developed a mathematical model including the cytosolic and nuclear subsystems; and applied the genetic algorithm to infer unknown model parameters. Robustness property of the mathematical model was used as a criterion to select the appropriate rate constants from the estimated candidates. Quantitative information regarding the absolute protein concentrations was used to refine the mathematical model. We have demonstrated that the incorporation of more experimental data could significantly enhance both the simulation accuracy and robustness property of the proposed model. In addition, we used the MAP kinase pathway inhibited by phosphatases with different concentrations to predict the signal output influenced by different cellular conditions. Our predictions are in good agreement with the experimental observations when the MAP kinase pathway was inhibited by phosphatase PP2A and MKP3. The successful application of the proposed modeling framework to the MAP kinase pathway suggests that our method is very promising for developing accurate mathematical models and yielding insights into the regulatory mechanisms of complex cell signaling pathways.

  19. Lacunarity analysis of raster datasets and 1D, 2D, and 3D point patterns

    NASA Astrophysics Data System (ADS)

    Dong, Pinliang

    2009-10-01

    Spatial scale plays an important role in many fields. As a scale-dependent measure for spatial heterogeneity, lacunarity describes the distribution of gaps within a set at multiple scales. In Earth science, environmental science, and ecology, lacunarity has been increasingly used for multiscale modeling of spatial patterns. This paper presents the development and implementation of a geographic information system (GIS) software extension for lacunarity analysis of raster datasets and 1D, 2D, and 3D point patterns. Depending on the application requirement, lacunarity analysis can be performed in two modes: global mode or local mode. The extension works for: (1) binary (1-bit) and grey-scale datasets in any raster format supported by ArcGIS and (2) 1D, 2D, and 3D point datasets as shapefiles or geodatabase feature classes. For more effective measurement of lacunarity for different patterns or processes in raster datasets, the extension allows users to define an area of interest (AOI) in four different ways, including using a polygon in an existing feature layer. Additionally, directionality can be taken into account when grey-scale datasets are used for local lacunarity analysis. The methodology and graphical user interface (GUI) are described. The application of the extension is demonstrated using both simulated and real datasets, including Brodatz texture images, a Spaceborne Imaging Radar (SIR-C) image, simulated 1D points on a drainage network, and 3D random and clustered point patterns. The options of lacunarity analysis and the effects of polyline arrangement on lacunarity of 1D points are also discussed. Results from sample data suggest that the lacunarity analysis extension can be used for efficient modeling of spatial patterns at multiple scales.

  20. Evaluation of reanalysis datasets against observational soil temperature data over China

    NASA Astrophysics Data System (ADS)

    Yang, Kai; Zhang, Jingyong

    2017-03-01

    Soil temperature is a key land surface variable, and is a potential predictor for seasonal climate anomalies and extremes. Using observational soil temperature data in China for 1981-2005, we evaluate four reanalysis datasets, the land surface reanalysis of the European Centre for Medium-Range Weather Forecasts (ERA-Interim/Land), the second modern-era retrospective analysis for research and applications (MERRA-2), the National Center for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR), and version 2 of the Global Land Data Assimilation System (GLDAS-2.0), with a focus on 40 cm soil layer. The results show that reanalysis data can mainly reproduce the spatial distributions of soil temperature in summer and winter, especially over the east of China, but generally underestimate their magnitudes. Owing to the influence of precipitation on soil temperature, the four datasets perform better in winter than in summer. The ERA-Interim/Land and GLDAS-2.0 produce spatial characteristics of the climatological mean that are similar to observations. The interannual variability of soil temperature is well reproduced by the ERA-Interim/Land dataset in summer and by the CFSR dataset in winter. The linear trend of soil temperature in summer is well rebuilt by reanalysis datasets. We demonstrate that soil heat fluxes in April-June and in winter are highly correlated with the soil temperature in summer and winter, respectively. Different estimations of surface energy balance components can contribute to different behaviors in reanalysis products in terms of estimating soil temperature. In addition, reanalysis datasets can mainly rebuild the northwest-southeast gradient of soil temperature memory over China.

  1. Efficient segmentation of 3D fluoroscopic datasets from mobile C-arm

    NASA Astrophysics Data System (ADS)

    Styner, Martin A.; Talib, Haydar; Singh, Digvijay; Nolte, Lutz-Peter

    2004-05-01

    The emerging mobile fluoroscopic 3D technology linked with a navigation system combines the advantages of CT-based and C-arm-based navigation. The intra-operative, automatic segmentation of 3D fluoroscopy datasets enables the combined visualization of surgical instruments and anatomical structures for enhanced planning, surgical eye-navigation and landmark digitization. We performed a thorough evaluation of several segmentation algorithms using a large set of data from different anatomical regions and man-made phantom objects. The analyzed segmentation methods include automatic thresholding, morphological operations, an adapted region growing method and an implicit 3D geodesic snake method. In regard to computational efficiency, all methods performed within acceptable limits on a standard Desktop PC (30sec-5min). In general, the best results were obtained with datasets from long bones, followed by extremities. The segmentations of spine, pelvis and shoulder datasets were generally of poorer quality. As expected, the threshold-based methods produced the worst results. The combined thresholding and morphological operations methods were considered appropriate for a smaller set of clean images. The region growing method performed generally much better in regard to computational efficiency and segmentation correctness, especially for datasets of joints, and lumbar and cervical spine regions. The less efficient implicit snake method was able to additionally remove wrongly segmented skin tissue regions. This study presents a step towards efficient intra-operative segmentation of 3D fluoroscopy datasets, but there is room for improvement. Next, we plan to study model-based approaches for datasets from the knee and hip joint region, which would be thenceforth applied to all anatomical regions in our continuing development of an ideal segmentation procedure for 3D fluoroscopic images.

  2. Identification of rogue datasets in serial crystallography1

    PubMed Central

    Assmann, Greta; Brehm, Wolfgang; Diederichs, Kay

    2016-01-01

    Advances in beamline optics, detectors and X-ray sources allow new techniques of crystallographic data collection. In serial crystallography, a large number of partial datasets from crystals of small volume are measured. Merging of datasets from different crystals in order to enhance data completeness and accuracy is only valid if the crystals are isomorphous, i.e. sufficiently similar in cell parameters, unit-cell contents and molecular structure. Identification and exclusion of non-isomorphous datasets is therefore indispensable and must be done by means of suitable indicators. To identify rogue datasets, the influence of each dataset on CC1/2 [Karplus & Diederichs (2012 ▸). Science, 336, 1030–1033], the correlation coefficient between pairs of intensities averaged in two randomly assigned subsets of observations, is evaluated. The presented method employs a precise calculation of CC1/2 that avoids the random assignment, and instead of using an overall CC1/2, an average over resolution shells is employed to obtain sensible results. The selection procedure was verified by measuring the correlation of observed (merged) intensities and intensities calculated from a model. It is found that inclusion and merging of non-isomorphous datasets may bias the refined model towards those datasets, and measures to reduce this effect are suggested. PMID:27275144

  3. Media independent interface

    NASA Technical Reports Server (NTRS)

    1987-01-01

    The work done on the Media Independent Interface (MII) Interface Control Document (ICD) program is described and recommendations based on it were made. Explanations and rationale for the content of the ICD itself are presented.

  4. New addition curing polyimides

    NASA Technical Reports Server (NTRS)

    Frimer, Aryeh A.; Cavano, Paul

    1991-01-01

    In an attempt to improve the thermal-oxidative stability (TOS) of PMR-type polymers, the use of 1,4-phenylenebis (phenylmaleic anhydride) PPMA, was evaluated. Two series of nadic end-capped addition curing polyimides were prepared by imidizing PPMA with either 4,4'-methylene dianiline or p-phenylenediamine. The first resulted in improved solubility and increased resin flow while the latter yielded a compression molded neat resin sample with a T(sub g) of 408 C, close to 70 C higher than PME-15. The performance of these materials in long term weight loss studies was below that of PMR-15, independent of post-cure conditions. These results can be rationalized in terms of the thermal lability of the pendant phenyl groups and the incomplete imidization of the sterically congested PPMA. The preparation of model compounds as well as future research directions are discussed.

  5. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  6. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    PubMed

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  7. The Transition of NASA EOS Datasets to WFO Operations: A Model for Future Technology Transfer

    NASA Technical Reports Server (NTRS)

    Darden, C.; Burks, J.; Jedlovec, G.; Haines, S.

    2007-01-01

    The collocation of a National Weather Service (NWS) Forecast Office with atmospheric scientists from NASA/Marshall Space Flight Center (MSFC) in Huntsville, Alabama has afforded a unique opportunity for science sharing and technology transfer. Specifically, the NWS office in Huntsville has interacted closely with research scientists within the SPORT (Short-term Prediction and Research and Transition) Center at MSFC. One significant technology transfer that has reaped dividends is the transition of unique NASA EOS polar orbiting datasets into NWS field operations. NWS forecasters primarily rely on the AWIPS (Advanced Weather Information and Processing System) decision support system for their day to day forecast and warning decision making. Unfortunately, the transition of data from operational polar orbiters or low inclination orbiting satellites into AWIPS has been relatively slow due to a variety of reasons. The ability to integrate these high resolution NASA datasets into operations has yielded several benefits. The MODIS (MODerate-resolution Imaging Spectrometer ) instrument flying on the Aqua and Terra satellites provides a broad spectrum of multispectral observations at resolutions as fine as 250m. Forecasters routinely utilize these datasets to locate fine lines, boundaries, smoke plumes, locations of fog or haze fields, and other mesoscale features. In addition, these important datasets have been transitioned to other WFOs for a variety of local uses. For instance, WFO Great Falls Montana utilizes the MODIS snow cover product for hydrologic planning purposes while several coastal offices utilize the output from the MODIS and AMSR-E instruments to supplement observations in the data sparse regions of the Gulf of Mexico and western Atlantic. In the short term, these datasets have benefited local WFOs in a variety of ways. In the longer term, the process by which these unique datasets were successfully transitioned to operations will benefit the planning and

  8. Hadley cell dynamics in Japanese Reanalysis-55 dataset: evaluation using other reanalysis datasets and global radiosonde network observations

    NASA Astrophysics Data System (ADS)

    Mathew, Sneha Susan; Kumar, Karanam Kishore; Subrahmanyam, Kandula Venkata

    2016-12-01

    Hadley circulation (HC) is a planetary scale circulation spanning one-third of the globe from tropics to the sub-tropics. Recent changes in HC width and its temporal variability is a topic of paramount interest because of the climate implications it carry alongside. The present study attempts to bring out the subtropical climate change indications in the comparatively new Japanese Re-analysis (JRA55) dataset by means of the mean meridional stream function (MSF). The observed features of HC in JRA55 are found to be reproduced in NCEP, MERRA and ECMWF datasets, with notable differences in the magnitudes of MSF. The calculated annual cycle of HC edges, center and total width from this dataset closely resembles the annual cycle of the respective parameters derived from the rest of the datasets, with very less inter-annual variability. For the first time, MSF estimated using four reanalysis datasets (JRA55, NCEP, MERRA and ECMWF datasets) are verified with observations from integrated global radiosonde archive datasets, using the process of subsampling. The features so estimated show a high degree of similarity amongst each other as well as with observations. The monthly trend in the total width of the HC is quantified to show a maximum of expansion during the month of July, which is significant at the 95 % confidence interval for all datasets. The present paper also discusses the presence of a `minor circulation' feature in the northern hemisphere which is centered on 34°N during the June and July months, but not in all years. The significance of the present study lies in evaluating the relatively new JRA55 datasets with widely used reanalysis data sets and radiosonde observations and revelation of a minor circulation not discussed hitherto in the context of HC dynamics.

  9. Bayes classifiers for imbalanced traffic accidents datasets.

    PubMed

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents.

  10. BMDExpress Data Viewer: A Visualization Tool to Analyze BMDExpress Datasets

    EPA Science Inventory

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure in human risk assessments. BMDExpress applies BMD modeling to transcriptomics datasets and groups genes to biological processes and pathways for rapid assessment of doses at whic...

  11. Constructing Phylogenetic Networks Based on the Isomorphism of Datasets

    PubMed Central

    Zhang, Zhibin; Li, Yanjuan

    2016-01-01

    Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK, and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topology G and construct a network for every dataset. For any one dataset 𝒞, we can compute a network from the network representing the simplest dataset which is isomorphic to 𝒞. This process will save more time for the algorithms when constructing networks. PMID:27547759

  12. Food recognition: a new dataset, experiments and results.

    PubMed

    Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo

    2016-12-07

    We propose a new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications. Each image depicts a real canteen tray with dishes and foods arranged in different ways. Each tray contains multiple instances of food classes. The dataset contains 1,027 canteen trays for a total of 3,616 food instances belonging to 73 food classes. The food on the tray images have been manually segmented using carefully drawn polygonal boundaries. We have benchmarked the dataset by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class. We have experimented three different classification strategies using also several visual descriptors. We achieve about 79% of food and tray recognition accuracy using Convolutional-Neural-Networksbased features. The dataset, as well as the benchmark framework, are available to the research community.

  13. A daily global mesoscale ocean eddy dataset from satellite altimetry.

    PubMed

    Faghmous, James H; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993-2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System.

  14. Human-Robot Emergency Response - Experimental Platform and Preliminary Dataset

    DTIC Science & Technology

    2014-07-28

    Human -Robot Emergency Response - Experimental Platform and Preliminary Dataset Technical Report #UM-CS-2014-006 Hee-Tae Jung, Takeshi Takahashi,and...2014 Abstract This paper presents progress towards a research infrastructure for studying human -robot performance in laboratory emergency response...scenarios and a preliminary dataset. It incorporates an emergency response team that is composed of a human participant, n ≤ 4 vision sensors in a

  15. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    PubMed

    Altman, R B

    2017-02-09

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability.

  16. Sampling Within k-Means Algorithm to Cluster Large Datasets

    SciTech Connect

    Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George

    2011-08-01

    Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

  17. Lunar Meteorites: A Global Geochemical Dataset

    NASA Technical Reports Server (NTRS)

    Zeigler, R. A.; Joy, K. H.; Arai, T.; Gross, J.; Korotev, R. L.; McCubbin, F. M.

    2017-01-01

    To date, the world's meteorite collections contain over 260 lunar meteorite stones representing at least 120 different lunar meteorites. Additionally, there are 20-30 as yet unnamed stones currently in the process of being classified. Collectively these lunar meteorites likely represent 40-50 distinct sampling locations from random locations on the Moon. Although the exact provenance of each individual lunar meteorite is unknown, collectively the lunar meteorites represent the best global average of the lunar crust. The Apollo sites are all within or near the Procellarum KREEP Terrane (PKT), thus lithologies from the PKT are overrepresented in the Apollo sample suite. Nearly all of the lithologies present in the Apollo sample suite are found within the lunar meteorites (high-Ti basalts are a notable exception), and the lunar meteorites contain several lithologies not present in the Apollo sample suite (e.g., magnesian anorthosite). This chapter will not be a sample-by-sample summary of each individual lunar meteorite. Rather, the chapter will summarize the different types of lunar meteorites and their relative abundances, comparing and contrasting the lunar meteorite sample suite with the Apollo sample suite. This chapter will act as one of the introductory chapters to the volume, introducing lunar samples in general and setting the stage for more detailed discussions in later more specialized chapters. The chapter will begin with a description of how lunar meteorites are ejected from the Moon, how deep samples are being excavated from, what the likely pairing relationships are among the lunar meteorite samples, and how the lunar meteorites can help to constrain the impactor flux in the inner solar system. There will be a discussion of the biases inherent to the lunar meteorite sample suite in terms of underrepresented lithologies or regions of the Moon, and an examination of the contamination and limitations of lunar meteorites due to terrestrial weathering. The

  18. Evaluation of Global Observations-Based Evapotranspiration Datasets and IPCC AR4 Simulations

    NASA Technical Reports Server (NTRS)

    Mueller, B.; Seneviratne, S. I.; Jimenez, C.; Corti, T.; Hirschi, M.; Balsamo, G.; Ciais, P.; Dirmeyer, P.; Fisher, J. B.; Guo, Z.; Jung, M.; Maignan, F.; McCabe, M. F.; Reichle, R.; Reichstein, M.; Rodell, M.; Sheffield, J.; Teuling, A. J.; Wang, K.; Wood, E. F.; Zhang, Y.

    2011-01-01

    Quantification of global land evapotranspiration (ET) has long been associated with large uncertainties due to the lack of reference observations. Several recently developed products now provide the capacity to estimate ET at global scales. These products, partly based on observational data, include satellite ]based products, land surface model (LSM) simulations, atmospheric reanalysis output, estimates based on empirical upscaling of eddycovariance flux measurements, and atmospheric water balance datasets. The LandFlux-EVAL project aims to evaluate and compare these newly developed datasets. Additionally, an evaluation of IPCC AR4 global climate model (GCM) simulations is presented, providing an assessment of their capacity to reproduce flux behavior relative to the observations ]based products. Though differently constrained with observations, the analyzed reference datasets display similar large-scale ET patterns. ET from the IPCC AR4 simulations was significantly smaller than that from the other products for India (up to 1 mm/d) and parts of eastern South America, and larger in the western USA, Australia and China. The inter-product variance is lower across the IPCC AR4 simulations than across the reference datasets in several regions, which indicates that uncertainties may be underestimated in the IPCC AR4 models due to shared biases of these simulations.

  19. Northern Hemisphere winter storm track trends since 1959 derived from multiple reanalysis datasets

    NASA Astrophysics Data System (ADS)

    Chang, Edmund K. M.; Yau, Albert M. W.

    2016-09-01

    In this study, a comprehensive comparison of Northern Hemisphere winter storm track trend since 1959 derived from multiple reanalysis datasets and rawinsonde observations has been conducted. In addition, trends in terms of variance and cyclone track statistics have been compared. Previous studies, based largely on the National Center for Environmental Prediction-National Center for Atmospheric Research Reanalysis (NNR), have suggested that both the Pacific and Atlantic storm tracks have significantly intensified between the 1950s and 1990s. Comparison with trends derived from rawinsonde observations suggest that the trends derived from NNR are significantly biased high, while those from the European Center for Medium Range Weather Forecasts 40-year Reanalysis and the Japanese 55-year Reanalysis are much less biased but still too high. Those from the two twentieth century reanalysis datasets are most consistent with observations but may exhibit slight biases of opposite signs. Between 1959 and 2010, Pacific storm track activity has likely increased by 10 % or more, while Atlantic storm track activity has likely increased by <10 %. Our analysis suggests that trends in Pacific and Atlantic basin wide storm track activity prior to the 1950s derived from the two twentieth century reanalysis datasets are unlikely to be reliable due to changes in density of surface observations. Nevertheless, these datasets may provide useful information on interannual variability, especially over the Atlantic.

  20. Development of a 10-year-old full body geometric dataset for computational modeling.

    PubMed

    Mao, Haojie; Holcombe, Sven; Shen, Ming; Jin, Xin; Wagner, Christina D; Wang, Stewart C; Yang, King H; King, Albert I

    2014-10-01

    The objective of this study was to create a computer-aided design (CAD) geometric dataset of a 10-year-old (10 YO) child. The study includes two phases of efforts. At Phase One, the 10 YO whole body CAD was developed from component computed tomography and magnetic resonance imaging scans of 12 pediatric subjects. Geometrical scaling methods were used to convert all component parts to the average size for a 10 YO child, based on available anthropometric data. Then the component surfaces were compiled and integrated into a complete body. The bony structures and flesh were adjusted as symmetrical to minimize the bias from a single subject while maintaining anthropometrical measurements. Internal organs including the liver, spleen, and kidney were further verified by literature data. At Phase Two, internal characteristics for the cervical spine disc, wrist, hand, pelvis, femur, and tibia were verified with data measured from additional 94 10 YO children. The CAD dataset developed through these processes was mostly within the corridor of one standard deviation (SD) of the mean. In conclusion, a geometric dataset for an average size 10 YO child was created. The dataset serves as a foundation to develop computational 10 YO whole body models for enhanced pediatric injury prevention.

  1. Trajectory-Based Flow Feature Tracking in Joint Particle/Volume Datasets.

    PubMed

    Sauer, Franz; Yu, Hongfeng; Ma, Kwan-Liu

    2014-12-01

    Studying the dynamic evolution of time-varying volumetric data is essential in countless scientific endeavors. The ability to isolate and track features of interest allows domain scientists to better manage large complex datasets both in terms of visual understanding and computational efficiency. This work presents a new trajectory-based feature tracking technique for use in joint particle/volume datasets. While traditional feature tracking approaches generally require a high temporal resolution, this method utilizes the indexed trajectories of corresponding Lagrangian particle data to efficiently track features over large jumps in time. Such a technique is especially useful for situations where the volume dataset is either temporally sparse or too large to efficiently track a feature through all intermediate timesteps. In addition, this paper presents a few other applications of this approach, such as the ability to efficiently track the internal properties of volumetric features using variables from the particle data. We demonstrate the effectiveness of this technique using real world combustion and atmospheric datasets and compare it to existing tracking methods to justify its advantages and accuracy.

  2. Preparing for the 100-megapixel detector: reconstructing a multi-terabyte computed-tomography dataset

    NASA Astrophysics Data System (ADS)

    Orr, Laurel J.; Jimenez, Edward S.

    2013-09-01

    Although there has been progress in applying GPU-technology to Computed-Tomography reconstruction algorithms, much of the work has concentrated on optimizing reconstruction performance for smaller, medical-scale datasets. Industrial CT datasets can vary widely in size and number of projections. With the new advancements in high resolution cameras, it is entirely possible that the Industrial CT community may soon need to pursue a 100-megapixel detector for CT applications. To reconstruct such a massive dataset, simply adding extra GPUs would not be an option as memory and storage bottlenecks would result in prolonged periods of GPU downtime, thus negating performance gains. Additionally, current reconstruction algorithms would not be sufficient due to the various bottlenecks in the processor hardware. Past work has shown that CT reconstruction is an irregular problem for large-scale datasets on a GPU due to the massively parallel environment. This work proposes a high-performance, multi-GPU, modularized approach to reconstruction where computation, memory transfers, and disk I/O are optimized to occur in parallel while accommodating the irregular nature of the computation kernel. Our approach utilizes a dynamic MIMD-type of architecture in a hybrid environment of CUDA and OpenMP. The modularized approach showed an improvement in load-balancing and performance such that a 1 trillion voxel volume was reconstructed from 10,000 100 megapixel projections in less than a day.

  3. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models

    PubMed Central

    Stephens, Zachary D.; Hudson, Matthew E.; Mainzer, Liudmila S.; Taschuk, Morgan; Weber, Matthew R.; Iyer, Ravishankar K.

    2016-01-01

    An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads. PMID:27893777

  4. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    NASA Technical Reports Server (NTRS)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  5. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging

    PubMed Central

    Rosa, Maria J.; Mehta, Mitul A.; Pich, Emilio M.; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A. T. S.; Williams, Steve C. R.; Dazzan, Paola; Doyle, Orla M.; Marquand, Andre F.

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow. PMID:26528117

  6. Nine martian years of dust optical depth observations: A reference dataset

    NASA Astrophysics Data System (ADS)

    Montabone, Luca; Forget, Francois; Kleinboehl, Armin; Kass, David; Wilson, R. John; Millour, Ehouarn; Smith, Michael; Lewis, Stephen; Cantor, Bruce; Lemmon, Mark; Wolff, Michael

    2016-07-01

    We present a multi-annual reference dataset of the horizontal distribution of airborne dust from martian year 24 to 32 using observations of the martian atmosphere from April 1999 to June 2015 made by the Thermal Emission Spectrometer (TES) aboard Mars Global Surveyor, the Thermal Emission Imaging System (THEMIS) aboard Mars Odyssey, and the Mars Climate Sounder (MCS) aboard Mars Reconnaissance Orbiter (MRO). Our methodology to build the dataset works by gridding the available retrievals of column dust optical depth (CDOD) from TES and THEMIS nadir observations, as well as the estimates of this quantity from MCS limb observations. The resulting (irregularly) gridded maps (one per sol) were validated with independent observations of CDOD by PanCam cameras and Mini-TES spectrometers aboard the Mars Exploration Rovers "Spirit" and "Opportunity", by the Surface Stereo Imager aboard the Phoenix lander, and by the Compact Reconnaissance Imaging Spectrometer for Mars aboard MRO. Finally, regular maps of CDOD are produced by spatially interpolating the irregularly gridded maps using a kriging method. These latter maps are used as dust scenarios in the Mars Climate Database (MCD) version 5, and are useful in many modelling applications. The two datasets (daily irregularly gridded maps and regularly kriged maps) for the nine available martian years are publicly available as NetCDF files and can be downloaded from the MCD website at the URL: http://www-mars.lmd.jussieu.fr/mars/dust_climatology/index.html

  7. A microarray whole-genome gene expression dataset in a rat model of inflammatory corneal angiogenesis

    PubMed Central

    Mukwaya, Anthony; Lindvall, Jessica M.; Xeroudaki, Maria; Peebo, Beatrice; Ali, Zaheer; Lennikov, Anton; Jensen, Lasse Dahl Ejby; Lagali, Neil

    2016-01-01

    In angiogenesis with concurrent inflammation, many pathways are activated, some linked to VEGF and others largely VEGF-independent. Pathways involving inflammatory mediators, chemokines, and micro-RNAs may play important roles in maintaining a pro-angiogenic environment or mediating angiogenic regression. Here, we describe a gene expression dataset to facilitate exploration of pro-angiogenic, pro-inflammatory, and remodelling/normalization-associated genes during both an active capillary sprouting phase, and in the restoration of an avascular phenotype. The dataset was generated by microarray analysis of the whole transcriptome in a rat model of suture-induced inflammatory corneal neovascularisation. Regions of active capillary sprout growth or regression in the cornea were harvested and total RNA extracted from four biological replicates per group. High quality RNA was obtained for gene expression analysis using microarrays. Fold change of selected genes was validated by qPCR, and protein expression was evaluated by immunohistochemistry. We provide a gene expression dataset that may be re-used to investigate corneal neovascularisation, and may also have implications in other contexts of inflammation-mediated angiogenesis. PMID:27874850

  8. Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control.

    PubMed

    Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R

    2014-01-01

    Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment.

  9. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging.

    PubMed

    Rosa, Maria J; Mehta, Mitul A; Pich, Emilio M; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A T S; Williams, Steve C R; Dazzan, Paola; Doyle, Orla M; Marquand, Andre F

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow.

  10. Independent technical review, handbook

    SciTech Connect

    Not Available

    1994-02-01

    Purpose Provide an independent engineering review of the major projects being funded by the Department of Energy, Office of Environmental Restoration and Waste Management. The independent engineering review will address questions of whether the engineering practice is sufficiently developed to a point where a major project can be executed without significant technical problems. The independent review will focus on questions related to: (1) Adequacy of development of the technical base of understanding; (2) Status of development and availability of technology among the various alternatives; (3) Status and availability of the industrial infrastructure to support project design, equipment fabrication, facility construction, and process and program/project operation; (4) Adequacy of the design effort to provide a sound foundation to support execution of project; (5) Ability of the organization to fully integrate the system, and direct, manage, and control the execution of a complex major project.

  11. Homebirth and independent midwifery.

    PubMed

    Harris, G

    2000-07-01

    Why do women choose to give birth at home, and midwives to work independently, in a culture that does little to support this option? This article looks at the reasons childbearing women and midwives make these choices and the barriers to achieving them. The safety of the homebirth option is supported in reference to analysis of mortality and morbidity. Homebirth practices and level of success are compared in Australia and New Zealand (NZ), in particular, and The Netherlands, England and America. The success of popularity of homebirths is analysed in terms of socio-economic status. The current situation and challenges of independent midwifery in Darwin are described.

  12. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    PubMed Central

    Lopez Torres, E.; Fiorina, E.; Pennazio, F.; Peroni, C.; Saletta, M.; Camarlinghi, N.; Fantacci, M. E.; Cerello, P.

    2015-01-01

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  13. Developing a regional retrospective ensemble precipitation dataset for watershed hydrology modeling, Idaho, USA

    NASA Astrophysics Data System (ADS)

    Flores, A. N.; Smith, K.; LaPorte, P.

    2011-12-01

    Applications like flood forecasting, military trafficability assessment, and slope stability analysis necessitate the use of models capable of resolving hydrologic states and fluxes at spatial scales of hillslopes (e.g., 10s to 100s m). These models typically require precipitation forcings at spatial scales of kilometers or better and time intervals of hours. Yet in especially rugged terrain that typifies much of the Western US and throughout much of the developing world, precipitation data at these spatiotemporal resolutions is difficult to come by. Ground-based weather radars have significant problems in high-relief settings and are sparsely located, leaving significant gaps in coverage and high uncertainties. Precipitation gages provide accurate data at points but are very sparsely located and their placement is often not representative, yielding significant coverage gaps in a spatial and physiographic sense. Numerical weather prediction efforts have made precipitation data, including critically important information on precipitation phase, available globally and in near real-time. However, these datasets present watershed modelers with two problems: (1) spatial scales of many of these datasets are tens of kilometers or coarser, (2) numerical weather models used to generate these datasets include a land surface parameterization that in some circumstances can significantly affect precipitation predictions. We report on the development of a regional precipitation dataset for Idaho that leverages: (1) a dataset derived from a numerical weather prediction model, (2) gages within Idaho that report hourly precipitation data, and (3) a long-term precipitation climatology dataset. Hourly precipitation estimates from the Modern Era Retrospective-analysis for Research and Applications (MERRA) are stochastically downscaled using a hybrid orographic and statistical model from their native resolution (1/2 x 2/3 degrees) to a resolution of approximately 1 km. Downscaled

  14. Postcard from Independence, Mo.

    ERIC Educational Resources Information Center

    Archer, Jeff

    2004-01-01

    This article reports results showing that the Independence, Missori school district failed to meet almost every one of its improvement goals under the No Child Left Behind Act. The state accreditation system stresses improvement over past scores, while the federal law demands specified amounts of annual progress toward the ultimate goal of 100…

  15. Native American Independent Living.

    ERIC Educational Resources Information Center

    Clay, Julie Anna

    1992-01-01

    Examines features of independent living philosophy with regard to compatibility with Native American cultures, including definition or conceptualization of disability; self-advocacy; systems advocacy; peer counseling; and consumer control and involvement. Discusses an actualizing process as one method of resolving cultural conflicts and…

  16. Touchstones of Independence.

    ERIC Educational Resources Information Center

    Roha, Thomas Arden

    1999-01-01

    Foundations affiliated with public higher education institutions can avoid having to open records for public scrutiny, by having independent boards of directors, occupying leased office space or paying market value for university space, using only foundation personnel, retaining legal counsel, being forthcoming with information and use of public…

  17. Independent Video in Britain.

    ERIC Educational Resources Information Center

    Stewart, David

    Maintaining the status quo as well as the attitude toward cultural funding and development that it imposes on video are detrimental to the formation of a thriving video network, and also out of key with the present social and political situation in Britain. Independent video has some quite specific advantages as a medium for cultural production…

  18. Caring about Independent Lives

    ERIC Educational Resources Information Center

    Christensen, Karen

    2010-01-01

    With the rhetoric of independence, new cash for care systems were introduced in many developed welfare states at the end of the 20th century. These systems allow local authorities to pay people who are eligible for community care services directly, to enable them to employ their own careworkers. Despite the obvious importance of the careworker's…

  19. Independence and Survival.

    ERIC Educational Resources Information Center

    James, H. Thomas

    Independent schools that are of viable size, well managed, and strategically located to meet competition will survive and prosper past the current financial crisis. We live in a complex technological society with insatiable demands for knowledgeable people to keep it running. The future will be marked by the orderly selection of qualified people,…

  20. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis.

    PubMed

    Alameda-Pineda, Xavier; Staiano, Jacopo; Subramanian, Ramanathan; Batrinca, Ligia; Ricci, Elisa; Lepri, Bruno; Lanz, Oswald; Sebe, Nicu

    2016-08-01

    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.

  1. Using Multiple Metadata Standards to Describe Climate Datasets in a Semantic Framework

    NASA Astrophysics Data System (ADS)

    Blumenthal, M. B.; Del Corral, J.; Bell, M.

    2007-12-01

    The standards underlying the Semantic Web -- Resource Description Framework (RDF) and Web Ontology Language (OWL), among others -- show great promise in addressing some of the basic problems in earth science metadata. In particular they provide a single framework that allows us to describe datasets according to multiple standards, creating a more complete description than any single standard can support, and avoiding the difficult problem of creating a super-standard that can describe everything about everything. The Semantic Web standards provide a framework for explicitly describing the data models implicit in programs that display and manipulate data. They also provide a framework where multiple metadata standards can be described. Most importantly, these data models and metadata standards can be interrelated, a key step in creating interoperability, and an important step in creating a practical system. As a exercise in understanding how this framework might be used, we have created an RDF expression of the datasets and some of the metadata in the IRI/LDEO Climate Data Library. This includes concepts like datasets, units, dependent variables, and independent variables. These datasets have been provided under diverse frameworks that have varied levels of associated metadata, including netCDF, GRIB, GeoTIFF, and OpenDAP: these frameworks have some associated concepts that are common, some that are similar and some that are quite distinct. We have also created an RDF expression of a taxonomy that forms the basis of a earth data search interface. These concepts include location, time, quantity, realm, author, and institution. A series of inference engines using currently-evolving semantic web technologies are then used to infer the connections between the diverse data-oriented concepts of the data library as well as the distinctly different conceptual framework of the data search.

  2. Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment

    PubMed Central

    2013-01-01

    Background Biopsies taken from individual tumours exhibit extensive differences in their cellular composition due to the inherent heterogeneity of cancers and vagaries of sample collection. As a result genes expressed in specific cell types, or associated with certain biological processes are detected at widely variable levels across samples in transcriptomic analyses. This heterogeneity also means that the level of expression of genes expressed specifically in a given cell type or process, will vary in line with the number of those cells within samples or activity of the pathway, and will therefore be correlated in their expression. Results Using a novel 3D network-based approach we have analysed six large human cancer microarray datasets derived from more than 1,000 individuals. Based upon this analysis, and without needing to isolate the individual cells, we have defined a broad spectrum of cell-type and pathway-specific gene signatures present in cancer expression data which were also found to be largely conserved in a number of independent datasets. Conclusions The conserved signature of the tumour-associated macrophage is shown to be largely-independent of tumour cell type. All stromal cell signatures have some degree of correlation with each other, since they must all be inversely correlated with the tumour component. However, viewed in the context of established tumours, the interactions between stromal components appear to be multifactorial given the level of one component e.g. vasculature, does not correlate tightly with another, such as the macrophage. PMID:23845084

  3. Discovery and Analysis of Intersecting Datasets: JMARS as a Comparative Science Platform

    NASA Astrophysics Data System (ADS)

    Carter, S.; Christensen, P. R.; Dickenshied, S.; Anwar, S.; Noss, D.

    2014-12-01

    sources under the given area. JMARS has the ability to geographically locate and display a vast array of remote sensing data for a user. In addition to its powerful searching ability, it also enables users to compare datasets using the Data Spike and Data Profile techniques. Plots and tables from this data can be exported and used in presentations, papers, or external software for further study.

  4. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    NASA Astrophysics Data System (ADS)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be

  5. A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity.

    PubMed

    Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

    2016-03-15

    We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103-189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven's Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association.

  6. A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity

    PubMed Central

    Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

    2016-01-01

    We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103–189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven’s Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association. PMID:26978040

  7. The CRUTEM4 land-surface air temperature dataset: construction, previous versions and dissemination via Google Earth

    NASA Astrophysics Data System (ADS)

    Osborn, T. J.; Jones, P. D.

    2013-10-01

    The CRUTEM4 (Climatic Research Unit Temperature version 4) land-surface air temperature dataset is one of the most widely used records of the climate system. Here we provide an important additional dissemination route for this dataset: online access to monthly, seasonal and annual data values and timeseries graphs via Google Earth. This is achieved via an interface written in Keyhole Markup Language (KML) and also provides access to the underlying weather station data used to construct the CRUTEM4 dataset. A mathematical description of the construction of the CRUTEM4 dataset (and its predecessor versions) is also provided, together with an archive of some previous versions and a recommendation for identifying the precise version of the dataset used in a particular study. The CRUTEM4 dataset used here is available from doi:10.5285/EECBA94F-62F9-4B7C-88D3-482F2C93C468.

  8. Securely measuring the overlap between private datasets with cryptosets.

    PubMed

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.

  9. Dataset from chemical gas sensor array in turbulent wind tunnel.

    PubMed

    Fonollosa, Jordi; Rodríguez-Luján, Irene; Trincavelli, Marco; Huerta, Ramón

    2015-06-01

    The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to "On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines", by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings.

  10. Realistic computer network simulation for network intrusion detection dataset generation

    NASA Astrophysics Data System (ADS)

    Payer, Garrett

    2015-05-01

    The KDD-99 Cup dataset is dead. While it can continue to be used as a toy example, the age of this dataset makes it all but useless for intrusion detection research and data mining. Many of the attacks used within the dataset are obsolete and do not reflect the features important for intrusion detection in today's networks. Creating a new dataset encompassing a large cross section of the attacks found on the Internet today could be useful, but would eventually fall to the same problem as the KDD-99 Cup; its usefulness would diminish after a period of time. To continue research into intrusion detection, the generation of new datasets needs to be as dynamic and as quick as the attacker. Simply examining existing network traffic and using domain experts such as intrusion analysts to label traffic is inefficient, expensive, and not scalable. The only viable methodology is simulation using technologies including virtualization, attack-toolsets such as Metasploit and Armitage, and sophisticated emulation of threat and user behavior. Simulating actual user behavior and network intrusion events dynamically not only allows researchers to vary scenarios quickly, but enables online testing of intrusion detection mechanisms by interacting with data as it is generated. As new threat behaviors are identified, they can be added to the simulation to make quicker determinations as to the effectiveness of existing and ongoing network intrusion technology, methodology and models.

  11. Discriminative motif analysis of high-throughput dataset

    PubMed Central

    Yao, Zizhen; MacQuarrie, Kyle L.; Fong, Abraham P.; Tapscott, Stephen J.; Ruzzo, Walter L.; Gentleman, Robert C.

    2014-01-01

    Motivation: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. Results: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. Availability: The motifRG package is publically available via the bioconductor repository. Contact: yzizhen@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24162561

  12. The LANDFIRE Refresh strategy: updating the national dataset

    USGS Publications Warehouse

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  13. Not only in the temperate zone: independent gametophytes of two vittarioid ferns (Pteridaceae, Polypodiales) in East Asian subtropics.

    PubMed

    Kuo, Li-Yaung; Chen, Cheng-Wei; Shinohara, Wataru; Ebihara, Atsushi; Kudoh, Hiroshi; Sato, Hirotoshi; Huang, Yao-Moan; Chiou, Wen-Liang

    2017-03-01

    Independent gametophyte ferns are unique among vascular plants because they are sporophyteless and reproduce asexually to maintain their populations in the gametophyte generation. Such ferns had been primarily discovered in temperate zone, and usually hypothesized with (sub)tropical origins and subsequent extinction of sporophyte due to climate change during glaciations. Presumably, independent fern gametophytes are unlikely to be distributed in tropics and subtropics because of relatively stable climates which are less affected by glaciations. Nonetheless, the current study presents cases of two independent gametophyte fern species in subtropic East Asia. In this study, we applied plastid DNA sequences (trnL-L-F and matK + ndhF + chlL datasets) and comprehensive sampling (~80%) of congeneric species for molecular identification and divergence time estimation of these independent fern gametophytes. The two independent gametophyte ferns were found belonging to genus Haplopteris (vittarioids, Pteridaceae) and no genetic identical sporophyte species in East Asia. For one species, divergence times between its populations imply recent oversea dispersal(s) by spores occurred during Pleistocene. By examining their ex situ and in situ fertility, prezygotic sterility was found in these two Haplopteris, in which gametangia were not or very seldom observed, and this prezygotic sterility might attribute to their lacks of functional sporophytes. Our field observation and survey on their habitats suggest microhabitat conditions might attribute to this prezygotic sterility. These findings point to consideration of whether recent climate change during the Pleistocene glaciation resulted in ecophysiological maladaptation of non-temperate independent gametophyte ferns. In addition, we provided a new definition to classify fern gametophyte independences at the population level. We expect that continued investigations into tropical and subtropical fern gametophyte floras will

  14. Agent independent task planning

    NASA Technical Reports Server (NTRS)

    Davis, William S.

    1990-01-01

    Agent-Independent Planning is a technique that allows the construction of activity plans without regard to the agent that will perform them. Once generated, a plan is then validated and translated into instructions for a particular agent, whether a robot, crewmember, or software-based control system. Because Space Station Freedom (SSF) is planned for orbital operations for approximately thirty years, it will almost certainly experience numerous enhancements and upgrades, including upgrades in robotic manipulators. Agent-Independent Planning provides the capability to construct plans for SSF operations, independent of specific robotic systems, by combining techniques of object oriented modeling, nonlinear planning and temporal logic. Since a plan is validated using the physical and functional models of a particular agent, new robotic systems can be developed and integrated with existing operations in a robust manner. This technique also provides the capability to generate plans for crewmembers with varying skill levels, and later apply these same plans to more sophisticated robotic manipulators made available by evolutions in technology.

  15. International exploration by independent

    SciTech Connect

    Bertragne, R.G.

    1992-04-01

    Recent industry trends indicate that the smaller U.S. independents are looking at foreign exploration opportunities as one of the alternatives for growth in the new age of exploration. Foreign finding costs per barrel usually are accepted to be substantially lower than domestic costs because of the large reserve potential of international plays. To get involved in overseas exploration, however, requires the explorationist to adapt to different cultural, financial, legal, operational, and political conditions. Generally, foreign exploration proceeds at a slower pace than domestic exploration because concessions are granted by a country's government, or are explored in partnership with a national oil company. First, the explorationist must prepare a mid- to long-term strategy, tailored to the goals and the financial capabilities of the company; next, is an ongoing evaluation of quality prospects in various sedimentary basins, and careful planning and conduct of the operations. To successfully explore overseas also requires the presence of a minimum number of explorationists and engineers thoroughly familiar with the various exploratory and operational aspects of foreign work. Ideally, these team members will have had a considerable amount of on-site experience in various countries and climates. Independents best suited for foreign expansion are those who have been financially successful in domestic exploration. When properly approached, foreign exploration is well within the reach of smaller U.S. independents, and presents essentially no greater risk than domestic exploration; however, the reward can be much larger and can catapult the company into the 'big leagues.'

  16. International exploration by independents

    SciTech Connect

    Bertagne, R.G. )

    1991-03-01

    Recent industry trends indicate that the smaller US independents are looking at foreign exploration opportunities as one of the alternatives for growth in the new age of exploration. It is usually accepted that foreign finding costs per barrel are substantially lower than domestic because of the large reserve potential of international plays. To get involved overseas requires, however, an adaptation to different cultural, financial, legal, operational, and political conditions. Generally foreign exploration proceeds at a slower pace than domestic because concessions are granted by the government, or are explored in partnership with the national oil company. First, a mid- to long-term strategy, tailored to the goals and the financial capabilities of the company, must be prepared; it must be followed by an ongoing evaluation of quality prospects in various sedimentary basins, and a careful planning and conduct of the operations. To successfully explore overseas also requires the presence on the team of a minimum number of explorationists and engineers thoroughly familiar with the various exploratory and operational aspects of foreign work, having had a considerable amount of onsite experience in various geographical and climatic environments. Independents that are best suited for foreign expansion are those that have been financially successful domestically, and have a good discovery track record. When properly approached foreign exploration is well within the reach of smaller US independents and presents essentially no greater risk than domestic exploration; the reward, however, can be much larger and can catapult the company into the big leagues.

  17. Why Additional Presentations Help Identify a Stimulus

    ERIC Educational Resources Information Center

    Guest, Duncan; Kent, Christopher; Adelman, James S.

    2010-01-01

    Nosofsky (1983) reported that additional stimulus presentations within a trial increase discriminability in absolute identification, suggesting that each presentation creates an independent stimulus representation, but it remains unclear whether exposure duration or the formation of independent representations improves discrimination in such…

  18. obs4MIPS: Satellite Datasets for Model Evaluation

    NASA Astrophysics Data System (ADS)

    Ferraro, R.; Waliser, D. E.; Gleckler, P. J.

    2013-12-01

    This poster will review the current status of the obs4MIPs project, whose purpose is to provide a limited collection of well-established and documented datasets for comparison with Earth system models. These datasets have been reformatted to correspond with the CMIP5 model output requirements, and include technical documentation specifically targeted for their use in model output evaluation. There are currently over 50 datasets containing observations that directly correspond to CMIP5 model output variables. We will review the rational and requirements for obs4MIPs contributions, and provide summary information of the current obs4MIPs holdings on the Earth System Grid Federation. We will also provide some usage statistics, an update on governance for the obs4MIPs project, and plans for supporting CMIP6.

  19. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    NASA Technical Reports Server (NTRS)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  20. ESTATE: Strategy for Exploring Labeled Spatial Datasets Using Association Analysis

    NASA Astrophysics Data System (ADS)

    Stepinski, Tomasz F.; Salazar, Josue; Ding, Wei; White, Denis

    We propose an association analysis-based strategy for exploration of multi-attribute spatial datasets possessing naturally arising classification. Proposed strategy, ESTATE (Exploring Spatial daTa Association patTErns), inverts such classification by interpreting different classes found in the dataset in terms of sets of discriminative patterns of its attributes. It consists of several core steps including discriminative data mining, similarity between transactional patterns, and visualization. An algorithm for calculating similarity measure between patterns is the major original contribution that facilitates summarization of discovered information and makes the entire framework practical for real life applications. Detailed description of the ESTATE framework is followed by its application to the domain of ecology using a dataset that fuses the information on geographical distribution of biodiversity of bird species across the contiguous United States with distributions of 32 environmental variables across the same area.

  1. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  2. Dataset for distribution of SIDER2 elements in the Leishmania major genome and transcriptome.

    PubMed

    Requena, Jose M; Rastrojo, Alberto; Garde, Esther; López, Manuel C; Thomas, M Carmen; Aguado, Begoña

    2017-04-01

    This paper contains data related to the research article entitled "Genomic cartography and proposal of nomenclature for the repeated, interspersed elements of the Leishmania major SIDER2 family and identification of SIDER2-containing transcripts" [1]. SIDER2 elements are repeated sequences, derived from, nowadays, extinct retrotransposons, that populate the genomes of protist of the genera Leishmania. This dataset (Supplementary file 1), an inventory of 1100 SIDER2 elements, was generated by surveying the L. major complete genome using bioinformatics tools with further manual refinements. In addition to the genomic distribution of these elements (summarized in Fig. 1), this dataset contains information regarding their association with specific transcripts, based on the recently established transcriptome for L. major[2].

  3. The Schema.org Datasets Schema: Experiences at the National Snow and Ice Data Center

    NASA Astrophysics Data System (ADS)

    Duerr, R.; Billingsley, B. W.; Harper, D.; Kovarik, J.

    2014-12-01

    Data discovery, is still a major challenge for many users. Relevant data may be located anywhere. There are currently no existing universal data registries. Often users start with a simple query through their web browser. But how do you get your data to actually show up near the top of the results? One relatively new way to accomplish this is to use schema.org dataset markup in your data pages. Theoretically this provides web crawlers the additional information needed so that a query for data will preferentially return those pages that were marked up accordingly. The National Snow and Ice Data Center recently implemented an initial set of markup in the data set pages returned by its catalog. The Datasets data model, our process, challenges encountered and results will be described.

  4. IsomiR expression patterns in canonical and Dicer-independent microRNAs

    PubMed Central

    Liang, Tingming; Yu, Jiafeng; Liu, Chang; Guo, Li

    2017-01-01

    Multiple microRNA (miRNA) variants, known as isomiRs, are extensively distributed in miRNA loci and predominantly derive from the alternative cleavage of Drosha/Dicer and 3′addition events. The present study aimed to investigate the expression patterns of multiple isomiRs in typical miRNA and Dicer-independent miRNA loci by conducting evolutionary and expression analysis using public datasets. Although different miRNA maturation processes exist, multiple isomiRs can be detected by similar expression distributions. However, isomiR expression in Dicer-independent miRNA loci tends to be at a moderate level, particularly for random distribution in the ends that are split by Dicer in the typical miRNA loci. Compared with the mature miRNA locus (dominant miRNA locus), the non-dominant miRNA locus indicates an expression distribution similar to that of the Dicer-independent miRNA locus. These results increase the understanding of multiple isomiRs in the progression of diseases. PMID:28098889

  5. Publishing datasets with eSciDoc and panMetaDocs

    NASA Astrophysics Data System (ADS)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    publishing scientific datasets as electronic data supplements to research papers. Publication of research manuscripts has an already well established workflow that shares junctures with other processes and involves several parties in the process of dataset publication. Activities of the author, the reviewer, the print publisher and the data publisher have to be coordinated into a common data publication workflow. The case of data publication at GFZ Potsdam displays some specifics, e.g. the DOIDB webservice. The DOIDB is a proxy service at GFZ for the DataCite [4] DOI registration and its metadata store. DOIDB provides a local summary of the dataset DOIs registered through GFZ as a publication agent. An additional use case for the DOIDB is its function to enrich the datacite metadata with additional custom attributes, like a geographic reference in a DIF record. These attributes are at the moment not available in the datacite metadata schema but would be valuable elements for the compilation of data catalogues in the earth sciences and for dissemination of catalogue data via OAI-PMH. [1] http://www.escidoc.org , eSciDoc, FIZ Karlruhe, Germany [2] http://panmetadocs.sf.net , panMetaDocs, GFZ Potsdam, Germany [3] http://metaworks.pangaea.de , panMetaWorks, Dr. R. Huber, MARUM, Univ. Bremen, Germany [4] http://www.datacite.org

  6. Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

    PubMed

    McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

    2017-01-01

    Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers.

  7. Climate Model Datasets on Earth System Grid II (ESG II)

    DOE Data Explorer

    Earth System Grid (ESG) is a project that combines the power and capacity of supercomputers, sophisticated analysis servers, and datasets on the scale of petabytes. The goal is to provide a seamless distributed environment that allows scientists in many locations to work with large-scale data, perform climate change modeling and simulation,and share results in innovative ways. Though ESG is more about the computing environment than the data, still there are several catalogs of data available at the web site that can be browsed or search. Most of the datasets are restricted to registered users, but several are open to any access.

  8. BigNeuron dataset V.0.0

    DOE Data Explorer

    Ramanathan, Arvind

    2016-01-01

    The cleaned bench testing reconstructions for the gold166 datasets have been put online at github https://github.com/BigNeuron/Events-and-News/wiki/BigNeuron-Events-and-News https://github.com/BigNeuron/Data/releases/tag/gold166_bt_v1.0 The respective image datasets were released a while ago from other sites (major pointer is available at github as well https://github.com/BigNeuron/Data/releases/tag/Gold166_v1 but since the files were big, the actual downloading was distributed at 3 continents separately)

  9. An evaluation of the global 1-km AVHRR land dataset

    USGS Publications Warehouse

    Teillet, P.M.; El Saleous, N.; Hansen, M.C.; Eidenshink, Jeffery C.; Justice, C.O.; Townshend, J.R.G.

    2000-01-01

    This paper summarizes the steps taken in the generation of the global 1-km AVHRR land dataset, and it documents an evaluation of the data product with respect to the original specifications and its usefulness in research and applications to date. The evaluation addresses data characterization, processing, compositing and handling issues. Examples of the main scientific outputs are presented and options for improved processing are outlined and prioritized. The dataset has made a significant contribution, and a strong recommendation is made for its reprocessing and continuation to produce a long-term record for global change research.

  10. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    SciTech Connect

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  11. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    NASA Astrophysics Data System (ADS)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  12. In-depth evaluation of software tools for data-independent acquisition based label-free quantification.

    PubMed

    Kuharev, Jörg; Navarro, Pedro; Distler, Ute; Jahn, Olaf; Tenzer, Stefan

    2015-09-01

    Label-free quantification (LFQ) based on data-independent acquisition workflows currently experiences increasing popularity. Several software tools have been recently published or are commercially available. The present study focuses on the evaluation of three different software packages (Progenesis, synapter, and ISOQuant) supporting ion mobility enhanced data-independent acquisition data. In order to benchmark the LFQ performance of the different tools, we generated two hybrid proteome samples of defined quantitative composition containing tryptically digested proteomes of three different species (mouse, yeast, Escherichia coli). This model dataset simulates complex biological samples containing large numbers of both unregulated (background) proteins as well as up- and downregulated proteins with exactly known ratios between samples. We determined the number and dynamic range of quantifiable proteins and analyzed the influence of applied algorithms (retention time alignment, clustering, normalization, etc.) on quantification results. Analysis of technical reproducibility revealed median coefficients of variation of reported protein abundances below 5% for MS(E) data for Progenesis and ISOQuant. Regarding accuracy of LFQ, evaluation with synapter and ISOQuant yielded superior results compared to Progenesis. In addition, we discuss reporting formats and user friendliness of the software packages. The data generated in this study have been deposited to the ProteomeXchange Consortium with identifier PXD001240 (http://proteomecentral.proteomexchange.org/dataset/PXD001240).

  13. Evaluation of a Moderate Resolution, Satellite-Based Impervious Surface Map Using an Independent, High-Resolution Validation Dataset

    EPA Science Inventory

    Given the relatively high cost of mapping impervious surfaces at regional scales, substantial effort is being expended in the development of moderate-resolution, satellite-based methods for estimating impervious surface area (ISA). To rigorously assess the accuracy of these data ...

  14. Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets.

    PubMed

    Ruau, David; Mbagwu, Michael; Dudley, Joel T; Krishnan, Vijay; Butte, Atul J

    2011-12-01

    Publicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as "gold standard". Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility.

  15. Cary Potter on Independent Education

    ERIC Educational Resources Information Center

    Potter, Cary

    1978-01-01

    Cary Potter was President of the National Association of Independent Schools from 1964-1978. As he leaves NAIS he gives his views on education, on independence, on the independent school, on public responsibility, on choice in a free society, on educational change, and on the need for collective action by independent schools. (Author/RK)

  16. Myth or Truth: Independence Day.

    ERIC Educational Resources Information Center

    Gardner, Traci

    Most Americans think of the Fourth of July as Independence Day, but is it really the day the U.S. declared and celebrated independence? By exploring myths and truths surrounding Independence Day, this lesson asks students to think critically about commonly believed stories regarding the beginning of the Revolutionary War and the Independence Day…

  17. Correcting OCR text by association with historical datasets

    NASA Astrophysics Data System (ADS)

    Hauser, Susan E.; Schlaifer, Jonathan; Sabir, Tehseen F.; Demner-Fushman, Dina; Straughan, Scott; Thoma, George R.

    2003-01-01

    The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting algorithms to generate electronic bibliographic citation data from paper biomedical journal articles. The OCR server incorporated in MARS performs well in general, but fares less well with text printed in small or italic fonts. Affiliations are often printed in small italic fonts in the journals processed by MARS. Consequently, although the automatic processes generate much of the citation data correctly, the affiliation field frequently contains incorrect data, which must be manually corrected by verification operators. In contrast, author names are usually printed in large, normal fonts that are correctly converted to text by the OCR server. The National Library of Medicine"s MEDLINE database contains 11 million indexed citations for biomedical journal articles. This paper documents our effort to use the historical author, affiliation relationships from this large dataset to find potential correct affiliations for MARS articles based on the author and the affiliation in the OCR output. Preliminary tests using a table of about 400,000 author/affiliation pairs extracted from the corrected data from MARS indicated that about 44% of the author/affiliation pairs were repeats and that about 47% of newly converted author names would be found in this set. A text-matching algorithm was developed to determine the likelihood that an affiliation found in the table corresponding to the OCR text of the first author was the current, correct affiliation. This matching algorithm compares an affiliation found in the author/affiliation table (found with the OCR text of the first author) to the OCR output affiliation, and calculates a score indicating the similarity of the affiliation found in the table to the OCR affiliation. Using a ground truth set of 519 OCR author/OCR affiliation/correct affiliation

  18. Dataset of manually measured QT intervals in the electrocardiogram

    PubMed Central

    Christov, Ivaylo; Dotsinsky, Ivan; Simova, Iana; Prokopova, Rada; Trendafilova, Elina; Naydenov, Stefan

    2006-01-01

    Background The QT interval and the QT dispersion are currently a subject of considerable interest. Cardiac repolarization delay is known to favor the development of arrhythmias. The QT dispersion, defined as the difference between the longest and the shortest QT intervals or as the standard deviation of the QT duration in the 12-lead ECG is assumed to be reliable predictor of cardiovascular mortality. The seventh annual PhysioNet/Computers in Cardiology Challenge, 2006 addresses a question of high clinical interest: Can the QT interval be measured by fully automated methods with accuracy acceptable for clinical evaluations? Method The PTB Diagnostic ECG Database was given to 4 cardiologists and 1 biomedical engineer for manual marking of QRS onsets and T-wave ends in 458 recordings. Each recording consisted of one selected beat in lead II, chosen visually to have minimum baseline shift, noise, and artifact. In cases where no T wave could be observed or its amplitude was very small, the referees were instructed to mark a 'group-T-wave end' taking into consideration leads with better manifested T wave. A modified Delphi approach was used, which included up to three rounds of measurements to obtain results closer to the median. Results A total amount of 2*5*548 Q-onsets and T-wave ends were manually marked during round 1. To obtain closer to the median results, 8.58 % of Q-onsets and 3.21 % of the T-wave ends had to be reviewed during round 2, and 1.50 % Q-onsets and 1.17 % T-wave ends in round 3. The mean and standard deviation of the differences between the values of the referees and the median after round 3 were 2.43 ± 0.96 ms for the Q-onset, and 7.43 ± 3.44 ms for the T-wave end. Conclusion A fully accessible, on the Internet, dataset of manually measured Q-onsets and T-wave ends was created and presented in additional file: 1 (Table 4) with this article. Thus, an available standard can be used for the development of automated methods for the detection of Q

  19. Eastern Renewable Generation Integration Study Solar Dataset (Presentation)

    SciTech Connect

    Hummon, M.

    2014-04-01

    The National Renewable Energy Laboratory produced solar power production data for the Eastern Renewable Generation Integration Study (ERGIS) including "real time" 5-minute interval data, "four hour ahead forecast" 60-minute interval data, and "day-ahead forecast" 60-minute interval data for the year 2006. This presentation provides a brief overview of the three solar power datasets.

  20. A Dataset for Visual Navigation with Neuromorphic Methods

    PubMed Central

    Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi

    2016-01-01

    Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets. PMID:26941595

  1. A Dataset for Visual Navigation with Neuromorphic Methods.

    PubMed

    Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi

    2016-01-01

    Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  2. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    NASA Technical Reports Server (NTRS)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  3. The Nashua agronomic, water quality, and economic dataset

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This paper describes a dataset on 36 0.4 hectare tile-drained plots relating management to nitrogen (N) loading and crop yields from 1990-2003 on the Northeast Research and Demonstration Farm near Nashua, Iowa. The measured data were analyzed with the Root Zone Water Quality Model (RZWQM) and summa...

  4. Automated single particle detection and tracking for large microscopy datasets

    PubMed Central

    Wilson, Rhodri S.; Yang, Lei; Dun, Alison; Smyth, Annya M.; Duncan, Rory R.; Rickman, Colin

    2016-01-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates. PMID:27293801

  5. Fitting Meta-Analytic Structural Equation Models with Complex Datasets

    ERIC Educational Resources Information Center

    Wilson, Sandra Jo; Polanin, Joshua R.; Lipsey, Mark W.

    2016-01-01

    A modification of the first stage of the standard procedure for two-stage meta-analytic structural equation modeling for use with large complex datasets is presented. This modification addresses two common problems that arise in such meta-analyses: (a) primary studies that provide multiple measures of the same construct and (b) the correlation…

  6. UK surveillance: provision of quality assured information from combined datasets.

    PubMed

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  7. A global experimental dataset for assessing grain legume production

    NASA Astrophysics Data System (ADS)

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-09-01

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide.

  8. Finding the Maine Story in Hugh Cumbersome National Monitoring Datasets

    EPA Science Inventory

    What’s a manager, analyst, or concerned citizen to do with the complex datasets generated by State and Federal monitoring efforts? Is it possible to use such information to address Maine’s environmental issues without having a degree in informatics and statistics? This presentati...

  9. A daily global mesoscale ocean eddy dataset from satellite altimetry

    PubMed Central

    Faghmous, James H.; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993–2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744

  10. A dataset of human decision-making in teamwork management

    PubMed Central

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members’ capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches. PMID:28094787

  11. A global experimental dataset for assessing grain legume production

    PubMed Central

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-01-01

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide. PMID:27676125

  12. A global experimental dataset for assessing grain legume production.

    PubMed

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-09-27

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide.

  13. A dataset of human decision-making in teamwork management

    NASA Astrophysics Data System (ADS)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  14. Filtergraph: An interactive web application for visualization of astronomy datasets

    NASA Astrophysics Data System (ADS)

    Burger, Dan; Stassun, Keivan G.; Pepper, Joshua; Siverd, Robert J.; Paegert, Martin; De Lee, Nathan M.; Robinson, William H.

    2013-08-01

    Filtergraph is a web application being developed and maintained by the Vanderbilt Initiative in Data-intensive Astrophysics (VIDA) to flexibly and rapidly visualize a large variety of astronomy datasets of various formats and sizes. The user loads a flat-file dataset into Filtergraph which automatically generates an interactive data portal that can be easily shared with others. From this portal, the user can immediately generate scatter plots of up to five dimensions as well as histograms and tables based on the dataset. Key features of the portal include intuitive controls with auto-completed variable names, the ability to filter the data in real time through user-specified criteria, the ability to select data by dragging on the screen, and the ability to perform arithmetic operations on the data in real time. To enable seamless data visualization and exploration, changes are quickly rendered on screen and visualizations can be exported as high quality graphics files. The application is optimized for speed in the context of large datasets: for instance, a plot generated from a stellar database of 3.1 million entries renders in less than 2 s on a standard web server platform. This web application has been created using the Web2py web framework based on the Python programming language. Filtergraph is free to use at http://filtergraph.vanderbilt.edu/.

  15. Accounting For Uncertainty in The Application Of High Throughput Datasets

    EPA Science Inventory

    The use of high throughput screening (HTS) datasets will need to adequately account for uncertainties in the data generation process and propagate these uncertainties through to ultimate use. Uncertainty arises at multiple levels in the construction of predictors using in vitro ...

  16. Oregon Cascades Play Fairway Analysis: Raster Datasets and Models

    SciTech Connect

    Adam Brandt

    2015-11-15

    This submission includes maps of the spatial distribution of basaltic, and felsic rocks in the Oregon Cascades. It also includes a final Play Fairway Analysis (PFA) model, with the heat and permeability composite risk segments (CRS) supplied separately. Metadata for each raster dataset can be found within the zip files, in the TIF images

  17. A Dataset for Breast Cancer Histopathological Image Classification.

    PubMed

    Spanhol, Fabio A; Oliveira, Luiz S; Petitjean, Caroline; Heutte, Laurent

    2016-07-01

    Today, medical image analysis papers require solid experiments to prove the usefulness of proposed methods. However, experiments are often performed on data selected by the researchers, which may come from different institutions, scanners, and populations. Different evaluation measures may be used, making it difficult to compare the methods. In this paper, we introduce a dataset of 7909 breast cancer histopathology images acquired on 82 patients, which is now publicly available from http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign and malignant images. The task associated with this dataset is the automated classification of these images in two classes, which would be a valuable computer-aided diagnosis tool for the clinician. In order to assess the difficulty of this task, we show some preliminary results obtained with state-of-the-art image classification systems. The accuracy ranges from 80% to 85%, showing room for improvement is left. By providing this dataset and a standardized evaluation protocol to the scientific community, we hope to gather researchers in both the medical and the machine learning field to advance toward this clinical application.

  18. A dataset of human decision-making in teamwork management.

    PubMed

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-17

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  19. Comparison and validation of gridded precipitation datasets for Spain

    NASA Astrophysics Data System (ADS)

    Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

    2016-04-01

    In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely

  20. Orientation-independent measures of ground motion

    USGS Publications Warehouse

    Boore, D.M.; Watson-Lamprey, Jennie; Abrahamson, N.A.

    2006-01-01

    The geometric mean of the response spectra for two orthogonal horizontal components of motion, commonly used as the response variable in predictions of strong ground motion, depends on the orientation of the sensors as installed in the field. This means that the measure of ground-motion intensity could differ for the same actual ground motion. This dependence on sensor orientation is most pronounced for strongly correlated motion (the extreme example being linearly polarized motion), such as often occurs at periods of 1 sec or longer. We propose two new measures of the geometric mean, GMRotDpp, and GMRotIpp, that are independent of the sensor orientations. Both are based on a set of geometric means computed from the as-recorded orthogonal horizontal motions rotated through all possible non-redundant rotation angles. GMRotDpp is determined as the ppth percentile of the set of geometric means for a given oscillator period. For example, GMRotDOO, GMRotD50, and GMRotD100 correspond to the minimum, median, and maximum values, respectively. The rotations that lead to GMRotDpp depend on period, whereas a single-period-independent rotation is used for GMRotIpp, the angle being chosen to minimize the spread of the rotation-dependent geometric mean (normalized by GMRotDpp) over the usable range of oscillator periods. GMRotI50 is the ground-motion intensity measure being used in the development of new ground-motion prediction equations by the Pacific Earthquake Engineering Center Next Generation Attenuation project. Comparisons with as-recorded geometric means for a large dataset show that the new measures are systematically larger than the geometric-mean response spectra using the as-recorded values of ground acceleration, but only by a small amount (less than 3%). The theoretical advantage of the new measures is that they remove sensor orientation as a contributor to aleatory uncertainty. Whether the reduction is of practical significance awaits detailed studies of large

  1. Atlas-Guided Cluster Analysis of Large Tractography Datasets

    PubMed Central

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  2. New insights on the sister lineage of percomorph fishes with an anchored hybrid enrichment dataset.

    PubMed

    Dornburg, Alex; Townsend, Jeffrey P; Brooks, Willa; Spriggs, Elizabeth; Eytan, Ron I; Moore, Jon A; Wainwright, Peter C; Lemmon, Alan; Lemmon, Emily Moriarty; Near, Thomas J

    2017-02-27

    Percomorph fishes represent over 17,100 species, including several model organisms and species of economic importance. Despite continuous advances in the resolution of the percomorph Tree of Life, resolution of the sister lineage to Percomorpha remains inconsistent but restricted to a small number of candidate lineages. Here we use an anchored hybrid enrichment (AHE) dataset of 132 loci with over 99,000 base pairs to identify the sister lineage of percomorph fishes. Initial analyses of this dataset failed to recover a strongly supported sister clade to Percomorpha, however, scrutiny of the AHE dataset revealed a bias towards high GC content at fast-evolving codon partitions (GC bias). By combining several existing approaches aimed at mitigating the impacts of convergence in GC bias, including RY coding and analyses of amino acids, we consistently recovered a strongly supported clade comprised of Holocentridae (squirrelfishes), Berycidae (Alfonsinos), Melamphaidae (bigscale fishes), Cetomimidae (flabby whalefishes), and Rondeletiidae (redmouth whalefishes) as the sister lineage to Percomorpha. Additionally, implementing phylogenetic informativeness (PI) based metrics as a filtration method yielded this same topology, suggesting PI based approaches will preferentially filter these fast-evolving regions and act in a manner consistent with other phylogenetic approaches aimed at mitigating GC bias. Our results provide a new perspective on a key issue for studies investigating the evolutionary history of more than one quarter of all living species of vertebrates.

  3. Discovering Associations in Biomedical Datasets by Link-based Associative Classifier (LAC)

    PubMed Central

    Yu, Pulan; Wild, David J.

    2012-01-01

    Associative classification mining (ACM) can be used to provide predictive models with high accuracy as well as interpretability. However, traditional ACM ignores the difference of significances among the features used for mining. Although weighted associative classification mining (WACM) addresses this issue by assigning different weights to features, most implementations can only be utilized when pre-assigned weights are available. In this paper, we propose a link-based approach to automatically derive weight information from a dataset using link-based models which treat the dataset as a bipartite model. By combining this link-based feature weighting method with a traditional ACM method–classification based on associations (CBA), a Link-based Associative Classifier (LAC) is developed. We then demonstrate the application of LAC to biomedical datasets for association discovery between chemical compounds and bioactivities or diseases. The results indicate that the novel link-based weighting method is comparable to support vector machine (SVM) and RELIEF method, and is capable of capturing significant features. Additionally, LAC is shown to produce models with high accuracies and discover interesting associations which may otherwise remain unrevealed by traditional ACM. PMID:23227228

  4. Historical Weather and Climate KML datasets at NOAA's National Climatic Data Center

    NASA Astrophysics Data System (ADS)

    Baldwin, R.; Ansari, S.; Reid, G.; Del Greco, S.; Lott, N.

    2008-12-01

    NOAA's National Climatic Data Center is using KML to share historical weather and climate data with the Virtual Globe community. Many diverse datasets are available as dynamic, static or custom manually created KML. The following dynamic datasets include archives delivered as REST-based KML web services: - NEXRAD Level-III point features describing general storm structure, hail, mesocyclone and tornado signatures - NOAA's National Weather Service Storm Events Database - NOAA's National Weather Service Local Storm Reports collected from storm spotters - NOAA's National Weather Service Warnings Static datasets include: - Integrated Surface Data (ISD), worldwide surface weather observations - Global Climate Observing System, a comprehensive system focused on the requirements for climate issues - Monthly Climatic Data for the World, approximately 1200 surface and 500 upper air worldwide stations In addition, the NOAA Weather and Climate Toolkit provides custom KML output for NEXRAD Radar and GOES Satellite Imagery. These various access methods provide KML capability to a wide variety of historical data and enhance the interoperability, integration and usability of NCDC data.

  5. A New Dataset of Spermatogenic vs. Oogenic Transcriptomes in the Nematode Caenorhabditis elegans

    PubMed Central

    Ortiz, Marco A.; Noble, Daniel; Sorokin, Elena P.; Kimble, Judith

    2014-01-01

    The nematode Caenorhabditis elegans is an important model for studies of germ cell biology, including the meiotic cell cycle, gamete specification as sperm or oocyte, and gamete development. Fundamental to those studies is a genome-level knowledge of the germline transcriptome. Here, we use RNA-Seq to identify genes expressed in isolated XX gonads, which are approximately 95% germline and 5% somatic gonadal tissue. We generate data from mutants making either sperm [fem-3(q96)] or oocytes [fog-2(q71)], both grown at 22°. Our dataset identifies a total of 10,754 mRNAs in the polyadenylated transcriptome of XX gonads, with 2748 enriched in spermatogenic gonads, 1732 enriched in oogenic gonads, and the remaining 6274 not enriched in either. These spermatogenic, oogenic, and gender-neutral gene datasets compare well with those of previous studies, but double the number of genes identified. A comparison of the additional genes found in our study with in situ hybridization patterns in the Kohara database suggests that most are expressed in the germline. We also query our RNA-Seq data for differential exon usage and find 351 mRNAs with sex-enriched isoforms. We suggest that this new dataset will prove useful for studies focusing on C. elegans germ cell biology. PMID:25060624

  6. Diagnostic variability for schizophrenia and major depression in a large public mental health care system dataset.

    PubMed

    Folsom, David P; Lindamer, Laurie; Montross, Lori P; Hawthorne, William; Golshan, Shahrokh; Hough, Richard; Shale, John; Jeste, Dilip V

    2006-11-15

    Administrative datasets can provide information about mental health treatment in real world settings; however, an important limitation in using these datasets is the uncertainty regarding psychiatric diagnosis. To better understand the psychiatric diagnoses, we investigated the diagnostic variability of schizophrenia and major depression in a large public mental health system. Using schizophrenia and major depression as the two comparison diagnoses, we compared the variability of diagnoses assigned to patients with one recorded diagnosis of schizophrenia or major depression. In addition, for both of these diagnoses, the diagnostic variability was compared across seven types of treatment settings. Statistical analyses were conducted using t tests for continuous data and chi-square tests for categorical data. We found that schizophrenia had greater diagnostic variability than major depression (31% vs. 43%). For both schizophrenia and major depression, variability was significantly higher in jail and the emergency psychiatric unit than in inpatient or outpatient settings. These findings demonstrate that the variability of psychiatric diagnoses recorded in the administrative dataset of a large public mental health system varies by diagnosis and by treatment setting. Further research is needed to clarify the relationship between psychiatric diagnosis, diagnostic variability and treatment setting.

  7. A new dataset of spermatogenic vs. oogenic transcriptomes in the nematode Caenorhabditis elegans.

    PubMed

    Ortiz, Marco A; Noble, Daniel; Sorokin, Elena P; Kimble, Judith

    2014-07-24

    The nematode Caenorhabditis elegans is an important model for studies of germ cell biology, including the meiotic cell cycle, gamete specification as sperm or oocyte, and gamete development. Fundamental to those studies is a genome-level knowledge of the germline transcriptome. Here, we use RNA-Seq to identify genes expressed in isolated XX gonads, which are approximately 95% germline and 5% somatic gonadal tissue. We generate data from mutants making either sperm [fem-3(q96)] or oocytes [fog-2(q71)], both grown at 22°. Our dataset identifies a total of 10,754 mRNAs in the polyadenylated transcriptome of XX gonads, with 2748 enriched in spermatogenic gonads, 1732 enriched in oogenic gonads, and the remaining 6274 not enriched in either. These spermatogenic, oogenic, and gender-neutral gene datasets compare well with those of previous studies, but double the number of genes identified. A comparison of the additional genes found in our study with in situ hybridization patterns in the Kohara database suggests that most are expressed in the germline. We also query our RNA-Seq data for differential exon usage and find 351 mRNAs with sex-enriched isoforms. We suggest that this new dataset will prove useful for studies focusing on C. elegans germ cell biology.

  8. Astronaut Photography of the Earth: A Long-Term Dataset for Earth Systems Research, Applications, and Education

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.

    2017-01-01

    capabilities. It is expected that these value additions will increase interest and use of the dataset by the global community.

  9. A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system

    NASA Astrophysics Data System (ADS)

    Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley

    2016-04-01

    An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between

  10. annot8r: GO, EC and KEGG annotation of EST datasets

    PubMed Central

    Schmid, Ralf; Blaxter, Mark L

    2008-01-01

    Background The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST

  11. A Large and Consistent Phylogenomic Dataset Supports Sponges as the Sister Group to All Other Animals.

    PubMed

    Simion, Paul; Philippe, Hervé; Baurain, Denis; Jager, Muriel; Richter, Daniel J; Di Franco, Arnaud; Roure, Béatrice; Satoh, Nori; Quéinnec, Éric; Ereskovsky, Alexander; Lapébie, Pascal; Corre, Erwan; Delsuc, Frédéric; King, Nicole; Wörheide, Gert; Manuel, Michaël

    2017-04-03

    Resolving the early diversification of animal lineages has proven difficult, even using genome-scale datasets. Several phylogenomic studies have supported the classical scenario in which sponges (Porifera) are the sister group to all other animals ("Porifera-sister" hypothesis), consistent with a single origin of the gut, nerve cells, and muscle cells in the stem lineage of eumetazoans (bilaterians + ctenophores + cnidarians). In contrast, several other studies have recovered an alternative topology in which ctenophores are the sister group to all other animals (including sponges). The "Ctenophora-sister" hypothesis implies that eumetazoan-specific traits, such as neurons and muscle cells, either evolved once along the metazoan stem lineage and were then lost in sponges and placozoans or evolved at least twice independently in Ctenophora and in Cnidaria + Bilateria. Here, we report on our reconstruction of deep metazoan relationships using a 1,719-gene dataset with dense taxonomic sampling of non-bilaterian animals that was assembled using a semi-automated procedure, designed to reduce known error sources. Our dataset outperforms previous metazoan gene superalignments in terms of data quality and quantity. Analyses with a best-fitting site-heterogeneous evolutionary model provide strong statistical support for placing sponges as the sister-group to all other metazoans, with ctenophores emerging as the second-earliest branching animal lineage. Only those methodological settings that exacerbated long-branch attraction artifacts yielded Ctenophora-sister. These results show that methodological issues must be carefully addressed to tackle difficult phylogenetic questions and pave the road to a better understanding of how fundamental features of animal body plans have emerged.

  12. A high resolution 7-Tesla resting-state fMRI test-retest dataset with cognitive and physiological measures.

    PubMed

    Gorgolewski, Krzysztof J; Mendes, Natacha; Wilfling, Domenica; Wladimirow, Elisabeth; Gauthier, Claudine J; Bonnen, Tyler; Ruby, Florence J M; Trampel, Robert; Bazin, Pierre-Louis; Cozatl, Roberto; Smallwood, Jonathan; Margulies, Daniel S

    2015-01-01

    Here we present a test-retest dataset of functional magnetic resonance imaging (fMRI) data acquired at rest. 22 participants were scanned during two sessions spaced one week apart. Each session includes two 1.5 mm isotropic whole-brain scans and one 0.75 mm isotropic scan of the prefrontal cortex, giving a total of six time-points. Additionally, the dataset includes measures of mood, sustained attention, blood pressure, respiration, pulse, and the content of self-generated thoughts (mind wandering). This data enables the investigation of sources of both intra- and inter-session variability not only limited to physiological changes, but also including alterations in cognitive and affective states, at high spatial resolution. The dataset is accompanied by a detailed experimental protocol and source code of all stimuli used.

  13. A Physical Activity Reference Data-Set Recorded from Older Adults Using Body-Worn Inertial Sensors and Video Technology—The ADAPT Study Data-Set

    PubMed Central

    Bourke, Alan Kevin; Ihlen, Espen Alexander F.; Bergquist, Ronny; Wik, Per Bendik; Vereijken, Beatrix; Helbostad, Jorunn L.

    2017-01-01

    Physical activity monitoring algorithms are often developed using conditions that do not represent real-life activities, not developed using the target population, or not labelled to a high enough resolution to capture the true detail of human movement. We have designed a semi-structured supervised laboratory-based activity protocol and an unsupervised free-living activity protocol and recorded 20 older adults performing both protocols while wearing up to 12 body-worn sensors. Subjects’ movements were recorded using synchronised cameras (≥25 fps), both deployed in a laboratory environment to capture the in-lab portion of the protocol and a body-worn camera for out-of-lab activities. Video labelling of the subjects’ movements was performed by five raters using 11 different category labels. The overall level of agreement was high (percentage of agreement >90.05%, and Cohen’s Kappa, corrected kappa, Krippendorff’s alpha and Fleiss’ kappa >0.86). A total of 43.92 h of activities were recorded, including 9.52 h of in-lab and 34.41 h of out-of-lab activities. A total of 88.37% and 152.01% of planned transitions were recorded during the in-lab and out-of-lab scenarios, respectively. This study has produced the most detailed dataset to date of inertial sensor data, synchronised with high frame-rate (≥25 fps) video labelled data recorded in a free-living environment from older adults living independently. This dataset is suitable for validation of existing activity classification systems and development of new activity classification algorithms. PMID:28287449

  14. Improving the terrestial gravity dataset in South-Estonia

    NASA Astrophysics Data System (ADS)

    Oja, T.; Gruno, A.; Bloom, A.; Mäekivi, E.; Ellmann, A.; All, T.; Jürgenson, H.; Michelson, M.

    2009-04-01

    The only available gravity dataset covering the whole of Estonia has been observed from 1949 to 1958. This historic dataset has been used as a main input source for many applications including the geoid determination, the realization of the height system, the geological mapping. However, some recent studies have been indicated remarkable systematic biases in the dataset. For instance, a comparison of modern gravity control points with the historic data revealed unreasonable discrepancies in a large region in South-Estonia. However, the distribution of the gravity control was scarce, which did not allow to fully assess the quality of the historic data in the study area. In 2008 a pilot project was called out as a cooperation between Estonian Land Board, Geological Survey of Estonia, Tallinn University of Technology and Estonian University of Life Sciences to densify the detected problematic area (about 2000 km2) with new and reliable gravity data. Field work was carried out in October and November 2008, whereas GPS RTK and relative Scintrex gravimeter CG5 were used for precise positioning and gravity determinations, respectively. Altogether more than 140 new points were determined along the roads. Despite bad weather conditions and unstable observation base of the gravimeter (mostly on the bank of the road), uncertainty better than ±0.1 mGal (1 mGal = 10-5 m/s2) was estimated from the adjustment of gravimeter's readings. The separate gravity dataset of the Estonian Geological Survey were also incorporated into the gravity database of the project for further analysis. Those data were collected within several geological mapping projects in 1981-2007 and contain the data with uncertainty better than ±0.25 mGal. After the collection of new gravity data, a Kriging with proper variogram modeling was applied to form the Bouguer anomaly grids of the historic and the new datasets. The comparison of the resulting grids revealed biases up to -4 mGal at certain regions

  15. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919–2014

    PubMed Central

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-01-01

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919–2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks. PMID:27116565

  16. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919-2014.

    PubMed

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-04-26

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919-2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks.

  17. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919-2014

    NASA Astrophysics Data System (ADS)

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-04-01

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919-2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks.

  18. Climatic Analysis of Oceanic Water Vapor Transports Based on Satellite E-P Datasets

    NASA Technical Reports Server (NTRS)

    Smith, Eric A.; Sohn, Byung-Ju; Mehta, Vikram

    2004-01-01

    Understanding the climatically varying properties of water vapor transports from a robust observational perspective is an essential step in calibrating climate models. This is tantamount to measuring year-to-year changes of monthly- or seasonally-averaged, divergent water vapor transport distributions. This cannot be done effectively with conventional radiosonde data over ocean regions where sounding data are generally sparse. This talk describes how a methodology designed to derive atmospheric water vapor transports over the world oceans from satellite-retrieved precipitation (P) and evaporation (E) datasets circumvents the problem of inadequate sampling. Ultimately, the method is intended to take advantage of the relatively complete and consistent coverage, as well as continuity in sampling, associated with E and P datasets obtained from satellite measurements. Independent P and E retrievals from Special Sensor Microwave Imager (SSM/I) measurements, along with P retrievals from Tropical Rainfall Measuring Mission (TRMM) measurements, are used to obtain transports by solving a potential function for the divergence of water vapor transport as balanced by large scale E - P conditions.

  19. Effects of VR system fidelity on analyzing isosurface visualization of volume datasets.

    PubMed

    Laha, Bireswar; Bowman, Doug A; Socha, John J

    2014-04-01

    Volume visualization is an important technique for analyzing datasets from a variety of different scientific domains. Volume data analysis is inherently difficult because volumes are three-dimensional, dense, and unfamiliar, requiring scientists to precisely control the viewpoint and to make precise spatial judgments. Researchers have proposed that more immersive (higher fidelity) VR systems might improve task performance with volume datasets, and significant results tied to different components of display fidelity have been reported. However, more information is needed to generalize these results to different task types, domains, and rendering styles. We visualized isosurfaces extracted from synchrotron microscopic computed tomography (SR-μCT) scans of beetles, in a CAVE-like display. We ran a controlled experiment evaluating the effects of three components of system fidelity (field of regard, stereoscopy, and head tracking) on a variety of abstract task categories that are applicable to various scientific domains, and also compared our results with those from our prior experiment using 3D texture-based rendering. We report many significant findings. For example, for search and spatial judgment tasks with isosurface visualization, a stereoscopic display provides better performance, but for tasks with 3D texture-based rendering, displays with higher field of regard were more effective, independent of the levels of the other display components. We also found that systems with high field of regard and head tracking improve performance in spatial judgment tasks. Our results extend existing knowledge and produce new guidelines for designing VR systems to improve the effectiveness of volume data analysis.

  20. Improving the Fundamental Understanding of Regional Seismic Signal Processing with a Unique Western U.S. Dataset

    SciTech Connect

    Walter, W R; Smith, K; O'Boyle, J; Hauk, T F; Ryall, F; Ruppert, S D; Myers, S C; Anderson, M; Dodge, D A

    2003-07-18

    recovered and reformatted old event segmented data from the LLNL and SNL managed stations for past nuclear tests and earthquakes. We then used the preferred origin catalog to extract waveforms from continuous data and associate event segmented waveforms within the database. The result is a well-organized regional western US dataset with hundreds of nuclear tests, thousands of mining explosions and hundreds of thousands of earthquakes. In the second stage of the project we have chosen a subset of approximately 125 events that are well located and cover a range of magnitudes, source types, and locations. Ms. Flori Ryall, an experienced seismic analyst is reviewing this dataset. She is picking all arrival onsets with quantitative uncertainties and making note of data problems (timing errors, glitches, dropouts) and issues. The resulting arrivals and comments will then be loaded into the database for future researcher use. During the summer of 2003 we will be carrying out some analysis and quality control on this subset. It is anticipated that this set of consistently picked, independently located data will provide an effective test set for regional sparse station location algorithms. In addition, because the set will include nuclear tests, earthquakes, and mine-related events, each with related source parameters, it will provide a valuable test set for regional discrimination and magnitude estimation as well. A final relational database of these approximately 125 events in the high quality subset will be put onto a CD-ROM and distributed for other researchers to use in benchmarking regional algorithms after the conclusion of the project.

  1. Global heating distributions for January 1979 calculated from GLA assimilated and simulated model-based datasets

    NASA Technical Reports Server (NTRS)

    Schaack, Todd K.; Lenzen, Allen J.; Johnson, Donald R.

    1991-01-01

    This study surveys the large-scale distribution of heating for January 1979 obtained from five sources of information. Through intercomparison of these distributions, with emphasis on satellite-derived information, an investigation is conducted into the global distribution of atmospheric heating and the impact of observations on the diagnostic estimates of heating derived from assimilated datasets. The results indicate a substantial impact of satellite information on diagnostic estimates of heating in regions where there is a scarcity of conventional observations. The addition of satellite data provides information on the atmosphere's temperature and wind structure that is important for estimation of the global distribution of heating and energy exchange.

  2. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    ERIC Educational Resources Information Center

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  3. Boosting medical diagnostics by pooling independent judgments

    PubMed Central

    Kurvers, Ralf H. J. M.; Herzog, Stefan M.; Hertwig, Ralph; Krause, Jens; Carney, Patricia A.; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max

    2016-01-01

    Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches. PMID:27432950

  4. Functional CAR models for large spatially correlated functional datasets.

    PubMed

    Zhang, Lin; Baladandayuthapani, Veerabhadran; Zhu, Hongxiao; Baggerly, Keith A; Majewski, Tadeusz; Czerniak, Bogdan A; Morris, Jeffrey S

    2016-01-01

    We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.

  5. A Computational Approach to Qualitative Analysis in Large Textual Datasets

    PubMed Central

    Evans, Michael S.

    2014-01-01

    In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern. PMID:24498398

  6. geoknife: Reproducible web-processing of large gridded datasets

    USGS Publications Warehouse

    Read, Jordan S.; Walker, Jordan I.; Appling, Alison P.; Blodgett, David L.; Read, Emily K.; Winslow, Luke A.

    2016-01-01

    Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large-scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially-averaged time series values filtered by user-specified areas of interest, and categorical coverage fractions for various land-use types.

  7. Multiresolution techniques for the classification of bioimage and biometric datasets

    NASA Astrophysics Data System (ADS)

    Chebira, Amina; Kovačević, Jelena

    2007-09-01

    We survey our work on adaptive multiresolution (MR) approaches to the classification of biological and fingerprint images. The system adds MR decomposition in front of a generic classifier consisting of feature computation and classification in each MR subspace, yielding local decisions, which are then combined into a global decision using a weighting algorithm. The system is tested on four different datasets, subcellular protein location images, drosophila embryo images, histological images and fingerprint images. Given the very high accuracies obtained for all four datasets, we demonstrate that the space-frequency localized information in the multiresolution subspaces adds significantly to the discriminative power of the system. Moreover, we show that a vastly reduced set of features is sufficient. Finally, we prove that frames are the class of MR techniques that performs the best in this context. This leads us to consider the construction of a new family of frames for classification, which we term lapped tight frame transforms.

  8. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset

    PubMed Central

    Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R.

    2016-01-01

    Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo “paired-recordings” such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. PMID:27306671

  9. A multimodal MRI dataset of professional chess players.

    PubMed

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  10. Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea)

    PubMed Central

    Hemery, Lenaïg G.; Améziane, Nadia; Eléaume, Marc

    2013-01-01

    Abstract This circumpolar dataset of the comatulid (Echinodermata: Crinoidea) Promachocrinus kerguelensis (Carpenter, 1888) from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012). The aim of Hemery et al. (2012) paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008). Over one thousand three hundred specimens (1307) used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d’Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear), sampling sites (station, geographic coordinates, depth) and genetic data (phylogroup, haplotype, sequence ID) for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d’Histoire naturelle, Paris) and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine). This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns), biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke. PMID:23878509

  11. Microscopic images dataset for automation of RBCs counting.

    PubMed

    Abbas, Sherif

    2015-12-01

    A method for Red Blood Corpuscles (RBCs) counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs) images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  12. Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea).

    PubMed

    Hemery, Lenaïg G; Améziane, Nadia; Eléaume, Marc

    2013-01-01

    This circumpolar dataset of the comatulid (Echinodermata: Crinoidea) Promachocrinus kerguelensis (Carpenter, 1888) from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012). The aim of Hemery et al. (2012) paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008). Over one thousand three hundred specimens (1307) used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d'Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear), sampling sites (station, geographic coordinates, depth) and genetic data (phylogroup, haplotype, sequence ID) for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d'Histoire naturelle, Paris) and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine). This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns), biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke.

  13. BMDExpress Data Viewer - a visualization tool to analyze BMDExpress datasets.

    PubMed

    Kuo, Byron; Francina Webster, A; Thomas, Russell S; Yauk, Carole L

    2016-08-01

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure for risk assessment. BMDExpress applies BMD modeling to transcriptomic datasets to identify transcriptional BMDs. However, graphing and analytical capabilities within BMDExpress are limited, and the analysis of output files is challenging. We developed a web-based application, BMDExpress Data Viewer (http://apps.sciome.com:8082/BMDX_Viewer/), for visualizing and graphing BMDExpress output files. The application consists of "Summary Visualization" and "Dataset Exploratory" tools. Through analysis of transcriptomic datasets of the toxicants furan and 4,4'-methylenebis(N,N-dimethyl)benzenamine, we demonstrate that the "Summary Visualization Tools" can be used to examine distributions of gene and pathway BMD values, and to derive a potential point of departure value based on summary statistics. By applying filters on enrichment P-values and minimum number of significant genes, the "Functional Enrichment Analysis" tool enables the user to select biological processes or pathways that are selectively perturbed by chemical exposure and identify the related BMD. The "Multiple Dataset Comparison" tool enables comparison of gene and pathway BMD values across multiple experiments (e.g., across timepoints or tissues). The "BMDL-BMD Range Plotter" tool facilitates the observation of BMD trends across biological processes or pathways. Through our case studies, we demonstrate that BMDExpress Data Viewer is a useful tool to visualize, explore and analyze BMDExpress output files. Visualizing the data in this manner enables rapid assessment of data quality, model fit, doses of peak activity, most sensitive pathway perturbations and other metrics that will be useful in applying toxicogenomics in risk assessment. © 2015 Her Majesty the Queen in Right of Canada. Journal of Applied Toxicology published by John Wiley & Sons, Ltd.

  14. Microscopic images dataset for automation of RBCs counting

    PubMed Central

    Abbas, Sherif

    2015-01-01

    A method for Red Blood Corpuscles (RBCs) counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs) images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image. PMID:26380843

  15. Quantification of NSW Ambulance Record Linkages with Multiple External Datasets.

    PubMed

    Carroll, Therese; Muecke, Sandy; Simpson, Judy; Irvine, Katie; Jenkins, André

    2015-01-01

    This study has two aims: 1) to describe linkage rates between ambulance data and external datasets for "episodes of care" and "patient only" linkages in New South Wales (NSW), Australia; and 2) to detect and report any systematic issues with linkage that relate to patients, and operational or clinical variables that may introduce bias in subsequent studies if not adequately addressed. During 2010-11, the Centre for Health Record Linkage (CHeReL) in NSW, linked the records for patients attended by NSW Ambulance paramedics for the period July 2006 to June 2009, with four external datasets: Emergency Department Data Collection; Admitted Patient Data Collection; NSW Registry of Births, Deaths and Marriages death registration data; and the Australian Bureau of Statistics mortality data. This study reports linkage rates in terms of those "expected" to link and those who were "not expected" to link with external databases within 24 hours of paramedic attendance. Following thorough data preparation processes, 2,041,728 NSW Ambulance care episodes for 1,116,509 patients fulfilled the inclusion criteria. The overall episode-specific hospital linkage rate was 97.2%. Where a patient was not transported to hospital following paramedic care, 8.6% of these episodes resulted in an emergency department attendance within 24 hours. For all care episodes, 5.2% linked to a death record at some time within the 3-year period, with 2.4% of all death episodes occurring within 7 days of a paramedic encounter. For NSW Ambulance episodes of care that were expected to link to an external dataset but did not, nonlinkage to hospital admission records tended to decrease with age. For all other variables, issues relating to rates of linkage and nonlinkage were more indiscriminate. This quantification of the limitations of this large linked dataset will underpin the interpretation and results of ensuing studies that will inform future clinical and operational policies and practices at NSW Ambulance.

  16. Evaluating summarised radionuclide concentration ratio datasets for wildlife.

    PubMed

    Wood, M D; Beresford, N A; Howard, B J; Copplestone, D

    2013-12-01

    Concentration ratios (CR(wo-media)) are used in most radioecological models to predict whole-body radionuclide activity concentrations in wildlife from those in environmental media. This simplistic approach amalgamates the various factors influencing transfer within a single generic value and, as a result, comparisons of model predictions with site-specific measurements can vary by orders of magnitude. To improve model predictions, the development of 'condition-specific' CR(wo-media) values has been proposed (e.g. for a specific habitat). However, the underlying datasets for most CR(wo-media) value databases, such as the wildlife transfer database (WTD) developed within the IAEA EMRAS II programme, include summarised data. This presents challenges for the calculation and subsequent statistical evaluation of condition-specific CR(wo-media) values. A further complication is the common use of arithmetic summary statistics to summarise data in source references, even though CR(wo-media) values generally tend towards a lognormal distribution and should, therefore, be summarised using geometric statistics. In this paper, we propose a statistically-defensible and robust method for reconstructing underlying datasets to calculate condition-specific CR(wo-media) values from summarised data and deriving geometric summary statistics. This method is applied to terrestrial datasets from the WTD. Statistically significant differences in sub-category CR(wo-media) values (e.g. mammals categorised by feeding strategy) were identified, which may justify the use of these CR(wo-media) values for specific assessment contexts. However, biases and limitations within the underlying datasets of the WTD explain some of these differences. Given the uncertainty in the summarised CR(wo-media) values, we suggest that the CR(wo-media) approach to estimating transfer is used with caution above screening-level assessments.

  17. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    USGS Publications Warehouse

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  18. GLEAM version 3: Global Land Evaporation Datasets and Model

    NASA Astrophysics Data System (ADS)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  19. Wind Integration Datasets from the National Renewable Energy Laboratory (NREL)

    DOE Data Explorer

    The Wind Integration Datasets provide time-series wind data for 2004, 2005, and 2006. They are intended to be used by energy professionals such as transmission planners, utility planners, project developers, and university researchers, helping them to perform comparisons of sites and estimate power production from hypothetical wind plants. NREL cautions that the information from modeled data may not match wind resource information shown on NREL;s state wind maps as they were created for different purposes and using different methodologies.

  20. Development of a video tampering dataset for forensic investigation.

    PubMed

    Ismael Al-Sanjary, Omar; Ahmed, Ahmed Abdullah; Sulong, Ghazali

    2016-09-01

    Forgery is an act of modifying a document, product, image or video, among other media. Video tampering detection research requires an inclusive database of video modification. This paper aims to discuss a comprehensive proposal to create a dataset composed of modified videos for forensic investigation, in order to standardize existing techniques for detecting video tampering. The primary purpose of developing and designing this new video library is for usage in video forensics, which can be consciously associated with reliable verification using dynamic and static camera recognition. To the best of the author's knowledge, there exists no similar library among the research community. Videos were sourced from YouTube and by exploring social networking sites extensively by observing posted videos and rating their feedback. The video tampering dataset (VTD) comprises a total of 33 videos, divided among three categories in video tampering: (1) copy-move, (2) splicing, and (3) swapping-frames. Compared to existing datasets, this is a higher number of tampered videos, and with longer durations. The duration of every video is 16s, with a 1280×720 resolution, and a frame rate of 30 frames per second. Moreover, all videos possess the same formatting quality (720p(HD).avi). Both temporal and spatial video features were considered carefully during selection of the videos, and there exists complete information related to the doctored regions in every modified video in the VTD dataset. This database has been made publically available for research on splicing, Swapping frames, and copy-move tampering, and, as such, various video tampering detection issues with ground truth. The database has been utilised by many international researchers and groups of researchers.

  1. A comparison between general circulation model simulations using two sea surface temperature datasets for January 1979

    NASA Technical Reports Server (NTRS)

    Ose, Tomoaki; Mechoso, Carlos; Halpern, David

    1994-01-01

    Simulations with the UCLA atmospheric general circulation model (AGCM) using two different global sea surface temperature (SST) datasets for January 1979 are compared. One of these datasets is based on Comprehensive Ocean-Atmosphere Data Set (COADS) (SSTs) at locations where there are ship reports, and climatology elsewhere; the other is derived from measurements by instruments onboard NOAA satellites. In the former dataset (COADS SST), data are concentrated along shipping routes in the Northern Hemisphere; in the latter dataset High Resolution Infrared Sounder (HIRS SST), data cover the global domain. Ensembles of five 30-day mean fields are obtained from integrations performed in the perpetual-January mode. The results are presented as anomalies, that is, departures of each ensemble mean from that produced in a control simulation with climatological SSTs. Large differences are found between the anomalies obtained using COADS and HIRS SSTs, even in the Northern Hemisphere where the datasets are most similar to each other. The internal variability of the circulation in the control simulation and the simulated atmospheric response to anomalous forcings appear to be linked in that the pattern of geopotential height anomalies obtained using COADS SSTs resembles the first empirical orthogonal function (EOF 1) in the control simulation. The corresponding pattern obtained using HIRS SSTs is substantially different and somewhat resembles EOF 2 in the sector from central North America to central Asia. To gain insight into the reasons for these results, three additional simulations are carried out with SST anomalies confined to regions where COADS SSTs are substantially warmer than HIRS SSTs. The regions correspond to warm pools in the northwest and northeast Pacific, and the northwest Atlantic. These warm pools tend to produce positive geopotential height anomalies in the northeastern part of the corresponding oceans. Both warm pools in the Pacific produce large

  2. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    NASA Astrophysics Data System (ADS)

    Levin, Barnaby D. A.; Padgett, Elliot; Chen, Chien-Chun; Scott, M. C.; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D.; Robinson, Richard D.; Ercius, Peter; Kourkoutis, Lena F.; Miao, Jianwei; Muller, David A.; Hovden, Robert

    2016-06-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data.

  3. Strategies for analyzing highly enriched IP-chip datasets

    PubMed Central

    Knott, Simon RV; Viggiani, Christopher J; Aparicio, Oscar M; Tavaré, Simon

    2009-01-01

    Background Chromatin immunoprecipitation on tiling arrays (ChIP-chip) has been employed to examine features such as protein binding and histone modifications on a genome-wide scale in a variety of cell types. Array data from the latter studies typically have a high proportion of enriched probes whose signals vary considerably (due to heterogeneity in the cell population), and this makes their normalization and downstream analysis difficult. Results Here we present strategies for analyzing such experiments, focusing our discussion on the analysis of Bromodeoxyruridine (BrdU) immunoprecipitation on tiling array (BrdU-IP-chip) datasets. BrdU-IP-chip experiments map large, recently replicated genomic regions and have similar characteristics to histone modification/location data. To prepare such data for downstream analysis we employ a dynamic programming algorithm that identifies a set of putative unenriched probes, which we use for both within-array and between-array normalization. We also introduce a second dynamic programming algorithm that incorporates a priori knowledge to identify and quantify positive signals in these datasets. Conclusion Highly enriched IP-chip datasets are often difficult to analyze with traditional array normalization and analysis strategies. Here we present and test a set of analytical tools for their normalization and quantification that allows for accurate identification and analysis of enriched regions. PMID:19772646

  4. Web-based 2-d Visualization with Large Datasets

    NASA Astrophysics Data System (ADS)

    Goldina, T.; Roby, W.; Wu, X.; Ly, L.

    2015-09-01

    Modern astronomical surveys produce large catalogs. Modern archives are web-based. As the science becomes more and more data driven, the pressure on visualization tools to support large datasets increases. While tables can render one page at a time, image overlays showing the returned catalog entries or XY plots showing the relationship between table columns must cover all of the rows to be meaningful. The large data set could easily overwhelm the browsers capabilities. Therefore the amount of data to be transported or rendered must be reduced. IRSA's catalog visualization is based on Firefly package, developed in IPAC (Roby 2013). Firefly is used by multiple web-based tools and archives, maintained by IRSA: Catalog Search, Spitzer, WISE, Plank, etc. Its distinctive feature is the tri-view: table, image overlay, and XY plot. All three highly interactive components are integrated together. The tri-view presentation allows an astronomer to dissect a dataset in various ways and to detect underlying structure and anomalies in the data, which makes it a handy tool for data exploration. Many challenges are encountered when only a subset of data is used in place of the full data set. Preserving coherence and maintaining the ability to select and filter data become issues. This talk addresses how we have solved problems in large dataset visualization.

  5. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  6. New public dataset for spotting patterns in medieval document images

    NASA Astrophysics Data System (ADS)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  7. Igloo-Plot: a tool for visualization of multidimensional datasets.

    PubMed

    Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S

    2014-01-01

    Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/.

  8. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    PubMed Central

    Levin, Barnaby D.A.; Padgett, Elliot; Chen, Chien-Chun; Scott, M.C.; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D.; Robinson, Richard D.; Ercius, Peter; Kourkoutis, Lena F.; Miao, Jianwei; Muller, David A.; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data. PMID:27272459

  9. Securely Measuring the Overlap between Private Datasets with Cryptosets

    PubMed Central

    Swamidass, S. Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data—collected by different groups or across large collaborative networks—into a combined analysis. Unfortunately, some of the most interesting and powerful datasets—like health records, genetic data, and drug discovery data—cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset’s contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach “information-theoretic” security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure. PMID:25714898

  10. Multiresolution persistent homology for excessively large biomolecular datasets

    NASA Astrophysics Data System (ADS)

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  11. Multiresolution persistent homology for excessively large biomolecular datasets

    PubMed Central

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-01-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. PMID:26450288

  12. Synchronization of networks of chaotic oscillators: Structural and dynamical datasets.

    PubMed

    Sevilla-Escoboza, Ricardo; Buldú, Javier M

    2016-06-01

    We provide the topological structure of a series of N=28 Rössler chaotic oscillators diffusively coupled through one of its variables. The dynamics of the y variable describing the evolution of the individual nodes of the network are given for a wide range of coupling strengths. Datasets capture the transition from the unsynchronized behavior to the synchronized one, as a function of the coupling strength between oscillators. The fact that both the underlying topology of the system and the dynamics of the nodes are given together makes this dataset a suitable candidate to evaluate the interplay between functional and structural networks and serve as a benchmark to quantify the ability of a given algorithm to extract the structural network of connections from the observation of the dynamics of the nodes. At the same time, it is possible to use the dataset to analyze the different dynamical properties (randomness, complexity, reproducibility, etc.) of an ensemble of oscillators as a function of the coupling strength.

  13. Multiresolution persistent homology for excessively large biomolecular datasets

    SciTech Connect

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  14. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy.

    PubMed

    Levin, Barnaby D A; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-06-07

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data.

  15. Production of a national 1:1,000,000-scale hydrography dataset for the United States: feature selection, simplification, and refinement

    USGS Publications Warehouse

    Gary, Robin H.; Wilson, Zachary D.; Archuleta, Christy-Ann M.; Thompson, Florence E.; Vrabel, Joseph

    2009-01-01

    During 2006-09, the U.S. Geological Survey, in cooperation with the National Atlas of the United States, produced a 1:1,000,000-scale (1:1M) hydrography dataset comprising streams and waterbodies for the entire United States, including Puerto Rico and the U.S. Virgin Islands, for inclusion in the recompiled National Atlas. This report documents the methods used to select, simplify, and refine features in the 1:100,000-scale (1:100K) (1:63,360-scale in Alaska) National Hydrography Dataset to create the national 1:1M hydrography dataset. Custom tools and semi-automated processes were created to facilitate generalization of the 1:100K National Hydrography Dataset (1:63,360-scale in Alaska) to 1:1M on the basis of existing small-scale hydrography datasets. The first step in creating the new 1:1M dataset was to address feature selection and optimal data density in the streams network. Several existing methods were evaluated. The production method that was established for selecting features for inclusion in the 1:1M dataset uses a combination of the existing attributes and network in the National Hydrography Dataset and several of the concepts from the methods evaluated. The process for creating the 1:1M waterbodies dataset required a similar approach to that used for the streams dataset. Geometric simplification of features was the next step. Stream reaches and waterbodies indicated in the feature selection process were exported as new feature classes and then simplified using a geographic information system tool. The final step was refinement of the 1:1M streams and waterbodies. Refinement was done through the use of additional geographic information system tools.

  16. Evaluation of catchment delineation methods for the medium-resolution National Hydrography Dataset

    USGS Publications Warehouse

    Johnston, Craig M.; Dewald, Thomas G.; Bondelid, Timothy R.; Worstell, Bruce B.; McKay, Lucinda D.; Rea, Alan; Moore, Richard B.; Goodall, Jonathan L.

    2009-01-01

    Different methods for determining catchments (incremental drainage areas) for stream segments of the medium-resolution (1:100,000-scale) National Hydrography Dataset (NHD) were evaluated by the U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency (USEPA). The NHD is a comprehensive set of digital spatial data that contains information about surface-water features (such as lakes, ponds, streams, and rivers) of the United States. The need for NHD catchments was driven primarily by the goal to estimate NHD streamflow and velocity to support water-quality modeling. The application of catchments for this purpose also demonstrates the broader value of NHD catchments for supporting landscape characterization and analysis. Five catchment delineation methods were evaluated. Four of the methods use topographic information for the delineation of the NHD catchments. These methods include the Raster Seeding Method; two variants of a method first used in a USGS New England study-one used the Watershed Boundary Dataset (WBD) and the other did not-termed the 'New England Methods'; and the Outlet Matching Method. For these topographically based methods, the elevation data source was the 30-meter (m) resolution National Elevation Dataset (NED), as this was the highest resolution available for the conterminous United States and Hawaii. The fifth method evaluated, the Thiessen Polygon Method, uses distance to the nearest NHD stream segments to determine catchment boundaries. Catchments were generated using each method for NHD stream segments within six hydrologically and geographically distinct Subbasins to evaluate the applicability of the method across the United States. The five methods were evaluated by comparing the resulting catchments with the boundaries and the computed area measurements available from several verification datasets that were developed independently using manual methods. The results of the evaluation indicated that the two

  17. Online Visualization and Analysis of Merged Global Geostationary Satellite Infrared Dataset

    NASA Astrophysics Data System (ADS)

    Liu, Z.; Ostrenga, D.; Leptoukh, G.; Mehta, A.

    2008-12-01

    The NASA Goddard Earth Sciences Data Information Services Center (GES DISC) is home of Tropical Rainfall Measuring Mission (TRMM) data archive. The global merged IR product, also known as, the NCEP/CPC 4-km Global (60°N - 60°S) IR Dataset, is one of TRMM ancillary datasets. They are globally-merged (60°N-60°S) pixel-resolution (4 km) IR brightness temperature data (equivalent blackbody temperatures), merged from all available geostationary satellites (GOES-8/10, METEOSAT-7/5 & GMS). The availability of data from METEOSAT-5, which is located at 63E at the present time, yields a unique opportunity for total global (60°N-60°S) coverage. The GES DISC has collected over 8 years of the data beginning from February of 2000. This high temporal resolution dataset can not only provide additional background information to TRMM and other satellite missions, but also allow observing a wide range of meteorological phenomena from space, such as, mesoscale convection system, tropical cyclones, hurricanes, etc. The dataset can also be used to verify model simulations. Despite that the data can be downloaded via ftp, however, its large volume poses a challenge for many users. A single file occupies about 70 MB disk space and there is a total of ~73,000 files (~4.5 TB) for the past 8 years. Because there is a lack of data subsetting service, one has to download the entire file, which could be time consuming and require a lot of disk space. In order to facilitate data access, we have developed a web prototype, the Global Image ViewER (GIVER), to allow users to conduct online visualization and analysis of this dataset. With a web browser and few mouse clicks, users can have a full access to over 8 year and over 4.5 TB data and generate black and white IR imagery and animation without downloading any software and data. Basic functions include selection of area of interest, single imagery or animation, a time skip capability for different temporal resolution and image size. Users

  18. Online Visualization and Analysis of Merged Global Geostationary Satellite Infrared Dataset

    NASA Technical Reports Server (NTRS)

    Liu, Zhong; Ostrenga, D.; Leptoukh, G.; Mehta, A.

    2008-01-01

    The NASA Goddard Earth Sciences Data Information Services Center (GES DISC) is home of Tropical Rainfall Measuring Mission (TRMM) data archive. The global merged IR product also known as the NCEP/CPC 4-km Global (60 degrees N - 60 degrees S) IR Dataset, is one of TRMM ancillary datasets. They are globally merged (60 degrees N - 60 degrees S) pixel-resolution (4 km) IR brightness temperature data (equivalent blackbody temperatures), merged from all available geostationary satellites (GOES-8/10, METEOSAT-7/5 and GMS). The availability of data from METEOSAT-5, which is located at 63E at the present time, yields a unique opportunity for total global (60 degrees N- 60 degrees S) coverage. The GES DISC has collected over 8 years of the data beginning from February of 2000. This high temporal resolution dataset can not only provide additional background information to TRMM and other satellite missions, but also allow observing a wide range of meteorological phenomena from space, such as, mesoscale convection systems, tropical cyclones, hurricanes, etc. The dataset can also be used to verify model simulations. Despite that the data can be downloaded via ftp, however, its large volume poses a challenge for many users. A single file occupies about 70 MB disk space and there is a total of approximately 73,000 files (approximately 4.5 TB) for the past 8 years. In order to facilitate data access, we have developed a web prototype to allow users to conduct online visualization and analysis of this dataset. With a web browser and few mouse clicks, users can have a full access to over 8 year and over 4.5 TB data and generate black and white IR imagery and animation without downloading any software and data. In short, you can make your own images! Basic functions include selection of area of interest, single imagery or animation, a time skip capability for different temporal resolution and image size. Users can save an animation as a file (animated gif) and import it in other

  19. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    NASA Astrophysics Data System (ADS)

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  20. Improving the discoverability, accessibility, and citability of omics datasets: a case report.

    PubMed

    Darlington, Yolanda F; Naumov, Alexey; McOwiti, Apollo; Kankanamge, Wasula H; Becnel, Lauren B; McKenna, Neil J

    2016-07-12

    Although omics datasets represent valuable assets for hypothesis generation, model testing, and data validation, the infrastructure supporting their reuse lacks organization and consistency. Using nuclear receptor signaling transcriptomic datasets as proof of principle, we developed a model to improve the discoverability, accessibility, and citability of published omics datasets. Primary datasets were retrieved from archives, processed to extract data points, then subjected to metadata enrichment and gap filling. The resulting secondary datasets were exposed on responsive web pages to support mining of gene lists, discovery of related datasets, and single-click citation integration with popular reference managers. Automated processes were established to embed digital object identifier-driven links to the secondary datasets in associated journal articles, small molecule and gene-centric databases, and a dataset search engine. Our model creates multiple points of access to reprocessed and reannotated derivative datasets across the digital biomedical research ecosystem, promoting their visibility and usability across disparate research communities.

  1. Can atmospheric reanalysis datasets be used to reproduce flood characteristics?

    NASA Astrophysics Data System (ADS)

    Andreadis, K.; Schumann, G.; Stampoulis, D.

    2014-12-01

    Floods are one of the costliest natural disasters and the ability to understand their characteristics and their interactions with population, land cover and climate changes is of paramount importance. In order to accurately reproduce flood characteristics such as water inundation and heights both in the river channels and floodplains, hydrodynamic models are required. Most of these models operate at very high resolutions and are computationally very expensive, making their application over large areas very difficult. However, a need exists for such models to be applied at regional to global scales so that the effects of climate change with regards to flood risk can be examined. We use the LISFLOOD-FP hydrodynamic model to simulate a 40-year history of flood characteristics at the continental scale, particularly over Australia. LISFLOOD-FP is a 2-D hydrodynamic model that solves the approximate Saint-Venant equations at large scales (on the order of 1 km) using a sub-grid representation of the river channel. This implementation is part of an effort towards a global 1-km flood modeling framework that will allow the reconstruction of a long-term flood climatology. The components of this framework include a hydrologic model (the widely-used Variable Infiltration Capacity model) and a meteorological dataset that forces it. In order to extend the simulated flood climatology to 50-100 years in a consistent manner, reanalysis datasets have to be used. The objective of this study is the evaluation of multiple atmospheric reanalysis datasets (ERA, NCEP, MERRA, JRA) as inputs to the VIC/LISFLOOD-FP model. Comparisons of the simulated flood characteristics are made with both satellite observations of inundation and a benchmark simulation of LISFLOOD-FP being forced by observed flows. Finally, the implications of the availability of a global flood modeling framework for producing flood hazard maps and disseminating disaster information are discussed.

  2. A synthetic dataset for evaluating soft and hard fusion algorithms

    NASA Astrophysics Data System (ADS)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  3. Determining similarity of scientific entities in annotation datasets.

    PubMed

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/

  4. Determining similarity of scientific entities in annotation datasets

    PubMed Central

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  5. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    NASA Astrophysics Data System (ADS)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  6. Visualization and data sharing of COSMIC radio occultation dataset

    NASA Astrophysics Data System (ADS)

    Ho, Y.; Weber, W. J.; Chastang, J.; Murray, D.; McWhirter, J.; Integrated Data Viewer

    2010-12-01

    Visualizing the trajectory and the sounding profile of the COSMIC netCDF dataset, and its evolution through time is developed in Unidata's Integrated data Viewer (IDV). The COSMIC radio occultation data is located in a remote data server called RAMADDA, which is a content management system for earth science data. The combination of these two software packages provides a powerful visualization and analysis tools for sharing real time and archived data for research and education. In this presentation we would like to demonstrate the development and the usage of these two software packages.

  7. Fast methods for training Gaussian processes on large datasets

    PubMed Central

    Moore, C. J.; Berry, C. P. L.; Gair, J. R.

    2016-01-01

    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large datasets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences. PMID:27293793

  8. Agile data management for curation of genomes to watershed datasets

    NASA Astrophysics Data System (ADS)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data

  9. Automatic segmentation of blood vessels from dynamic MRI datasets.

    PubMed

    Kubassova, Olga

    2007-01-01

    In this paper we present an approach for blood vessel segmentation from dynamic contrast-enhanced MRI datasets of the hand joints acquired from patients with active rheumatoid arthritis. Exclusion of the blood vessels is needed for accurate visualisation of the activation events and objective evaluation of the degree of inflammation. The segmentation technique is based on statistical modelling motivated by the physiological properties of the individual tissues, such as speed of uptake and concentration of the contrast agent; it incorporates Markov random field probabilistic framework and principal component analysis. The algorithm was tested on 60 temporal slices and has shown promising results.

  10. Additive Similarity Trees

    ERIC Educational Resources Information Center

    Sattath, Shmuel; Tversky, Amos

    1977-01-01

    Tree representations of similarity data are investigated. Hierarchical clustering is critically examined, and a more general procedure, called the additive tree, is presented. The additive tree representation is then compared to multidimensional scaling. (Author/JKS)

  11. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Pratt, J. R.; St. Clair, T. L.; Burks, H. D.; Stoakley, D. M.

    1987-01-01

    A method has been found for enhancing the melt flow of thermoplastic polyimides during processing. A high molecular weight 422 copoly(amic acid) or copolyimide was fused with approximately 0.05 to 5 pct by weight of a low molecular weight amic acid or imide additive, and this melt was studied by capillary rheometry. Excellent flow and improved composite properties on graphite resulted from the addition of a PMDA-aniline additive to LARC-TPI. Solution viscosity studies imply that amic acid additives temporarily lower molecular weight and, hence, enlarge the processing window. Thus, compositions containing the additive have a lower melt viscosity for a longer time than those unmodified.

  12. [Food additives and healthiness].

    PubMed

    Heinonen, Marina

    2014-01-01

    Additives are used for improving food structure or preventing its spoilage, for example. Many substances used as additives are also naturally present in food. The safety of additives is evaluated according to commonly agreed principles. If high concentrations of an additive cause adverse health effects for humans, a limit of acceptable daily intake (ADI) is set for it. An additive is a risk only when ADI is exceeded. The healthiness of food is measured on the basis of nutrient density and scientifically proven effects.

  13. Fast automatic myocardial segmentation in 4D cine CMR datasets.

    PubMed

    Queirós, Sandro; Barbosa, Daniel; Heyde, Brecht; Morais, Pedro; Vilaça, João L; Friboulet, Denis; Bernard, Olivier; D'hooge, Jan

    2014-10-01

    A novel automatic 3D+time left ventricle (LV) segmentation framework is proposed for cardiac magnetic resonance (CMR) datasets. The proposed framework consists of three conceptual blocks to delineate both endo and epicardial contours throughout the cardiac cycle: (1) an automatic 2D mid-ventricular initialization and segmentation; (2) an automatic stack initialization followed by a 3D segmentation at the end-diastolic phase; and (3) a tracking procedure. Hereto, we propose to adapt the recent B-spline Explicit Active Surfaces (BEAS) framework to the properties of CMR images by integrating dedicated energy terms. Moreover, we extend the coupled BEAS formalism towards its application in 3D MR data by adapting it to a cylindrical space suited to deal with the topology of the image data. Furthermore, a fast stack initialization method is presented for efficient initialization and to enforce consistent cylindrical topology. Finally, we make use of an anatomically constrained optical flow method for temporal tracking of the LV surface. The proposed framework has been validated on 45 CMR datasets taken from the 2009 MICCAI LV segmentation challenge. Results show the robustness, efficiency and competitiveness of the proposed method both in terms of accuracy and computational load.

  14. Development of a Watershed Boundary Dataset for Mississippi

    USGS Publications Warehouse

    Van Wilson, K.; Clair, Michael G.; Turnipseed, D. Phil; Rebich, Richard A.

    2009-01-01

    The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System, developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and drainage areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (hydrologic unit codes) were further subdivided into 10-digit watersheds and 12-digit subwatersheds - the exceptions are the Lower Mississippi River Alluvial Plain (known locally as the Delta) and the Mississippi River inside levees, which were only subdivided into 10-digit watersheds. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, hydrologic unit codes and names, and drainage-area data - are stored in a Geographic Information System database.

  15. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    PubMed Central

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health—the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography “Everything is everywhere, but the environment selects” first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime. PMID:27527985

  16. The Path from Large Earth Science Datasets to Information

    NASA Astrophysics Data System (ADS)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  17. Exploitation of a large COSMO-SkyMed interferometric dataset

    NASA Astrophysics Data System (ADS)

    Nutricato, Raffaele; Nitti, Davide O.; Bovenga, Fabio; Refice, Alberto; Chiaradia, Maria T.

    2014-10-01

    In this work we explored a dataset made by more than 100 images acquired by COSMO-SkyMed (CSK) constellation over the Port-au-Prince (Haiti) metropolitan and surrounding areas that were severely hit by the January 12th, 2010 earthquake. The images were acquired along ascending pass by all the four sensors of the constellation with a mean rate of 1 acquisition/week. This consistent CSK dataset was fully exploited by using the Persistent Scatterer Interferometry algorithm SPINUA with the aim of: i) providing a displacement map of the area; ii) assessing the use of CSK and PSI for ground elevation measurements; iii) exploring the CSK satellite orbital tube in terms of both precision and size. In particular, significant subsidence phenomena were detected affecting river deltas and coastal areas of the Port-au-Prince and Carrefour region, as well as very slow slope movements and local ground instabilities. Ground elevation was also measured on PS targets with resolution of 3m. The density of these measurable targets depends on the ground coverage, and reaches values higher than 4000 PS/km2 over urban areas, while it drops over vegetated areas or along slopes affected by layover and shadow. Heights values were compared with LIDAR data at 1m of resolution collected soon after the 2010 earthquake. Furthermore, by using geocoding procedures and the precise LIDAR data as reference, the orbital errors affecting CSK records were investigated. The results are in line with other recent studies.

  18. Reactome Pathway Analysis to Enrich Biological Discovery in Proteomics Datasets

    PubMed Central

    Haw, Robin; Hermjakob, Henning; D’Eustachio, Peter; Stein, Lincoln

    2012-01-01

    Reactome (http://www.reactome.org) is an open source, expert-authored, peer-reviewed, manually curated database of reactions, pathways and biological processes. We provide an intuitive web-based user interface to pathway knowledge and a suite of data analysis tools. The Pathway Browser is a Systems Biology Graphical Notation (SBGN)-like visualization system that supports manual navigation of pathways by zooming, scrolling and event highlighting, and that exploits PSI Common Query Interface (PSIQUIC) web services to overlay pathways with molecular interaction data from the Reactome Functional Interaction (FI) Network and interaction databases such as IntAct, ChEMBL, and BioGRID. Pathway and Expression Analysis tools employ web services to provide ID mapping, pathway assignment and over-representation analysis of user-supplied datasets. By applying Ensembl Compara to curated human proteins and reactions, Reactome generates pathway inferences for 20 other species. The Species Comparison tool provides a summary of results for each of these species as a table showing numbers of orthologous proteins found by pathway from which users can navigate to inferred details for specific proteins and reactions. Reactome’s diverse pathway knowledge and suite of data analysis tools provide a platform for data mining, modeling and the analysis of large-scale proteomics datasets. PMID:21751369

  19. Fitting meta-analytic structural equation models with complex datasets.

    PubMed

    Wilson, Sandra Jo; Polanin, Joshua R; Lipsey, Mark W

    2016-06-01

    A modification of the first stage of the standard procedure for two-stage meta-analytic structural equation modeling for use with large complex datasets is presented. This modification addresses two common problems that arise in such meta-analyses: (a) primary studies that provide multiple measures of the same construct and (b) the correlation coefficients that exhibit substantial heterogeneity, some of which obscures the relationships between the constructs of interest or undermines the comparability of the correlations across the cells. One component of this approach is a three-level random effects model capable of synthesizing a pooled correlation matrix with dependent correlation coefficients. Another component is a meta-regression that can be used to generate covariate-adjusted correlation coefficients that reduce the influence of selected unevenly distributed moderator variables. A non-technical presentation of these techniques is given, along with an illustration of the procedures with a meta-analytic dataset. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Reliability of brain volume measurements: A test-retest dataset

    PubMed Central

    Maclaren, Julian; Han, Zhaoying; Vos, Sjoerd B; Fischbein, Nancy; Bammer, Roland

    2014-01-01

    Evaluation of neurodegenerative disease progression may be assisted by quantification of the volume of structures in the human brain using magnetic resonance imaging (MRI). Automated segmentation software has improved the feasibility of this approach, but often the reliability of measurements is uncertain. We have established a unique dataset to assess the repeatability of brain segmentation and analysis methods. We acquired 120 T1-weighted volumes from 3 subjects (40 volumes/subject) in 20 sessions spanning 31 days, using the protocol recommended by the Alzheimer's Disease Neuroimaging Initiative (ADNI). Each subject was scanned twice within each session, with repositioning between the two scans, allowing determination of test-retest reliability both within a single session (intra-session) and from day to day (inter-session). To demonstrate the application of the dataset, all 3D volumes were processed using FreeSurfer v5.1. The coefficient of variation of volumetric measurements was between 1.6% (caudate) and 6.1% (thalamus). Inter-session variability exceeded intra-session variability for lateral ventricle volume (P<0.0001), indicating that ventricle volume in the subjects varied between days. PMID:25977792

  1. Collaboration tools and techniques for large model datasets

    USGS Publications Warehouse

    Signell, R.P.; Carniel, S.; Chiggiato, J.; Janekovic, I.; Pullen, J.; Sherwood, C.R.

    2008-01-01

    In MREA and many other marine applications, it is common to have multiple models running with different grids, run by different institutions. Techniques and tools are described for low-bandwidth delivery of data from large multidimensional datasets, such as those from meteorological and oceanographic models, directly into generic analysis and visualization tools. Output is stored using the NetCDF CF Metadata Conventions, and then delivered to collaborators over the web via OPeNDAP. OPeNDAP datasets served by different institutions are then organized via THREDDS catalogs. Tools and procedures are then used which enable scientists to explore data on the original model grids using tools they are familiar with. It is also low-bandwidth, enabling users to extract just the data they require, an important feature for access from ship or remote areas. The entire implementation is simple enough to be handled by modelers working with their webmasters - no advanced programming support is necessary. ?? 2007 Elsevier B.V. All rights reserved.

  2. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    NASA Astrophysics Data System (ADS)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  3. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  4. Biofuel Enduse Datasets from the Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about]

    Holdings include datasets, models, and maps. This is a very new resource, but the collections will grow due to both DOE contributions and individualsÆ data uploads. Currently the Biofuel Enduse collection includes 133 items. Most of these are categorized as literature, but 36 are listed as datasets and ten as models.

  5. Datasets for radiation network algorithm development and testing

    SciTech Connect

    Rao, Nageswara S; Sen, Satyabrata; Berry, M. L..; Wu, Qishi; Grieme, M.; Brooks, Richard R; Cordone, G.

    2016-01-01

    Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) program supported the development of networks of commercial-off-the-shelf (COTS) radiation counters for detecting, localizing, and identifying low-level radiation sources. Under this program, a series of indoor and outdoor tests were conducted with multiple source strengths and types, different background profiles, and various types of source and detector movements. Following the tests, network algorithms were replayed in various re-constructed scenarios using sub-networks. These measurements and algorithm traces together provide a rich collection of highly valuable datasets for testing the current and next generation radiation network algorithms, including the ones (to be) developed by broader R&D communities such as distributed detection, information fusion, and sensor networks. From this multiple TeraByte IRSS database, we distilled out and packaged the first batch of canonical datasets for public release. They include measurements from ten indoor and two outdoor tests which represent increasingly challenging baseline scenarios for robustly testing radiation network algorithms.

  6. Developing a data dictionary for the irish nursing minimum dataset.

    PubMed

    Henry, Pamela; Mac Neela, Pádraig; Clinton, Gerard; Scott, Anne; Treacy, Pearl; Butler, Michelle; Hyde, Abbey; Morris, Roisin; Irving, Kate; Byrne, Anne

    2006-01-01

    One of the challenges in health care in Ireland is the relatively slow acceptance of standardised clinical information systems. Yet the national Irish health reform programme indicates that an Electronic Health Care Record (EHCR) will be implemented on a phased basis. [3-5]. While nursing has a key role in ensuring the quality and comparability of health information, the so- called 'invisibility' of some nursing activities makes this a challenging aim to achieve [3-5]. Any integrated health care system requires the adoption of uniform standards for electronic data exchange [1-2]. One of the pre-requisites for uniform standards is the composition of a data dictionary. Inadequate definition of data elements in a particular dataset hinders the development of an integrated data depository or electronic health care record (EHCR). This paper outlines how work on the data dictionary for the Irish Nursing Minimum Dataset (INMDS) has addressed this issue. Data set elements were devised on the basis of a large scale empirical research programme. ISO 18104, the reference terminology for nursing [6], was used to cross-map the data set elements with semantic domains, categories and links and data set items were dissected.

  7. A new compression format for fiber tracking datasets.

    PubMed

    Presseau, Caroline; Jodoin, Pierre-Marc; Houde, Jean-Christophe; Descoteaux, Maxime

    2015-04-01

    A single diffusion MRI streamline fiber tracking dataset may contain hundreds of thousands, and often millions of streamlines and can take up to several gigabytes of memory. This amount of data is not only heavy to compute, but also difficult to visualize and hard to store on disk (especially when dealing with a collection of brains). These problems call for a fiber-specific compression format that simplifies its manipulation. As of today, no fiber compression format has yet been adopted and the need for it is now becoming an issue for future connectomics research. In this work, we propose a new compression format, .zfib, for streamline tractography datasets reconstructed from diffusion magnetic resonance imaging (dMRI). Tracts contain a large amount of redundant information and are relatively smooth. Hence, they are highly compressible. The proposed method is a processing pipeline containing a linearization, a quantization and an encoding step. Our pipeline is tested and validated under a wide range of DTI and HARDI tractography configurations (step size, streamline number, deterministic and probabilistic tracking) and compression options. Similar to JPEG, the user has one parameter to select: a worst-case maximum tolerance error in millimeter (mm). Overall, we find a compression factor of more than 96% for a maximum error of 0.1mm without any perceptual change or change of diffusion statistics (mean fractional anisotropy and mean diffusivity) along bundles. This opens new opportunities for connectomics and tractometry applications.

  8. Polyester: simulating RNA-seq datasets with differential transcript expression

    PubMed Central

    Frazee, Alyssa C.; Jaffe, Andrew E.; Langmead, Ben; Leek, Jeffrey T.

    2015-01-01

    Motivation: Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. Results: Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. Availability and implementation: Polyester is freely available from Bioconductor (http://bioconductor.org/). Contact: jtleek@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25926345

  9. Standardized dataset health services: Part 2--top to bottom.

    PubMed

    Galemore, Cynthia A; Maughan, Erin D

    2014-07-01

    It is critical for school nurses to promote and educate others on what they do. Data can help shape the message into understandable language across education and health. While Part 1 of this article discusses NASN's progress on identifying a standardized dataset for school health services, Part 2 focuses on the analysis and sharing of data at the local level. Examples of how to use the data to improve practice and create change are included. Guidance is provided in creating and sharing data as part of an annual report as a final step in advocating for school health services commensurate with student health needs. As the work on an evidence-based uniform dataset continues at the national level, what should be the response at the local level? Do we wait, or do we continue to collect certain data? The purpose of Part 2 of this article is to describe how data being collected locally illustrate health trends, benchmarking, and school nursing outcomes and can be compiled and shared in an annual report.

  10. Reliability of brain volume measurements: a test-retest dataset.

    PubMed

    Maclaren, Julian; Han, Zhaoying; Vos, Sjoerd B; Fischbein, Nancy; Bammer, Roland

    2014-01-01

    Evaluation of neurodegenerative disease progression may be assisted by quantification of the volume of structures in the human brain using magnetic resonance imaging (MRI). Automated segmentation software has improved the feasibility of this approach, but often the reliability of measurements is uncertain. We have established a unique dataset to assess the repeatability of brain segmentation and analysis methods. We acquired 120 T1-weighted volumes from 3 subjects (40 volumes/subject) in 20 sessions spanning 31 days, using the protocol recommended by the Alzheimer's Disease Neuroimaging Initiative (ADNI). Each subject was scanned twice within each session, with repositioning between the two scans, allowing determination of test-retest reliability both within a single session (intra-session) and from day to day (inter-session). To demonstrate the application of the dataset, all 3D volumes were processed using FreeSurfer v5.1. The coefficient of variation of volumetric measurements was between 1.6% (caudate) and 6.1% (thalamus). Inter-session variability exceeded intra-session variability for lateral ventricle volume (P<0.0001), indicating that ventricle volume in the subjects varied between days.

  11. Developing a Resource for Implementing ArcSWAT Using Global Datasets

    NASA Astrophysics Data System (ADS)

    Taggart, M.; Caraballo Álvarez, I. O.; Mueller, C.; Palacios, S. L.; Schmidt, C.; Milesi, C.; Palmer-Moloney, L. J.

    2015-12-01

    This project developed a comprehensive user manual outlining methods for adapting and implementing global datasets for use within ArcSWAT for international and worldwide applications. The Soil and Water Assessment Tool (SWAT) is a hydrologic model that looks at a number of hydrologic variables including runoff and the chemical makeup of water at a given location on the Earth's surface using Digital Elevation Models (DEM), land cover, soil, and weather data. However, the application of ArcSWAT for projects outside of the United States is challenging as there is no standard framework for inputting global datasets into ArcSWAT. This project aims to remove this obstacle by outlining methods for adapting and implementing these global datasets via the user manual. The manual takes the user through the processes of data conditioning while providing solutions and suggestions for common errors. The efficacy of the manual was explored using examples from watersheds located in Puerto Rico, Mexico and Western Africa. Each run explored the various options for setting up a ArcSWAT project as well as a range of satellite data products and soil databases. Future work will incorporate in-situ data for validation and calibration of the model and outline additional resources to assist future users in efficiently implementing the model for worldwide applications. The capacity to manage and monitor freshwater availability is of critical importance in both developed and developing countries. As populations grow and climate changes, both the quality and quantity of freshwater are affected resulting in negative impacts on the health of the surrounding population. The use of hydrologic models such as ArcSWAT can help stakeholders and decision makers understand the future impacts of these changes enabling informed and substantiated decisions.

  12. Remote web-based 3D visualization of hydrological forecasting datasets.

    NASA Astrophysics Data System (ADS)

    van Meersbergen, Maarten; Drost, Niels; Blower, Jon; Griffiths, Guy; Hut, Rolf; van de Giesen, Nick

    2015-04-01

    As the possibilities for larger and more detailed simulations of geoscientific data expand, the need for smart solutions in data visualization grow as well. Large volumes of data should be quickly accessible from anywhere in the world without the need for transferring the simulation results. We aim to provide tools for both processing and the handling of these large datasets. As an example, the eWaterCycle project (www.ewatercycle.org) aims to provide a running 14-day ensemble forecast to predict water related stress around the globe. The large volumes of simulation results with uncertainty data that are generated through ensemble hydrological predictions provide a challenge for existing visualization solutions. One possible solution for this challenge lies in the use of web-enabled technology for visualization and analysis of these datasets. Web-based visualization provides an additional benefit in that it eliminates the need for any software installation and configuration and allows for the easy communication of research results between collaborating research parties. Providing interactive tools for the exploration of these datasets will not only help in the analysis of the data by researchers, it can also aid in the dissemination of the research results to the general public. In Vienna, we will present a working open source solution for remote visualization of large volumes of global geospatial data based on the proven open-source 3D web visualization software package Cesium (cesiumjs.org), the ncWMS software package provided by the Reading e-Science Centre and the WebGL and NetCDF standards.

  13. A High-Resolution Merged Wind Dataset for DYNAMO: Progress and Future Plans

    NASA Technical Reports Server (NTRS)

    Lang, Timothy J.; Mecikalski, John; Li, Xuanli; Chronis, Themis; Castillo, Tyler; Hoover, Kacie; Brewer, Alan; Churnside, James; McCarty, Brandi; Hein, Paul; Rutledge, Steve; Dolan, Brenda; Matthews, Alyssa; Thompson, Elizabeth

    2015-01-01

    In order to support research on optimal data assimilation methods for the Cyclone Global Navigation Satellite System (CYGNSS), launching in 2016, work has been ongoing to produce a high-resolution merged wind dataset for the Dynamics of the Madden Julian Oscillation (DYNAMO) field campaign, which took place during late 2011/early 2012. The winds are produced by assimilating DYNAMO observations into the Weather Research and Forecasting (WRF) three-dimensional variational (3DVAR) system. Data sources from the DYNAMO campaign include the upper-air sounding network, radial velocities from the radar network, vector winds from the Advanced Scatterometer (ASCAT) and Oceansat-2 Scatterometer (OSCAT) satellite instruments, the NOAA High Resolution Doppler Lidar (HRDL), and several others. In order the prep them for 3DVAR, significant additional quality control work is being done for the currently available TOGA and SMART-R radar datasets, including automatically dealiasing radial velocities and correcting for intermittent TOGA antenna azimuth angle errors. The assimilated winds are being made available as model output fields from WRF on two separate grids with different horizontal resolutions - a 3-km grid focusing on the main DYNAMO quadrilateral (i.e., Gan Island, the R/V Revelle, the R/V Mirai, and Diego Garcia), and a 1-km grid focusing on the Revelle. The wind dataset is focused on three separate approximately 2-week periods during the Madden Julian Oscillation (MJO) onsets that occurred in October, November, and December 2011. Work is ongoing to convert the 10-m surface winds from these model fields to simulated CYGNSS observations using the CYGNSS End-To-End Simulator (E2ES), and these simulated satellite observations are being compared to radar observations of DYNAMO precipitation systems to document the anticipated ability of CYGNSS to provide information on the relationships between surface winds and oceanic precipitation at the mesoscale level. This research will

  14. Investigating Martian and Venusian hyperspectral datasets through Positive Source Separation

    NASA Astrophysics Data System (ADS)

    Tréguier, E.; Schmidt, F.; Schmidt, A.; Moussaoui, S.; Dobigeon, N.; Erard, S.; Cardesín, A.; Pinet, P.; Martin, P.

    2010-12-01

    Spectro-imagers with improved spectral/spatial resolution have mapped planetary bodies, providing high dimensional hyperspectral datasets that contain abundant data about the surface and/or atmosphere. The spatial extent of a pixel is usually large enough to contain a mixture of various surface/atmospheric constituents which contribute to a single pixel spectrum. Unsupervised spectral unmixing [1] aims at identifying the spectral signatures of materials present in the image and at estimating their abundances in each pixel. Bayesian Positive Source Separation (BPSS) [2] is an interesting way to deal with this unmixing challenge under linearity constraints. Notably, it ensures the non-negativity of both the unmixed component spectra and their abundances. Such a constraint is crucial to the physical interpretability of the results. A sum-to-one constraint [3] can also be imposed on the estimated abundances: its relevance depends on the nature of the dataset under consideration. Despite undeniable advantages, the use of such algorithms has so far been hampered by excessive computational resource requirements; so far it has not been possible to process a whole hyperspectral image of a size typically encountered in Earth and Planetary Sciences. Two kinds of implementation strategies were adopted to overcome this computational issue [4]. Firstly, several technical optimizations made it possible to run the BPSS algorithms on a complete image for the first time. Secondly, a pixel selection method was investigated: performed as a preprocessing step, it aims at extracting a few especially relevant pixels among all the image pixels. Then, the algorithm can be launched on this selection, with significantly lower computation overhead. In order to better understand the behavior of the method, tests on synthetic datasets generated by linear mixing of known mineral endmembers were performed. They help to assess the potential loss of quality induced by the pixel selection, depending

  15. Device-independent quantum key distribution

    NASA Astrophysics Data System (ADS)

    Hänggi, Esther

    2010-12-01

    In this thesis, we study two approaches to achieve device-independent quantum key distribution: in the first approach, the adversary can distribute any system to the honest parties that cannot be used to communicate between the three of them, i.e., it must be non-signalling. In the second approach, we limit the adversary to strategies which can be implemented using quantum physics. For both approaches, we show how device-independent quantum key distribution can be achieved when imposing an additional condition. In the non-signalling case this additional requirement is that communication is impossible between all pairwise subsystems of the honest parties, while, in the quantum case, we demand that measurements on different subsystems must commute. We give a generic security proof for device-independent quantum key distribution in these cases and apply it to an existing quantum key distribution protocol, thus proving its security even in this setting. We also show that, without any additional such restriction there always exists a successful joint attack by a non-signalling adversary.

  16. Independent Learning Models: A Comparison.

    ERIC Educational Resources Information Center

    Wickett, R. E. Y.

    Five models of independent learning are suitable for use in adult education programs. The common factor is a facilitator who works in some way with the student in the learning process. They display different characteristics, including the extent of independence in relation to content and/or process. Nondirective tutorial instruction and learning…

  17. Convergent Genetic and Expression Datasets Highlight TREM2 in Parkinson's Disease Susceptibility.

    PubMed

    Liu, Guiyou; Liu, Yongquan; Jiang, Qinghua; Jiang, Yongshuai; Feng, Rennan; Zhang, Liangcai; Chen, Zugen; Li, Keshen; Liu, Jiafeng

    2016-09-01

    A rare TREM2 missense mutation (rs75932628-T) was reported to confer a significant Alzheimer's disease (AD) risk. A recent study indicated no evidence of the involvement of this variant in Parkinson's disease (PD). Here, we used the genetic and expression data to reinvestigate the potential association between TREM2 and PD susceptibility. In stage 1, using 10 independent studies (N = 89,157; 8787 cases and 80,370 controls), we conducted a subgroup meta-analysis. We identified a significant association between rs75932628 and PD (P = 3.10E-03, odds ratio (OR) = 3.88, 95 % confidence interval (CI) 1.58-9.54) in No-Northern Europe subgroup, and significantly increased PD risks (P = 0.01 for Mann-Whitney test) in No-Northern Europe subgroup than in Northern Europe subgroup. In stage 2, we used the summary results from a large-scale PD genome-wide association study (GWAS; N = 108,990; 13,708 cases and 95,282 controls) to search for other TREM2 variants contributing to PD susceptibility. We identified 14 single-nucleotide polymorphisms (SNPs) associated with PD within 50-kb upstream and downstream range of TREM2. In stage 3, using two brain expression GWAS datasets (N = 773), we identified 6 of the 14 SNPs regulating increased expression of TREM2. In stage 4, using the whole human genome microarray data (N = 50), we further identified significantly increased expression of TREM2 in PD cases compared with controls in human prefrontal cortex. In summary, convergent genetic and expression datasets demonstrate that TREM2 is a potent risk factor for PD and may be a therapeutic target in PD and other neurodegenerative diseases.

  18. Synthesizing Global and Local Datasets to Estimate Jurisdictional Forest Carbon Fluxes in Berau, Indonesia

    PubMed Central

    Griscom, Bronson W.; Ellis, Peter W.; Baccini, Alessandro; Marthinus, Delon; Evans, Jeffrey S.; Ruslandi

    2016-01-01

    Background Forest conservation efforts are increasingly being implemented at the scale of sub-national jurisdictions in order to mitigate global climate change and provide other ecosystem services. We see an urgent need for robust estimates of historic forest carbon emissions at this scale, as the basis for credible measures of climate and other benefits achieved. Despite the arrival of a new generation of global datasets on forest area change and biomass, confusion remains about how to produce credible jurisdictional estimates of forest emissions. We demonstrate a method for estimating the relevant historic forest carbon fluxes within the Regency of Berau in eastern Borneo, Indonesia. Our method integrates best available global and local datasets, and includes a comprehensive analysis of uncertainty at the regency scale. Principal Findings and Significance We find that Berau generated 8.91 ± 1.99 million tonnes of net CO2 emissions per year during 2000–2010. Berau is an early frontier landscape where gross emissions are 12 times higher than gross sequestration. Yet most (85%) of Berau’s original forests are still standing. The majority of net emissions were due to conversion of native forests to unspecified agriculture (43% of total), oil palm (28%), and fiber plantations (9%). Most of the remainder was due to legal commercial selective logging (17%). Our overall uncertainty estimate offers an independent basis for assessing three other estimates for Berau. Two other estimates were above the upper end of our uncertainty range. We emphasize the importance of including an uncertainty range for all parameters of the emissions equation to generate a comprehensive uncertainty estimate–which has not been done before. We believe comprehensive estimates of carbon flux uncertainty are increasingly important as national and international institutions are challenged with comparing alternative estimates and identifying a credible range of historic emissions values

  19. Identification and optimization of classifier genes from multi-class earthworm microarray dataset.

    PubMed

    Li, Ying; Wang, Nan; Perkins, Edward J; Zhang, Chaoyang; Gong, Ping

    2010-10-28

    Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.

  20. Restoration and Recalibration of the Viking MAWD Datasets

    NASA Astrophysics Data System (ADS)

    Nuno, R. G.; Paige, D. A.; Sullivan, M.

    2014-12-01

    High-resolution HIRISE images of transient albedo dark features, called Recurring Slope Lineae (RSL), have been interpreted to be evidence for current hydrological activity [1]. If there are surface sources of water, then localized plumes of atmospheric water may be observable from orbit. The Viking MAWD column water vapor data are uniquely valuable for this purpose because they cover the full range of Martian local times, and include data sampled at high spatial resolution [2]. They also are accompanied by simultaneously acquired surface and atmospheric temperatures acquired by the Viking Infrared Thermal Mapper (IRTM) instruments. We searched the raster-averaged Viking Orbiter 1 and 2 MAWD column water vapor dataset for regions of localized elevated column water vapor abundances and found mid-latitude regions with transient water observations [3]. The raster averaged Viking Orbiter 1 and 2 MAWD column water vapor data available in the Planetary Data System (PDS), were calculated from radiance measurements using seasonally and topographically varying surface pressures which, at the time, had high uncertainties [4]. Due to recent interest in transient hydrological activity on Mars [2], we decoded the non-raster averaged Viking MAWD dataset, which are sampled at 15 times higher spatial resolution than the data that are currently available from PDS. This new dataset is being used to recalculate column water vapor abundances using current topographical data, as well as dust and pressure measurements from the Mars Global Circulation Model.References: [1] McEwen, A. S., et al. (2011). Seasonal flows on warm Martian slopes. Science (New York, N.Y.), 333(6043), 740-3. [2] Farmer, C. B., & Laporte, D. D. (1972). The Detection and Mapping of Water Vapor in the Martian Atmosphere. Icarus. [3] Nuno, R. G., et al. (2013). Searching for Localized Water Vapor Sources on Mars Utilizing Viking MAWD Data. 44th Lunar and Planetary Science Conference. [4] Farmer, C. B., et al. (1977

  1. Accuracy assessment of seven global land cover datasets over China

    NASA Astrophysics Data System (ADS)

    Yang, Yongke; Xiao, Pengfeng; Feng, Xuezhi; Li, Haixing

    2017-03-01

    Land cover (LC) is the vital foundation to Earth science. Up to now, several global LC datasets have arisen with efforts of many scientific communities. To provide guidelines for data usage over China, nine LC maps from seven global LC datasets (IGBP DISCover, UMD, GLC, MCD12Q1, GLCNMO, CCI-LC, and GlobeLand30) were evaluated in this study. First, we compared their similarities and discrepancies in both area and spatial patterns, and analysed their inherent relations to data sources and classification schemes and methods. Next, five sets of validation sample units (VSUs) were collected to calculate their accuracy quantitatively. Further, we built a spatial analysis model and depicted their spatial variation in accuracy based on the five sets of VSUs. The results show that, there are evident discrepancies among these LC maps in both area and spatial patterns. For LC maps produced by different institutes, GLC 2000 and CCI-LC 2000 have the highest overall spatial agreement (53.8%). For LC maps produced by same institutes, overall spatial agreement of CCI-LC 2000 and 2010, and MCD12Q1 2001 and 2010 reach up to 99.8% and 73.2%, respectively; while more efforts are still needed if we hope to use these LC maps as time series data for model inputting, since both CCI-LC and MCD12Q1 fail to represent the rapid changing trend of several key LC classes in the early 21st century, in particular urban and built-up, snow and ice, water bodies, and permanent wetlands. With the highest spatial resolution, the overall accuracy of GlobeLand30 2010 is 82.39%. For the other six LC datasets with coarse resolution, CCI-LC 2010/2000 has the highest overall accuracy, and following are MCD12Q1 2010/2001, GLC 2000, GLCNMO 2008, IGBP DISCover, and UMD in turn. Beside that all maps exhibit high accuracy in homogeneous regions; local accuracies in other regions are quite different, particularly in Farming-Pastoral Zone of North China, mountains in Northeast China, and Southeast Hills. Special

  2. MGH-USC Human Connectome Project datasets with ultra-high b-value diffusion MRI.

    PubMed

    Fan, Qiuyun; Witzel, Thomas; Nummenmaa, Aapo; Van Dijk, Koene R A; Van Horn, John D; Drews, Michelle K; Somerville, Leah H; Sheridan, Margaret A; Santillana, Rosario M; Snyder, Jenna; Hedden, Trey; Shaw, Emily E; Hollinshead, Marisa O; Renvall, Ville; Zanzonico, Roberta; Keil, Boris; Cauley, Stephen; Polimeni, Jonathan R; Tisdall, Dylan; Buckner, Randy L; Wedeen, Van J; Wald, Lawrence L; Toga, Arthur W; Rosen, Bruce R

    2016-01-01

    The MGH-USC CONNECTOM MRI scanner housed at the Massachusetts General Hospital (MGH) is a major hardware innovation of the Human Connectome Project (HCP). The 3T CONNECTOM scanner is capable of producing a magnetic field gradient of up to 300 mT/m strength for in vivo human brain imaging, which greatly shortens the time spent on diffusion encoding, and decreases the signal loss due to T2 decay. To demonstrate the capability of the novel gradient system, data of healthy adult participants were acquired for this MGH-USC Adult Diffusion Dataset (N=35), minimally preprocessed, and shared through the Laboratory of Neuro Imaging Image Data Archive (LONI IDA) and the WU-Minn Connectome Database (ConnectomeDB). Another purpose of sharing the data is to facilitate methodological studies of diffusion MRI (dMRI) analyses utilizing high diffusion contrast, which perhaps is not easily feasible with standard MR gradient system. In addition, acquisition of the MGH-Harvard-USC Lifespan Dataset is currently underway to include 120 healthy participants ranging from 8 to 90 years old, which will also be shared through LONI IDA and ConnectomeDB. Here we describe the efforts of the MGH-USC HCP consortium in acquiring and sharing the ultra-high b-value diffusion MRI data and provide a report on data preprocessing and access. We conclude with a demonstration of the example data, along with results of standard diffusion analyses, including q-ball Orientation Distribution Function (ODF) reconstruction and tractography.

  3. A multi-dataset data-collection strategy produces better diffraction data.

    PubMed

    Liu, Zhi Jie; Chen, Lirong; Wu, Dong; Ding, Wei; Zhang, Hua; Zhou, Weihong; Fu, Zheng Qing; Wang, Bi Cheng

    2011-11-01

    A multi-dataset (MDS) data-collection strategy is proposed and analyzed for macromolecular crystal diffraction data acquisition. The theoretical analysis indicated that the MDS strategy can reduce the standard deviation (background noise) of diffraction data compared with the commonly used single-dataset strategy for a fixed X-ray dose. In order to validate the hypothesis experimentally, a data-quality evaluation process, termed a readiness test of the X-ray data-collection system, was developed. The anomalous signals of sulfur atoms in zinc-free insulin crystals were used as the probe to differentiate the quality of data collected using different data-collection strategies. The data-collection results using home-laboratory-based rotating-anode X-ray and synchrotron X-ray systems indicate that the diffraction data collected with the MDS strategy contain more accurate anomalous signals from sulfur atoms than the data collected with a regular data-collection strategy. In addition, the MDS strategy offered more advantages with respect to radiation-damage-sensitive crystals and better usage of rotating-anode as well as synchrotron X-rays.

  4. New Atmospheric and Oceanic Angular Momentum Datasets for Predictions of Earth Rotation/Polar Motion

    NASA Astrophysics Data System (ADS)

    Salstein, D. A.; Stamatakos, N.

    2014-12-01

    We are reviewing the state of the art in available datasets for both atmospheric angular momentum (AAM) and oceanic angular momentum (OAM) for the purposes of analysis and prediction of both polar motion and length of day series. Both analyses and forecasts of these quantities have been used separately and in combination to aid in short and medium range predictions of Earth rotation parameters. The AAM and OAM combination, with the possible addition of hydrospheric angular momentum can form a proxy index for the Earth rotation parameters themselves due to the conservation of angular momentum in the Earth system. Such a combination of angular momentum of the geophysical fluids has helped in forecasts within periods up to about 10 days, due to the dynamic models, and together with extended statistical predictions of Earth rotation parameters out even as far as 90 days, according to Dill et al. (2013). We assess other dataset combinations that can be used in such analysis and prediction efforts for the Earth rotation parameters, and demonstrate the corresponding skill levels in doing so.

  5. Rule-based topology system for spatial databases to validate complex geographic datasets

    NASA Astrophysics Data System (ADS)

    Martinez-Llario, J.; Coll, E.; Núñez-Andrés, M.; Femenia-Ribera, C.

    2017-06-01

    A rule-based topology software system providing a highly flexible and fast procedure to enforce integrity in spatial relationships among datasets is presented. This improved topology rule system is built over the spatial extension Jaspa. Both projects are open source, freely available software developed by the corresponding author of this paper. Currently, there is no spatial DBMS that implements a rule-based topology engine (considering that the topology rules are designed and performed in the spatial backend). If the topology rules are applied in the frontend (as in many GIS desktop programs), ArcGIS is the most advanced solution. The system presented in this paper has several major advantages over the ArcGIS approach: it can be extended with new topology rules, it has a much wider set of rules, and it can mix feature attributes with topology rules as filters. In addition, the topology rule system can work with various DBMSs, including PostgreSQL, H2 or Oracle, and the logic is performed in the spatial backend. The proposed topology system allows users to check the complex spatial relationships among features (from one or several spatial layers) that require some complex cartographic datasets, such as the data specifications proposed by INSPIRE in Europe and the Land Administration Domain Model (LADM) for Cadastral data.

  6. Analysis of RNAseq datasets from a comparative infectious disease zebrafish model using GeneTiles bioinformatics.

    PubMed

    Veneman, Wouter J; de Sonneville, Jan; van der Kolk, Kees-Jan; Ordas, Anita; Al-Ars, Zaid; Meijer, Annemarie H; Spaink, Herman P

    2015-03-01

    We present a RNA deep sequencing (RNAseq) analysis of a comparison of the transcriptome responses to infection of zebrafish larvae with Staphylococcus epidermidis and Mycobacterium marinum bacteria. We show how our developed GeneTiles software can improve RNAseq analysis approaches by more confidently identifying a large set of markers upon infection with these bacteria. For analysis of RNAseq data currently, software programs such as Bowtie2 and Samtools are indispensable. However, these programs that are designed for a LINUX environment require some dedicated programming skills and have no options for visualisation of the resulting mapped sequence reads. Especially with large data sets, this makes the analysis time consuming and difficult for non-expert users. We have applied the GeneTiles software to the analysis of previously published and newly obtained RNAseq datasets of our zebrafish infection model, and we have shown the applicability of this approach also to published RNAseq datasets of other organisms by comparing our data with a published mammalian infection study. In addition, we have implemented the DEXSeq module in the GeneTiles software to identify genes, such as glucagon A, that are differentially spliced under infection conditions. In the analysis of our RNAseq data, this has led to the possibility to improve the size of data sets that could be efficiently compared without using problem-dedicated programs, leading to a quick identification of marker sets. Therefore, this approach will also be highly useful for transcriptome analyses of other organisms for which well-characterised genomes are available.

  7. Wide-Area Mapping of Forest with National Airborne Laser Scanning and Field Inventory Datasets

    NASA Astrophysics Data System (ADS)

    Monnet, J.-M.; Ginzler, C.; Clivaz, J.-C.

    2016-06-01

    Airborne laser scanning (ALS) remote sensing data are now available for entire countries such as Switzerland. Methods for the estimation of forest parameters from ALS have been intensively investigated in the past years. However, the implementation of a forest mapping workflow based on available data at a regional level still remains challenging. A case study was implemented in the Canton of Valais (Switzerland). The national ALS dataset and field data of the Swiss National Forest Inventory were used to calibrate estimation models for mean and maximum height, basal area, stem density, mean diameter and stem volume. When stratification was performed based on ALS acquisition settings and geographical criteria, satisfactory prediction models were obtained for volume (R2 = 0.61 with a root mean square error of 47 %) and basal area (respectively 0.51 and 45 %) while height variables had an error lower than 19%. This case study shows that the use of nationwide ALS and field datasets for forest resources mapping is cost efficient, but additional investigations are required to handle the limitations of the input data and optimize the accuracy.

  8. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide

    PubMed Central

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-01-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species’ evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals (“MammalDIET”). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  9. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide.

    PubMed

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-07-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species' evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals ("MammalDIET"). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  10. Revisiting Frazier's subdeltas: enhancing datasets with dimensionality, better to understand geologic systems

    USGS Publications Warehouse

    Flocks, James

    2006-01-01

    Scientific knowledge from the past century is commonly represented by two-dimensional figures and graphs, as presented in manuscripts and maps. Using today's computer technology, this information can be extracted and projected into three- and four-dimensional perspectives. Computer models can be applied to datasets to provide additional insight into complex spatial and temporal systems. This process can be demonstrated by applying digitizing and modeling techniques to valuable information within widely used publications. The seminal paper by D. Frazier, published in 1967, identified 16 separate delta lobes formed by the Mississippi River during the past 6,000 yrs. The paper includes stratigraphic descriptions through geologic cross-sections, and provides distribution and chronologies of the delta lobes. The data from Frazier's publication are extensively referenced in the literature. Additional information can be extracted from the data through computer modeling. Digitizing and geo-rectifying Frazier's geologic cross-sections produce a three-dimensional perspective of the delta lobes. Adding the chronological data included in the report provides the fourth-dimension of the delta cycles, which can be visualized through computer-generated animation. Supplemental information can be added to the model, such as post-abandonment subsidence of the delta-lobe surface. Analyzing the regional, net surface-elevation balance between delta progradations and land subsidence is computationally intensive. By visualizing this process during the past 4,500 yrs through multi-dimensional animation, the importance of sediment compaction in influencing both the shape and direction of subsequent delta progradations becomes apparent. Visualization enhances a classic dataset, and can be further refined using additional data, as well as provide a guide for identifying future areas of study.

  11. Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges

    NASA Astrophysics Data System (ADS)

    Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.

    2013-12-01

    Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM

  12. Independently Controlled Wing Stroke Patterns in the Fruit Fly Drosophila melanogaster

    PubMed Central

    Chakraborty, Soma; Bartussek, Jan; Fry, Steven N.; Zapotocky, Martin

    2015-01-01

    Flies achieve supreme flight maneuverability through a small set of miniscule steering muscles attached to the wing base. The fast flight maneuvers arise from precisely timed activation of the steering muscles and the resulting subtle modulation of the wing stroke. In addition, slower modulation of wing kinematics arises from changes in the activity of indirect flight muscles in the thorax. We investigated if these modulations can be described as a superposition of a limited number of elementary deformations of the wing stroke that are under independent physiological control. Using a high-speed computer vision system, we recorded the wing motion of tethered flying fruit flies for up to 12 000 consecutive wing strokes at a sampling rate of 6250 Hz. We then decomposed the joint motion pattern of both wings into components that had the minimal mutual information (a measure of statistical dependence). In 100 flight segments measured from 10 individual flies, we identified 7 distinct types of frequently occurring least-dependent components, each defining a kinematic pattern (a specific deformation of the wing stroke and the sequence of its activation from cycle to cycle). Two of these stroke deformations can be associated with the control of yaw torque and total flight force, respectively. A third deformation involves a change in the downstroke-to-upstroke duration ratio, which is expected to alter the pitch torque. A fourth kinematic pattern consists in the alteration of stroke amplitude with a period of 2 wingbeat cycles, extending for dozens of cycles. Our analysis indicates that these four elementary kinematic patterns can be activated mutually independently, and occur both in isolation and in linear superposition. The results strengthen the available evidence for independent control of yaw torque, pitch torque, and total flight force. Our computational method facilitates systematic identification of novel patterns in large kinematic datasets. PMID:25710715

  13. SU-E-T-32: A Feasibility Study of Independent Dose Verification for IMAT

    SciTech Connect

    Kamima, T; Takahashi, R; Sato, Y; Baba, H; Tachibana, H; Yamashita, M; Sugawara, Y

    2015-06-15

    Purpose: To assess the feasibility of the independent dose verification (Indp) for intensity modulated arc therapy (IMAT). Methods: An independent dose calculation software program (Simple MU Analysis, Triangle Products, JP) was used in this study, which can compute the radiological path length from the surface to the reference point for each control point using patient’s CT image dataset and the MLC aperture shape was simultaneously modeled in reference to the information of MLC from DICOM-RT plan. Dose calculation was performed using a modified Clarkson method considering MLC transmission and dosimetric leaf gap. In this study, a retrospective analysis was conducted in which IMAT plans from 120 patients of the two sites (prostate / head and neck) from four institutes were retrospectively analyzed to compare the Indp to the TPS using patient CT images. In addition, an ion-chamber measurement was performed to verify the accuracy of the TPS and the Indp in water-equivalent phantom. Results: The agreements between the Indp and the TPS (mean±1SD) were −0.8±2.4% and −1.3±3.8% for the regions of prostate and head and neck, respectively. The measurement comparison showed similar results (−0.8±1.6% and 0.1±4.6% for prostate and head and neck). The variation was larger in the head and neck because the number of the segments was increased that the reference point was under the MLC and the modified Clarkson method cannot consider the smooth falloff of the leaf penumbra. Conclusion: The independent verification program would be practical and effective for secondary check for IMAT with the sufficient accuracy in the measurement and CT-based calculation. The accuracy would be improved if considering the falloff of the leaf penumbra.

  14. Dataset used to improve liquid water absorption models in the microwave

    DOE Data Explorer

    Turner, David

    2015-12-14

    Two datasets, one a compilation of laboratory data and one a compilation from three field sites, are provided here. These datasets provide measurements of the real and imaginary refractive indices and absorption as a function of cloud temperature. These datasets were used in the development of the new liquid water absorption model that was published in Turner et al. 2015.

  15. MATCH: Metadata Access Tool for Climate and Health Datasets

    DOE Data Explorer

    MATCH is a searchable clearinghouse of publicly available Federal metadata (i.e. data about data) and links to datasets. Most metadata on MATCH pertain to geospatial data sets ranging from local to global scales. The goals of MATCH are to: 1) Provide an easily accessible clearinghouse of relevant Federal metadata on climate and health that will increase efficiency in solving research problems; 2) Promote application of research and information to understand, mitigate, and adapt to the health effects of climate change; 3) Facilitate multidirectional communication among interested stakeholders to inform and shape Federal research directions; 4) Encourage collaboration among traditional and non-traditional partners in development of new initiatives to address emerging climate and health issues. [copied from http://match.globalchange.gov/geoportal/catalog/content/about.page

  16. Geocoding and stereo display of tropical forest multisensor datasets

    NASA Technical Reports Server (NTRS)

    Welch, R.; Jordan, T. R.; Luvall, J. C.

    1990-01-01

    Concern about the future of tropical forests has led to a demand for geocoded multisensor databases that can be used to assess forest structure, deforestation, thermal response, evapotranspiration, and other parameters linked to climate change. In response to studies being conducted at the Braulino Carrillo National Park, Costa Rica, digital satellite and aircraft images recorded by Landsat TM, SPOT HRV, Thermal Infrared Multispectral Scanner, and Calibrated Airborne Multispectral Scanner sensors were placed in register using the Landsat TM image as the reference map. Despite problems caused by relief, multitemporal datasets, and geometric distortions in the aircraft images, registration was accomplished to within + or - 20 m (+ or - 1 data pixel). A digital elevation model constructed from a multisensor Landsat TM/SPOT stereopair proved useful for generating perspective views of the rugged, forested terrain.

  17. Orthology Detection Combining Clustering and Synteny for Very Large Datasets

    PubMed Central

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets. PMID:25137074

  18. Recovering complete and draft population genomes from metagenome datasets

    SciTech Connect

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

  19. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGES

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  20. Biofuel Production Datasets from DOE's Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about]

    Holdings include datasets, models, and maps and the collections arel growing due to both DOE contributions and data uploads from individuals.

  1. Feedstock Logistics Datasets from DOE's Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. Holdings include datasets, models, and maps. [from https://www.bioenergykdf.net/content/about

  2. Biofuel Distribution Datasets from the Bioenergy Knowledge Discovery Framework

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about] Holdings include datasets, models, and maps and the collections are growing due to both DOE contributions and individuals' data uploads.

  3. Feedstock Production Datasets from the Bioenergy Knowledge Discovery Framework

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about] Holdings include datasets, models, and maps and the collections are growing due to both DOE contributions and data uploads from individuals.

  4. Cluster analysis of long time-series medical datasets

    NASA Astrophysics Data System (ADS)

    Hirano, Shoji; Tsumoto, Shusaku

    2004-04-01

    This paper presents a comparative study about the characteristics of clustering methods for inhomogeneous time-series medical datasets. Using various combinations of comparison methods and grouping methods, we performed clustering experiments of the hepatitis data set and evaluated validity of the results. The results suggested that (1) complete-linkage (CL) criterion in agglomerative hierarchical clustering (AHC) outperformed average-linkage (AL) criterion in terms of the interpretability of a dendrogram and clustering results, (2) combination of dynamic time warping (DTW) and CL-AHC constantly produced interpretable results, (3) combination of DTW and rough clustering (RC) would be used to find the core sequences of the clusters, (4) multiscale matching may suffer from the treatment of 'no-match' pairs, however, the problem may be eluded by using RC as a subsequent grouping method.

  5. Probing cosmic acceleration by using the SNLS3 SNIa dataset

    SciTech Connect

    Li, Xiao-Dong; Wang, Shuang; Zhang, Wen-Shuai; Li, Song; Huang, Qing-Guo; Li, Miao E-mail: sli@itp.ac.cn E-mail: wszhang@mail.ustc.edu.cn E-mail: mli@itp.ac.cn

    2011-07-01

    We probe the cosmic acceleration by using the recently released SNLS3 sample of 472 type Ia supernovae. Combining this type Ia supernovae dataset with the cosmic microwave background anisotropy data from the Wilkinson Microwave Anisotropy Probe 7-yr observations, the baryon acoustic oscillation results from the Sloan Digital Sky Survey data release 7, and the Hubble constant measurement from the Wide Field Camera 3 on the Hubble Space Telescope, we measure the dark energy equation of state w and the deceleration parameter q as functions of redshift by using the Chevallier-Polarski-Linder parametrization. Our result is consistent with a cosmological constant at 1σ confidence level, without evidence for the recent slowing down of the cosmic acceleration. Furthermore, we consider three binned parametrizations (w is piecewise constant in redshift z) based on different binning methods. The similar results are obtained, i.e., the ΛCDM model is still nicely compatible with current observations.

  6. Flow cytometry dataset for cells collected from touched surfaces

    PubMed Central

    Kwon, Ye Jin; Stanciu, Cristina E.; Philpott, M. Katherine; Ehrhardt, Christopher J.

    2016-01-01

    ‘Touch’ or trace cell mixtures submitted as evidence are a significant problem for forensic laboratories as they can render resulting genetic profiles difficult or even impossible to interpret. Optical signatures that distinguish epidermal cell populations from different contributors could facilitate the physical separation of mixture components prior to genetic analysis, and potentially the downstream production of single source profiles and/or simplified mixtures.  This dataset comprises the results from antibody hybridization surveys using Human Leukocyte Antigen (HLA) and Cytokeratin (CK) probes, as well as surveys of optical properties of deposited cells, including forward scatter (FSC), side scatter (SSC), and fluorescence emissions in the Allophycocyanin (APC) channel.  All analyses were performed on “touch” samples deposited by several different contributors on multiple days to assess inter- and intra-contributor variability. PMID:28105303

  7. Incorporating the TRMM Dataset into the GPM Mission Data Suite

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz; Ji, Yimin; Chou, Joyce; Kelley, Owen; Kwiatkowski, John; Stout, John

    2016-01-01

    In June 2015 the TRMM satellite came to its end. The 17 plus year of mission data that it provided has proven a valuable asset to a variety of science communities. This 17plus year data set does not, however, stagnate with the end of the mission itself. NASA/JAXA intend to integrate the TRMM data set into the data suite of the GPM mission. This will ensure the creation of a consistent, intercalibrated, accurate dataset within GPM that extends back to November of 1998. This paper describes the plans for incorporating the TRMM 17plus year data into the GPM data suite. These plans call for using GPM algorithms for both radiometer and radar to reprocess TRMM data as well as intercalibrating partner radiometers using GPM intercalibration techniques. This reprocessing will mean changes in content, logical format and physical format as well as improved geolocation, sensor corrections and retrieval techniques.

  8. Polylactides in additive biomanufacturing.

    PubMed

    Poh, Patrina S P; Chhaya, Mohit P; Wunner, Felix M; De-Juan-Pardo, Elena M; Schilling, Arndt F; Schantz, Jan-Thorsten; van Griensven, Martijn; Hutmacher, Dietmar W

    2016-12-15

    New advanced manufacturing technologies under the alias of additive biomanufacturing allow the design and fabrication of a range of products from pre-operative models, cutting guides and medical devices to scaffolds. The process of printing in 3 dimensions of cells, extracellular matrix (ECM) and biomaterials (bioinks, powders, etc.) to generate in vitro and/or in vivo tissue analogue structures has been termed bioprinting. To further advance in additive biomanufacturing, there are many aspects that we can learn from the wider additive manufacturing (AM) industry, which have progressed tremendously since its introduction into the manufacturing sector. First, this review gives an overview of additive manufacturing and both industry and academia efforts in addressing specific challenges in the AM technologies to drive toward AM-enabled industrial revolution. After which, considerations of poly(lactides) as a biomaterial in additive biomanufacturing are discussed. Challenges in wider additive biomanufacturing field are discussed in terms of (a) biomaterials; (b) computer-aided design, engineering and manufacturing; (c) AM and additive biomanufacturing printers hardware; and (d) system integration. Finally, the outlook for additive biomanufacturing was discussed.

  9. Additive Manufactured Product Integrity

    NASA Technical Reports Server (NTRS)

    Waller, Jess; Wells, Doug; James, Steve; Nichols, Charles

    2017-01-01

    NASA is providing key leadership in an international effort linking NASA and non-NASA resources to speed adoption of additive manufacturing (AM) to meet NASA's mission goals. Participants include industry, NASA's space partners, other government agencies, standards organizations and academia. Nondestructive Evaluation (NDE) is identified as a universal need for all aspects of additive manufacturing.

  10. Alaska national hydrography dataset positional accuracy assessment study

    USGS Publications Warehouse

    Arundel, Samantha; Yamamoto, Kristina H.; Constance, Eric; Mantey, Kim; Vinyard-Houx, Jeremy

    2013-01-01

    Initial visual assessments Wide range in the quality of fit between features in NHD and these new image sources. No statistical analysis has been performed to actually quantify accuracy Determining absolute accuracy is cost prohibitive (must collect independent, well defined test points) Quantitative analysis of relative positional error is feasible.

  11. Displaying Planetary and Geophysical Datasets on NOAAs Science On a Sphere (TM)

    NASA Astrophysics Data System (ADS)

    Albers, S. C.; MacDonald, A. E.; Himes, D.

    2005-12-01

    NOAAs Science On a Sphere(TM)(SOS)was developed to educate current and future generations about the changing Earth and its processes. This system presents NOAAs global science through a 3D representation of our planet as if the viewer were looking at the Earth from outer space. In our presentation, we will describe the preparation of various global datasets for display on Science On a Sphere(TM), a 1.7-m diameter spherical projection system developed and patented at the Forecast Systems Laboratory (FSL) in Boulder, Colorado. Four projectors cast rotating images onto a spherical projection screen to create the effect of Earth, planet, or satellite floating in space. A static dataset can be prepared for display using popular image formats such as JPEG, usually sized at 1024x2048 or 2048x4096 pixels. A set of static images in a directory will comprise a movie. Imagery and data for SOS are obtained from a variety of government organizations, sometimes post-processed by various individuals, including the authors. Some datasets are already available in the required cylindrical projection. Readily available planetary maps can often be improved in coverage and/or appearance by reprojecting and combining additional images and mosaics obtained by various spacecraft, such as Voyager, Galileo, and Cassini. A map of Mercury was produced by blending some Mariner 10 photo-mosaics with a USGS shaded-relief map. An improved high-resolution map of Venus was produced by combining several Magellan mosaics, supplied by The Planetary Society, along with other spacecraft data. We now have a full set of Jupiter's Galilean satellite imagery that we can display on Science On a Sphere(TM). Photo-mosaics of several Saturnian satellites were updated by reprojecting and overlaying recently taken Cassini flyby images. Maps of imagery from five Uranian satellites were added, as well as one for Neptune. More image processing was needed to add a high-resolution Voyager mosaic to a pre-existing map

  12. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, Stefan K.

    1998-01-01

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.

  13. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, S.K.

    1998-03-24

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.

  14. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Fletcher, James C. (Inventor); Pratt, J. Richard (Inventor); St.clair, Terry L. (Inventor); Stoakley, Diane M. (Inventor); Burks, Harold D. (Inventor)

    1992-01-01

    A process for preparing polyimides having enhanced melt flow properties is described. The process consists of heating a mixture of a high molecular weight poly-(amic acid) or polyimide with a low molecular weight amic acid or imide additive in the range of 0.05 to 15 percent by weight of additive. The polyimide powders so obtained show improved processability, as evidenced by lower melt viscosity by capillary rheometry. Likewise, films prepared from mixtures of polymers with additives show improved processability with earlier onset of stretching by TMA.

  15. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Pratt, J. Richard (Inventor); St.clair, Terry L. (Inventor); Stoakley, Diane M. (Inventor); Burks, Harold D. (Inventor)

    1993-01-01

    A process for preparing polyimides having enhanced melt flow properties is described. The process consists of heating a mixture of a high molecular weight poly-(amic acid) or polyimide with a low molecular weight amic acid or imide additive in the range of 0.05 to 15 percent by weight of the additive. The polyimide powders so obtained show improved processability, as evidenced by lower melt viscosity by capillary rheometry. Likewise, films prepared from mixtures of polymers with additives show improved processability with earlier onset of stretching by TMA.

  16. Chandra Independently Determines Hubble Constant

    NASA Astrophysics Data System (ADS)

    2006-08-01

    A critically important number that specifies the expansion rate of the Universe, the so-called Hubble constant, has been independently determined using NASA's Chandra X-ray Observatory. This new value matches recent measurements using other methods and extends their validity to greater distances, thus allowing astronomers to probe earlier epochs in the evolution of the Universe. "The reason this result is so significant is that we need the Hubble constant to tell us the size of the Universe, its age, and how much matter it contains," said Max Bonamente from the University of Alabama in Huntsville and NASA's Marshall Space Flight Center (MSFC) in Huntsville, Ala., lead author on the paper describing the results. "Astronomers absolutely need to trust this number because we use it for countless calculations." Illustration of Sunyaev-Zeldovich Effect Illustration of Sunyaev-Zeldovich Effect The Hubble constant is calculated by measuring the speed at which objects are moving away from us and dividing by their distance. Most of the previous attempts to determine the Hubble constant have involved using a multi-step, or distance ladder, approach in which the distance to nearby galaxies is used as the basis for determining greater distances. The most common approach has been to use a well-studied type of pulsating star known as a Cepheid variable, in conjunction with more distant supernovae to trace distances across the Universe. Scientists using this method and observations from the Hubble Space Telescope were able to measure the Hubble constant to within 10%. However, only independent checks would give them the confidence they desired, considering that much of our understanding of the Universe hangs in the balance. Chandra X-ray Image of MACS J1149.5+223 Chandra X-ray Image of MACS J1149.5+223 By combining X-ray data from Chandra with radio observations of galaxy clusters, the team determined the distances to 38 galaxy clusters ranging from 1.4 billion to 9.3 billion

  17. Atlas Toolkit: Fast registration of 3D morphological datasets in the absence of landmarks

    PubMed Central

    Grocott, Timothy; Thomas, Paul; Münsterberg, Andrea E.

    2016-01-01

    Image registration is a gateway technology for Developmental Systems Biology, enabling computational analysis of related datasets within a shared coordinate system. Many registration tools rely on landmarks to ensure that datasets are correctly aligned; yet suitable landmarks are not present in many datasets. Atlas Toolkit is a Fiji/ImageJ plugin collection offering elastic group-wise registration of 3D morphological datasets, guided by segmentation of the interesting morphology. We demonstrate the method by combinatorial mapping of cell signalling events in the developing eyes of chick embryos, and use the integrated datasets to predictively enumerate Gene Regulatory Network states. PMID:26864723

  18. Food Additives and Hyperkinesis

    ERIC Educational Resources Information Center

    Wender, Ester H.

    1977-01-01

    The hypothesis that food additives are causally associated with hyperkinesis and learning disabilities in children is reviewed, and available data are summarized. Available from: American Medical Association 535 North Dearborn Street Chicago, Illinois 60610. (JG)

  19. Smog control fuel additives

    SciTech Connect

    Lundby, W.

    1993-06-29

    A method is described of controlling, reducing or eliminating, ozone and related smog resulting from photochemical reactions between ozone and automotive or industrial gases comprising the addition of iodine or compounds of iodine to hydrocarbon-base fuels prior to or during combustion in an amount of about 1 part iodine per 240 to 10,000,000 parts fuel, by weight, to be accomplished by: (a) the addition of these inhibitors during or after the refining or manufacturing process of liquid fuels; (b) the production of these inhibitors for addition into fuel tanks, such as automotive or industrial tanks; or (c) the addition of these inhibitors into combustion chambers of equipment utilizing solid fuels for the purpose of reducing ozone.

  20. A New Dataset of Automatically Extracted Structure of Arms and Bars in Spiral Galaxies

    NASA Astrophysics Data System (ADS)

    Hayes, Wayne B.; Davis, D.

    2012-05-01

    We present an algorithm capable of automatically extracting quantitative structure (bars and arms) from images of spiral galaxies. We have run the algorithm on 30,000 galaxies and compared the results to human classifications generously provided pre-publication by the Galaxy Zoo 2 team. In all available measures, our algorithm agrees with the humans about as well as they agree with each other. In addition we provide objective, quantitative measures not available in human classifications. We provide a preliminary analysis of this dataset to see how the properties of arms and bars vary as a function of basic variables such as environment, redshift, absolute magnitude, and color. We also show how structure can vary across wavebands as well as along and across individual arms and bars. Finally, we present preliminary results of a measurement of the total angular momentum present in our observed set of galaxies with an eye towards determining if there is a preferred "handedness" in the universe.

  1. Compiling a Comprehensive EVA Training Dataset for NASA Astronauts

    NASA Technical Reports Server (NTRS)

    Laughlin, M. S.; Murray, J. D.; Lee, L. R.; Wear, M. L.; Van Baalen, M.

    2016-01-01

    Training for a spacewalk or extravehicular activity (EVA) is considered a hazardous duty for NASA astronauts. This places astronauts at risk for decompression sickness as well as various musculoskeletal disorders from working in the spacesuit. As a result, the operational and research communities over the years have requested access to EVA training data to supplement their studies. The purpose of this paper is to document the comprehensive EVA training data set that was compiled from multiple sources by the Lifetime Surveillance of Astronaut Health (LSAH) epidemiologists to investigate musculoskeletal injuries. The EVA training dataset does not contain any medical data, rather it only documents when EVA training was performed, by whom and other details about the session. The first activities practicing EVA maneuvers in water were performed at the Neutral Buoyancy Simulator (NBS) at the Marshall Spaceflight Center in Huntsville, Alabama. This facility opened in 1967 and was used for EVA training until the early Space Shuttle program days. Although several photographs show astronauts performing EVA training in the NBS, records detailing who performed the training and the frequency of training are unavailable. Paper training records were stored within the NBS after it was designated as a National Historic Landmark in 1985 and closed in 1997, but significant resources would be needed to identify and secure these records, and at this time LSAH has not pursued acquisition of these early training records. Training in the NBS decreased when the Johnson Space Center in Houston, Texas, opened the Weightless Environment Training Facility (WETF) in 1980. Early training records from the WETF consist of 11 hand-written dive logbooks compiled by individual workers that were digitized at the request of LSAH. The WETF was integral in the training for Space Shuttle EVAs until its closure in 1998. The Neutral Buoyancy Laboratory (NBL) at the Sonny Carter Training Facility near JSC

  2. Independent Schools: Landscape and Learnings.

    ERIC Educational Resources Information Center

    Oates, William A.

    1981-01-01

    Examines American independent schools (parochial, southern segregated, and private institutions) in terms of their funding, expenditures, changing enrollment patterns, teacher-student ratios, and societal functions. Journal available from Daedalus Subscription Department, 1172 Commonwealth Ave., Boston, MA 02132. (AM)

  3. Technology for Independent Living: Sourcebook.

    ERIC Educational Resources Information Center

    Enders, Alexandra, Ed.

    This sourcebook provides information for the practical implementation of independent living technology in the everyday rehabilitation process. "Information Services and Resources" lists databases, clearinghouses, networks, research and development programs, toll-free telephone numbers, consumer protection caveats, selected publications, and…

  4. Experimental interference of independent photons.

    PubMed

    Kaltenbaek, Rainer; Blauensteiner, Bibiane; Zukowski, Marek; Aspelmeyer, Markus; Zeilinger, Anton

    2006-06-23

    Interference of photons emerging from independent sources is essential for modern quantum-information processing schemes, above all quantum repeaters and linear-optics quantum computers. We report an observation of nonclassical interference of two single photons originating from two independent, separated sources, which were actively synchronized with a rms timing jitter of 260 fs. The resulting (two-photon) interference visibility was (83+/-4)%.

  5. Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

    PubMed Central

    Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

    2011-01-01

    Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus

  6. Multielement geochemical dataset of surficial materials for the northern Great Basin

    USGS Publications Warehouse

    Coombs, Mary Jane; Kotlyar, Boris B.; Ludington, Steve; Folger, Helen W.; Mossotti, Victor G.

    2002-01-01

    This report presents geochemical data generated during mineral and environmental assessments for the Bureau of Land Management in northern Nevada, northeastern California, southeastern Oregon, and southwestern Idaho, along with metadata and map representations of selected elements. The dataset presented here is a compilation of chemical analyses of over 10,200 stream-sediment and soil samples originally collected during the National Uranium Resource Evaluation's (NURE) Hydrogeochemical and Stream Sediment Reconnaissance (HSSR) program of the Department of Energy and its predecessors and reanalyzed to support a series of mineral-resource assessments by the U.S. Geological Survey (USGS). The dataset also includes the analyses of additional samples collected by the USGS in 1992. The sample sites are in southeastern Oregon, southwestern Idaho, northeastern California, and, primarily, in northern Nevada. These samples were collected from 1977 to 1983, before the development of most of the present-day large-scale mining infrastructure in northern Nevada. As such, these data may serve as an important baseline for current and future geoenvironmental studies. Largely because of the very diverse analytical methods used by the NURE HSSR program, the original NURE analyses in this area yielded little useful geochemical information. The Humboldt, Malheur-Jordan-Andrews, and Winnemucca-Surprise studies were designed to provide useful geochemical data via improved analytical methods (lower detection levels and higher precision) and, in the Malheur-Jordan-Andrews and Winnemucca Surprise areas, to collect additional stream-sediment samples to increase sampling coverage. The data are provided in *.xls (Microsoft Excel) and *.csv (comma-separated-value) format. We also present graphically 35 elements, interpolated ("gridded") in a geographic information system (GIS) and overlain by major geologic trends, so that users may view the variation in elemental concentrations over the

  7. Aster Global dem Version 3, and New Aster Water Body Dataset

    NASA Astrophysics Data System (ADS)

    Abrams, M.

    2016-06-01

    In 2016, the US/Japan ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) project released Version 3 of the Global DEM (GDEM). This 30 m DEM covers the earth's surface from 82N to 82S, and improves on two earlier versions by correcting some artefacts and filling in areas of missing DEMs by the acquisition of additional data. The GDEM was produced by stereocorrelation of 2 million ASTER scenes and operation on a pixel-by-pixel basis: cloud screening; stacking data from overlapping scenes; removing outlier values, and averaging elevation values. As previously, the GDEM is packaged in ~ 23,000 1 x 1 degree tiles. Each tile has a DEM file, and a NUM file reporting the number of scenes used for each pixel, and identifying the source for fill-in data (where persistent clouds prevented computation of an elevation value). An additional data set was concurrently produced and released: the ASTER Water Body Dataset (AWBD). This is a 30 m raster product, which encodes every pixel as either lake, river, or ocean; thus providing a global inland and shore-line water body mask. Water was identified through spectral analysis algorithms and manual editing. This product was evaluated against the Shuttle Water Body Dataset (SWBD), and the Landsat-based Global Inland Water (GIW) product. The SWBD only covers the earth between about 60 degrees north and south, so it is not a global product. The GIW only delineates inland water bodies, and does not deal with ocean coastlines. All products are at 30 m postings.

  8. Evaluating the use of different precipitation datasets in flood modelling

    NASA Astrophysics Data System (ADS)

    Akyurek, Zuhal; Soytekin, Arzu

    2016-04-01

    Satellite based precipitation products, numerical weather prediction model precipitation forecasts and weather radar precipitation estimates can be a remedy for gauge sparse regions especially in flood forecasting studies. However, there is a strong need for evaluation of the performance and limitations of these estimates in hydrology. This study compares the Hydro-Estimator precipitation product, Weather Research and Forecasting (WRF) model precipitation and weather radar values with gauge data in Samsun-Terme region located in the eastern Black Sea region of Turkey, which generally receives high rainfall from north-facing slopes of mountains. Using different statistical factors, performance of the precipitation estimates are compared in point and areal based manner. In point based comparisons, three matching methods; direct matching method (DM), probability matching method (PMM) and window correlation matching method (WCMM) are used to make comparisons for the flood event (22.11.2014) lasted 40 hours. Hourly rainfall data from 13 ground observation stations were used in the analyses. This flood event created 541 m3/sec peak discharge at the 22-45 discharge observation station and flooding at the downstream of the basin. It is seen that, general trend of the rainfall is captured by the radar rainfall estimation well but radar underestimates the peaks. Moreover, it is observed that the assessment factor (gauge rainfall/ radar rainfall estimation) does not depend on the distance between radar and gauge station. In WCMM calculation it is found out that change of space window from 1x1 type to 5x5 type does not improve the results dramatically. In areal based comparisons, it is found out that the distribution of the HE product in time series does not show similarity for other datasets. Furthermore, the geometry of the subbasins, size of the area in 2D and 3D and average elevation do not have an impact on the mean statistics, RMSE, r and bias calculation for both radar

  9. Forest restoration: a global dataset for biodiversity and vegetation structure.

    PubMed

    Crouzeilles, Renato; Ferreira, Mariana S; Curran, Michael

    2016-08-01

    Restoration initiatives are becoming increasingly applied around the world. Billions of dollars have been spent on ecological restoration research and initiatives, but restoration outcomes differ widely among these initiatives in part due to variable socioeconomic and ecological contexts. Here, we present the most comprehensive dataset gathered to date on forest restoration. It encompasses 269 primary studies across 221 study landscapes in 53 countries and contains 4,645 quantitative comparisons between reference ecosystems (e.g., old-growth forest) and degraded or restored ecosystems for five taxonomic groups (mammals, birds, invertebrates, herpetofauna, and plants) and five measures of vegetation structure reflecting different ecological processes (cover, density, height, biomass, and litter). We selected studies that (1) were conducted in forest ecosystems; (2) had multiple replicate sampling sites to measure indicators of biodiversity and/or vegetation structure in reference and restored and/or degraded ecosystems; and (3) used less-disturbed forests as a reference to the ecosystem under study. We recorded (1) latitude and longitude; (2) study year; (3) country; (4) biogeographic realm; (5) past disturbance type; (6) current disturbance type; (7) forest conversion class; (8) restoration activity; (9) time that a system has been disturbed; (10) time elapsed since restoration started; (11) ecological metric used to assess biodiversity; and (12) quantitative value of the ecological metric of biodiversity and/or vegetation structure for reference and restored and/or degraded ecosystems. These were the most common data available in the selected studies. We also estimated forest cover and configuration in each study landscape using a recently developed 1 km consensus land cover dataset. We measured forest configuration as the (1) mean size of all forest patches; (2) size of the largest forest patch; and (3) edge:area ratio of forest patches. Global analyses of the

  10. Additive Manufacturing Infrared Inspection

    NASA Technical Reports Server (NTRS)

    Gaddy, Darrell

    2014-01-01

    Additive manufacturing is a rapid prototyping technology that allows parts to be built in a series of thin layers from plastic, ceramics, and metallics. Metallic additive manufacturing is an emerging form of rapid prototyping that allows complex structures to be built using various metallic powders. Significant time and cost savings have also been observed using the metallic additive manufacturing compared with traditional techniques. Development of the metallic additive manufacturing technology has advanced significantly over the last decade, although many of the techniques to inspect parts made from these processes have not advanced significantly or have limitations. Several external geometry inspection techniques exist such as Coordinate Measurement Machines (CMM), Laser Scanners, Structured Light Scanning Systems, or even traditional calipers and gages. All of the aforementioned techniques are limited to external geometry and contours or must use a contact probe to inspect limited internal dimensions. This presentation will document the development of a process for real-time dimensional inspection technique and digital quality record of the additive manufacturing process using Infrared camera imaging and processing techniques.

  11. District nurses prescribing as nurse independent prescribers.

    PubMed

    Downer, Frances; Shepherd, Chew Kim

    2010-07-01

    Nurse prescribing has been established in the UK since 1994, however, limited focus has been placed on the experiences of district nurses adopting this additional role. This phenomenological study explores the experiences of district nurses prescribing as nurse independent prescribers across the West of Scotland. A qualitative Heideggarian approach examined the every-day experiences of independent prescribing among district nurses. A purposive sample was used and data collected using audio taped one-to-one informal interviews. The data was analysed thematically using Colaizzi's seven procedural steps. Overall these nurses reported that nurse prescribing was a predominantly positive experience. Participants identified improvements in patient care, job satisfaction, level of autonomy and role development. However, some of the participants indicated that issues such as support, record keeping, confidence and ongoing education are all major influences on prescribing practices.

  12. Phenylethynyl Containing Reactive Additives

    NASA Technical Reports Server (NTRS)

    Connell, John W. (Inventor); Smith, Joseph G., Jr. (Inventor); Hergenrother, Paul M. (Inventor)

    2002-01-01

    Phenylethynyl containing reactive additives were prepared from aromatic diamine, containing phenylethvnvl groups and various ratios of phthalic anhydride and 4-phenylethynviphthalic anhydride in glacial acetic acid to form the imide in one step or in N-methyl-2-pvrrolidinone to form the amide acid intermediate. The reactive additives were mixed in various amounts (10% to 90%) with oligomers containing either terminal or pendent phenylethynyl groups (or both) to reduce the melt viscosity and thereby enhance processability. Upon thermal cure, the additives react and become chemically incorporated into the matrix and effect an increase in crosslink density relative to that of the host resin. This resultant increase in crosslink density has advantageous consequences on the cured resin properties such as higher glass transition temperature and higher modulus as compared to that of the host resin.

  13. Phenylethynyl Containing Reactive Additives

    NASA Technical Reports Server (NTRS)

    Connell, John W. (Inventor); Smith, Joseph G., Jr. (Inventor); Hergenrother, Paul M. (Inventor)

    2002-01-01

    Phenylethynyl containing reactive additives were prepared from aromatic diamines containing phenylethynyl groups and various ratios of phthalic anhydride and 4-phenylethynylphthalic anhydride in glacial acetic acid to form the imide in one step or in N-methyl-2-pyrrolidi none to form the amide acid intermediate. The reactive additives were mixed in various amounts (10% to 90%) with oligomers containing either terminal or pendent phenylethynyl groups (or both) to reduce the melt viscosity and thereby enhance processability. Upon thermal cure, the additives react and become chemically incorporated into the matrix and effect an increase in crosslink density relative to that of the host resin. This resultant increase in crosslink density has advantageous consequences on the cured resin properties such as higher glass transition temperature and higher modulus as compared to that of the host resin.

  14. Impact of survey workflow on precision and accuracy of terrestrial LiDAR datasets

    NASA Astrophysics Data System (ADS)

    Gold, P. O.; Cowgill, E.; Kreylos, O.

    2009-12-01

    Ground-based LiDAR (Light Detection and Ranging) survey techniques are enabling remote visualization and quantitative analysis of geologic features at unprecedented levels of detail. For example, digital terrain models computed from LiDAR data have been used to measure displaced landforms along active faults and to quantify fault-surface roughness. But how accurately do terrestrial LiDAR data represent the true ground surface, and in particular, how internally consistent and precise are the mosaiced LiDAR datasets from which surface models are constructed? Addressing this question is essential for designing survey workflows that capture the necessary level of accuracy for a given project while minimizing survey time and equipment, which is essential for effective surveying of remote sites. To address this problem, we seek to define a metric that quantifies how scan registration error changes as a function of survey workflow. Specifically, we are using a Trimble GX3D laser scanner to conduct a series of experimental surveys to quantify how common variables in field workflows impact the precision of scan registration. Primary variables we are testing include 1) use of an independently measured network of control points to locate scanner and target positions, 2) the number of known-point locations used to place the scanner and point clouds in 3-D space, 3) the type of target used to measure distances between the scanner and the known points, and 4) setting up the scanner over a known point as opposed to resectioning of known points. Precision of the registered point cloud is quantified using Trimble Realworks software by automatic calculation of registration errors (errors between locations of the same known points in different scans). Accuracy of the registered cloud (i.e., its ground-truth) will be measured in subsequent experiments. To obtain an independent measure of scan-registration errors and to better visualize the effects of these errors on a registered point

  15. Additives in plastics.

    PubMed Central

    Deanin, R D

    1975-01-01

    The polymers used in plastics are generally harmless. However, they are rarely used in pure form. In almost all commercial plastics, they are "compounded" with monomeric ingredients to improve their processing and end-use performance. In order of total volume used, these monomeric additives may be classified as follows: reinforcing fibers, fillers, and coupling agents; plasticizers; colorants; stabilizers (halogen stabilizers, antioxidants, ultraviolet absorbers, and biological preservatives); processing aids (lubricants, others, and flow controls); flame retardants, peroxides; and antistats. Some information is already available, and much more is needed, on potential toxicity and safe handling of these additives during processing and manufacture of plastics products. PMID:1175566

  16. Additives in plastics.

    PubMed

    Deanin, R D

    1975-06-01

    The polymers used in plastics are generally harmless. However, they are rarely used in pure form. In almost all commercial plastics, they are "compounded" with monomeric ingredients to improve their processing and end-use performance. In order of total volume used, these monomeric additives may be classified as follows: reinforcing fibers, fillers, and coupling agents; plasticizers; colorants; stabilizers (halogen stabilizers, antioxidants, ultraviolet absorbers, and biological preservatives); processing aids (lubricants, others, and flow controls); flame retardants, peroxides; and antistats. Some information is already available, and much more is needed, on potential toxicity and safe handling of these additives during processing and manufacture of plastics products.

  17. A dataset from bottom trawl survey around Taiwan

    PubMed Central

    Shao, Kwang-Tsao; Lin, Jack; Wu, Chung-Han; Yeh, Hsin-Ming; Cheng, Tun-Yuan

    2012-01-01

    Abstract Bottom trawl fishery is one of the most important coastal fisheries in Taiwan both in production and economic values. However, its annual production started to decline due to overfishing since the 1980s. Its bycatch problem also damages the fishery resource seriously. Thus, the government banned the bottom fishery within 3 nautical miles along the shoreline in 1989. To evaluate the effectiveness of this policy, a four year survey was conducted from 2000–2003, in the waters around Taiwan and Penghu (Pescadore) Islands, one region each year respectively. All fish specimens collected from trawling were brought back to lab for identification, individual number count and body weight measurement. These raw data have been integrated and established in Taiwan Fish Database (http://fishdb.sinica.edu.tw). They have also been published through TaiBIF (http://taibif.tw), FishBase and GBIF (website see below). This dataset contains 631 fish species and 3,529 records, making it the most complete demersal fish fauna and their temporal and spatial distributional data on the soft marine habitat in Taiwan. PMID:22707908

  18. A dataset from bottom trawl survey around Taiwan.

    PubMed

    Shao, Kwang-Tsao; Lin, Jack; Wu, Chung-Han; Yeh, Hsin-Ming; Cheng, Tun-Yuan

    2012-01-01

    Bottom trawl fishery is one of the most important coastal fisheries in Taiwan both in production and economic values. However, its annual production started to decline due to overfishing since the 1980s. Its bycatch problem also damages the fishery resource seriously. Thus, the government banned the bottom fishery within 3 nautical miles along the shoreline in 1989. To evaluate the effectiveness of this policy, a four year survey was conducted from 2000-2003, in the waters around Taiwan and Penghu (Pescadore) Islands, one region each year respectively. All fish specimens collected from trawling were brought back to lab for identification, individual number count and body weight measurement. These raw data have been integrated and established in Taiwan Fish Database (http://fishdb.sinica.edu.tw). They have also been published through TaiBIF (http://taibif.tw), FishBase and GBIF (website see below). This dataset contains 631 fish species and 3,529 records, making it the most complete demersal fish fauna and their temporal and spatial distributional data on the soft marine habitat in Taiwan.

  19. CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset.

    PubMed

    Cao, Houwei; Cooper, David G; Keutmann, Michael K; Gur, Ruben C; Nenkova, Ani; Verma, Ragini

    2014-01-01

    People convey their emotional state in their face and voice. We present an audio-visual data set uniquely suited for the study of multi-modal emotion expression and perception. The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9%, 58.2% and 63.6% respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.

  20. Privacy-preserving record linkage on large real world datasets.

    PubMed

    Randall, Sean M; Ferrante, Anna M; Boyd, James H; Bauer, Jacqueline K; Semmens, James B

    2014-08-01

    Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised.