Science.gov

Sample records for additional independent datasets

  1. Optimal training dataset composition for SVM-based, age-independent, automated epileptic seizure detection.

    PubMed

    Bogaarts, J G; Gommer, E D; Hilkman, D M W; van Kranen-Mastenbroek, V H J M; Reulen, J P H

    2016-08-01

    Automated seizure detection is a valuable asset to health professionals, which makes adequate treatment possible in order to minimize brain damage. Most research focuses on two separate aspects of automated seizure detection: EEG feature computation and classification methods. Little research has been published regarding optimal training dataset composition for patient-independent seizure detection. This paper evaluates the performance of classifiers trained on different datasets in order to determine the optimal dataset for use in classifier training for automated, age-independent, seizure detection. Three datasets are used to train a support vector machine (SVM) classifier: (1) EEG from neonatal patients, (2) EEG from adult patients and (3) EEG from both neonates and adults. To correct for baseline EEG feature differences among patients feature, normalization is essential. Usually dedicated detection systems are developed for either neonatal or adult patients. Normalization might allow for the development of a single seizure detection system for patients irrespective of their age. Two classifier versions are trained on all three datasets: one with feature normalization and one without. This gives us six different classifiers to evaluate using both the neonatal and adults test sets. As a performance measure, the area under the receiver operating characteristics curve (AUC) is used. With application of FBC, it resulted in performance values of 0.90 and 0.93 for neonatal and adult seizure detection, respectively. For neonatal seizure detection, the classifier trained on EEG from adult patients performed significantly worse compared to both the classifier trained on EEG data from neonatal patients and the classier trained on both neonatal and adult EEG data. For adult seizure detection, optimal performance was achieved by either the classifier trained on adult EEG data or the classifier trained on both neonatal and adult EEG data. Our results show that age-independent

  2. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    SciTech Connect

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  3. Gene Expression Profile for Predicting Survival in Advanced-Stage Serous Ovarian Cancer Across Two Independent Datasets

    PubMed Central

    Yoshihara, Kosuke; Tajima, Atsushi; Yahata, Tetsuro; Kodama, Shoji; Fujiwara, Hiroyuki; Suzuki, Mitsuaki; Onishi, Yoshitaka; Hatae, Masayuki; Sueyoshi, Kazunobu; Fujiwara, Hisaya; Kudo, Yoshiki; Kotera, Kohei; Masuzaki, Hideaki; Tashiro, Hironori; Katabuchi, Hidetaka; Inoue, Ituro; Tanaka, Kenichi

    2010-01-01

    Background Advanced-stage ovarian cancer patients are generally treated with platinum/taxane-based chemotherapy after primary debulking surgery. However, there is a wide range of outcomes for individual patients. Therefore, the clinicopathological factors alone are insufficient for predicting prognosis. Our aim is to identify a progression-free survival (PFS)-related molecular profile for predicting survival of patients with advanced-stage serous ovarian cancer. Methodology/Principal Findings Advanced-stage serous ovarian cancer tissues from 110 Japanese patients who underwent primary surgery and platinum/taxane-based chemotherapy were profiled using oligonucleotide microarrays. We selected 88 PFS-related genes by a univariate Cox model (p<0.01) and generated the prognostic index based on 88 PFS-related genes after adjustment of regression coefficients of the respective genes by ridge regression Cox model using 10-fold cross-validation. The prognostic index was independently associated with PFS time compared to other clinical factors in multivariate analysis [hazard ratio (HR), 3.72; 95% confidence interval (CI), 2.66–5.43; p<0.0001]. In an external dataset, multivariate analysis revealed that this prognostic index was significantly correlated with PFS time (HR, 1.54; 95% CI, 1.20–1.98; p = 0.0008). Furthermore, the correlation between the prognostic index and overall survival time was confirmed in the two independent external datasets (log rank test, p = 0.0010 and 0.0008). Conclusions/Significance The prognostic ability of our index based on the 88-gene expression profile in ridge regression Cox hazard model was shown to be independent of other clinical factors in predicting cancer prognosis across two distinct datasets. Further study will be necessary to improve predictive accuracy of the prognostic index toward clinical application for evaluation of the risk of recurrence in patients with advanced-stage serous ovarian cancer. PMID:20300634

  4. Cross-platform comparison of independent datasets identifies an immune signature associated with improved survival in metastatic melanoma

    PubMed Central

    Lardone, Ricardo D.; Plaisier, Seema B.; Navarrete, Marian S.; Shamonki, Jaime M.; Jalas, John R.

    2016-01-01

    Platform and study differences in prognostic signatures from metastatic melanoma (MM) gene expression reports often hinder consensus arrival. We performed survival/outcome-based pairwise comparisons of three independent MM gene expression profiles using the threshold-free algorithm rank-rank hypergeometric overlap analysis (RRHO). We found statistically significant overlap for genes overexpressed in favorable outcome (FO) groups, but no overlap for poor outcome (PO) groups. This “favorable outcome signature” (FOS) of 228 genes coinciding on all three overlapping gene lists showed immune function predominated in FO MM. Surprisingly, specific cell signature-enrichment analysis showed B cell-associated genes enriched in FO MM, along with T cell-associated genes. Higher levels of B and T cells (p<0.05) and their relative proximity (p<0.05) were detected in FO-to-PO tumor comparisons from an independent MM patients cohort. Finally, expression of FOS in two independent Stage III MM tumor datasets correctly predicted clinical outcome in 12/14 and 44/70 patients using a weighted gene voting classifier (area under the curve values 0.96 and 0.75, respectively). This RRHO-based, cross-study analysis emphasizes the RRHO approach power, confirms T cells relevance for prolonged MM survival, supports a favorable role for B cells in anti-melanoma immunity, and suggests B cells potential as means of intervention in melanoma treatment. PMID:26883106

  5. Accuracy and Precision in the Southern Hemisphere Additional Ozonesondes (SHADOZ) Dataset in Light of the JOSIE-2000 Results

    NASA Technical Reports Server (NTRS)

    Witte, Jacquelyn C.; Thompson, Anne M.; Schmidlin, F. J.; Oltmans, S. J.; Smit, H. G. J.

    2004-01-01

    Since 1998 the Southern Hemisphere ADditional OZonesondes (SHADOZ) project has provided over 2000 ozone profiles over eleven southern hemisphere tropical and subtropical stations. Balloon-borne electrochemical concentration cell (ECC) ozonesondes are used to measure ozone. The data are archived at: &ttp://croc.gsfc.nasa.gov/shadoz>. In analysis of ozonesonde imprecision within the SHADOZ dataset, Thompson et al. [JGR, 108,8238,20031 we pointed out that variations in ozonesonde technique (sensor solution strength, instrument manufacturer, data processing) could lead to station-to-station biases within the SHADOZ dataset. Imprecisions and accuracy in the SHADOZ dataset are examined in light of new data. First, SHADOZ total ozone column amounts are compared to version 8 TOMS (2004 release). As for TOMS version 7, satellite total ozone is usually higher than the integrated column amount from the sounding. Discrepancies between the sonde and satellite datasets decline two percentage points on average, compared to version 7 TOMS offsets. Second, the SHADOZ station data are compared to results of chamber simulations (JOSE-2000, Juelich Ozonesonde Intercomparison Experiment) in which the various SHADOZ techniques were evaluated. The range of JOSE column deviations from a standard instrument (-10%) in the chamber resembles that of the SHADOZ station data. It appears that some systematic variations in the SHADOZ ozone record are accounted for by differences in solution strength, data processing and instrument type (manufacturer).

  6. A pooled job physical exposure dataset from multiple independent studies in a consortium study of carpal tunnel syndrome

    PubMed Central

    Bao, Stephen S.; Kapellusch, Jay M.; Garg, Arun; Silverstein, Barbara A.; Harris-Adamson, Carisa; Burt, Susan E.; Dale, Ann Marie; Evanoff, Bradley A.; Gerr, Frederic E.; Hegmann, Kurt T.; Merlino, Linda A.; Thiese, Matthew S.; Rempel, David M.

    2016-01-01

    Background Six research groups independently conducted prospective studies of carpal tunnel syndrome (CTS) incidence in 54 US workplaces in 10 US States. Physical exposure variables were collected by all research groups at the individual worker level. Data from these research groups were pooled to increase the exposure spectrum and statistical power. Objective This paper provides a detailed description of the characteristics of the pooled physical exposure variables and the source data information from the individual research studies. Methods Physical exposure data were inspected and prepared by each of the individual research studies according to detailed instructions provided by an exposure sub-committee of the research consortium. Descriptive analyses were performed on the pooled physical exposure dataset. Correlation analyses were performed among exposure variables estimating similar exposure aspects. Results At baseline, there were a total of 3010 subjects in the pooled physical exposure dataset. Overall, the pooled data meaningfully increased the spectra of most exposure variables. The increased spectra were due to the wider range in exposure data of different jobs provided by the research studies. The correlations between variables estimating similar exposure aspects showed different patterns among data provided by the research studies. Conclusions The increased spectra of the physical exposure variables among the data pooled likely improved the possibility of detecting potential associations between these physical exposure variables and CTS incidence. It is also recognized that methods need to be developed for general use by all researchers for standardization of physical exposure variable definition, data collection, processing and reduction. PMID:25504866

  7. Evaluation of results from genome-wide studies of language and reading in a novel independent dataset.

    PubMed

    Carrion-Castillo, A; van Bergen, E; Vino, A; van Zuijen, T; de Jong, P F; Francks, C; Fisher, S E

    2016-07-01

    Recent genome-wide association scans (GWAS) for reading and language abilities have pin-pointed promising new candidate loci. However, the potential contributions of these loci remain to be validated. In this study, we tested 17 of the most significantly associated single nucleotide polymorphisms (SNPs) from these GWAS studies (P < 10(-6) in the original studies) in a new independent population dataset from the Netherlands: known as Familial Influences on Literacy Abilities. This dataset comprised 483 children from 307 nuclear families and 505 adults (including parents of participating children), and provided adequate statistical power to detect the effects that were previously reported. The following measures of reading and language performance were collected: word reading fluency, nonword reading fluency, phonological awareness and rapid automatized naming. Two SNPs (rs12636438 and rs7187223) were associated with performance in multivariate and univariate testing, but these did not remain significant after correction for multiple testing. Another SNP (rs482700) was only nominally associated in the multivariate test. For the rest of the SNPs, we did not find supportive evidence of association. The findings may reflect differences between our study and the previous investigations with respect to the language of testing, the exact tests used and the recruitment criteria. Alternatively, most of the prior reported associations may have been false positives. A larger scale GWAS meta-analysis than those previously performed will likely be required to obtain robust insights into the genomic architecture underlying reading and language. PMID:27198479

  8. Complementary Aerodynamic Performance Datasets for Variable Speed Power Turbine Blade Section from Two Independent Transonic Turbine Cascades

    NASA Technical Reports Server (NTRS)

    Flegel, Ashlie B.; Welch, Gerard E.; Giel, Paul W.; Ames, Forrest E.; Long, Jonathon A.

    2015-01-01

    Two independent experimental studies were conducted in linear cascades on a scaled, two-dimensional mid-span section of a representative Variable Speed Power Turbine (VSPT) blade. The purpose of these studies was to assess the aerodynamic performance of the VSPT blade over large Reynolds number and incidence angle ranges. The influence of inlet turbulence intensity was also investigated. The tests were carried out in the NASA Glenn Research Center Transonic Turbine Blade Cascade Facility and at the University of North Dakota (UND) High Speed Compressible Flow Wind Tunnel Facility. A large database was developed by acquiring total pressure and exit angle surveys and blade loading data for ten incidence angles ranging from +15.8deg to -51.0deg. Data were acquired over six flow conditions with exit isentropic Reynolds number ranging from 0.05×106 to 2.12×106 and at exit Mach numbers of 0.72 (design) and 0.35. Flow conditions were examined within the respective facility constraints. The survey data were integrated to determine average exit total-pressure and flow angle. UND also acquired blade surface heat transfer data at two flow conditions across the entire incidence angle range aimed at quantifying transitional flow behavior on the blade. Comparisons of the aerodynamic datasets were made for three "match point" conditions. The blade loading data at the match point conditions show good agreement between the facilities. This report shows comparisons of other data and highlights the unique contributions of the two facilities. The datasets are being used to advance understanding of the aerodynamic challenges associated with maintaining efficient power turbine operation over a wide shaft-speed range.

  9. 10 CFR 431.175 - Additional requirements applicable to non-Voluntary Independent Certification Program participants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... have such testing performed at an independent laboratory. In addition, you must test a sufficient... Independent Certification Program participants. 431.175 Section 431.175 Energy DEPARTMENT OF ENERGY ENERGY... requirements applicable to non-Voluntary Independent Certification Program participants. If you are...

  10. Objective identification of mid-latitude storms in satellite imagery: determination of an independent storm validation dataset.

    NASA Astrophysics Data System (ADS)

    Delsol, C.; Hodges, K.

    2003-04-01

    Current methods of validating GCMs involve comparing model results with Re-analysis datasets in which observations have been combined with a model. The quality of this approach depends on the observational data distribution in space and time and on the model formulation. We propose to use an automatic and objective technique that can provide efficiently a dataset of “real” data against which the models and re-analysis can be validated based on the identification and tracking of weather systems in satellite imagery. We present results of a boundary finding method based on Fourier Shape Descriptors for the identification of extra-tropical cyclones in the mid-latitudes using NOAA’s AVHRR IR imagery. The boundary-finding method, initially derived for medical image processing, is designed to incorporate model-based information into a boundary finding process for continuously deformable objects. This allows us to work with objects that are diverse and irregular in their shape such as developing weather systems. The method is suited to work in an environment, which may contain spurious and broken boundaries. The main characteristic features of an extra-tropical system such as the vortex and associated frontal systems are identified. This work provides a basis for statistical analyses of extra-tropical cyclones for climatological studies and for the validation of GCMs, making use of the vast amount of satellite archive data available. It is also useful for individual case studies for weather forecast verification.

  11. Antimicrobial combinations: Bliss independence and Loewe additivity derived from mechanistic multi-hit models.

    PubMed

    Baeder, Desiree Y; Yu, Guozhi; Hozé, Nathanaël; Rolff, Jens; Regoes, Roland R

    2016-05-26

    Antimicrobial peptides (AMPs) and antibiotics reduce the net growth rate of bacterial populations they target. It is relevant to understand if effects of multiple antimicrobials are synergistic or antagonistic, in particular for AMP responses, because naturally occurring responses involve multiple AMPs. There are several competing proposals describing how multiple types of antimicrobials add up when applied in combination, such as Loewe additivity or Bliss independence. These additivity terms are defined ad hoc from abstract principles explaining the supposed interaction between the antimicrobials. Here, we link these ad hoc combination terms to a mathematical model that represents the dynamics of antimicrobial molecules hitting targets on bacterial cells. In this multi-hit model, bacteria are killed when a certain number of targets are hit by antimicrobials. Using this bottom-up approach reveals that Bliss independence should be the model of choice if no interaction between antimicrobial molecules is expected. Loewe additivity, on the other hand, describes scenarios in which antimicrobials affect the same components of the cell, i.e. are not acting independently. While our approach idealizes the dynamics of antimicrobials, it provides a conceptual underpinning of the additivity terms. The choice of the additivity term is essential to determine synergy or antagonism of antimicrobials.This article is part of the themed issue 'Evolutionary ecology of arthropod antimicrobial peptides'. PMID:27160596

  12. Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.

    PubMed

    Fan, Jianqing; Feng, Yang; Song, Rui

    2011-06-01

    A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods. PMID:22279246

  13. Concentration Addition, Independent Action and Generalized Concentration Addition Models for Mixture Effect Prediction of Sex Hormone Synthesis In Vitro

    PubMed Central

    Hadrup, Niels; Taxvig, Camilla; Pedersen, Mikael; Nellemann, Christine; Hass, Ulla; Vinggaard, Anne Marie

    2013-01-01

    Humans are concomitantly exposed to numerous chemicals. An infinite number of combinations and doses thereof can be imagined. For toxicological risk assessment the mathematical prediction of mixture effects, using knowledge on single chemicals, is therefore desirable. We investigated pros and cons of the concentration addition (CA), independent action (IA) and generalized concentration addition (GCA) models. First we measured effects of single chemicals and mixtures thereof on steroid synthesis in H295R cells. Then single chemical data were applied to the models; predictions of mixture effects were calculated and compared to the experimental mixture data. Mixture 1 contained environmental chemicals adjusted in ratio according to human exposure levels. Mixture 2 was a potency adjusted mixture containing five pesticides. Prediction of testosterone effects coincided with the experimental Mixture 1 data. In contrast, antagonism was observed for effects of Mixture 2 on this hormone. The mixtures contained chemicals exerting only limited maximal effects. This hampered prediction by the CA and IA models, whereas the GCA model could be used to predict a full dose response curve. Regarding effects on progesterone and estradiol, some chemicals were having stimulatory effects whereas others had inhibitory effects. The three models were not applicable in this situation and no predictions could be performed. Finally, the expected contributions of single chemicals to the mixture effects were calculated. Prochloraz was the predominant but not sole driver of the mixtures, suggesting that one chemical alone was not responsible for the mixture effects. In conclusion, the GCA model seemed to be superior to the CA and IA models for the prediction of testosterone effects. A situation with chemicals exerting opposing effects, for which the models could not be applied, was identified. In addition, the data indicate that in non-potency adjusted mixtures the effects cannot always be

  14. Automated determination of chemical functionalisation addition routes based on magnetic susceptibility and nucleus independent chemical shifts

    NASA Astrophysics Data System (ADS)

    Van Lier, G.; Ewels, C. P.; Geerlings, P.

    2008-07-01

    We present a modified version of our previously reported meta-code SACHA, for systematic analysis of chemical addition. The code automates the generation of structures, running of quantum chemical codes, and selection of preferential isomers based on chosen selection rules. While the selection rules for the previous version were based on the total system energy, predicting purely thermodynamic addition patterns, we examine here the possibility of using other system parameters, notably magnetic susceptibility as a descriptor of global aromaticity, and nucleus independent chemical shifts (NICS) as local aromaticity descriptor.

  15. 10 CFR 431.174 - Additional requirements applicable to Voluntary Independent Certification Program participants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Independent Certification Program participants. 431.174 Section 431.174 Energy DEPARTMENT OF ENERGY ENERGY... requirements applicable to Voluntary Independent Certification Program participants. (a) Description of Voluntary Independent Certification Program participant. For purposes of this subpart, a manufacturer...

  16. Independent and additive effects of glutamic acid and methionine on yeast longevity.

    PubMed

    Wu, Ziyun; Song, Lixia; Liu, Shao Quan; Huang, Dejian

    2013-01-01

    It is established that glucose restriction extends yeast chronological and replicative lifespan, but little is known about the influence of amino acids on yeast lifespan, although some amino acids were reported to delay aging in rodents. Here we show that amino acid composition greatly alters yeast chronological lifespan. We found that non-essential amino acids (to yeast) methionine and glutamic acid had the most significant impact on yeast chronological lifespan extension, restriction of methionine and/or increase of glutamic acid led to longevity that was not the result of low acetic acid production and acidification in aging media. Remarkably, low methionine, high glutamic acid and glucose restriction additively and independently extended yeast lifespan, which could not be further extended by buffering the medium (pH 6.0). Our preliminary findings using yeasts with gene deletion demonstrate that glutamic acid addition, methionine and glucose restriction prompt yeast longevity through distinct mechanisms. This study may help to fill a gap in yeast model for the fast developing view that nutrient balance is a critical factor to extend lifespan. PMID:24244480

  17. The Use of Additional GPS Frequencies to Independently Determine Tropospheric Water Vapor Profiles

    NASA Technical Reports Server (NTRS)

    Herman, B.M.; Feng, D.; Flittner, D. E.; Kursinski, E. R.

    2000-01-01

    It is well known that the currently employed L1 and L2 GPS/MET frequencies (1.2 - 1.6) Ghz) do not allow for the separation of water vapor and density (or temperature) from active microwave occultation measurements in regions of the troposphere warmer than 240 K Therefore, additional information must be used, from other types of measurements and weather analyses, to recover water vapor (and temperature) profiles. Thus in data sparse regions, these inferred profiles can be subject to larger errors than would result in data rich regions. The use of properly selected additional GPS frequencies enables a direct, independent measurement of the absorption associated with the water vapor profile, which may then be used in the standard GPS/MET retrievals to obtain a more accurate determination of atmospheric temperature throughout the water vapor layer. This study looks at the use of microwave crosslinks in the region of the 22 Ghz water vapor absorption line for this purpose. An added advantage of using 22 Ghz frequencies is that they are only negligibly affected by the ionosphere in contrast to the large effect at the GPS frequencies. The retrieval algorithm uses both amplitude and phase measurements to obtain profiles of atmospheric pressure, temperature and water water vapor pressure with a vertical resolution of 1 km or better. This technique also provides the cloud liquid water content along the ray path, which is in itself an important element in climate monitoring. Advantages of this method include the ability to make measurements in the presence of clouds and the use of techniques and technology proven through the GPS/MET experiment and several of NASA's planetary exploration missions. Simulations demonstrating this method will be presented for both clear and cloudy sky conditions.

  18. Accuracy and Precision in the Southern Hemisphere Additional Ozonesondes (SHADOZ) Dataset 1998-2000 in Light of the JOSIE-2000 Results

    NASA Technical Reports Server (NTRS)

    Witte, J. C.; Thompson, A. M.; Schmidlin, F. J.; Oltmans, S. J.; McPeters, R. D.; Smit, H. G. J.

    2003-01-01

    A network of 12 southern hemisphere tropical and subtropical stations in the Southern Hemisphere ADditional OZonesondes (SHADOZ) project has provided over 2000 profiles of stratospheric and tropospheric ozone since 1998. Balloon-borne electrochemical concentration cell (ECC) ozonesondes are used with standard radiosondes for pressure, temperature and relative humidity measurements. The archived data are available at:http: //croc.gsfc.nasa.gov/shadoz. In Thompson et al., accuracies and imprecisions in the SHADOZ 1998- 2000 dataset were examined using ground-based instruments and the TOMS total ozone measurement (version 7) as references. Small variations in ozonesonde technique introduced possible biases from station-to-station. SHADOZ total ozone column amounts are now compared to version 8 TOMS; discrepancies between the two datasets are reduced 2\\% on average. An evaluation of ozone variations among the stations is made using the results of a series of chamber simulations of ozone launches (JOSIE-2000, Juelich Ozonesonde Intercomparison Experiment) in which a standard reference ozone instrument was employed with the various sonde techniques used in SHADOZ. A number of variations in SHADOZ ozone data are explained when differences in solution strength, data processing and instrument type (manufacturer) are taken into account.

  19. Nitrogen Addition and Warming Independently Influence the Belowground Micro-Food Web in a Temperate Steppe

    PubMed Central

    Li, Qi; Bai, Huahua; Liang, Wenju; Xia, Jianyang; Wan, Shiqiang; van der Putten, Wim H.

    2013-01-01

    Climate warming and atmospheric nitrogen (N) deposition are known to influence ecosystem structure and functioning. However, our understanding of the interactive effect of these global changes on ecosystem functioning is relatively limited, especially when it concerns the responses of soils and soil organisms. We conducted a field experiment to study the interactive effects of warming and N addition on soil food web. The experiment was established in 2006 in a temperate steppe in northern China. After three to four years (2009–2010), we found that N addition positively affected microbial biomass and negatively influenced trophic group and ecological indices of soil nematodes. However, the warming effects were less obvious, only fungal PLFA showed a decreasing trend under warming. Interestingly, the influence of N addition did not depend on warming. Structural equation modeling analysis suggested that the direct pathway between N addition and soil food web components were more important than the indirect connections through alterations in soil abiotic characters or plant growth. Nitrogen enrichment also affected the soil nematode community indirectly through changes in soil pH and PLFA. We conclude that experimental warming influenced soil food web components of the temperate steppe less than N addition, and there was little influence of warming on N addition effects under these experimental conditions. PMID:23544140

  20. Developing Independent Listening Skills for English as an Additional Language Students

    ERIC Educational Resources Information Center

    Picard, Michelle; Velautham, Lalitha

    2016-01-01

    This paper describes an action research project to develop online, self-access listening resources mirroring the authentic academic contexts experienced by graduate university students. Current listening materials for English as an Additional Language (EAL) students mainly use Standard American English or Standard British pronunciation, and far…

  1. Addition of uridines to edited RNAs in trypanosome mitochondria occurs independently of transcription

    SciTech Connect

    Harris, M.E.; Moore, D.R.; Hajduk, S.L. )

    1990-07-05

    RNA editing is a novel RNA processing event of unknown mechanism that results in the introduction of nucleotides not encoded in the DNA into specific RNA molecules. We have examined the post-transcriptional addition of nucleotides into the mitochondrial RNA of Trypanosoma brucei. Utilizing an isolated organelle system we have determined that addition of uridines to edited RNAs does not require ongoing transcription. Trypanosome mitochondria incorporate CTP, ATP, and UTP into RNA in the absence of transcription. GTP is incorporated into RNA only as a result of the transcription process. Post-transcriptional CTP and ATP incorporation can be ascribed to known enzymatic activities. CTP is incorporated into tRNAs as a result of synthesis or turnover of their 3{prime} CCA sequences. ATP is incorporated into the 3{prime} CCA of tRNAs and into mitochondrial messenger RNAs due to polyadenylation. In the absence of transcription, UTP is incorporated into transcripts known to undergo editing, and the degree of UTP incorporation is consistent with the degree of editing occurring in these transcripts. Cytochrome b mRNAs, which contain a single editing site near their 5{prime} ends, are initially transcribed unedited at that site. Post-transcriptional labeling of cytochrome b mRNAs in the organelle with (alpha-32P)UTP results in the addition of uridines near the 5{prime} end of the RNA but not in a 3{prime} region which lacks an editing site. These results indicate that RNA editing is a post-transcriptional process in the mitochondria of trypanosomes.

  2. Casein synthesis is independently and additively related to individual essential amino acid supply.

    PubMed

    Arriola Apelo, S I; Singer, L M; Ray, W K; Helm, R F; Lin, X Y; McGilliard, M L; St-Pierre, N R; Hanigan, M D

    2014-05-01

    ). Individual AA effects on CFSR did not correlate with mammalian target of rapamycin (mTOR) signaling. Independent responses of CFSR to individual essential AA observed in this study contradict the single-limiting AA theory assumed in current requirement systems. The saturable responses in CFSR to these 4 AA also highlight the inadequacy of using a fixed postabsorptive AA efficiency approach for determining AA requirements for milk protein synthesis. PMID:24582441

  3. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  4. Treatment Planning Constraints to Avoid Xerostomia in Head-and-Neck Radiotherapy: An Independent Test of QUANTEC Criteria Using a Prospectively Collected Dataset

    SciTech Connect

    Moiseenko, Vitali; Wu, Jonn; Hovan, Allan; Saleh, Ziad; Apte, Aditya; Deasy, Joseph O.; Harrow, Stephen; Rabuka, Carman; Muggli, Adam; Thompson, Anna

    2012-03-01

    Purpose: The severe reduction of salivary function (xerostomia) is a common complication after radiation therapy for head-and-neck cancer. Consequently, guidelines to ensure adequate function based on parotid gland tolerance dose-volume parameters have been suggested by the QUANTEC group and by Ortholan et al. We perform a validation test of these guidelines against a prospectively collected dataset and compared with a previously published dataset. Methods and Materials: Whole-mouth stimulated salivary flow data from 66 head-and-neck cancer patients treated with radiotherapy at the British Columbia Cancer Agency (BCCA) were measured, and treatment planning data were abstracted. Flow measurements were collected from 50 patients at 3 months, and 60 patients at 12-month follow-up. Previously published data from a second institution, Washington University in St. Louis (WUSTL), were used for comparison. A logistic model was used to describe the incidence of Grade 4 xerostomia as a function of the mean dose of the spared parotid gland. The rate of correctly predicting the lack of xerostomia (negative predictive value [NPV]) was computed for both the QUANTEC constraints and Ortholan et al. recommendation to constrain the total volume of both glands receiving more than 40 Gy to less than 33%. Results: Both datasets showed a rate of xerostomia of less than 20% when the mean dose to the least-irradiated parotid gland is kept to less than 20 Gy. Logistic model parameters for the incidence of xerostomia at 12 months after therapy, based on the least-irradiated gland, were D{sub 50} = 32.4 Gy and and {gamma} = 0.97. NPVs for QUANTEC guideline were 94% (BCCA data), and 90% (WUSTL data). For Ortholan et al. guideline NPVs were 85% (BCCA) and 86% (WUSTL). Conclusion: These data confirm that the QUANTEC guideline effectively avoids xerostomia, and this is somewhat more effective than constraints on the volume receiving more than 40 Gy.

  5. Additional Saturday rehabilitation improves functional independence and quality of life and reduces length of stay: a randomized controlled trial

    PubMed Central

    2013-01-01

    Background Many inpatients receive little or no rehabilitation on weekends. Our aim was to determine what effect providing additional Saturday rehabilitation during inpatient rehabilitation had on functional independence, quality of life and length of stay compared to 5 days per week of rehabilitation. Methods This was a multicenter, single-blind (assessors) randomized controlled trial with concealed allocation and 12-month follow-up conducted in two publically funded metropolitan inpatient rehabilitation facilities in Melbourne, Australia. Patients were eligible if they were adults (aged ≥18 years) admitted for rehabilitation for any orthopedic, neurological or other disabling conditions excluding those admitted for slow stream rehabilitation/geriatric evaluation and management. Participants were randomly allocated to usual care Monday to Friday rehabilitation (control) or to Monday to Saturday rehabilitation (intervention). The additional Saturday rehabilitation comprised physiotherapy and occupational therapy. The primary outcomes were functional independence (functional independence measure (FIM); measured on an 18 to 126 point scale), health-related quality of life (EQ-5D utility index; measured on a 0 to 1 scale, and EQ-5D visual analog scale; measured on a 0 to 100 scale), and patient length of stay. Outcome measures were assessed on admission, discharge (primary endpoint), and at 6 and 12 months post discharge. Results We randomly assigned 996 adults (mean (SD) age 74 (13) years) to Monday to Saturday rehabilitation (n = 496) or usual care Monday to Friday rehabilitation (n = 500). Relative to admission scores, intervention group participants had higher functional independence (mean difference (MD) 2.3, 95% confidence interval (CI) 0.5 to 4.1, P = 0.01) and health-related quality of life (MD 0.04, 95% CI 0.01 to 0.07, P = 0.009) on discharge and may have had a shorter length of stay by 2 days (95% CI 0 to 4, P = 0.1) when compared to

  6. Neck Circumference, along with Other Anthropometric Indices, Has an Independent and Additional Contribution in Predicting Fatty Liver Disease

    PubMed Central

    Huang, Bi-xia; Zhu, Ming-fan; Wu, Ting; Zhou, Jing-ya; Liu, Yan; Chen, Xiao-lin; Zhou, Rui-fen; Wang, Li-jun; Chen, Yu-ming; Zhu, Hui-lian

    2015-01-01

    Background and Aim Previous studies have indicated that neck circumference is a valuable predictor for obesity and metabolic syndrome, but little evidence is available for fatty liver disease. We examined the association of neck circumference with fatty liver disease and evaluated its predictive value in Chinese adults. Methods This cross-sectional study comprised 4053 participants (1617 women and 2436 men, aged 20-88) recruited from the Health Examination Center in Guangzhou, China between May 2009 and April 2010. Anthropometric measurements were taken, abdominal ultrasonography was conducted and blood biochemical parameters were measured. Covariance, logistic regression and receiver operating characteristic curve analyses were employed. Results The mean neck circumference was greater in subjects with fatty liver disease than those without the disease in both women and men after adjusting for age (P<0.001). Logistic regression analysis showed that the age-adjusted ORs (95% CI) of fatty liver disease for quartile 4 (vs. quartile 1) of neck circumference were 7.70 (4.95-11.99) for women and 12.42 (9.22-16.74) for men. After further adjusting for other anthropometric indices, both individually and combined, the corresponding ORs remained significant (all P-trends<0.05) but were attenuated to 1.94-2.53 for women and 1.45-2.08 for men. An additive interaction existed between neck circumference and the other anthropometric measures (all P<0.05). A high neck circumference value was associated with a much greater prevalence of fatty liver disease in participants with both high and normal BMI, waist circumference and waist-to-hip ratio values. Conclusions Neck circumference was an independent predictor for fatty liver disease and provided an additional contribution when applied with other anthropometric measures. PMID:25679378

  7. The influence of non-solvent addition on the independent and dependent parameters in roller electrospinning of polyurethane.

    PubMed

    Cengiz-Callioglu, Funda; Jirsak, Oldrich; Dayik, Mehmet

    2013-07-01

    This paper discusses the effects of 1,1,2,2 tetrachlorethylen (TCE) non-solvent addition on the independent (electrical conductivity, dielectric constant, surface tension and the theological properties of the solution etc.) and dependent parameters (number of Taylor cones per square meter (NTC/m2), spinning performance for one Taylor cone (SP/TC), total spinning performance (SP), fiber properties such as diameter, diameter uniformity, non-fibrous area) in roller electrospinning of polyurethane (PU). The same process parameters (voltage, distance of the electrodes, humidity, etc.) were applied for all solutions during the spinning process. According to the results, the effect of TCE non-solvent concentration on the dielectric constant, surface tension, rheological properties of the solution and also spinning performance was important statistically. Beside these results, TCE non-solvent concentration effects quality of fiber and nano web structure. Generally high fiber density, low non-fibrous percentage and uniform nanofibers were obtained from fiber morphology analyses. PMID:23901497

  8. Dataset of calcified plaque condition in the stenotic coronary artery lesion obtained using multidetector computed tomography to indicate the addition of rotational atherectomy during percutaneous coronary intervention.

    PubMed

    Akutsu, Yasushi; Hamazaki, Yuji; Sekimoto, Teruo; Kaneko, Kyouichi; Kodama, Yusuke; Li, Hui-Ling; Suyama, Jumpei; Gokan, Takehiko; Sakai, Koshiro; Kosaki, Ryota; Yokota, Hiroyuki; Tsujita, Hiroaki; Tsukamoto, Shigeto; Sakurai, Masayuki; Sambe, Takehiko; Oguchi, Katsuji; Uchida, Naoki; Kobayashi, Shinichi; Aoki, Atsushi; Kobayashi, Youichi

    2016-06-01

    Our data shows the regional coronary artery calcium scores (lesion CAC) on multidetector computed tomography (MDCT) and the cross-section imaging on MDCT angiography (CTA) in the target lesion of the patients with stable angina pectoris who were scheduled for percutaneous coronary intervention (PCI). CAC and CTA data were measured using a 128-slice scanner (Somatom Definition AS+; Siemens Medical Solutions, Forchheim, Germany) before PCI. CAC was measured in a non-contrast-enhanced scan and was quantified using the Calcium Score module of SYNAPSE VINCENT software (Fujifilm Co. Tokyo, Japan) and expressed in Agatston units. CTA were then continued with a contrast-enhanced ECG gating to measure the severity of the calcified plaque condition. We present that both CAC and CTA data are used as a benchmark to consider the addition of rotational atherectomy during PCI to severely calcified plaque lesions. PMID:26977441

  9. Phytosterol intake and dietary fat reduction are independent and additive in their ability to reduce plasma LDL cholesterol

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The plasma LDL-cholesterol-lowering effect of plant sterols (PS) appears to be independent of background diet, but definitive proof is lacking. The effect of background diet on plasma concentrations of PS has not been reported. We determined the effects of manipulating dietary contents of PS and f...

  10. The Effects of Providing Additional Reading Opportunities for Struggling Readers at Their Independent Reading Levels within Content Areas

    ERIC Educational Resources Information Center

    Vasinko, Teresa

    2013-01-01

    A mixed-methods study was conducted to determine if extra reading practice incorporated into fifth-and sixth-grade social studies and science content classes would have a positive impact on reading assessments for readers at risk. At-risk readers' independent reading levels were assessed using Dynamic Indicator of Basic Early Literacy Skills or…

  11. Maturation of poultry G-I microbiome during 42d of growth is independent of organic acid feed additives

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Poultry remains a major source of foodborne infections in the U.S. and globally. A variety of additives with presumed anti-microbial and/or growth-promoting effects are commonly added to poultry feed, yet the effects of these additives on the ecology of the gastro-intestinal microbial community (th...

  12. Independent and additive association of prenatal famine exposure and intermediary life conditions with adult mortality age 18–63 years

    PubMed Central

    Ekamper, P.; van Poppel, F.; Stein, A.D.; Lumey, L.H.

    2014-01-01

    Objectives To quantify the relation between prenatal famine exposure and adult mortality, taking into account mediating effects of intermediary life conditions. Design Historical follow-up study. Setting The Dutch famine (Hunger Winter) of 1944–1945 which occurred towards the end of WWII in occupied Netherlands. Study population From 408,015 Dutch male births born 1944–1947, examined for military service at age 18, we selected for follow-up all men born at the time of the famine in six affected cities in the Western Netherlands (n=25,283), and a sample of unexposed time (n=10,667) and place (n=9,087) controls. These men were traced and followed for mortality through the national population and death record systems. Outcome measure All-cause mortality between ages 18 and 63 years using Cox proportional hazards models adjusted for intermediary life conditions. Results An increase in mortality was seen after famine exposure in early gestation (HR 1.12; 95% confidence interval (CI): 1.01 to 1.24) but not late gestation (HR 1.04; 95% CI: 0.96 to 1.13). Among intermediary life conditions at age 18 years, educational level was inversely associated with mortality and mortality was elevated in men with fathers with a manual versus non-manual occupations (HR 1.08; CI: 1.02 to 1.16) and in men who were declared unfit for military service (HR 1.44; CI: 1.31 to 1.58). Associations of intermediate factors with mortality were independent of famine exposure in early life and associations between prenatal famine exposure and adult mortality were independent of social class and education at age 18. Conclusions Timing of exposure in relation to the stage of pregnancy may be of critical importance for later health outcomes independent of intermediary life conditions. PMID:24262812

  13. Statistical Reference Datasets

    National Institute of Standards and Technology Data Gateway

    Statistical Reference Datasets (Web, free access)   The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.

  14. Phylogenomic analysis of EST datasets.

    PubMed

    Peregrín-Alvarez, José M; Parkinson, John

    2009-01-01

    To date the genomes of over 600 organisms have been generated of which 100 are from eukaryotes. Together with partial genome data for an additional 700 eukaryotic organisms, these exceptional sequence resources offer new opportunities to explore phylogenetic relationships and species diversity. The identification of highly diverse sequences specific to an EST-based sequence dataset offers insights into the extent of genetic novelty within that dataset. Sequences that are only shared with other related species from the same taxon might represent genes associated with taxon-specific innovations. On the other hand, sequences that are highly conserved across many other species offer valuable resources for performing more in-depth phylogenetic analyses. In the following chapter, we guide the reader through the process of examining their sequence datasets in the context of phylogenetic relationships. Performed across large-scale datasets, such analyses are termed Phylogenomics. Two complementary approaches are described, both based on the use of BLAST similarity metrics. The first uses an established Java tool - SimiTri - to visualize sequence similarity relationships between the EST dataset and three user-defined datasets. The second focuses on the use of phylogenetic profiles to identify groups of taxonomically related sequences. PMID:19277568

  15. Identification of novel estrogen receptor (ER) agonists that have additional and complementary anti-cancer activities via ER-independent mechanism.

    PubMed

    Kim, Taelim; Kim, Hye-In; An, Ji-Young; Lee, Jun; Lee, Na-Rae; Heo, Jinyuk; Kim, Ji-Eun; Yu, Jihyun; Lee, Yong Sup; Inn, Kyung-Soo; Kim, Nam-Jung

    2016-04-01

    In this study, a series of bis(4-hydroxy)benzophenone oxime ether derivatives such as 12c, 12e and 12h were identified as novel estrogen receptor (ER) agonists that have additional and complementary anti-proliferative activities via ER-independent mechanism in cancer cells. These compounds are expected to overcome the therapeutic limitation of existing ER agonists such as estradiol and tamoxifen, which have been known to induce the proliferation of cancer cells. PMID:26905830

  16. Segmentation of Unstructured Datasets

    NASA Technical Reports Server (NTRS)

    Bhat, Smitha

    1996-01-01

    Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.

  17. Dataset Lifecycle Policy

    NASA Technical Reports Server (NTRS)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  18. Fixing Dataset Search

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  19. Introduction of a simple-model-based land surface dataset for Europe

    NASA Astrophysics Data System (ADS)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar

  20. Handwritten mathematical symbols dataset

    PubMed Central

    Chajri, Yassine; Bouikhalene, Belaid

    2016-01-01

    Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc. PMID:27006975

  1. Handwritten mathematical symbols dataset.

    PubMed

    Chajri, Yassine; Bouikhalene, Belaid

    2016-06-01

    Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc. PMID:27006975

  2. Fast randomization of large genomic datasets while preserving alteration counts

    PubMed Central

    Gobbi, Andrea; Iorio, Francesco; Dawson, Kevin J.; Wedge, David C.; Tamborero, David; Alexandrov, Ludmil B.; Lopez-Bigas, Nuria; Garnett, Mathew J.; Jurman, Giuseppe; Saez-Rodriguez, Julio

    2014-01-01

    Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a ‘mutually exclusive’ manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. Availability and implementation: BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html Contact: iorio@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161255

  3. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    SciTech Connect

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul; Li, Yaquin; Garg, Seema; Tobin Jr, Kenneth William; Chaum, Edward

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.

  4. A Risk Score with Additional Four Independent Factors to Predict the Incidence and Recovery from Metabolic Syndrome: Development and Validation in Large Japanese Cohorts

    PubMed Central

    Obokata, Masaru; Negishi, Kazuaki; Ohyama, Yoshiaki; Okada, Haruka; Imai, Kunihiko; Kurabayashi, Masahiko

    2015-01-01

    Background Although many risk factors for Metabolic syndrome (MetS) have been reported, there is no clinical score that predicts its incidence. The purposes of this study were to create and validate a risk score for predicting both incidence and recovery from MetS in a large cohort. Methods Subjects without MetS at enrollment (n = 13,634) were randomly divided into 2 groups and followed to record incidence of MetS. We also examined recovery from it in rest 2,743 individuals with prevalent MetS. Results During median follow-up of 3.0 years, 878 subjects in the derivation and 757 in validation cohorts developed MetS. Multiple logistic regression analysis identified 12 independent variables from the derivation cohort and initial score for subsequent MetS was created, which showed good discrimination both in the derivation (c-statistics 0.82) and validation cohorts (0.83). The predictability of the initial score for recovery from MetS was tested in the 2,743 MetS population (906 subjects recovered from MetS), where nine variables (including age, sex, γ-glutamyl transpeptidase, uric acid and five MetS diagnostic criteria constituents.) remained significant. Then, the final score was created using the nine variables. This score significantly predicted both the recovery from MetS (c-statistics 0.70, p<0.001, 78% sensitivity and 54% specificity) and incident MetS (c-statistics 0.80) with an incremental discriminative ability over the model derived from five factors used in the diagnosis of MetS (continuous net reclassification improvement: 0.35, p < 0.001 and integrated discrimination improvement: 0.01, p<0.001). Conclusions We identified four additional independent risk factors associated with subsequent MetS, developed and validated a risk score to predict both incident and recovery from MetS. PMID:26230621

  5. STAT4 Associates with SLE Through Two Independent Effects that Correlate with Gene Expression and Act Additively with IRF5 to Increase Risk

    PubMed Central

    Abelson, Anna-Karin; Delgado-Vega, Angélica M.; Kozyrev, Sergey V.; Sánchez, Elena; Velázquez-Cruz, Rafael; Eriksson, Niclas; Wojcik, Jerome; Reddy, Prasad Linga; Lima, Guadalupe; D’Alfonso, Sandra; Migliaresi, Sergio; Baca, Vicente; Orozco, Lorena; Witte, Torsten; Ortego-Centeno, Norberto; Abderrahim, Hadi; Pons-Estel, Bernardo A.; Gutiérrez, Carmen; Suárez, Ana; González-Escribano, Maria Francisca; Martin, Javier; Alarcón-Riquelme, Marta E.

    2013-01-01

    Objectives To confirm and define the genetic association of STAT4 and systemic lupus erythematosus, investigate the possibility of correlations with differential splicing and/or expression levels, and genetic interaction with IRF5. Methods 30 tag SNPs were genotyped in an independent set of Spanish cases and controls. SNPs surviving correction for multiple tests were genotyped in 5 new sets of cases and controls for replication. STAT4 cDNA was analyzed by 5’-RACE PCR and sequencing. Expression levels were measured by quantitative PCR. Results In the fine-mapping, four SNPs were significant after correction for multiple testing, with rs3821236 and rs3024866 as the strongest signals, followed by the previously associated rs7574865, and by rs1467199. Association was replicated in all cohorts. After conditional regression analyses, two major independent signals represented by SNPs rs3821236 and rs7574865, remained significant across the sets. These SNPs belong to separate haplotype blocks. High levels of STAT4 expression correlated with SNPs rs3821236, rs3024866 (both in the same haplotype block) and rs7574865 but not with other SNPs. We also detected transcription of alternative tissue-specific exons 1, indicating presence of tissue-specific promoters of potential importance in the expression of STAT4. No interaction with associated SNPs of IRF5 was observed using regression analysis. Conclusions These data confirm STAT4 as a susceptibility gene for SLE and suggest the presence of at least two functional variants affecting levels of STAT4. Our results also indicate that both genes STAT4 and IRF5 act additively to increase risk for SLE. PMID:19019891

  6. NATIONAL ELEVATION DATASET

    EPA Science Inventory

    The USGS National Elevation Dataset (NED) has been developed by merging the highest-resolution, best-quality elevation data available across the United States into a seamless raster format. NED is the result of the maturation of the USGS effort to provide 1:24,000-scale Digital ...

  7. NATIONAL ELEVATION DATASET HILLSHADE

    EPA Science Inventory

    The USGS National Elevation Dataset (NED) has been developed bymerging the highest-resolution, best-quality elevation data available across the United States into a seamless raster format. NED is the result of the maturation of the USGS effort to provide 1:24,000-scale Digital E...

  8. NATIONAL HYDROGRAPHY DATASET

    EPA Science Inventory

    The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the nations surface water drainage system. It is based initially on the content of the U.S. Geological Survey 1:100,000-scal...

  9. NATIONAL HYDROGRAPHY DATASET

    EPA Science Inventory

    Resource Purpose:The National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that contains information about surface water features such as lakes, ponds, streams, rivers, springs and wells. Within the NHD, surface water features are combined to fo...

  10. Independent Contributions of the Central Executive, Intelligence, and In-Class Attentive Behavior to Developmental Change in the Strategies Used to Solve Addition Problems

    PubMed Central

    Geary, David C.; Hoard, Mary K.; Nugent, Lara

    2012-01-01

    Children’s (n = 275) use of retrieval, decomposition (e.g., 7 = 4+3, and thus 6+7=6+4+3), and counting to solve additional problems was longitudinally assessed from first to fourth grade, and intelligence, working memory, and in-class attentive behavior was assessed in one or several grades. The goal was to assess the relation between capacity of the central executive component of working memory, controlling for intelligence and in-class attentive behavior, and grade-related changes in children’s use of these strategies. The predictor on intercept effects from multilevel models revealed that children with higher central executive capacity correctly retrieved more facts and used the most sophisticated counting procedure more frequently and accurately than did their lower capacity peers at the beginning of first grade, but the predictor on slope effects indicated that this advantage disappeared (retrieval) or declined in importance (counting) from first to fourth grade. The predictor on slope effects also revealed that from first through fourth grade, children with higher capacity adopted the decomposition strategy more quickly than did other children. The results remained robust with controls for children’s sex, race, school site, speed of encoding Arabic numerals and articulating number words, and mathematics achievement in kindergarten. The results also revealed that intelligence and in-class attentive behavior independently contributed to children’s strategy development. PMID:22698947

  11. Plant Functional Diversity Can Be Independent of Species Diversity: Observations Based on the Impact of 4-Yrs of Nitrogen and Phosphorus Additions in an Alpine Meadow

    PubMed Central

    Li, Wei; Cheng, Ji-Min; Yu, Kai-Liang; Epstein, Howard E.; Guo, Liang; Jing, Guang-Hua; Zhao, Jie; Du, Guo-Zhen

    2015-01-01

    Past studies have widely documented the decrease in species diversity in response to addition of nutrients, however functional diversity is often independent from species diversity. In this study, we conducted a field experiment to examine the effect of nitrogen and phosphorus fertilization ((NH4)2 HPO4) at 0, 15, 30 and 60 g m-2 yr-1 (F0, F15, F30 and F60) after 4 years of continuous fertilization on functional diversity and species diversity, and its relationship with productivity in an alpine meadow community on the Tibetan Plateau. To this purpose, three community-weighted mean trait values (specific leaf area, SLA; mature plant height, MPH; and seed size, SS) for 30 common species in each fertilization level were determined; three components of functional diversity (functional richness, FRic; functional evenness, FEve; and Rao’s index of quadratic entropy, FRao) were quantified. Our results showed that: (i) species diversity sharply decreased, but functional diversity remained stable with fertilization; (ii) community-weighted mean traits (SLA and MPH) had a significant increase along the fertilization level; (iii) aboveground biomass was not correlated with functional diversity, but it was significantly correlated with species diversity and MPH. Our results suggest that decreases in species diversity due to fertilization do not result in corresponding changes in functional diversity. Functional identity of species may be more important than functional diversity in influencing aboveground productivity in this alpine meadow community, and our results also support the mass ratio hypothesis; that is, the traits of the dominant species influenced the community biomass production. PMID:26295345

  12. Prediction of joint algal toxicity of nano-CeO2/nano-TiO2 and florfenicol: Independent action surpasses concentration addition.

    PubMed

    Wang, Zhuang; Wang, Se; Peijnenburg, Willie J G M

    2016-08-01

    Co-exposure of aquatic organisms to engineered nanoparticles (ENPs) and antibiotics is likely to take place in the environment. However, the impacts of co-exposure on aquatic organisms are virtually unknown and understanding the joint toxicity of ENPs and antibiotics is a topic of importance. The independent action (IA) model and the concentration addition (CA) model are two of the most common approaches to mixture toxicity assessment. In this study, the joint toxicity of two ENPs (nCeO2 and nTiO2) and one antibiotic (florfenicol, FLO) to Chlorella pyrenoidosa was determined to compare the applicability of the IA and the CA model. Concentration-response analyses were performed for single toxicants and for binary mixtures containing FLO and one of the ENPs at two suspended particle concentrations. The effect concentrations and the observed effects of the binary mixtures were compared to the predictions of the joint toxicity. The observed toxicity associated with the nCeO2 or nTiO2 exposure was enhanced by the concomitant FLO exposure. The joint toxicity of nCeO2 and FLO was significantly higher than that of nTiO2 and FLO. Predictions based on the IA and CA models tend to underestimate the overall toxicity (in terms of median effect concentration) of the binary mixtures, but IA performs better than CA, irrespective of the effect level under consideration and the types of mixtures studied. This result underpins the need to consider the effects of mixtures of ENPs and organic chemicals on aquatic organisms, and the practicability of the IA and CA methods in toxicity assessment of ENPs. PMID:27156210

  13. Selecting optimal partitioning schemes for phylogenomic datasets

    PubMed Central

    2014-01-01

    Background Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. Methods We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. Results We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. Conclusions These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets. PMID:24742000

  14. Error characterisation of global active and passive microwave soil moisture datasets

    NASA Astrophysics Data System (ADS)

    Dorigo, W. A.; Scipal, K.; Parinussa, R. M.; Liu, Y. Y.; Wagner, W.; de Jeu, R. A. M.; Naeimi, V.

    2010-12-01

    Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT) and passive (AMSR-E and SSM/I) microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active), the use of different frequencies (C-, X-, and Ku-band) for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH). The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different datasets are

  15. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  16. Development of an Independent Global Land Cover Validation Dataset

    NASA Astrophysics Data System (ADS)

    Sulla-Menashe, D. J.; Olofsson, P.; Woodcock, C. E.; Holden, C.; Metcalfe, M.; Friedl, M. A.; Stehman, S. V.; Herold, M.; Giri, C.

    2012-12-01

    Accurate information related to the global distribution and dynamics in global land cover is critical for a large number of global change science questions. A growing number of land cover products have been produced at regional to global scales, but the uncertainty in these products and the relative strengths and weaknesses among available products are poorly characterized. To address this limitation we are compiling a database of high spatial resolution imagery to support international land cover validation studies. Validation sites were selected based on a probability sample, and may therefore be used to estimate statistically defensible accuracy statistics and associated standard errors. Validation site locations were identified using a stratified random design based on 21 strata derived from an intersection of Koppen climate classes and a population density layer. In this way, the two major sources of global variation in land cover (climate and human activity) are explicitly included in the stratification scheme. At each site we are acquiring high spatial resolution (< 1-m) satellite imagery for 5-km x 5-km blocks. The response design uses an object-oriented hierarchical legend that is compatible with the UN FAO Land Cover Classification System. Using this response design, we are classifying each site using a semi-automated algorithm that blends image segmentation with a supervised RandomForest classification algorithm. In the long run, the validation site database is designed to support international efforts to validate land cover products. To illustrate, we use the site database to validate the MODIS Collection 4 Land Cover product, providing a prototype for validating the VIIRS Surface Type Intermediate Product scheduled to start operational production early in 2013. As part of our analysis we evaluate sources of error in coarse resolution products including semantic issues related to the class definitions, mixed pixels, and poor spectral separation between classes.

  17. Reduced Number of Transitional and Naive B Cells in Addition to Decreased BAFF Levels in Response to the T Cell Independent Immunogen Pneumovax®23

    PubMed Central

    Roth, Alena; Glaesener, Stephanie; Schütz, Katharina; Meyer-Bahlburg, Almut

    2016-01-01

    Protective immunity against T cell independent (TI) antigens such as Streptococcus pneumoniae is characterized by antibody production of B cells induced by the combined activation of T cell independent type 1 and type 2 antigens in the absence of direct T cell help. In mice, the main players in TI immune responses have been well defined as marginal zone (MZ) B cells and B-1 cells. However, the existence of human equivalents to these B cell subsets and the nature of the human B cell compartment involved in the immune reaction remain elusive. We therefore analyzed the effect of a TI antigen on the B cell compartment through immunization of healthy individuals with the pneumococcal polysaccharide (PnPS)-based vaccine Pneumovax®23, and subsequent characterization of B cell subpopulations. Our data demonstrates a transient decrease of transitional and naïve B cells, with a concomitant increase of IgA+ but not IgM+ or IgG+ memory B cells and a predominant generation of PnPS-specific IgA+ producing plasma cells. No alterations could be detected in T cells, or proposed human B-1 and MZ B cell equivalents. Consistent with the idea of a TI immune response, antigen-specific memory responses could not be observed. Finally, BAFF, which is supposed to drive class switching to IgA, was unexpectedly found to be decreased in serum in response to Pneumovax®23. Our results demonstrate that a characteristic TI response induced by Pneumovax®23 is associated with distinct phenotypical and functional changes within the B cell compartment. Those modulations occur in the absence of any modulations of T cells and without the development of a specific memory response. PMID:27031098

  18. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. PMID:27408929

  19. Data Mining for Imbalanced Datasets: An Overview

    NASA Astrophysics Data System (ADS)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  20. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades

    PubMed Central

    Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K.; Thakor, Nitish

    2015-01-01

    Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches. PMID:26635513

  1. Independent Contributions of the Central Executive, Intelligence, and In-Class Attentive Behavior to Developmental Change in the Strategies Used to Solve Addition Problems

    ERIC Educational Resources Information Center

    Geary, David C.; Hoard, Mary K.; Nugent, Lara

    2012-01-01

    Children's (N = 275) use of retrieval, decomposition (e.g., 7 = 4+3 and thus 6+7 = 6+4+3), and counting to solve additional problems was longitudinally assessed from first grade to fourth grade, and intelligence, working memory, and in-class attentive behavior was assessed in one or several grades. The goal was to assess the relation between…

  2. Five year global dataset: NMC operational analyses (1978 to 1982)

    NASA Technical Reports Server (NTRS)

    Straus, David; Ardizzone, Joseph

    1987-01-01

    This document describes procedures used in assembling a five year dataset (1978 to 1982) using NMC Operational Analysis data. These procedures entailed replacing missing and unacceptable data in order to arrive at a complete dataset that is continuous in time. In addition, a subjective assessment on the integrity of all data (both preliminary and final) is presented. Documentation on tapes comprising the Five Year Global Dataset is also included.

  3. Genomic Datasets for Cancer Research

    Cancer.gov

    A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.

  4. Providing Geographic Datasets as Linked Data in Sdi

    NASA Astrophysics Data System (ADS)

    Hietanen, E.; Lehto, L.; Latvala, P.

    2016-06-01

    In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  5. An integrated pan-tropical biomass map using multiple reference datasets.

    PubMed

    Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

    2016-04-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. PMID:26499288

  6. Detecting bimodality in astronomical datasets

    NASA Technical Reports Server (NTRS)

    Ashman, Keith A.; Bird, Christina M.; Zepf, Stephen E.

    1994-01-01

    We discuss statistical techniques for detecting and quantifying bimodality in astronomical datasets. We concentrate on the KMM algorithm, which estimates the statistical significance of bimodality in such datasets and objectively partitions data into subpopulations. By simulating bimodal distributions with a range of properties we investigate the sensitivity of KMM to datasets with varying characteristics. Our results facilitate the planning of optimal observing strategies for systems where bimodality is suspected. Mixture-modeling algorithms similar to the KMM algorithm have been used in previous studies to partition the stellar population of the Milky Way into subsystems. We illustrate the broad applicability of KMM by analyzing published data on globular cluster metallicity distributions, velocity distributions of galaxies in clusters, and burst durations of gamma-ray sources. FORTRAN code for the KMM algorithm and directions for its use are available from the authors upon request.

  7. A joint dataset of fair-weather atmospheric electricity

    NASA Astrophysics Data System (ADS)

    Tammet, H.

    2009-02-01

    A new open access dataset ATMEL2007A ( http://ael.physic.ut.ee/tammet/dd/) takes advantage of the diary-type data structure. The dataset comprises the measurements of atmospheric electric field, positive and negative conductivities, air ion concentrations and accompanying meteorological measurements at 13 stations, including 7 stations of the former World Data Centre network. The dataset incorporates more than half a million diurnal series of hourly averages and it can easily be expanded with additional data. The dataset is designed for importing into a personal computer, which makes possible the appending of private data and safely protecting it from public access. Available free software allows extracting data excerpts in the form of traditional data tables or spreadsheets. Examples show how the dataset can be used in the research of the correlations and trends in atmospheric electricity and air pollution.

  8. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  9. Development of a global historic monthly mean precipitation dataset

    NASA Astrophysics Data System (ADS)

    Yang, Su; Xu, Wenhui; Xu, Yan; Li, Qingxiang

    2016-04-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  10. Geospatial datasets for watershed delineation and characterization used in the Hawaii StreamStats web application

    USGS Publications Warehouse

    Rea, Alan; Skinner, Kenneth D.

    2012-01-01

    The U.S. Geological Survey Hawaii StreamStats application uses an integrated suite of raster and vector geospatial datasets to delineate and characterize watersheds. The geospatial datasets used to delineate and characterize watersheds on the StreamStats website, and the methods used to develop the datasets are described in this report. The datasets for Hawaii were derived primarily from 10 meter resolution National Elevation Dataset (NED) elevation models, and the National Hydrography Dataset (NHD), using a set of procedures designed to enforce the drainage pattern from the NHD into the NED, resulting in an integrated suite of elevation-derived datasets. Additional sources of data used for computing basin characteristics include precipitation, land cover, soil permeability, and elevation-derivative datasets. The report also includes links for metadata and downloads of the geospatial datasets.

  11. Source Detection with Interferometric Datasets

    NASA Astrophysics Data System (ADS)

    Trott, Cathryn M.; Wayth, Randall B.; Macquart, Jean-Pierre R.; Tingay, Steven J.

    2012-04-01

    The detection of sources in interferometric radio data typically relies on extracting information from images, formed by Fourier transform of the underlying visibility dataset, and CLEANed of contaminating sidelobes through iterative deconvolution. Variable and transient radio sources span a large range of variability timescales, and their study has the potential to enhance our knowledge of the dynamic universe. Their detection and classification involve large data rates and non-stationary PSFs, commensal observing programs and ambitious science goals, and will demand a paradigm shift in the deployment of next-generation instruments. Optimal source detection and classification in real time requires efficient and automated algorithms. On short time-scales variability can be probed with an optimal matched filter detector applied directly to the visibility dataset. This paper shows the design of such a detector, and some preliminary detection performance results.

  12. Are Independent Probes Truly Independent?

    ERIC Educational Resources Information Center

    Camp, Gino; Pecher, Diane; Schmidt, Henk G.; Zeelenberg, Rene

    2009-01-01

    The independent cue technique has been developed to test traditional interference theories against inhibition theories of forgetting. In the present study, the authors tested the critical criterion for the independence of independent cues: Studied cues not presented during test (and unrelated to test cues) should not contribute to the retrieval…

  13. ISRUC-Sleep: A comprehensive public dataset for sleep researchers.

    PubMed

    Khalighi, Sirvan; Sousa, Teresa; Santos, José Moutinho; Nunes, Urbano

    2016-02-01

    To facilitate the performance comparison of new methods for sleep patterns analysis, datasets with quality content, publicly-available, are very important and useful. We introduce an open-access comprehensive sleep dataset, called ISRUC-Sleep. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC). The dataset comprises three groups of data: (1) data concerning 100 subjects, with one recording session per subject; (2) data gathered from 8 subjects; two recording sessions were performed per subject, and (3) data collected from one recording session related to 10 healthy subjects. The polysomnography (PSG) recordings, associated with each subject, were visually scored by two human experts. Comparing the existing sleep-related public datasets, ISRUC-Sleep provides data of a reasonable number of subjects with different characteristics such as: data useful for studies involving changes in the PSG signals over time; and data of healthy subjects useful for studies involving comparison of healthy subjects with the patients, suffering from sleep disorders. This dataset was created aiming to complement existing datasets by providing easy-to-apply data collection with some characteristics not covered yet. ISRUC-Sleep can be useful for analysis of new contributions: (i) in biomedical signal processing; (ii) in development of ASSC methods; and (iii) on sleep physiology studies. To evaluate and compare new contributions, which use this dataset as a benchmark, results of applying a subject-independent automatic sleep stage classification (ASSC) method on ISRUC-Sleep dataset are presented. PMID:26589468

  14. A polymer dataset for accelerated property prediction and design

    PubMed Central

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-01-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. PMID:26927478

  15. Forecasting medical waste generation using short and extra short datasets: Case study of Lithuania.

    PubMed

    Karpušenkaitė, Aistė; Ruzgas, Tomas; Denafas, Gintaras

    2016-04-01

    The aim of the study is to evaluate the performance of various mathematical modelling methods, while forecasting medical waste generation using Lithuania's annual medical waste data. Only recently has a hazardous waste collection system that includes medical waste been created and therefore the study access to gain large sets of relevant data for its research has been somewhat limited. According to data that was managed to be obtained, it was decided to develop three short and extra short datasets with 20, 10 and 6 observations. Spearman's correlation calculation showed that the influence of independent variables, such as visits at hospitals and other medical institutions, number of children in the region, number of beds in hospital and other medical institutions, average life expectancy and doctor's visits in that region are the most consistent and common in all three datasets. Tests on the performance of artificial neural networks, multiple linear regression, partial least squares, support vector machines and four non-parametric regression methods were conducted on the collected datasets. The best and most promising results were demonstrated by generalised additive (R(2) = 0.90455) in the regional data case, smoothing splines models (R(2) = 0.98584) in the long annual data case and multilayer feedforward artificial neural networks in the short annual data case (R(2) = 0.61103). PMID:26879908

  16. The CMS dataset bookkeeping service

    SciTech Connect

    Afaq, Anzar,; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay; /Fermilab

    2007-10-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  17. Salam's independence

    NASA Astrophysics Data System (ADS)

    Fraser, Gordon

    2009-01-01

    In his kind review of my biography of the Nobel laureate Abdus Salam (December 2008 pp45-46), John W Moffat wrongly claims that Salam had "independently thought of the idea of parity violation in weak interactions".

  18. Historical Space Weather Datasets within NOAA

    NASA Astrophysics Data System (ADS)

    Denig, W. F.; Mabie, J. J.; Horan, K.; Clark, C.

    2013-12-01

    The National Geophysical Data Center (NGDC) is primarily responsible for scientific data stewardship of operational space weather data from NOAA's fleet of environmental satellites in geostationary and polar, low-earth orbits. In addition to this and as the former World Data Center for Solar Terrestrial Physics from 1957 to 2011 NGDC acquired a large variety of solar and space environmental data in differing formats including paper records and on film. Management of this heterogeneous collection of environmental data is a continued responsibility of NGDC as a participant in the new World Data System. Through the former NOAA Climate Data Modernization Program many of these records were converted to digital format and are readily available online. However, reduced funding and staff have put a strain on NGDC's ability to effectively steward these historical datasets, some of which are unique and, in particular cases, were the basis of fundamental scientific breakthroughs in our understanding of the near-earth space environment. In this talk, I will provide an overview of the historical space weather datasets which are currently managed by NGDC and discuss strategies for preserving these data during these fiscally stressing times.

  19. Data Integration for Heterogenous Datasets

    PubMed Central

    2014-01-01

    Abstract More and more, the needs of data analysts are requiring the use of data outside the control of their own organizations. The increasing amount of data available on the Web, the new technologies for linking data across datasets, and the increasing need to integrate structured and unstructured data are all driving this trend. In this article, we provide a technical overview of the emerging “broad data” area, in which the variety of heterogeneous data being used, rather than the scale of the data being analyzed, is the limiting factor in data analysis efforts. The article explores some of the emerging themes in data discovery, data integration, linked data, and the combination of structured and unstructured data. PMID:25553272

  20. Watershed Boundary Dataset for Mississippi

    USGS Publications Warehouse

    Wilson, K. Van, Jr.; Clair, Michael G., II; Turnipseed, D. Phil; Rebich, Richard A.

    2009-01-01

    The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (Hydrologic Unit Codes) were further subdivided into 10-digit watersheds (62.5 to 391 square miles (mi2)) and 12-digit subwatersheds (15.6 to 62.5 mi2) - the exceptions being the Delta part of Mississippi and the Mississippi River inside levees, which were subdivided into 10-digit watersheds only. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, subdivision codes and names, and drainage-area data - are stored in a Geographic Information System database, which are available at: http://ms.water.usgs.gov/. This map shows information on drainage and hydrography in the form of U.S. Geological Survey hydrologic unit boundaries for water-resource 2-digit regions, 4-digit subregions, 6-digit basins (formerly called accounting units), 8-digit subbasins (formerly called cataloging units), 10-digit watershed, and 12-digit subwatersheds in Mississippi. A description of the project study area, methods used in the development of watershed and subwatershed boundaries for Mississippi, and results are presented in Wilson and others (2008). The data presented in this map and by Wilson and others (2008) supersede the data presented for Mississippi by Seaber and others (1987) and U.S. Geological Survey (1977).

  1. The Johns Hopkins University multimodal dataset for human action recognition

    NASA Astrophysics Data System (ADS)

    Murray, Thomas S.; Mendat, Daniel R.; Pouliquen, Philippe O.; Andreou, Andreas G.

    2015-05-01

    The Johns Hopkins University MultiModal Action (JHUMMA) dataset contains a set of twenty-one actions recorded with four sensor systems in three different modalities. The data was collected with a data acquisition system that includes three independent active sonar devices at three different frequencies and a Microsoft Kinect sensor that provides both RGB and Depth data. We have developed algorithms for human action recognition from active acoustics and provide benchmark baseline recognition performance results.

  2. A comprehensive polymer dataset for accelerated property prediction and design

    NASA Astrophysics Data System (ADS)

    Tran, Huan; Kumar Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Oilania, Ghanshyam; Ramprasad, Rampi

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. In principle, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to rapidly predict the properties of materials not already in the dataset, thus accelerating the design of materials with preferable properties. Here, we report the development of a dataset of 1,065 polymers and related materials, which is available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. The dataset will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. We discuss some information ``learned`` from the dataset and suggest that it may be used as the playground for further data-mining work.

  3. AMADA-Analysis of multidimensional astronomical datasets

    NASA Astrophysics Data System (ADS)

    de Souza, R. S.; Ciardi, B.

    2015-09-01

    We present AMADA, an interactive web application to analyze multidimensional datasets. The user uploads a simple ASCII file and AMADA performs a number of exploratory analysis together with contemporary visualizations diagnostics. The package performs a hierarchical clustering in the parameter space, and the user can choose among linear, monotonic or non-linear correlation analysis. AMADA provides a number of clustering visualization diagnostics such as heatmaps, dendrograms, chord diagrams, and graphs. In addition, AMADA has the option to run a standard or robust principal components analysis, displaying the results as polar bar plots. The code is written in R and the web interface was created using the SHINY framework. AMADA source-code is freely available at https://goo.gl/KeSPue, and the shiny-app at http://goo.gl/UTnU7I.

  4. Feature expressions: creating and manipulating sequence datasets.

    PubMed

    Fristensky, B

    1993-12-25

    Annotation of features, such as introns, exons and protein coding regions in GenBank/EMBL/DDBJ entries is now standardized through use of the Features Table (FT) language. The essence of the FT language is described by the relation 'expression-->sequence', meaning that each FT expression evaluates to a sequence. For example, the expression M74750:1..50 evaluates to the first 50 bases of the sequence with accession number M74750. Because FT is intrinsic to the database definition, it can serve as a software- and platform-independent lingua franca for sequence manipulation. The XYLEM package makes it possible to create and manipulate sequence datasets using FT expressions. FEATURES is a program that resolves FT expressions into their corresponding sequences. Annotated features can be retrieved either by feature key or by expression. Even unannotated portions of a sequence can be retrieved by user-generated FT expressions. Applications of the FT language include retrieval of subsequences from large sequence entries, generation of chromosome models or artificial DNA constructs, and representation of restriction maps or mutants. PMID:8290362

  5. Understanding independence

    NASA Astrophysics Data System (ADS)

    Annan, James; Hargreaves, Julia

    2016-04-01

    In order to perform any Bayesian processing of a model ensemble, we need a prior over the ensemble members. In the case of multimodel ensembles such as CMIP, the historical approach of ``model democracy'' (i.e. equal weight for all models in the sample) is no longer credible (if it ever was) due to model duplication and inbreeding. The question of ``model independence'' is central to the question of prior weights. However, although this question has been repeatedly raised, it has not yet been satisfactorily addressed. Here I will discuss the issue of independence and present a theoretical foundation for understanding and analysing the ensemble in this context. I will also present some simple examples showing how these ideas may be applied and developed.

  6. REX: response exploration for neuroimaging datasets.

    PubMed

    Duff, Eugene P; Cunnington, Ross; Egan, Gary F

    2007-01-01

    Neuroimaging technologies produce large and complex datasets. The challenge of comprehensively analysing the recorded dynamics remains an important field of research. The whole-brain linear modelling of hypothesised response dynamics and experimental effects must utilise simple basis sets, which may not detect unexpected or complex signal effects. These unmodelled effects can influence statistical mapping results, and provide important additional clues to the underlying neural dynamics. They can be detected via exploration of the raw signal, however this can be difficult. Specialised visualisation tools are required to manage the huge number of voxels, events and scans. Many effects can be occluded by noise in individual voxel time-series. This paper describes a visualisation framework developed for the assessment of entire neuroimaging datasets. While currently available tools tend to be tied to a specific model of experimental effects, this framework includes a novel metadata schema that enables the rapid selection and processing of responses based on easily-adjusted classifications of scans, brain regions, and events. Flexible event-related averaging and process pipelining capabilities enable users to investigate the effects of preprocessing algorithms and to visualise power spectra and other transformations of the data. The framework has been implemented as a MATLAB package, REX (Response Exploration), which has been utilised within our lab and is now publicly available for download. Its interface enables the real-time control of data selection and processing, for very rapid visualisation. The concepts outlined in this paper have general applicability, and could provide significant further functionality to neuroimaging databasing and process pipeline environments. PMID:17985253

  7. A reanalysis dataset of the South China Sea

    PubMed Central

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  8. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    NASA Astrophysics Data System (ADS)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  9. A reanalysis dataset of the South China Sea.

    PubMed

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992-2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  10. Studying the Independent School Library

    ERIC Educational Resources Information Center

    Cahoy, Ellysa Stern; Williamson, Susan G.

    2008-01-01

    In 2005, the American Association of School Librarians' Independent Schools Section conducted a national survey of independent school libraries. This article analyzes the results of the survey, reporting specialized data and information regarding independent school library budgets, collections, services, facilities, and staffing. Additionally, the…

  11. Cosmological Analyses Based On The Combined Planck And WMAP Mission Datasets

    NASA Astrophysics Data System (ADS)

    Bennett, Charles

    independently analyzed the WMAP data. Most reproduced WMAP results, while others uncovered additional useful insights into the data, and still others found issues, which the WMAP team examined more carefully. Independent replication was quite important, as was the work extending the results and calling attention to issues. This process was not only helpful for getting the most out of the WMAP mission results, it was essential for establishing confidence in the mission datasets. WMAP team discussions with independent scientists were fruitful and provided invaluable replication and additional peer-review of the WMAP team work, in addition to new analysis and results. We expect that the Planck team will benefit from similar interactions with independent scientists. WMAP team members are especially important for computing detailed comparisons between Planck and WMAP data. Now that the WMAP project has ended, the WMAP team no longer has funding to carry out this crucial and compelling comparison of WMAP and Planck data at the level of detail needed for precision cosmology. This proposal requests that four of the most active and experienced WMAP team members with specialized knowledge in temperature calibration, beam calibration, foreground separation, simulations, power spectrum computation, and more, be supported to reconcile WMAP and Planck data in detail, to combine the datasets to obtain optimal results, and to produce improved cosmological results.

  12. Dataset of Scientific Inquiry Learning Environment

    ERIC Educational Resources Information Center

    Ting, Choo-Yee; Ho, Chiung Ching

    2015-01-01

    This paper presents the dataset collected from student interactions with INQPRO, a computer-based scientific inquiry learning environment. The dataset contains records of 100 students and is divided into two portions. The first portion comprises (1) "raw log data", capturing the student's name, interfaces visited, the interface…

  13. Modeling Genome-Scale mRNA Expression Datasets: From Matrix Algebra to Genetic Networks

    NASA Astrophysics Data System (ADS)

    Alter, Orly

    2003-03-01

    for two genome-scale datasets. GSVD is a unique data-driven linear transformation of the two datasets from the two genes × arrays spaces to two reduced and diagonal ``genelets'' × ``arraylets'' spaces. Some of the genelets can be associated with independent regulatory programs that are common to both datasets. Other genelets can be associated with independent biological processes or experimental artifacts that are almost exclusive to one of the datasets or the other. I will conclude with a discussion of the insights that these models may offer into the biology, chemistry and physics of gene expression.

  14. Assessment of Northern Hemisphere Snow Water Equivalent Datasets in ESA SnowPEx project

    NASA Astrophysics Data System (ADS)

    Luojus, Kari; Pulliainen, Jouni; Cohen, Juval; Ikonen, Jaakko; Derksen, Chris; Mudryk, Lawrence; Nagler, Thomas; Bojkov, Bojan

    2016-04-01

    Reliable information on snow cover across the Northern Hemisphere and Arctic and sub-Arctic regions is needed for climate monitoring, for understanding the Arctic climate system, and for the evaluation of the role of snow cover and its feedback in climate models. In addition to being of significant interest for climatological investigations, reliable information on snow cover is of high value for the purpose of hydrological forecasting and numerical weather prediction. Terrestrial snow covers up to 50 million km² of the Northern Hemisphere in winter and is characterized by high spatial and temporal variability. Therefore satellite observations provide the best means for timely and complete observations of the global snow cover. There are a number of independent SWE products available that describe the snow conditions on multi-decadal and global scales. Some products are derived using satellite-based information while others rely on meteorological observations and modelling. What is common to practically all the existing hemispheric SWE products, is that their retrieval performance on hemispherical and multi-decadal scales are not accurately known. The purpose of the ESA funded SnowPEx project is to obtain a quantitative understanding of the uncertainty in satellite- as well as model-based SWE products through an internationally coordinated and consistent evaluation exercise. The currently available Northern Hemisphere wide satellite-based SWE datasets which were assessed include 1) the GlobSnow SWE, 2) the NASA Standard SWE, 3) NASA prototype and 4) NSIDC-SSM/I SWE products. The model-based datasets include: 5) the Global Land Data Assimilation System Version 2 (GLDAS-2) product 6) the European Centre for Medium-Range Forecasts Interim Land Reanalysis (ERA-I-Land) which uses a simple snow scheme 7) the Modern Era Retrospective Analysis for Research and Applications (MERRA) which uses an intermediate complexity snow scheme; and 8) SWE from the Crocus snow scheme, a

  15. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions.

    NASA Astrophysics Data System (ADS)

    Heather, David; Besse, Sebastien; Barbarisi, Isa; Arviset, Christophe; de Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Martinez, Santa; Rios, Carlos

    2016-04-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  16. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

    NASA Astrophysics Data System (ADS)

    Heather, David

    2016-07-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  17. Comparison of global 3-D aviation emissions datasets

    NASA Astrophysics Data System (ADS)

    Olsen, S. C.; Wuebbles, D. J.; Owen, B.

    2013-01-01

    Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx), sulfur oxides, carbon monoxide (CO), and hydrocarbons (HC) impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time) aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution of emissions in certain regions for the Aero2k dataset.

  18. The generation of China land surface datasets for CLM

    NASA Astrophysics Data System (ADS)

    Li, Haiying; Peng, Hongchun; Li, Xin; Veroustraete, Frank

    2005-10-01

    Community land model or common land model (CLM) describes the exchange of the fluxes of energy, mass and momentum between the earth's surface and the planetary boundary layer. This model is used to simulate the environmental changes in China. Hence, it requires a complete parameters field of the land surface. The present paper focuses on making the surface datasets of CLM in China. In the present paper, vegetation was divided into 39 Plant Function Types (PFTs) of China from its classification map. The land surface datasets were created using vegetation type, five land cover types (lake, wetland, glacier, urban and vegetated), monthly maximum Normalized Difference Vegetation Index (NDVI) derived from SPOT_VGT data and soil properties data. The percentages of glacier, lake and wetland were derived from their own vector maps of China. The fractional coverage of PFTs was derived from China vegetation map. Time-independent vegetation biophysical parameters, such as canopy top and bottom heights and other vegetation parameters related to photosynthesis, were based on the values documented in literatures. The soil color dataset was derived from landuse and vegetation data based on their correspondent relationship. The soil texture (clay%, sand% and silt%) came from global dataset. Time-dependent vegetation biophysical parameters, such as leaf area index(LAI) and fractional absorbed photosynthetically active radiation(FPAR), were calculated from one year of NDVI monthly maximum value composites for the China region based on equations given in Sellers et al. (1996a,b) and Los et al. (2000). The resolution of these datasets for CLM is 1km.

  19. Improved global aerosol datasets for 2008 from Aerosol_cci

    NASA Astrophysics Data System (ADS)

    Holzer-Popp, Thomas; de Leeuw, Gerrit

    2013-04-01

    Within the ESA Climate Change Initiative (CCI) the Aerosol_cci project has meanwhile produced and validated global datasets from AATSR, PARASOL, MERIS, OMI and GOMOS for the complete year 2008. Whereas OMI and GOMOS were used to derive absorbing aerosol index and stratospheric extinction profiles, respectively, Aerosol Optical Depth (AOD) and Angstrom coefficient were retrieved from the three nadir sensors. For AATSR three algorithms were applied. AOD validation was conducted against AERONET sun photometer observations also in comparison to MODIS and MISR datasets. Validation included level2 (pixel level) and level3 (gridded daily) datasets. Several validation metrices were used and in some cases developed further in order to comprehensively evaluate the capabilities and limitations of the datasets. The metrices include standard statistical quantities (bias, rmse, Pearson correlation, linear regression) as well as scoring approaches to quantitatively assess the spatial and temporal correlations against AERONET. Over open ocean also MAN data were used to better constrain the aerosol background, but in 2008 had limited coverage. The validation showed that the PARASOL (ocean only) and AATSR (land and ocean) datasets have improved significantly and now reach the quality level and sometimes even go beyond the level of MODIS and MISR. However, the coverage of these European datasets is weaker than the one of the NASA datasets due to smaller instrument swath width. The MERIS dataset provides better coverage but has lower quality then the other datasets. A detailed regional and seasonal analysis revealed the strengths and weaknesses of each algorithm. Also, Angstrom coefficient was validated and showed encouraging results (more detailed aerosol type information provided in particular from PARASOL was not yet evaluated further). Additionally, pixel uncertainties contained in each dataset were statistically assessed which showed some remaining issues but also the added value

  20. 'Independence' Panorama

    NASA Technical Reports Server (NTRS)

    2005-01-01

    [figure removed for brevity, see original site] Click on the image for 'Independence' Panorama (QTVR)

    This is the Spirit 'Independence' panorama, acquired on martian days, or sols, 536 to 543 (July 6 to 13, 2005), from a position in the 'Columbia Hills' near the summit of 'Husband Hill.' The summit of 'Husband Hill' is the peak near the right side of this panorama and is about 100 meters (328 feet) away from the rover and about 30 meters (98 feet) higher in elevation. The rocky outcrops downhill and on the left side of this mosaic include 'Larry's Lookout' and 'Cumberland Ridge,' which Spirit explored in April, May, and June of 2005.

    The panorama spans 360 degrees and consists of 108 individual images, each acquired with five filters of the rover's panoramic camera. The approximate true color of the mosaic was generated using the camera's 750-, 530-, and 480-nanometer filters. During the 8 martian days, or sols, that it took to acquire this image, the lighting varied considerably, partly because of imaging at different times of sol, and partly because of small sol-to-sol variations in the dustiness of the atmosphere. These slight changes produced some image seams and rock shadows. These seams have been eliminated from the sky portion of the mosaic to better simulate the vista a person standing on Mars would see. However, it is often not possible or practical to smooth out such seams for regions of rock, soil, rover tracks or solar panels. Such is the nature of acquiring and assembling large panoramas from the rovers.

  1. Utilizing Multiple Datasets for Snow Cover Mapping

    NASA Technical Reports Server (NTRS)

    Tait, Andrew B.; Hall, Dorothy K.; Foster, James L.; Armstrong, Richard L.

    1999-01-01

    Snow-cover maps generated from surface data are based on direct measurements, however they are prone to interpolation errors where climate stations are sparsely distributed. Snow cover is clearly discernable using satellite-attained optical data because of the high albedo of snow, yet the surface is often obscured by cloud cover. Passive microwave (PM) data is unaffected by clouds, however, the snow-cover signature is significantly affected by melting snow and the microwaves may be transparent to thin snow (less than 3cm). Both optical and microwave sensors have problems discerning snow beneath forest canopies. This paper describes a method that combines ground and satellite data to produce a Multiple-Dataset Snow-Cover Product (MDSCP). Comparisons with current snow-cover products show that the MDSCP draws together the advantages of each of its component products while minimizing their potential errors. Improved estimates of the snow-covered area are derived through the addition of two snow-cover classes ("thin or patchy" and "high elevation" snow cover) and from the analysis of the climate station data within each class. The compatibility of this method for use with Moderate Resolution Imaging Spectroradiometer (MODIS) data, which will be available in 2000, is also discussed. With the assimilation of these data, the resolution of the MDSCP would be improved both spatially and temporally and the analysis would become completely automated.

  2. 77 FR 15052 - Dataset Workshop-U.S. Billion Dollar Disasters Dataset (1980-2011): Assessing Dataset Strengths...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-14

    ... Disasters (1980-2011) dataset and associated methods used to develop the data set. An important goal of the... data set addresses; What steps should be taken to enhance the robustness of the billion-dollar...

  3. A synthetic document image dataset for developing and evaluating historical document processing methods

    NASA Astrophysics Data System (ADS)

    Walker, Daniel; Lund, William; Ringger, Eric

    2012-01-01

    Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the performance of such methods often requires further annotation of training and test data (e.g., topical document labels). However, transcribing and labeling historical documents is expensive. As a result, existing real-world document image datasets with such accompanying resources are rare and often relatively small. We introduce synthetic document image datasets of varying levels of noise that have been created from standard (English) text corpora using an existing document degradation model applied in a novel way. Included in the datasets is the OCR output from real OCR engines including the commercial ABBYY FineReader and the open-source Tesseract engines. These synthetic datasets are designed to exhibit some of the characteristics of an example real-world document image dataset, the Eisenhower Communiqúes. The new datasets also benefit from additional metadata that exist due to the nature of their collection and prior labeling efforts. We demonstrate the usefulness of the synthetic datasets by training an existing multi-engine OCR correction method on the synthetic data and then applying the model to reduce word error rates on the historical document dataset. The synthetic datasets will be made available for use by other researchers.

  4. A high-resolution European dataset for hydrologic modeling

    NASA Astrophysics Data System (ADS)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    inputs to the hydrological calibration and validation of EFAS as well as for establishing long-term discharge "proxy" climatologies which can then in turn be used for statistical analysis to derive return periods or other time series derivatives. In addition, this dataset will be used to assess climatological trends in Europe. Unfortunately, to date no baseline dataset at the European scale exists to test the quality of the herein presented data. Hence, a comparison against other existing datasets can therefore only be an indication of data quality. Due to availability, a comparison was made for precipitation and temperature only, arguably the most important meteorological drivers for hydrologic models. A variety of analyses was undertaken at country scale against data reported to EUROSTAT and E-OBS datasets. The comparison revealed that while the datasets showed overall similar temporal and spatial patterns, there were some differences in magnitudes especially for precipitation. It is not straightforward to define the specific cause for these differences. However, in most cases the comparatively low observation station density appears to be the principal reason for the differences in magnitude.

  5. Simulation of Smart Home Activity Datasets

    PubMed Central

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-01-01

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation. PMID:26087371

  6. Simulation of Smart Home Activity Datasets.

    PubMed

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-01-01

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation. PMID:26087371

  7. Self-Aligning Manifolds for Matching Disparate Medical Image Datasets.

    PubMed

    Baumgartner, Christian F; Gomez, Alberto; Koch, Lisa M; Housden, James R; Kolbitsch, Christoph; McClelland, Jamie R; Rueckert, Daniel; King, Andy P

    2015-01-01

    Manifold alignment can be used to reduce the dimensionality of multiple medical image datasets into a single globally consistent low-dimensional space. This may be desirable in a wide variety of problems, from fusion of different imaging modalities for Alzheimer's disease classification to 4DMR reconstruction from 2D MR slices. Unfortunately, most existing manifold alignment techniques require either a set of prior correspondences or comparability between the datasets in high-dimensional space, which is often not possible. We propose a novel technique for the 'self-alignment' of manifolds (SAM) from multiple dissimilar imaging datasets without prior correspondences or inter-dataset image comparisons. We quantitatively evaluate the method on 4DMR reconstruction from realistic, synthetic sagittal 2D MR slices from 6 volunteers and real data from 4 volunteers. Additionally, we demonstrate the technique for the compounding of two free breathing 3D ultrasound views from one volunteer. The proposed method performs significantly better for 4DMR reconstruction than state-of-the-art image-based techniques. PMID:26221687

  8. A polymer dataset for accelerated property prediction and design

    DOE PAGESBeta

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-03-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate targetmore » of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. As a result, it will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.« less

  9. Using bitmap index for interactive exploration of large datasets

    SciTech Connect

    Wu, Kesheng; Koegler, Wendy; Chen, Jacqueline; Shoshani, Arie

    2003-04-24

    Many scientific applications generate large spatio-temporal datasets. A common way of exploring these datasets is to identify and track regions of interest. Usually these regions are defined as contiguous sets of points whose attributes satisfy some user defined conditions, e.g. high temperature regions in a combustion simulation. At each time step, the regions of interest may be identified by first searching for all points that satisfy the conditions and then grouping the points into connected regions. To speed up this process, the searching step may use a tree based indexing scheme, such as a kd-tree or an octree. However, these indices are efficient only if the searches are limited to one or a small number of selected attributes. Scientific datasets often contain hundreds of attributes and scientists frequently study these attributes incomplex combinations, e.g. finding regions of high temperature yet low shear rate and pressure. Bitmap indexing is an efficient method for searching on multiple criteria simultaneously. We apply a bitmap compression scheme to reduce the size of the indices. In addition, we show that the compressed bitmaps can be used efficiently to perform the region growing and the region tracking operations. Analyses show that our approach scales well and our tests on two datasets from simulation of the auto ignition process show impressive performance.

  10. Enhanced Data Discoverability for in Situ Hyperspectral Datasets

    NASA Astrophysics Data System (ADS)

    Rasaiah, B.; Bellman, C.; Hewson, R. D.; Jones, S. D.; Malthus, T. J.

    2016-06-01

    Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015) with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  11. Comparing multiple 3D magnetotelluric inversions of the same dataset

    NASA Astrophysics Data System (ADS)

    Walter, C.; Jones, A. G.

    2013-12-01

    The Taupo Volcanic Zone (TVZ) hosts the majority of the geothermal systems in New Zealand and is a valuable source for power generation and tourism. It is important for the sustainable exploitation of this area to fully understand the processes and structures in the TVZ. As part of the 'Hotter and Deeper' project of the Foundation for Research, Science and Technology (FRST), a dataset of 200 broadband magnetotelluric (MT) stations has been collected in the TVZ of New Zealand in 2009 and 2010. Combined with a smaller dataset from Reporoa, a total of 230 stations are available for 3D inversion to image the deeper structures of the TVZ. For the study presented in this paper, multiple 3D inversions of this dataset using different control parameters have been undertaken to study the influence of the choice of parameters on the inversion result. The parameters that have been varied include; the type of responses used in the inversion, the use of topography and bathymetry, and varying vertical grid spacings. All inversions commenced with a uniform half-space so that there was no preconceived structures to begin with. The results show that the main structures in the model are robust in that they are independent of the choice of parameters and become introduced in every inversion. The only differences are in the shape and exact location of the structures, which vary between the models. Furthermore, different ways to get a measure for the differences between models have been explored.

  12. Status and Preliminary Evaluation for Chinese Re-Analysis Datasets

    NASA Astrophysics Data System (ADS)

    bin, zhao; chunxiang, shi; tianbao, zhao; dong, si; jingwei, liu

    2016-04-01

    Based on operational T639L60 spectral model, combined with Hybird_GSI assimilation system by using meteorological observations including radiosondes, buoyes, satellites el al., a set of Chinese Re-Analysis (CRA) datasets is developing by Chinese National Meteorological Information Center (NMIC) of Chinese Meteorological Administration (CMA). The datasets are run at 30km (0.28°latitude / longitude) resolution which holds higher resolution than most of the existing reanalysis dataset. The reanalysis is done in an effort to enhance the accuracy of historical synoptic analysis and aid to find out detailed investigation of various weather and climate systems. The current status of reanalysis is in a stage of preliminary experimental analysis. One-year forecast data during Jun 2013 and May 2014 has been simulated and used in synoptic and climate evaluation. We first examine the model prediction ability with the new assimilation system, and find out that it represents significant improvement in Northern and Southern hemisphere, due to addition of new satellite data, compared with operational T639L60 model, the effect of upper-level prediction is improved obviously and overall prediction stability is enhanced. In climatological analysis, compared with ERA-40, NCEP/NCAR and NCEP/DOE reanalyses, the results show that surface temperature simulates a bit lower in land and higher over ocean, 850-hPa specific humidity reflects weakened anomaly and the zonal wind value anomaly is focus on equatorial tropics. Meanwhile, the reanalysis dataset shows good ability for various climate index, such as subtropical high index, ESMI (East-Asia subtropical Summer Monsoon Index) et al., especially for the Indian and western North Pacific monsoon index. Latter we will further improve the assimilation system and dynamical simulating performance, and obtain 40-years (1979-2018) reanalysis datasets. It will provide a more comprehensive analysis for synoptic and climate diagnosis.

  13. Food additives

    MedlinePlus

    Food additives are substances that become part of a food product when they are added during the processing or making of that food. "Direct" food additives are often added during processing to: Add nutrients ...

  14. Comparison of Shallow Survey 2012 Multibeam Datasets

    NASA Astrophysics Data System (ADS)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  15. Interoperability of Multiple Datasets with JMARS

    NASA Astrophysics Data System (ADS)

    Smith, M. E.; Christensen, P. R.; Noss, D.; Anwar, S.; Dickenshied, S.

    2012-12-01

    Planetary Science includes all celestial bodies including Earth. However, when investigating Geographic Information System (GIS) applications, Earth and planetary bodies have the tendency to be separated. One reason is because we have been learning and investigating Earth's properties much longer than we have been studying the other planetary bodies, therefore, the archive of GCS and projections is much larger. The first latitude and longitude system of Earth was invented between 276 BC and 194 BC by Eratosthenes who was also the first to calculate the circumference of the Earth. As time went on, scientists continued to re-measure the Earth on both local and global scales which has created a large collection of projections and geographic coordinate systems (GCS) to choose from. The variety of options can create a time consuming task to determine which GCS or projection gets applied to each dataset and how to convert to the correct GCS or projection. Another issue is presented when determining if the dataset should be applied to a geocentric sphere or a geodetic spheroid. Both of which are measured and determine latitude values differently. This can lead to inconsistent results and frustration for the user. This is not the case with other planetary bodies. Although the existence of other planets have been known since the early Babylon times, the accuracy of the planets rotation, size and geologic properties weren't known for several hundreds of years later. Therefore, the options for projections or GCS's are much smaller than the options one has for Earth's data. Even then, the projection and GCS options for other celestial bodies are informal. So it can be hard for the user to determine which projection or GCS to apply to the other planets. JMARS (Java Mission Analysis for Remote Sensing) is an open source suite that was developed by Arizona State University's Mars Space Flight Facility. The beauty of JMARS is that the tool transforms all datasets behind the scenes

  16. Food additives

    PubMed Central

    Spencer, Michael

    1974-01-01

    Food additives are discussed from the food technology point of view. The reasons for their use are summarized: (1) to protect food from chemical and microbiological attack; (2) to even out seasonal supplies; (3) to improve their eating quality; (4) to improve their nutritional value. The various types of food additives are considered, e.g. colours, flavours, emulsifiers, bread and flour additives, preservatives, and nutritional additives. The paper concludes with consideration of those circumstances in which the use of additives is (a) justified and (b) unjustified. PMID:4467857

  17. Quality Visualization of Microarray Datasets Using Circos

    PubMed Central

    Koch, Martin; Wiese, Michael

    2012-01-01

    Quality control and normalization is considered the most important step in the analysis of microarray data. At present there are various methods available for quality assessments of microarray datasets. However there seems to be no standard visualization routine, which also depicts individual microarray quality. Here we present a convenient method for visualizing the results of standard quality control tests using Circos plots. In these plots various quality measurements are drawn in a circular fashion, thus allowing for visualization of the quality and all outliers of each distinct array within a microarray dataset. The proposed method is intended for use with the Affymetrix Human Genome platform (i.e., GPL 96, GPL570 and GPL571). Circos quality measurement plots are a convenient way for the initial quality estimate of Affymetrix datasets that are stored in publicly available databases.

  18. Introduction of a simple-model-based land surface dataset for Europe

    NASA Astrophysics Data System (ADS)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology is important because it can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for the European continent that consists of soil moisture, runoff and evapotranspiration. It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset covers Europe and extends over the period 1984-2013 with a daily time step and 0.5°x0.5° resolution. We employ a novel approach to calibrate the model, whereby we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), evapotranspiration and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against ERA-Interim/Land and simulations of the Community Land Model Version 4, using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. In terms of runoff the SWBM dataset also performs well versus the benchmarks. They all show a slight dry bias which is probably due to underestimated precipitation used to force the model. The evaluation of the SWBM evapotranspiration dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting evapotranspiration dynamics may not be captured, and quality issues may occur in regions with complex terrain. Furthermore we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar impact on the

  19. The Role of Datasets on Scientific Influence within Conflict Research

    PubMed Central

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  20. The Role of Datasets on Scientific Influence within Conflict Research.

    PubMed

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  1. A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source.

    PubMed

    Atwood, Robert C; Bodey, Andrew J; Price, Stephen W T; Basham, Mark; Drakopoulos, Michael

    2015-06-13

    Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an 'orthogonal' fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and 'facility-independent': it can run on standard cluster infrastructure at any institution. PMID:25939626

  2. Future weather dataset for fourteen UK sites.

    PubMed

    Liu, Chunde

    2016-09-01

    This Future weather dataset is used for assessing the risk of overheating and thermal discomfort or heat stress in the free running buildings. The weather files are in the format of .epw which can be used in the building simulation packages such as EnergyPlus, DesignBuilder, IES, etc. PMID:27570809

  3. Bacterial clinical infectious diseases ontology (BCIDO) dataset.

    PubMed

    Gordon, Claire L; Weng, Chunhua

    2016-09-01

    This article describes the Bacterial Infectious Diseases Ontology (BCIDO) dataset related to research published in http:dx.doi.org/ 10.1016/j.jbi.2015.07.014 [1], and contains the Protégé OWL files required to run BCIDO in the Protégé environment. BCIDO contains 1719 classes and 39 object properties. PMID:27508237

  4. Thesaurus Dataset of Educational Technology in Chinese

    ERIC Educational Resources Information Center

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  5. A computationally efficient Bayesian sequential simulation approach for the assimilation of vast and diverse hydrogeophysical datasets

    NASA Astrophysics Data System (ADS)

    Nussbaumer, Raphaël; Gloaguen, Erwan; Mariéthoz, Grégoire; Holliger, Klaus

    2016-04-01

    Bayesian sequential simulation (BSS) is a powerful geostatistical technique, which notably has shown significant potential for the assimilation of datasets that are diverse with regard to the spatial resolution and their relationship. However, these types of applications of BSS require a large number of realizations to adequately explore the solution space and to assess the corresponding uncertainties. Moreover, such simulations generally need to be performed on very fine grids in order to adequately exploit the technique's potential for characterizing heterogeneous environments. Correspondingly, the computational cost of BSS algorithms in their classical form is very high, which so far has limited an effective application of this method to large models and/or vast datasets. In this context, it is also important to note that the inherent assumption regarding the independence of the considered datasets is generally regarded as being too strong in the context of sequential simulation. To alleviate these problems, we have revisited the classical implementation of BSS and incorporated two key features to increase the computational efficiency. The first feature is a combined quadrant spiral - superblock search, which targets run-time savings on large grids and adds flexibility with regard to the selection of neighboring points using equal directional sampling and treating hard data and previously simulated points separately. The second feature is a constant path of simulation, which enhances the efficiency for multiple realizations. We have also modified the aggregation operator to be more flexible with regard to the assumption of independence of the considered datasets. This is achieved through log-linear pooling, which essentially allows for attributing weights to the various data components. Finally, a multi-grid simulating path was created to enforce large-scale variance and to allow for adapting parameters, such as, for example, the log-linear weights or the type

  6. Precipitation trends using in-situ and gridded datasets

    NASA Astrophysics Data System (ADS)

    Beharry, Sharlene Lata; Clarke, Ricardo Marcus; Kurmarsingh, Kishan

    2014-02-01

    Six in situ precipitation time series of varying time periods in the northwestern region and the Global Precipitation Climatology Centre (GPCC) v6 0.5° monthly dataset (1901-2010) were statistically examined for monotonic trends in Trinidad. The Pettit test was used to investigate the abrupt changes in the mean while the Mann-Kendall test was employed to assess the monotonic trends. It was found that three in situ stations and the six grids experienced abrupt changes in the rainfall patterns and that there was an apparent shift in the seasons. In addition, for five out of the six in situ stations no monotonic change was detected in the monthly, seasonal, and annual rainfall patterns. Gradual decreases were detected in the calculated weighted area average for five stations, the GPCCv6 dataset and St. Ann's time series. The GPCCv6 data indicated that the dry season in the southern Trinidad is becoming drier. Results also suggested that the range between the greatest and lowest recorded rainfall values for some months have increased while others decreased. The gridded dataset appears to give a good representation of the dry season (January to May) rainfall compared with the wet season (June to December) and was found to be negatively biased for the north-western region but may not necessarily be so for the entire island. The results suggested that in the north-western region mirco-climates may exist. It is recommended that further investigations are needed using in situ data.

  7. Interpolation of diffusion weighted imaging datasets.

    PubMed

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W; Reislev, Nina L; Paulson, Olaf B; Ptito, Maurice; Siebner, Hartwig R

    2014-12-01

    Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical resolution and more anatomical details in complex regions such as tract boundaries and cortical layers, which are normally only visualized at higher image resolutions. Similar results were found with typical clinical human DWI dataset. However, a possible bias in quantitative values imposed by the interpolation method used should be considered. The results indicate that conventional interpolation methods can be successfully applied to DWI datasets for mining anatomical details that are normally seen only at higher resolutions, which will aid in tractography and microstructural mapping of tissue compartments. PMID:25219332

  8. Efficiently Finding Individuals from Video Dataset

    NASA Astrophysics Data System (ADS)

    Hao, Pengyi; Kamata, Sei-Ichiro

    We are interested in retrieving video shots or videos containing particular people from a video dataset. Owing to the large variations in pose, illumination conditions, occlusions, hairstyles and facial expressions, face tracks have recently been researched in the fields of face recognition, face retrieval and name labeling from videos. However, when the number of face tracks is very large, conventional methods, which match all or some pairs of faces in face tracks, will not be effective. Therefore, in this paper, an efficient method for finding a given person from a video dataset is presented. In our study, in according to performing research on face tracks in a single video, we also consider how to organize all the faces in videos in a dataset and how to improve the search quality in the query process. Different videos may include the same person; thus, the management of individuals in different videos will be useful for their retrieval. The proposed method includes the following three points. (i) Face tracks of the same person appearing for a period in each video are first connected on the basis of scene information with a time constriction, then all the people in one video are organized by a proposed hierarchical clustering method. (ii) After obtaining the organizational structure of all the people in one video, the people are organized into an upper layer by affinity propagation. (iii) Finally, in the process of querying, a remeasuring method based on the index structure of videos is performed to improve the retrieval accuracy. We also build a video dataset that contains six types of videos: films, TV shows, educational videos, interviews, press conferences and domestic activities. The formation of face tracks in the six types of videos is first researched, then experiments are performed on this video dataset containing more than 1 million faces and 218,786 face tracks. The results show that the proposed approach has high search quality and a short search time.

  9. A new method for evaluating age distributions of detrital zircon datasets by incorporating discordant data

    NASA Astrophysics Data System (ADS)

    Reimink, Jesse; Davies, Joshua; Rojas, Xavier; Waldron, John

    2015-04-01

    U-Pb ages from detrital zircons play an important role in sediment provenance studies. However, U-Pb ages from detrital zircon populations often contain a discordant component, which is traditionally removed before the age data are interpreted. Many different processes can create discordant analyses, with the most important being Pb-loss and mixing of distinct zircon age domains during analysis. Discordant ages contain important information regarding the history of a detrital zircon population, for example the timing of Pb-loss or metamorphism, and removing these analyses may significantly bias a zircon dataset. Here we present a new technique for analyzing detrital zircon populations that uses all U-Pb analyses, independent of discordance. We have developed computer code that evaluates the relative likelihood of discordia lines based on their proximity to discordant data points. When two or more data points lie on or near a discordia line the likelihood associated with that line increases. The upper and lower intercepts of each discordia line, as well as the relative likelihood along that line, are stored, and the likelihood of upper and lower intercepts are plotted with age. There are many benefits to using this technique for analysis of detrital zircon datasets. By utilizing the discordant analyses we allow for the addition of upper and lower intercept information to conventional analysis techniques (i.e. probability density functions or kernel density estimators). We are then able to use a much stricter discordance filter (e.g. < 3%) when analyzing 'concordant' data, thereby increasing the reliability of Pb/Pb ages used in the traditional analysis. Additionally, by not rejecting discordant data from zircon datasets we potentially reduce the overall bias in the analysis, which is a critical step in detrital zircon studies. This new technique is relatively quick and uses traditional analytical results, while the upper and lower intercept information is obtained

  10. FTSPlot: Fast Time Series Visualization for Large Datasets

    PubMed Central

    Riss, Michael

    2014-01-01

    The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of ; the visualization itself can be done with a complexity of and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with ms. The current 64-bit implementation theoretically supports datasets with up to bytes, on the x86_64 architecture currently up to bytes are supported, and benchmarks have been conducted with bytes/1 TiB or double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments. PMID:24732865

  11. Spatially-based quality control for daily precipitation datasets

    NASA Astrophysics Data System (ADS)

    Serrano-Notivoli, Roberto; de Luis, Martín; Beguería, Santiago; Ángel Saz, Miguel

    2016-04-01

    There are many reasons why wrong data can appear in original precipitation datasets but their common characteristic is that all of them do not correspond to the natural variability of the climate variable. For this reason, is necessary a comprehensive analysis of the data of each station in each day, to be certain that the final dataset will be consistent and reliable. Most of quality control techniques applied over daily precipitation are based on the comparison of each observed value with the rest of values in same series or in reference series built from its nearest stations. These methods are inherited from monthly precipitation studies, but in daily scale the variability is bigger and the methods have to be different. A common character shared by all of these approaches is that they made reconstructions based on the best-correlated reference series, which could be a biased decision because, for example, a extreme precipitation occurred in one day in more than one station could be flagged as erroneous. We propose a method based on the specific conditions of the day and location to determine the reliability of each observation. This method keeps the local variance of the variable and the time-structure independence. To do that, individually for each daily value, we first compute the probability of precipitation occurrence through a multivariate logistic regression using the 10 nearest observations in a binomial mode (0=dry; 1=wet), this produces a binomial prediction (PB) between 0 and 1. Then, we compute a prediction of precipitation magnitude (PM) with the raw data of the same 10 nearest observations. Through these predictions we explore the original data in each day and location by five criteria: 1) Suspect data; 2) Suspect zero; 3) Suspect outlier; 4) Suspect wet and 5) Suspect dry. Tests over different datasets addressed that flagged data depend mainly on the number of available data and the homogeneous distribution of them.

  12. Quantifying uncertainty in observational rainfall datasets

    NASA Astrophysics Data System (ADS)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  13. Food additives.

    PubMed

    Berglund, F

    1978-01-01

    The use of additives to food fulfils many purposes, as shown by the index issued by the Codex Committee on Food Additives: Acids, bases and salts; Preservatives, Antioxidants and antioxidant synergists; Anticaking agents; Colours; Emulfifiers; Thickening agents; Flour-treatment agents; Extraction solvents; Carrier solvents; Flavours (synthetic); Flavour enhancers; Non-nutritive sweeteners; Processing aids; Enzyme preparations. Many additives occur naturally in foods, but this does not exclude toxicity at higher levels. Some food additives are nutrients, or even essential nutritents, e.g. NaCl. Examples are known of food additives causing toxicity in man even when used according to regulations, e.g. cobalt in beer. In other instances, poisoning has been due to carry-over, e.g. by nitrate in cheese whey - when used for artificial feed for infants. Poisonings also occur as the result of the permitted substance being added at too high levels, by accident or carelessness, e.g. nitrite in fish. Finally, there are examples of hypersensitivity to food additives, e.g. to tartrazine and other food colours. The toxicological evaluation, based on animal feeding studies, may be complicated by impurities, e.g. orthotoluene-sulfonamide in saccharin; by transformation or disappearance of the additive in food processing in storage, e.g. bisulfite in raisins; by reaction products with food constituents, e.g. formation of ethylurethane from diethyl pyrocarbonate; by metabolic transformation products, e.g. formation in the gut of cyclohexylamine from cyclamate. Metabolic end products may differ in experimental animals and in man: guanylic acid and inosinic acid are metabolized to allantoin in the rat but to uric acid in man. The magnitude of the safety margin in man of the Acceptable Daily Intake (ADI) is not identical to the "safety factor" used when calculating the ADI. The symptoms of Chinese Restaurant Syndrome, although not hazardous, furthermore illustrate that the whole ADI

  14. MDL and RMSEP assessment of spectral pretreatments by adding different noises in calibration/validation datasets.

    PubMed

    Zhao, Na; Wu, Zhisheng; Cheng, Yaqian; Shi, Xinyuan; Qiao, Yanjiang

    2016-06-15

    In multivariate calibration, the optimization of pretreatment methods is usually according to the prediction error and there is a lack of robustness evaluation. This study investigated the robustness of pretreatment methods by adding different simulate noises to validation dataset, calibration and validation datasets, respectively. The root mean squared error of prediction (RMSEP) and multivariate detection limits (MDL) were simultaneously calculated to assess the robustness of different pretreatment methods. The result with two different near-infrared (NIR) datasets illustrated that Multiplicative Scatter Correction (MSC) and Standard normal variate (SNV) were substantially more robust to additive noise with smaller REMSP and MDL value. PMID:27031447

  15. MDL and RMSEP assessment of spectral pretreatments by adding different noises in calibration/validation datasets

    NASA Astrophysics Data System (ADS)

    Zhao, Na; Wu, Zhisheng; Cheng, Yaqian; Shi, Xinyuan; Qiao, Yanjiang

    2016-06-01

    In multivariate calibration, the optimization of pretreatment methods is usually according to the prediction error and there is a lack of robustness evaluation. This study investigated the robustness of pretreatment methods by adding different simulate noises to validation dataset, calibration and validation datasets, respectively. The root mean squared error of prediction (RMSEP) and multivariate detection limits (MDL) were simultaneously calculated to assess the robustness of different pretreatment methods. The result with two different near-infrared (NIR) datasets illustrated that Multiplicative Scatter Correction (MSC) and Standard normal variate (SNV) were substantially more robust to additive noise with smaller REMSP and MDL value.

  16. Independence Generalizing Monotone and Boolean Independences

    NASA Astrophysics Data System (ADS)

    Hasebe, Takahiro

    2011-01-01

    We define conditionally monotone independence in two states which interpolates monotone and Boolean ones. This independence is associative, and therefore leads to a natural probability theory in a non-commutative algebra.

  17. The late addition of core lipids to nascent apolipoprotein B100, resulting in the assembly and secretion of triglyceride-rich lipoproteins, is independent of both microsomal triglyceride transfer protein activity and new triglyceride synthesis.

    PubMed

    Pan, Meihui; Liang Js, Jun-shan; Fisher, Edward A; Ginsberg, Henry N

    2002-02-01

    Although microsomal triglyceride transfer protein (MTP) and newly synthesized triglyceride (TG) are critical for co-translational targeting of apolipoprotein B (apoB100) to lipoprotein assembly in hepatoma cell lines, their roles in the later stages of lipoprotein assembly remain unclear. Using N-acetyl-Leu-Leu-norleucinal to prevent proteasomal degradation, HepG2 cells were radiolabeled and chased for 0-90 min (chase I). The medium was changed and cells chased for another 150 min (chase II) in the absence (control) or presence of Pfizer MTP inhibitor CP-10447 (CP). As chase I was extended, inhibition of apoB100 secretion by CP during chase II decreased from 75.9% to only 15% of control (no CP during chase II). Additional studies were conducted in which chase I was either 0 or 90 min, and chase II was in the presence of [(3)H]glycerol and either BSA (control), CP (inhibits both MTP activity and TG synthesis),BMS-1976360-1) (BMS) (inhibits only MTP activity), or triacsin C (TC) (inhibits only TG synthesis). When chase I was 0 min, CP, BMS, and TC reduced apoB100 secretion during chase II by 75.3, 73.9, and 53.9%. However, when chase I was 90 min, those agents reduced apoB100 secretion during chase II by only 16.0, 19.2, and 13.9%. Of note, all three inhibited secretion of newly synthesized TG during chase II by 80, 80, and 40%, whether chase I was 0 or 90 min. In both HepG2 cells and McA-RH7777 cells, if chase I was at least 60 min, inhibition of TG synthesis and/or MTP activity did not affect the density of secreted apoB100-lipoproteins under basal conditions. Oleic acid increased secretion of TG-enriched apoB100-lipoproteins similarly in the absence or presence of either of CP, BMS, or TC. We conclude that neither MTP nor newly synthesized TG is necessary for the later stages of apoB100-lipoprotein assembly and secretion in either HepG2 or McA-RH7777 cells. PMID:11704664

  18. Global Precipitation Measurement: Methods, Datasets and Applications

    NASA Technical Reports Server (NTRS)

    Tapiador, Francisco; Turk, Francis J.; Petersen, Walt; Hou, Arthur Y.; Garcia-Ortega, Eduardo; Machado, Luiz, A. T.; Angelis, Carlos F.; Salio, Paola; Kidd, Chris; Huffman, George J.; De Castro, Manuel

    2011-01-01

    This paper reviews the many aspects of precipitation measurement that are relevant to providing an accurate global assessment of this important environmental parameter. Methods discussed include ground data, satellite estimates and numerical models. First, the methods for measuring, estimating, and modeling precipitation are discussed. Then, the most relevant datasets gathering precipitation information from those three sources are presented. The third part of the paper illustrates a number of the many applications of those measurements and databases. The aim of the paper is to organize the many links and feedbacks between precipitation measurement, estimation and modeling, indicating the uncertainties and limitations of each technique in order to identify areas requiring further attention, and to show the limits within which datasets can be used.

  19. Artificial neural networks for small dataset analysis.

    PubMed

    Pasini, Antonello

    2015-05-01

    Artificial neural networks (ANNs) are usually considered as tools which can help to analyze cause-effect relationships in complex systems within a big-data framework. On the other hand, health sciences undergo complexity more than any other scientific discipline, and in this field large datasets are seldom available. In this situation, I show how a particular neural network tool, which is able to handle small datasets of experimental or observational data, can help in identifying the main causal factors leading to changes in some variable which summarizes the behaviour of a complex system, for instance the onset of a disease. A detailed description of the neural network tool is given and its application to a specific case study is shown. Recommendations for a correct use of this tool are also supplied. PMID:26101654

  20. Artificial neural networks for small dataset analysis

    PubMed Central

    2015-01-01

    Artificial neural networks (ANNs) are usually considered as tools which can help to analyze cause-effect relationships in complex systems within a big-data framework. On the other hand, health sciences undergo complexity more than any other scientific discipline, and in this field large datasets are seldom available. In this situation, I show how a particular neural network tool, which is able to handle small datasets of experimental or observational data, can help in identifying the main causal factors leading to changes in some variable which summarizes the behaviour of a complex system, for instance the onset of a disease. A detailed description of the neural network tool is given and its application to a specific case study is shown. Recommendations for a correct use of this tool are also supplied. PMID:26101654

  1. Multiscale peak alignment for chromatographic datasets.

    PubMed

    Zhang, Zhi-Min; Liang, Yi-Zeng; Lu, Hong-Mei; Tan, Bin-Bin; Xu, Xiao-Na; Ferro, Miguel

    2012-02-01

    Chromatography has been extensively applied in many fields, such as metabolomics and quality control of herbal medicines. Preprocessing, especially peak alignment, is a time-consuming task prior to the extraction of useful information from the datasets by chemometrics and statistics. To accurately and rapidly align shift peaks among one-dimensional chromatograms, multiscale peak alignment (MSPA) is presented in this research. Peaks of each chromatogram were detected based on continuous wavelet transform (CWT) and aligned against a reference chromatogram from large to small scale gradually, and the aligning procedure is accelerated by fast Fourier transform cross correlation. The presented method was compared with two widely used alignment methods on chromatographic dataset, which demonstrates that MSPA can preserve the shapes of peaks and has an excellent speed during alignment. Furthermore, MSPA method is robust and not sensitive to noise and baseline. MSPA was implemented and is available at http://code.google.com/p/mspa. PMID:22222564

  2. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  3. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  4. Performances of seven datasets in presenting the upper ocean heat content in the South China Sea

    NASA Astrophysics Data System (ADS)

    Chen, Xiao; Yan, Youfang; Cheng, Xuhua; Qi, Yiquan

    2013-09-01

    In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993-2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation Model for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA), and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18°N, 12.75°N, and 120°E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic comparisons, we found that each dataset had its own shortcomings and advantages in presenting the upper OHC in the SCS.

  5. Projecting global datasets to achieve equal areas

    USGS Publications Warehouse

    Usery, E.L.; Finn, M.P.; Cox, J.D.; Beard, T.; Ruhl, S.; Bearden, M.

    2003-01-01

    Scientists routinely accomplish global modeling in the raster domain, but recent research has indicated that the transformation of large areas through map projection equations leads to errors. This research attempts to gauge the extent of map projection and resampling effects on the tabulation of categorical areas by comparing the results of three datasets for seven common projections. The datasets, Global Land Cover, Holdridge Life Zones, and Global Vegetation, were compiled at resolutions of 30 arc-second, 1/2 degree, and 1 degree, respectively. These datasets were projected globally from spherical coordinates to plane representations. Results indicate significant problems in the implementation of global projection transformations in commercial software, as well as differences in areal accuracy across projections. The level of raster resolution directly affects the accuracy of areal tabulations, with higher resolution yielding higher accuracy. If the raster resolution is high enough for individual pixels to approximate points, the areal error tends to zero. The 30-arc-second cells appear to approximate this condition.

  6. First observations using SPICE hyperspectral dataset

    NASA Astrophysics Data System (ADS)

    Rosario, Dalton; Romano, Joao; Borel, Christoph

    2014-06-01

    Our first observations using the longwave infrared (LWIR) hyperspectral data subset of the Spectral and Polarimetric Imagery Collection Experiment (SPICE) database are summarized in this paper, focusing on the inherent challenges associated with using this sensing modality for the purpose of object pattern recognition. Emphases are also put on data quality, qualitative validation of expected atmospheric spectral features, and qualitative comparison against another dataset of the same site using a different LWIR hyperspectral sensor. SPICE is a collaborative effort between the Army Research Laboratory, U.S. Army Armament RDEC, and more recently the Air Force Institute of Technology. It focuses on the collection and exploitation of longwave and midwave infrared (LWIR and MWIR) hyperspectral and polarimetric imagery. We concluded from this work that the quality of SPICE hyperspectral LWIR data is categorically comparable to other datasets recorded by a different sensor of similar specs; and adequate for algorithm research, given the scope of SPICE. The scope was to conduct a long-term infrared data collection of the same site with targets, using both sensing modalities, under various weather and non-ideal conditions. Then use the vast dataset and associated ground truth information to assess performance of the state of the art algorithms, while determining performance degradation sources. The expectation is that results from these assessments will spur new algorithmic ideas with the potential to augment pattern recognition performance in remote sensing applications. Over time, we are confident the SPICE database will prove to be an asset to the wide open remote sensing community.

  7. Data assimilation and model evaluation experiment datasets

    NASA Technical Reports Server (NTRS)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  8. Potlining Additives

    SciTech Connect

    Rudolf Keller

    2004-08-10

    In this project, a concept to improve the performance of aluminum production cells by introducing potlining additives was examined and tested. Boron oxide was added to cathode blocks, and titanium was dissolved in the metal pool; this resulted in the formation of titanium diboride and caused the molten aluminum to wet the carbonaceous cathode surface. Such wetting reportedly leads to operational improvements and extended cell life. In addition, boron oxide suppresses cyanide formation. This final report presents and discusses the results of this project. Substantial economic benefits for the practical implementation of the technology are projected, especially for modern cells with graphitized blocks. For example, with an energy savings of about 5% and an increase in pot life from 1500 to 2500 days, a cost savings of $ 0.023 per pound of aluminum produced is projected for a 200 kA pot.

  9. Phosphazene additives

    SciTech Connect

    Harrup, Mason K; Rollins, Harry W

    2013-11-26

    An additive comprising a phosphazene compound that has at least two reactive functional groups and at least one capping functional group bonded to phosphorus atoms of the phosphazene compound. One of the at least two reactive functional groups is configured to react with cellulose and the other of the at least two reactive functional groups is configured to react with a resin, such as an amine resin of a polycarboxylic acid resin. The at least one capping functional group is selected from the group consisting of a short chain ether group, an alkoxy group, or an aryloxy group. Also disclosed are an additive-resin admixture, a method of treating a wood product, and a wood product.

  10. The Development of a Noncontact Letter Input Interface “Fingual” Using Magnetic Dataset

    NASA Astrophysics Data System (ADS)

    Fukushima, Taishi; Miyazaki, Fumio; Nishikawa, Atsushi

    We have newly developed a noncontact letter input interface called “Fingual”. Fingual uses a glove mounted with inexpensive and small magnetic sensors. Using the glove, users can input letters to form the finger alphabets, a kind of sign language. The proposed method uses some dataset which consists of magnetic field and the corresponding letter information. In this paper, we show two recognition methods using the dataset. First method uses Euclidean norm, and second one additionally uses Gaussian function as a weighting function. Then we conducted verification experiments for the recognition rate of each method in two situations. One of the situations is that subjects used their own dataset; the other is that they used another person's dataset. As a result, the proposed method could recognize letters with a high rate in both situations, even though it is better to use their own dataset than to use another person's dataset. Though Fingual needs to collect magnetic dataset for each letter in advance, its feature is the ability to recognize letters without the complicated calculations such as inverse problems. This paper shows results of the recognition experiments, and shows the utility of the proposed system “Fingual”.

  11. Development of a SPARK Training Dataset

    SciTech Connect

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  12. Independent Peer Reviews

    SciTech Connect

    2012-03-16

    Independent Assessments: DOE's Systems Integrator convenes independent technical reviews to gauge progress toward meeting specific technical targets and to provide technical information necessary for key decisions.

  13. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching

    PubMed Central

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48. PMID:27348229

  14. Evaluation of anomalies in GLDAS-1996 dataset.

    PubMed

    Zhou, Xinyao; Zhang, Yongqiang; Yang, Yonghui; Yang, Yanmin; Han, Shumin

    2013-01-01

    Global Land Data Assimilation System (GLDAS) data are widely used for land-surface flux simulations. Therefore, the simulation accuracy using GLDAS dataset is largely contingent upon the accuracy of the GLDAS dataset. It is found that GLDAS land-surface model simulated runoff exhibits strong anomalies for 1996. These anomalies are investigated by evaluating four GLDAS meteorological forcing data (precipitation, air temperature, downward shortwave radiation and downward longwave radiation) in six large basins across the world (Danube, Mississippi, Yangtze, Congo, Amazon and Murray-Darling basins). Precipitation data from the Global Precipitation Climatology Centre (GPCC) are also compared with GLDAS forcing precipitation data. Large errors and lack of monthly variability in GLDAS-1996 precipitation data are the main sources for the anomalies in the simulated runoff. The impact of the precipitation data on simulated runoff for 1996 is investigated with the Community Atmosphere Biosphere Land Exchange (CABLE) land-surface model in the Yangtze basin, for which area high-quality local precipitation data are obtained from the China Meteorological Administration (CMA). The CABLE model is driven by GLDAS daily precipitation data and CMA daily precipitation, respectively. The simulated daily and monthly runoffs obtained from CMA data are noticeably better than those obtained from GLDAS data, suggesting that GLDAS-1996 precipitation data are not so reliable for land-surface flux simulations. PMID:23579825

  15. Lifting Object Detection Datasets into 3D.

    PubMed

    Carreira, Joao; Vicente, Sara; Agapito, Lourdes; Batista, Jorge

    2016-07-01

    While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D scanning or manual design, that scale poorly, and instead populate object category detection datasets semi-automatically with dense, per-object 3D reconstructions, bootstrapped from:(i) class labels, (ii) ground truth figure-ground segmentations and (iii) a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion and then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions and to accurately estimate cameras viewpoints on one of the most challenging existing object-category detection datasets, PASCAL VOC. We hope that our results will re-stimulate interest on joint object recognition and 3D reconstruction from a single image. PMID:27295458

  16. Massive Dataset Analysis for Geoscience Data (Invited)

    NASA Astrophysics Data System (ADS)

    Braverman, A. J.

    2013-12-01

    Many large datasets in the geosciences manifest a fundamental problem in massive data set analysis: to understand and quantify local, fine-scale structure in a global context. One approach is to reduce data in a way that preserves spatial, temporal, and inter-scale structures via discrete probability distribution estimates associated with cells of space-time grids at different resolutions. It is then possible to study relationships between cells at different scales. This talk describes the theory and implementation of such a data reduction method developed for NASA satellite missions. Data are stratified on a monthly, five-degree, latitude-longitude space-time grid to form subsets. Each subset is reduced using a clustering algorithm for which the loss function includes an information-theoretic penalty term to help choose the number of clusters and the assignment of observations to them. The clusters' centroids and populations define a set of discrete probability distributions, which become the fundamental units for data analysis. Since the cluster representatives are centroids of original data points, the distributions can be aggregated in time and space, allowing us build statistical models that relate phenomena across scales. These ideas are illustrated with datasets produced through the application of this algorithm for the Multi-angle Imaging SpectroRadiometer (MISR) instrument.

  17. Land cover trends dataset, 1973-2000

    USGS Publications Warehouse

    Soulard, Christopher E.; Acevedo, William; Auch, Roger F.; Sohl, Terry L.; Drummond, Mark A.; Sleeter, Benjamin M.; Sorenson, Daniel G.; Kambly, Steven; Wilson, Tamara S.; Taylor, Janis L.; Sayler, Kristi L.; Stier, Michael P.; Barnes, Christopher A.; Methven, Steven C.; Loveland, Thomas R.; Headley, Rachel; Brooks, Mark S.

    2014-01-01

    The U.S. Geological Survey Land Cover Trends Project is releasing a 1973–2000 time-series land-use/land-cover dataset for the conterminous United States. The dataset contains 5 dates of land-use/land-cover data for 2,688 sample blocks randomly selected within 84 ecological regions. The nominal dates of the land-use/land-cover maps are 1973, 1980, 1986, 1992, and 2000. The land-use/land-cover maps were classified manually from Landsat Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus imagery using a modified Anderson Level I classification scheme. The resulting land-use/land-cover data has a 60-meter resolution and the projection is set to Albers Equal-Area Conic, North American Datum of 1983. The files are labeled using a standard file naming convention that contains the number of the ecoregion, sample block, and Landsat year. The downloadable files are organized by ecoregion, and are available in the ERDAS IMAGINETM (.img) raster file format.

  18. Hydrologic information server for benchmark precipitation dataset

    NASA Astrophysics Data System (ADS)

    McEnery, John A.; McKee, Paul W.; Shelton, Gregory P.; Ramsey, Ryan W.

    2013-01-01

    This paper will present the methodology and overall system development by which a benchmark dataset of precipitation information has been made available. Rainfall is the primary driver of the hydrologic cycle. High quality precipitation data is vital for hydrologic models, hydrometeorologic studies and climate analysis,and hydrologic time series observations are important to many water resources applications. Over the past two decades, with the advent of NEXRAD radar, science to measure and record rainfall has improved dramatically. However, much existing data has not been readily available for public access or transferable among the agricultural, engineering and scientific communities. This project takes advantage of the existing CUAHSI Hydrologic Information System ODM model and tools to bridge the gap between data storage and data access, providing an accepted standard interface for internet access to the largest time-series dataset of NEXRAD precipitation data ever assembled. This research effort has produced an operational data system to ingest, transform, load and then serve one of most important hydrologic variable sets.

  19. Integrated remotely sensed datasets for disaster management

    NASA Astrophysics Data System (ADS)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  20. Social voting advice applications-definitions, challenges, datasets and evaluation.

    PubMed

    Katakis, Ioannis; Tsapatsoulis, Nicolas; Mendez, Fernando; Triga, Vasiliki; Djouvas, Constantinos

    2014-07-01

    Voting advice applications (VAAs) are online tools that have become increasingly popular and purportedly aid users in deciding which party/candidate to vote for during an election. In this paper we present an innovation to current VAA design which is based on the introduction of a social network element. We refer to this new type of online tool as a social voting advice application (SVAA). SVAAs extend VAAs by providing (a) community-based recommendations, (b) comparison of users' political opinions, and (c) a channel of user communication. In addition, SVAAs enriched with data mining modules, can operate as citizen sensors recording the sentiment of the electorate on issues and candidates. Drawing on VAA datasets generated by the Preference Matcher research consortium, we evaluate the results of the first VAA-Choose4Greece-which incorporated social voting features and was launched during the landmark Greek national elections of 2012. We demonstrate how an SVAA can provide community based features and, at the same time, serve as a citizen sensor. Evaluation of the proposed techniques is realized on a series of datasets collected from various VAAs, including Choose4Greece. The collection is made available online in order to promote research in the field. PMID:24058045

  1. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  2. Publicly Releasing a Large Simulation Dataset with NDS Labs

    NASA Astrophysics Data System (ADS)

    Goldbaum, Nathan

    2016-03-01

    Optimally, all publicly funded research should be accompanied by the tools, code, and data necessary to fully reproduce the analysis performed in journal articles describing the research. This ideal can be difficult to attain, particularly when dealing with large (>10 TB) simulation datasets. In this lightning talk, we describe the process of publicly releasing a large simulation dataset to accompany the submission of a journal article. The simulation was performed using Enzo, an open source, community-developed N-body/hydrodynamics code and was analyzed using a wide range of community- developed tools in the scientific Python ecosystem. Although the simulation was performed and analyzed using an ecosystem of sustainably developed tools, we enable sustainable science using our data by making it publicly available. Combining the data release with the NDS Labs infrastructure allows a substantial amount of added value, including web-based access to analysis and visualization using the yt analysis package through an IPython notebook interface. In addition, we are able to accompany the paper submission to the arXiv preprint server with links to the raw simulation data as well as interactive real-time data visualizations that readers can explore on their own or share with colleagues during journal club discussions. It is our hope that the value added by these services will substantially increase the impact and readership of the paper.

  3. Multiresolution comparison of precipitation datasets for large-scale models

    NASA Astrophysics Data System (ADS)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  4. A comparison of clustering methods for biogeography with fossil datasets

    PubMed Central

    2016-01-01

    Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place. PMID:26966658

  5. Statistics of large detrital geochronology datasets

    NASA Astrophysics Data System (ADS)

    Saylor, J. E.; Sundell, K. E., II

    2014-12-01

    Implementation of quantitative metrics for inter-sample comparison of detrital geochronological data sets has lagged the increase in data set size, and ability to identify sub-populations and quantify their relative proportions. Visual comparison or application of some statistical approaches, particularly the Kolmogorov-Smirnov (KS) test, that initially appeared to provide a simple way of comparing detrital data sets, may be inadequate to quantify their similarity. We evaluate several proposed metrics by applying them to four large synthetic datasets drawn randomly from a parent dataset, as well as a recently published large empirical dataset consisting of four separate (n = ~1000 each) analyses of the same rock sample. Visual inspection of the cumulative probability density functions (CDF) and relative probability density functions (PDF) confirms an increasingly close correlation between data sets as the number of analyses increases. However, as data set size increases the KS test yields lower mean p-values implying greater confidence that the samples were not drawn from the same parent population and high standard deviations despite minor decreases in the mean difference between sample CDFs. We attribute this to the increasing sensitivity of the KS test when applied to larger data sets, which in turn limits its use for quantitative inter-sample comparison in detrital geochronology. Proposed alternative metrics, including Similarity, Likeness (complement to Mismatch), and the coefficient of determination (R2) of a cross-plot of PDF quantiles, point to an increasingly close correlation between data sets with increasing size, although they are the most sensitive at different ranges of data set sizes. The Similarity test is most sensitive to variation in data sets with n < 100 and is relatively insensitive to further convergence between larger data sets. The Likeness test reaches 90% of its asymptotic maximum at data set sizes of n = 200. The PDF cross-plot R2 value

  6. Dataset-Driven Research to Support Learning and Knowledge Analytics

    ERIC Educational Resources Information Center

    Verbert, Katrien; Manouselis, Nikos; Drachsler, Hendrik; Duval, Erik

    2012-01-01

    In various research areas, the availability of open datasets is considered as key for research and application purposes. These datasets are used as benchmarks to develop new algorithms and to compare them to other algorithms in given settings. Finding such available datasets for experimentation can be a challenging task in technology enhanced…

  7. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    PubMed

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis. PMID:27391182

  8. ADAM: automated data management for research datasets

    PubMed Central

    Woodbridge, Mark; Tomlinson, Christopher D.; Butcher, Sarah A.

    2013-01-01

    Existing repositories for experimental datasets typically capture snapshots of data acquired using a single experimental technique and often require manual population and continual curation. We present a storage system for heterogeneous research data that performs dynamic automated indexing to provide powerful search, discovery and collaboration features without the restrictions of a structured repository. ADAM is able to index many commonly used file formats generated by laboratory assays and therefore offers specific advantages to the experimental biology community. However, it is not domain specific and can promote sharing and re-use of working data across scientific disciplines. Availability and implementation: ADAM is implemented using Java and supported on Linux. It is open source under the GNU General Public License v3.0. Installation instructions, binary code, a demo system and virtual machine image and are available at http://www.imperial.ac.uk/bioinfsupport/resources/software/adam. Contact: m.woodbridge@imperial.ac.uk PMID:23109181

  9. National hydrography dataset--linear referencing

    USGS Publications Warehouse

    Simley, Jeffrey; Doumbouya, Ariel

    2012-01-01

    Geospatial data normally have a certain set of standard attributes, such as an identification number, the type of feature, and name of the feature. These standard attributes are typically embedded into the default attribute table, which is directly linked to the geospatial features. However, it is impractical to embed too much information because it can create a complex, inflexible, and hard to maintain geospatial dataset. Many scientists prefer to create a modular, or relational, data design where the information about the features is stored and maintained separately, then linked to the geospatial data. For example, information about the water chemistry of a lake can be maintained in a separate file and linked to the lake. A Geographic Information System (GIS) can then relate the water chemistry to the lake and analyze it as one piece of information. For example, the GIS can select all lakes more than 50 acres, with turbidity greater than 1.5 milligrams per liter.

  10. Internationally coordinated glacier monitoring: strategy and datasets

    NASA Astrophysics Data System (ADS)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  11. VAST Contest Dataset Use in Education

    SciTech Connect

    Whiting, Mark A.; North, Chris; Endert, Alexander; Scholtz, Jean; Haack, Jereme N.; Varley, Caroline F.; Thomas, James J.

    2009-12-13

    The IEEE Visual Analytics Science and Technology (VAST) Symposium has held a contest each year since its inception in 2006. These events are designed to provide visual analytics researchers and developers with analytic challenges similar to those encountered by professional information analysts. The VAST contest has had an extended life outside of the symposium, however, as materials are being used in universities and other educational settings, either to help teachers of visual analytics-related classes or for student projects. We describe how we develop VAST contest datasets that results in products that can be used in different settings and review some specific examples of the adoption of the VAST contest materials in the classroom. The examples are drawn from graduate and undergraduate courses at Virginia Tech and from the Visual Analytics "Summer Camp" run by the National Visualization and Analytics Center in 2008. We finish with a brief discussion on evaluation metrics for education

  12. A Bayesian reanalaysis of the quasar dataset

    NASA Astrophysics Data System (ADS)

    Cameron, E.; Pettitt, A. N.

    We investigate recent claims of spatial variation in the fine structure constant on cosmic distance scales based on estimates of its extra-galactic-to-on-Earth ratio recovered from ``many multiplet'' fitting of quasar absorption spectra. To overcome the limitations of previous analyses requiring the assumption of a strictly unbiased and Normal distribution for the ``unexplained errors'' of this quasar dataset we employ a Bayesian model selection strategy with prior-sensitivity analysis. A particular strength of the hypothesis testing methodology advocated herein is that it can handle both parametric and semi-parametric models self-consistently through a combination of recursive marginal likelihood estimation and importance sample reweighting. We conclude from the presently-available data that the observed trends are more likely to arise from biases of opposing sign in the two telescopes used to undertake these measurements than from a genuine large-scale trend in this fundamental ``constant''.

  13. Advanced Subsetter Capabilities for Atmospheric Science Datasets

    NASA Astrophysics Data System (ADS)

    Baskin, W. E.; Perez, J.

    2012-12-01

    Within the last three years, the NASA Atmospheric Sciences Data Center (ASDC) has developed and deployed production provider-specific search and subset web applications for the CALIPSO, CERES, and TES missions. ASDC is now collaborating with the MOPITT science team to provide tailored subsetting for their level 2 satellite datasets leveraging the architecture of the recently deployed subsetting systems. This presentation explores the challenges encountered by the ASDC's development team and discusses solutions implemented for the following advanced subsetter capabilities: - On-the-fly conversion of subsetted HDF data granules to NetCDF - Generation of CF-Compliant subset results for non-gridded data (level2 swaths) - Parameter-specific filtering - Multi-dimensional spatial subsetting - Complex temporal subsetting (temporal filtering)

  14. LIMS Version 6 Level 3 Dataset

    NASA Technical Reports Server (NTRS)

    Remsberg, Ellis E.; Lingenfelser, Gretchen

    2010-01-01

    This report describes the Limb Infrared Monitor of the Stratosphere (LIMS) Version 6 (V6) Level 3 data products and the assumptions used for their generation. A sequential estimation algorithm was used to obtain daily, zonal Fourier coefficients of the several parameters of the LIMS dataset for 216 days of 1978-79. The coefficients are available at up to 28 pressure levels and at every two degrees of latitude from 64 S to 84 N and at the synoptic time of 12 UT. Example plots were prepared and archived from the data at 10 hPa of January 1, 1979, to illustrate the overall coherence of the features obtained with the LIMS-retrieved parameters.

  15. Asteroids in the EXPLORE II Dataset

    NASA Astrophysics Data System (ADS)

    Schmoll, S.; Mallen-Ornelas, G.; Holman, M.

    2005-12-01

    The inner asteroid belt holds information about the solar system's history and future. The currently accepted theory of planet formation is that smaller rocky bodies collided and formed the planets of the inner solar system, and asteroids are relics of this past. Furthermore, near Earth objects that could potentially collide with us usually originate in the main belt. Determining the size distribution of the main-belt asteroids is key to unlocking the processes of planet formation and possible problems with near Earth objects. Here the EXtra Solar PLanet Occultation(EXPLORE) II data taken with the CFH12K mosaic CCD prime focus camera on the CFHT 3.6-m telescope are used to find the size distribution of main belt asteroids. The EXPLORE Project is an extrasolar planet detection survey that focuses on one patch of the sky per observing run. The resultant data have more observations per asteroid than any preceding deep asteroid search. Here a pipeline is presented to find the asteroids in this dataset, along with the other four EXPLORE datasets. This is done by processing the data with an image subtraction package called ISIS (Alard et al. 1997) and custom masking using IRAF. Asteroids are found using SExtractor (Bertin et al. 1996) and a set of custom C programs that detects moving objects in a series of images. Then light curves are created for each asteroid found. Sizes can be estimated based on the absolute magnitudes of the asteroids. We present absolute magnitudes and preliminary size distribution for the >52 asteroids found thus far. This Research was made possible by the NSF and SAO REU Program.

  16. EVALUATION OF LAND USE/LAND COVER DATASETS FOR URBAN WATERSHED MODELING

    SciTech Connect

    S.J. BURIAN; M.J. BROWN; T.N. MCPHERSON

    2001-08-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size.

  17. Independent Schools - Independent Thinking - Independent Art: Testing Assumptions.

    ERIC Educational Resources Information Center

    Carnes, Virginia

    This study consists of a review of selected educational reform issues from the past 10 years that deal with changing attitudes towards art and art instruction in the context of independent private sector schools. The major focus of the study is in visual arts and examines various programs and initiatives with an art focus. Programs include…

  18. Analysis Summary of an Assembled Western U.S. Dataset

    SciTech Connect

    Ryall, F

    2005-03-22

    The dataset for this report is described in Walter et al. (2004) and consists primarily of Nevada Test Site (NTS) explosions, hole collapse and earthquakes. In addition, there were several earthquakes in California and Utah; earthquakes recorded near Cataract Creek, Arizona; mine blasts at two areas in Arizona; and two mine collapses in Wyoming. In the vicinity of NTS there were mainshock/aftershock sequences at Little Skull Mt, Scotty's Junction and Hector ere mine. All the events were shallow and distances ranged from about 0.1 degree to regional distances. All of the data for these events were carefully reviewed and analyzed. In the following sections of the report, we describe analysis procedures, problems with the data and results of analysis.

  19. An Alternative Measure of Solar Activity from Detailed Sunspot Datasets

    NASA Astrophysics Data System (ADS)

    Muraközy, J.; Baranyi, T.; Ludmány, A.

    2016-05-01

    The sunspot number is analyzed by using detailed sunspot data, including aspects of observability, sunspot sizes, and proper identification of sunspot groups as discrete entities of solar activity. The tests show that in addition to the subjective factors there are also objective causes of the ambiguities in the series of sunspot numbers. To introduce an alternative solar-activity measure, the physical meaning of the sunspot number has to be reconsidered. It contains two components whose numbers are governed by different physical mechanisms and this is one source of the ambiguity. This article suggests an activity index, which is the amount of emerged magnetic flux. The only long-term proxy measure is the detailed sunspot-area dataset with proper calibration to the magnetic flux. The Debrecen sunspot databases provide an appropriate source for the establishment of the suggested activity index.

  20. Utilizing the Antarctic Master Directory to find orphan datasets

    NASA Astrophysics Data System (ADS)

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    identifying the records containing a URL leading to a national data center or other disciplinary data repository, the remaining records were individually inspected for data type, format, and quality of metadata and then assessed to determine how best to preserve. Of the records reviewed, those for which appropriate repositories could be identified were submitted. An additional 35 were deemed acceptable in quality of metadata to register in the USAP-DCC. The content of these datasets were varied in nature, ranging from penguin counts to paleo-geologic maps to results of meteorological models all of which are discoverable through our search interface, http://www.usap-data.org/search.php. The remaining 40 records linked to either no data or had inadequate documentation for preservation highlighting the danger of serving datasets on local servers where minimal metadata standards can not be enforced and long-term access can not be ensured.

  1. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    SciTech Connect

    Lopez Torres, E. E-mail: cerello@to.infn.it; Fiorina, E.; Pennazio, F.; Peroni, C.; Saletta, M.; Cerello, P. E-mail: cerello@to.infn.it; Camarlinghi, N.; Fantacci, M. E.

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  2. Middle Atmosphere Transport Properties of Assimilated Datasets

    NASA Technical Reports Server (NTRS)

    Pawson, Steven; Rood, Richard

    1999-01-01

    One of the most compelling reasons for performing data assimilation in the middle atmosphere is to obtain global, balanced datasets for studies of trace gas transport and chemistry. This is a major motivation behind the Goddard Earth observation System-Data Assimilation System (GEOS-DAS). Previous studies have shown that while this and other data assimilation systems can generally obtain good estimates of the extratropical rotational velocity field, the divergent part of the dynamical field is deficient; this impacts the "residual circulation" and leads to spurious trace gas transport on seasonal and interannual timescales. These problems are impacted by the quality and the method of use of the observational data and by deficiencies in the atmospheric general circulation model. Whichever the cause at any place and time, the "solution" is to introduce non-physical forcing terms into the system (the so-called incremental analysis updates); these can directly (thermal) or indirectly (mechanical) affect the residual circulation. This paper will illustrate how the divergent circulation is affected by deficiencies in both observations and models. Theoretical considerations will be illustrated with examples from the GEOS-DAS and from simplified numerical experiments. These are designed to isolate known problems, such as the inability of models to sustain a quasi-biennial oscillation and sparse observational constraints on tropical dynamics, or radiative inconsistencies in the presence of volcanic aerosols.

  3. Intercalibration of Mars Global Surveyor Datasets

    NASA Technical Reports Server (NTRS)

    Houben, Howard; Bergstrom, R. W.; Hollingsworth, J.; Smith, M.; Martin, T.; Hinson, D.; DeVincenizi, D. (Technical Monitor)

    2002-01-01

    The calibration and validation of satellite soundings of atmospheric variables is always a difficult prospect, but this difficulty is greatly magnified when the measurements are made at a different planet, whose meteorology is poorly known and poorly constrained, and for which there are virtually no prospects of obtaining ground truth. The Mars Global Surveyor which has been circling Mars in its mapping orbit since early 1999 includes a variety of instruments capable of making atmospheric observations: the Thermal Emission Spectrometer (TES) which takes more than 100,000 nadir-view infrared spectra per day (although these observations are confined to the 2am - 2pm time of the sun-fixed orbit); much less frequent TES limb scans (still only at 2am and 2pm); the Mars Horizon Sensor Assembly measures side-looking broadband 15 micrometer radiation; Radio Science occultations at favorable seasons give high resolution temperature profiles; the Mars orbiter Camera and Mars Orbiter Laser Altimeter have made water, dust, and carbon dioxide cloud detections. These observations are now being supplemented by high-resolution 15 micron measurements by THEMIS on Mars Odyssey. Thus, all of these observations are made at different times and places. Data assimilation techniques are being used to fuse this vast array of observations into a single dataset that best represents our understanding of the Martian atmosphere, its current meteorological state, and the relevant instrumental properties.

  4. Middle Atmospheric Transport Properties of Assimilated Datasets

    NASA Technical Reports Server (NTRS)

    Pawson, Steven; Rood, Richard

    1999-01-01

    One of the most compelling reasons for performing data assimilation in the middle atmosphere is to obtain global, balanced datasets for studies of trace gas transport and chemistry. This is a major motivation behind the Goddard Earth observation System-Data Assimilation System (GEOS-DAS). Previous studies have shown that while this and other data assimilation systems can generally obtain good estimates of the extratropical rotational velocity field, the divergent part of the dynamical field is deficient; this impacts the "residual circulation" and leads to spurious trace gas transport on seasonal and interannual timescales. These problems are impacted by the quality and the method of use of the observational data and by deficiencies in the atmospheric general circulation model. Whichever the cause at any place and time, the "solution" is to introduce non-physical forcing terms into the system (the so-called incremental analysis updates); these can directly (thermal) or indirectly (mechanical) affect the residual circulation. This paper will illustrate how the divergent circulation is affected by deficiencies in both observations and models. Theoretical considerations will be illustrated with examples from the GEOS-DAS and from simplified numerical experiments. These are designed to isolate known problems, such as the inability of models to sustain a quasi-biennial oscillation and sparse observational constraints on tropical dynamics, or radiative inconsistencies in the presence of volcanic aerosols.

  5. Reconstructing thawing quintessence with multiple datasets

    NASA Astrophysics Data System (ADS)

    Lima, Nelson A.; Liddle, Andrew R.; Sahlén, Martin; Parkinson, David

    2016-03-01

    In this work we model the quintessence potential in a Taylor series expansion, up to second order, around the present-day value of the scalar field. The field is evolved in a thawing regime assuming zero initial velocity. We use the latest data from the Planck satellite, baryonic acoustic oscillations observations from the Sloan Digital Sky Survey, and supernova luminosity distance information from Union2.1 to constrain our models parameters, and also include perturbation growth data from the WiggleZ, BOSS, and 6dF surveys. The supernova data provide the strongest individual constraint on the potential parameters. We show that the growth data performance is competitive with the other datasets in constraining the dark energy parameters we introduce. We also conclude that the combined constraints we obtain for our model parameters, when compared to previous works of nearly a decade ago, have shown only modest improvement, even with new growth of structure data added to previously existent types of data.

  6. Visualization of cosmological particle-based datasets.

    PubMed

    Navratil, Paul; Johnson, Jarrett; Bromm, Volker

    2007-01-01

    We describe our visualization process for a particle-based simulation of the formation of the first stars and their impact on cosmic history. The dataset consists of several hundred time-steps of point simulation data, with each time-step containing approximately two million point particles. For each time-step, we interpolate the point data onto a regular grid using a method taken from the radiance estimate of photon mapping. We import the resulting regular grid representation into ParaView, with which we extract isosurfaces across multiple variables. Our images provide insights into the evolution of the early universe, tracing the cosmic transition from an initially homogeneous state to one of increasing complexity. Specifically, our visualizations capture the build-up of regions of ionized gas around the first stars, their evolution, and their complex interactions with the surrounding matter. These observations will guide the upcoming James Webb Space Telescope, the key astronomy mission of the next decade. PMID:17968129

  7. Classification of antimicrobial peptides with imbalanced datasets

    NASA Astrophysics Data System (ADS)

    Camacho, Francy L.; Torres, Rodrigo; Ramos Pollán, Raúl

    2015-12-01

    In the last years, pattern recognition has been applied to several fields for solving multiple problems in science and technology as for example in protein prediction. This methodology can be useful for prediction of activity of biological molecules, e.g. for determination of antimicrobial activity of synthetic and natural peptides. In this work, we evaluate the performance of different physico-chemical properties of peptides (descriptors groups) in the presence of imbalanced data sets, when facing the task of detecting whether a peptide has antimicrobial activity. We evaluate undersampling and class weighting techniques to deal with the class imbalance with different classification methods and descriptor groups. Our classification model showed an estimated precision of 96% showing that descriptors used to codify the amino acid sequences contain enough information to correlate the peptides sequences with their antimicrobial activity by means of learning machines. Moreover, we show how certain descriptor groups (pseudoaminoacid composition type I) work better with imbalanced datasets while others (dipeptide composition) work better with balanced ones.

  8. ASSESSING THE ACCURACY OF NATIONAL LAND COVER DATASET AREA ESTIMATES AT MULTIPLE SPATIAL EXTENTS

    EPA Science Inventory

    Site specific accuracy assessments provide fine-scale evaluation of the thematic accuracy of land use/land cover (LULC) datasets; however, they provide little insight into LULC accuracy across varying spatial extents. Additionally, LULC data are typically used to describe lands...

  9. Provenance Challenges for Earth Science Dataset Publication

    NASA Technical Reports Server (NTRS)

    Tilmes, Curt

    2011-01-01

    Modern science is increasingly dependent on computational analysis of very large data sets. Organizing, referencing, publishing those data has become a complex problem. Published research that depends on such data often fails to cite the data in sufficient detail to allow an independent scientist to reproduce the original experiments and analyses. This paper explores some of the challenges related to data identification, equivalence and reproducibility in the domain of data intensive scientific processing. It will use the example of Earth Science satellite data, but the challenges also apply to other domains.

  10. Prioritizing Cancer Therapeutic Small Molecules by Integrating Multiple OMICS Datasets

    PubMed Central

    Lv, Sali; Xu, Yanjun; Chen, Xin; Li, Yan; Li, Ronghong; Wang, Qianghu

    2012-01-01

    Abstract Drug design is crucial for the effective discovery of anti-cancer drugs. The success or failure of drug design often depends on the leading compounds screened in pre-clinical studies. Many efforts, such as in vivo animal experiments and in vitro drug screening, have improved this process, but these methods are usually expensive and laborious. In the post-genomics era, it is possible to seek leading compounds for large-scale candidate small-molecule screening with multiple OMICS datasets. In the present study, we developed a computational method of prioritizing small molecules as leading compounds by integrating transcriptomics and toxicogenomics data. This method provides priority lists for the selection of leading compounds, thereby reducing the time required for drug design. We found 11 known therapeutic small molecules for breast cancer in the top 100 candidates in our list, 2 of which were in the top 10. Furthermore, another 3 of the top 10 small molecules were recorded as closely related to cancer treatment in the DrugBank database. A comparison of the results of our approach with permutation tests and shared gene methods demonstrated that our OMICS data-based method is quite competitive. In addition, we applied our method to a prostate cancer dataset. The results of this analysis indicated that our method surpasses both the shared gene method and random selection. These analyses suggest that our method may be a valuable tool for directing experimental studies in cancer drug design, and we believe this time- and cost-effective computational strategy will be helpful in future studies in cancer therapy. PMID:22917481

  11. Clementine: Anticipated scientific datasets from the Moon and Geographos

    NASA Technical Reports Server (NTRS)

    Mcewen, A. S.

    1993-01-01

    The Clementine spacecraft mission is designed to test the performance of new lightweight and low-power detectors developed at the Lawrence Livermore National Laboratory (LLNL) for the Strategic Defense Initiative Office (SDIO). A secondary objective of the mission is to acquire useful scientific data, principally of the Moon and the near-Earth asteroid Geographos. The spacecraft will be in an elliptical polar orbit about the Moon for about 2 months beginning in February of 1994 and it will fly by Geographos on August 31. Clementine will carry seven detectors each weighing less than about 1 kg: two Star Trackers wide-angle uv/vis wide-angle Short Wavelength IR (SWIR) Long-Wavelength IR (LWIR) and LIDAR (Laser Image Detection And Ranging) narrow-angle imaging and ranging. Additional presentations about the mission detectors and related science issues are in this volume. If fully successful Clementine will return about 3 million lunar images, a dataset with nearly as many bits of data (uncompressed) as the first cycle of Magellan and more than 5000 images of Geographos. The complete and efficient analysis of such large data sets requires systematic processing efforts. Described below are concepts for two such efforts for the Clementine mission: global multispectral imaging of the Moon and videos of the Geographos flyby. Other anticipated datasets for which systematic processing might be desirable include multispectral observations of Earth; LIDAR altimetry of the Moon with high-resolution imaging along each ground track; high-resolution LIDAR color along each lunar ground track which could be used to identify potential titanium-rich deposits at scales of a few meters; and thermal IR imaging along each lunar ground track (including nighttime observations near the poles).

  12. Adventures in Creating an Historical Multi-sensor Precipitation Dataset

    NASA Astrophysics Data System (ADS)

    Fuelberg, H. E.

    2008-05-01

    Florida State University has created a ten year historical multi-sensor precipitation dataset using the National Weather Service's (NWS) Multi-sensor Precipitation Estimator (MPE) software. MPE combines the high spatial resolution of radar-derived estimates with the assumed "ground truth" of the gauges. Input for the procedure included radar-derived hourly digital precipitation arrays on the 4 x 4 km HRAP grid, together with hourly rain gauge data from the National Climatic Data Center and five Florida Water Management Districts. This combination of gauge information provides comparatively high spatial resolution. The MPE output consists of hourly rainfall estimates on the 4 x 4 km grid. This paper will describe the many challenges that we faced in creating the multi-sensor data set. Some of the topics to be discussed are 1) Rain gauge data, even if said to have been quality controlled, still need a careful additional check. Objective procedures are needed due to the vast amount of hourly data, and it is challenging to develop a scheme that will catch most errors without deleting valid information. 2) The radar data also require careful scrutiny. Many types of false or erroneous returns lurk within the files and can lead to erroneous multi- sensor results. 3) The MPE procedure contains many adaptable parameters that need to be tuned to account for density of the available data and the character of the precipitation. These parameters generally will need to be changed based on the geographical area of study. Finally, examples of the MPE dataset will be shown along with brief comparisons with gauge data alone.

  13. A reference GNSS tropospheric dataset over Europe.

    NASA Astrophysics Data System (ADS)

    Pacione, Rosa; Di Tomaso, Simona

    2016-04-01

    The present availability of 18 years of GNSS data belonging to the European Permanent Network (EPN, http://www.epncb.oma.be/) is a valuable database for the development of a climate data record of GNSS tropospheric products over Europe. This dataset has high potential for monitoring trend and variability in atmospheric water vapour, improving the knowledge of climatic trends of atmospheric water vapour and being useful for global and regional NWP reanalyses as well as climate model simulations. In the framework of the EPN-Repro2, a second reprocessing campaign of the EPN, five Analysis Centres have homogenously reprocessed the EPN network for the 1996-2013. Three Analysis Centres are providing homogenously reprocessed solutions for the entire network, which are analyzed by the three different software packages: Bernese, GAMIT and GIPSY-OASIS. Smaller subnetworks based on Bernese 5.2 are also provided. A huge effort is made for providing solutions that are the basis for deriving new coordinates, velocities and troposphere parameters, Zenith Tropospheric Delays and Horizontal Gradients, for the entire EPN. These individual contributions are combined in order to provide the official EPN reprocessed products. A preliminary tropospheric combined solution for the period 1996-2013 has been carried out. It is based on all the available homogenously reprocessed solutions and it offers the possibility to assess each of them prior to the ongoing final combination. We will present the results of the EPN Repro2 tropospheric combined products and how the climate community will benefit from them. Aknowledgment.The EPN Repro2 working group is acknowledged for providing the EPN solutions used in this work. E-GEOS activity is carried out in the framework of ASI contract 2015-050-R.0.

  14. Integrating diverse datasets improves developmental enhancer prediction.

    PubMed

    Erwin, Genevieve D; Oksenberg, Nir; Truty, Rebecca M; Kostka, Dennis; Murphy, Karl K; Ahituv, Nadav; Pollard, Katherine S; Capra, John A

    2014-06-01

    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable

  15. Integrating Diverse Datasets Improves Developmental Enhancer Prediction

    PubMed Central

    Erwin, Genevieve D.; Oksenberg, Nir; Truty, Rebecca M.; Kostka, Dennis; Murphy, Karl K.; Ahituv, Nadav; Pollard, Katherine S.; Capra, John A.

    2014-01-01

    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable

  16. Automatic registration method for multisensor datasets adopted for dimensional measurements on cutting tools

    NASA Astrophysics Data System (ADS)

    Shaw, L.; Ettl, S.; Mehari, F.; Weckenmann, A.; Häusler, G.

    2013-04-01

    Multisensor systems with optical 3D sensors are frequently employed to capture complete surface information by measuring workpieces from different views. During coarse and fine registration the resulting datasets are afterward transformed into one common coordinate system. Automatic fine registration methods are well established in dimensional metrology, whereas there is a deficit in automatic coarse registration methods. The advantage of a fully automatic registration procedure is twofold: it enables a fast and contact-free alignment and further a flexible application to datasets of any kind of optical 3D sensor. In this paper, an algorithm adapted for a robust automatic coarse registration is presented. The method was originally developed for the field of object reconstruction or localization. It is based on a segmentation of planes in the datasets to calculate the transformation parameters. The rotation is defined by the normals of three corresponding segmented planes of two overlapping datasets, while the translation is calculated via the intersection point of the segmented planes. First results have shown that the translation is strongly shape dependent: 3D data of objects with non-orthogonal planar flanks cannot be registered with the current method. In the novel supplement for the algorithm, the translation is additionally calculated via the distance between centroids of corresponding segmented planes, which results in more than one option for the transformation. A newly introduced measure considering the distance between the datasets after coarse registration evaluates the best possible transformation. Results of the robust automatic registration method are presented on the example of datasets taken from a cutting tool with a fringe-projection system and a focus-variation system. The successful application in dimensional metrology is proven with evaluations of shape parameters based on the registered datasets of a calibrated workpiece.

  17. Application of Huang-Hilbert Transforms to Geophysical Datasets

    NASA Technical Reports Server (NTRS)

    Duffy, Dean G.

    2003-01-01

    The Huang-Hilbert transform is a promising new method for analyzing nonstationary and nonlinear datasets. In this talk I will apply this technique to several important geophysical datasets. To understand the strengths and weaknesses of this method, multi- year, hourly datasets of the sea level heights and solar radiation will be analyzed. Then we will apply this transform to the analysis of gravity waves observed in a mesoscale observational net.

  18. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    SciTech Connect

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul; Li, Yaquin; Garg, Seema; Tobin Jr, Kenneth William; Chaum, Edward

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

  19. Framework for Interactive Parallel Dataset Analysis on the Grid

    SciTech Connect

    Alexander, David A.; Ananthan, Balamurali; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  20. Development to Release of CTBT Knowledge Base Datasets

    SciTech Connect

    Moore, S.G.; Shepherd, E.R.

    1998-10-20

    For the CTBT Knowledge Base to be useful as a tool for improving U.S. monitoring capabilities, the contents of the Knowledge Base must be subjected to a well-defined set of procedures to ensure integrity and relevance of the con- stituent datasets. This paper proposes a possible set of procedures for datasets that are delivered to Sandia National Laboratories (SNL) for inclusion in the Knowledge Base. The proposed procedures include defining preliminary acceptance criteria, performing verification and validation activities, and subjecting the datasets to approvrd by domain experts. Preliminary acceptance criteria include receipt of the data, its metadata, and a proposal for its usability for U.S. National Data Center operations. Verification activi- ties establish the correctness and completeness of the data, while validation activities establish the relevance of the data to its proposed use. Results from these activities are presented to domain experts, such as analysts and peers for final approval of the datasets for release to the Knowledge Base. Formats and functionality will vary across datasets, so the procedures proposed herein define an overall plan for establishing integrity and relevance of the dataset. Specific procedures for verification, validation, and approval will be defined for each dataset, or for each type of dataset, as appropriate. Potential dataset sources including Los Alamos National Laboratories and Lawrence Livermore National Laborato- ries have contributed significantly to the development of thk process.

  1. Quantifying the reliability of four global datasets for drought monitoring over a semiarid region

    NASA Astrophysics Data System (ADS)

    Katiraie-Boroujerdy, Pari-Sima; Nasrollahi, Nasrin; Hsu, Kuo-lin; Sorooshian, Soroosh

    2016-01-01

    Drought is one of the most relevant natural disasters, especially in arid regions such as Iran. One of the requirements to access reliable drought monitoring is long-term and continuous high-resolution precipitation data. Different climatic and global databases are being developed and made available in real time or near real time by different agencies and centers; however, for this purpose, these databases must be evaluated regionally and in different local climates. In this paper, a near real-time global climate model, a data assimilation system, and two gridded gauge-based datasets over Iran are evaluated. The ground truth data include 50 gauges from the period of 1980 to 2010. Drought analysis was carried out by means of the Standard Precipitation Index (SPI) at 2-, 3-, 6-, and 12-month timescales. Although the results show spatial variations, overall the two gauge-based datasets perform better than the models. In addition, the results are more reliable for the western portion of the Zagros Range and the eastern region of the country. The analysis of the onsets of the 6-month moderate drought with at least 3 months' persistence indicates that all datasets have a better performance over the western portion of the Zagros Range, but display poor performance over the coast of the Caspian Sea. Base on the results of this study, the Modern-Era Retrospective Analysis for Research and Applications (MERRA) dataset is a preferred alternative for drought analysis in the region when gauge-based datasets are not available.

  2. The CORA dataset: validation and diagnostics of in-situ ocean temperature and salinity measurements

    NASA Astrophysics Data System (ADS)

    Cabanes, C.; Grouazel, A.; von Schuckmann, K.; Hamon, M.; Turpin, V.; Coatanoan, C.; Paris, F.; Guinehut, S.; Boone, C.; Ferry, N.; de Boyer Montégut, C.; Carval, T.; Reverdin, G.; Pouliquen, S.; Le Traon, P.-Y.

    2013-01-01

    The French program Coriolis, as part of the French operational oceanographic system, produces the COriolis dataset for Re-Analysis (CORA) on a yearly basis. This dataset contains in-situ temperature and salinity profiles from different data types. The latest release CORA3 covers the period 1990 to 2010. Several tests have been developed to ensure a homogeneous quality control of the dataset and to meet the requirements of the physical ocean reanalysis activities (assimilation and validation). Improved tests include some simple tests based on comparison with climatology and a model background check based on a global ocean reanalysis. Visual quality control is performed on all suspicious temperature and salinity profiles identified by the tests, and quality flags are modified in the dataset if necessary. In addition, improved diagnostic tools have been developed - including global ocean indicators - which give information on the quality of the CORA3 dataset and its potential applications. CORA3 is available on request through the MyOcean Service Desk (http://www.myocean.eu/).

  3. The CORA dataset: validation and diagnostics of ocean temperature and salinity in situ measurements

    NASA Astrophysics Data System (ADS)

    Cabanes, C.; Grouazel, A.; von Schuckmann, K.; Hamon, M.; Turpin, V.; Coatanoan, C.; Guinehut, S.; Boone, C.; Ferry, N.; Reverdin, G.; Pouliquen, S.; Le Traon, P.-Y.

    2012-03-01

    The French program Coriolis as part of the French oceanographic operational system produces the COriolis dataset for Re-Analysis (CORA) on a yearly basis which is based on temperature and salinity measurements on observed levels from different data types. The latest release of CORA covers the period 1990 to 2010. To qualify this dataset, several tests have been developed to improve in a homogeneous way the quality of the raw dataset and to fit the level required by the physical ocean re-analysis activities (assimilation and validation). These include some simple tests, climatological tests and a model background check based on a global ocean reanalysis. Visual quality control (QC) is performed on all suspicious temperature (T) and salinity (S) profiles identified by the tests and quality flags are modified in the dataset if necessary. In addition, improved diagnostic tools were developed - including global ocean indicators - which give information on the potential and quality of the CORA dataset for all applications. This Coriolis product is available on request through the MyOcean Service Desk (http://www.myocean.eu/).

  4. Normalization of transposon-mutant library sequencing datasets to improve identification of conditionally essential genes.

    PubMed

    DeJesus, Michael A; Ioerger, Thomas R

    2016-06-01

    Sequencing of transposon-mutant libraries using next-generation sequencing (TnSeq) has become a popular method for determining which genes and non-coding regions are essential for growth under various conditions in bacteria. For methods that rely on quantitative comparison of counts of reads at transposon insertion sites, proper normalization of TnSeq datasets is vitally important. Real TnSeq datasets are often noisy and exhibit a significant skew that can be dominated by high counts at a small number of sites (often for non-biological reasons). If two datasets that are not appropriately normalized are compared, it might cause the artifactual appearance of Differentially Essential (DE) genes in a statistical test, constituting type I errors (false positives). In this paper, we propose a novel method for normalization of TnSeq datasets that corrects for the skew of read-count distributions by fitting them to a Beta-Geometric distribution. We show that this read-count correction procedure reduces the number of false positives when comparing replicate datasets grown under the same conditions (for which no genuine differences in essentiality are expected). We compare these results to results obtained with other normalization procedures, and show that it results in greater reduction in the number of false positives. In addition we investigate the effects of normalization on the detection of DE genes. PMID:26932272

  5. Advancements in Wind Integration Study Input Data Modeling: The Wind Integration National Dataset (WIND) Toolkit

    NASA Astrophysics Data System (ADS)

    Hodge, B.; Orwig, K.; McCaa, J. R.; Harrold, S.; Draxl, C.; Jones, W.; Searight, K.; Getman, D.

    2013-12-01

    Regional wind integration studies in the United States, such as the Western Wind and Solar Integration Study (WWSIS), Eastern Wind Integration and Transmission Study (EWITS), and Eastern Renewable Generation Integration Study (ERGIS), perform detailed simulations of the power system to determine the impact of high wind and solar energy penetrations on power systems operations. Some of the specific aspects examined include: infrastructure requirements, impacts on grid operations and conventional generators, ancillary service requirements, as well as the benefits of geographic diversity and forecasting. These studies require geographically broad and temporally consistent wind and solar power production input datasets that realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of wind and solar power plant production, and are time-synchronous with load profiles. The original western and eastern wind datasets were generated independently for 2004-2006 using numerical weather prediction (NWP) models run on a ~2 km grid with 10-minute resolution. Each utilized its own site selection process to augment existing wind plants with simulated sites of high development potential. The original dataset also included day-ahead simulated forecasts. These datasets were the first of their kind and many lessons were learned from their development. For example, the modeling approach used generated periodic false ramps that later had to be removed due to unrealistic impacts on ancillary service requirements. For several years, stakeholders have been requesting an updated dataset that: 1) covers more recent years; 2) spans four or more years to better evaluate interannual variability; 3) uses improved methods to minimize false ramps and spatial seams; 4) better incorporates solar power production inputs; and 5) is more easily accessible. To address these needs, the U.S. Department of Energy (DOE) Wind and Solar Programs have funded two

  6. Sharing Clouds: Showing, Distributing, and Sharing Large Point Datasets

    NASA Astrophysics Data System (ADS)

    Grigsby, S.

    2012-12-01

    Sharing large data sets with colleagues and the general public presents a unique technological challenge for scientists. In addition to large data volumes, there are significant challenges in representing data that is often irregular, multidimensional and spatial in nature. For derived data products, additional challenges exist in displaying and providing provenance data. For this presentation, several open source technologies are demonstrated for the remote display and access of large irregular point data sets. These technologies and techniques include the remote viewing of point data using HTML5 and OpenGL, which provides a highly accessible preview of the data sets for a range of audiences. Intermediate levels of accessibility and high levels of interactivity are accomplished with technologies such as wevDAV, which allows collaborators to run analysis on local clients, using data stored and administered on remote servers. Remote processing and analysis, including provenance tracking, will be discussed at the workgroup level. The data sets used for this presentation include data acquired from the NSF funded National Center for Airborne Laser Mapping (NCALM), and data acquired for research and instructional use in NASA's Student Airborne Research Program (SARP). These datasets include Light Ranging And Detection (LiDAR) point clouds ranging in size from several hundred thousand to several hundred million data points; the techniques and technologies discussed are applicable to other forms of irregular point data.

  7. Public Availability to ECS Collected Datasets

    NASA Astrophysics Data System (ADS)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  8. Accuracy assessment of gridded precipitation datasets in the Himalayas

    NASA Astrophysics Data System (ADS)

    Khan, A.

    2015-12-01

    Accurate precipitation data are vital for hydro-climatic modelling and water resources assessments. Based on mass balance calculations and Turc-Budyko analysis, this study investigates the accuracy of twelve widely used precipitation gridded datasets for sub-basins in the Upper Indus Basin (UIB) in the Himalayas-Karakoram-Hindukush (HKH) region. These datasets are: 1) Global Precipitation Climatology Project (GPCP), 2) Climate Prediction Centre (CPC) Merged Analysis of Precipitation (CMAP), 3) NCEP / NCAR, 4) Global Precipitation Climatology Centre (GPCC), 5) Climatic Research Unit (CRU), 6) Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), 7) Tropical Rainfall Measuring Mission (TRMM), 8) European Reanalysis (ERA) interim data, 9) PRINCETON, 10) European Reanalysis-40 (ERA-40), 11) Willmott and Matsuura, and 12) WATCH Forcing Data based on ERA interim (WFDEI). Precipitation accuracy and consistency was assessed by physical mass balance involving sum of annual measured flow, estimated actual evapotranspiration (average of 4 datasets), estimated glacier mass balance melt contribution (average of 4 datasets), and ground water recharge (average of 3 datasets), during 1999-2010. Mass balance assessment was complemented by Turc-Budyko non-dimensional analysis, where annual precipitation, measured flow and potential evapotranspiration (average of 5 datasets) data were used for the same period. Both analyses suggest that all tested precipitation datasets significantly underestimate precipitation in the Karakoram sub-basins. For the Hindukush and Himalayan sub-basins most datasets underestimate precipitation, except ERA-interim and ERA-40. The analysis indicates that for this large region with complicated terrain features and stark spatial precipitation gradients the reanalysis datasets have better consistency with flow measurements than datasets derived from records of only sparsely distributed climatic

  9. Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets.

    PubMed

    Grarup, Niels; Sulem, Patrick; Sandholt, Camilla H; Thorleifsson, Gudmar; Ahluwalia, Tarunveer S; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F; Magnusson, Olafur T; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

    2013-06-01

    Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B(12) (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B(12) and folate measurements, respectively. We found six novel loci associating with serum B(12) (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B(12) and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B(12) or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations. PMID:23754956

  10. Vikodak - A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets

    PubMed Central

    Nagpal, Sunil; Haque, Mohammed Monzoorul; Mande, Sharmila S.

    2016-01-01

    Background The overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Consequently, prior (collated) information pertaining to genes, enzymes encoded by the resident microbes can aid in indirectly (re)constructing/ inferring the metabolic/ functional potential of a given microbial community (given its taxonomic abundance profile). In this study, we present Vikodak—a multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. With the underlying assumptions of co-metabolism and independent contributions of different microbes in a community, a concerted effort has been made to accommodate microbial co-existence patterns in various modules incorporated in Vikodak. Results Validation experiments on over 1400 metagenomic samples have confirmed the utility of Vikodak in (a) deciphering enzyme abundance profiles of any KEGG metabolic pathway, (b) functional resolution of distinct metagenomic environments, (c) inferring patterns of functional interaction between resident microbes, and (d) automating statistical comparison of functional features of studied microbiomes. Novel features incorporated in Vikodak also facilitate automatic removal of false positives and spurious functional predictions. Conclusions With novel provisions for comprehensive functional analysis, inclusion of microbial co-existence pattern based algorithms, automated inter-environment comparisons; in-depth analysis of individual metabolic pathways and greater flexibilities at the user end, Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction. Availability and Implementation A web implementation of Vikodak

  11. Independence of Internal Auditors.

    ERIC Educational Resources Information Center

    Montondon, Lucille; Meixner, Wilda F.

    1993-01-01

    A survey of 288 college and university auditors investigated patterns in their appointment, reporting, and supervisory practices as indicators of independence and objectivity. Results indicate a weakness in the positioning of internal auditing within institutions, possibly compromising auditor independence. Because the auditing function is…

  12. American Independence. Fifth Grade.

    ERIC Educational Resources Information Center

    Crosby, Annette

    This fifth grade teaching unit covers early conflicts between the American colonies and Britain, battles of the American Revolutionary War, and the Declaration of Independence. Knowledge goals address the pre-revolutionary acts enforced by the British, the concepts of conflict and independence, and the major events and significant people from the…

  13. Fostering Musical Independence

    ERIC Educational Resources Information Center

    Shieh, Eric; Allsup, Randall Everett

    2016-01-01

    Musical independence has always been an essential aim of musical instruction. But this objective can refer to everything from high levels of musical expertise to more student choice in the classroom. While most conceptualizations of musical independence emphasize the demonstration of knowledge and skills within particular music traditions, this…

  14. Centering on Independent Study.

    ERIC Educational Resources Information Center

    Miller, Stephanie

    Independent study is an instructional approach that can have enormous power in the classroom. It can be used successfully with students at all ability levels, even though it is often associated with gifted students. Independent study is an opportunity for students to study a subject of their own choosing under the guidance of a teacher. The…

  15. AMADA: Analysis of Multidimensional Astronomical DAtasets

    NASA Astrophysics Data System (ADS)

    de Souza, Rafael S.; Ciardi, Benedetta

    2015-03-01

    AMADA allows an iterative exploration and information retrieval of high-dimensional data sets. This is done by performing a hierarchical clustering analysis for different choices of correlation matrices and by doing a principal components analysis in the original data. Additionally, AMADA provides a set of modern visualization data-mining diagnostics. The user can switch between them using the different tabs.

  16. NCAR's Research Data Archive: OPeNDAP Access for Complex Datasets

    NASA Astrophysics Data System (ADS)

    Dattore, R.; Worley, S. J.

    2014-12-01

    Many datasets have complex structures including hundreds of parameters and numerous vertical levels, grid resolutions, and temporal products. Making these data accessible is a challenge for a data provider. OPeNDAP is powerful protocol for delivering in real-time multi-file datasets that can be ingested by many analysis and visualization tools, but for these datasets there are too many choices about how to aggregate. Simple aggregation schemes can fail to support, or at least make it very challenging, for many potential studies based on complex datasets. We address this issue by using a rich file content metadata collection to create a real-time customized OPeNDAP service to match the full suite of access possibilities for complex datasets. The Climate Forecast System Reanalysis (CFSR) and it's extension, the Climate Forecast System Version 2 (CFSv2) datasets produced by the National Centers for Environmental Prediction (NCEP) and hosted by the Research Data Archive (RDA) at the Computational and Information Systems Laboratory (CISL) at NCAR are examples of complex datasets that are difficult to aggregate with existing data server software. CFSR and CFSv2 contain 141 distinct parameters on 152 vertical levels, six grid resolutions and 36 products (analyses, n-hour forecasts, multi-hour averages, etc.) where not all parameter/level combinations are available at all grid resolution/product combinations. These data are archived in the RDA with the data structure provided by the producer; no additional re-organization or aggregation have been applied. Since 2011, users have been able to request customized subsets (e.g. - temporal, parameter, spatial) from the CFSR/CFSv2, which are processed in delayed-mode and then downloaded to a user's system. Until now, the complexity has made it difficult to provide real-time OPeNDAP access to the data. We have developed a service that leverages the already-existing subsetting interface and allows users to create a virtual dataset

  17. Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System

    NASA Astrophysics Data System (ADS)

    Ji, Z.; Worley, S. J.; Schuster, D. C.

    2011-12-01

    at hourly, daily, monthly, and yearly intervals. DSUPDT is also fully scalable and continues to support addition of new data streams. This paper will introduce the powerful functionality of the RDAMS for operational dataset updates, and provide examples of its use

  18. Interface between astrophysical datasets and distributed database management systems (DAVID)

    NASA Technical Reports Server (NTRS)

    Iyengar, S. S.

    1988-01-01

    This is a status report on the progress of the DAVID (Distributed Access View Integrated Database Management System) project being carried out at Louisiana State University, Baton Rouge, Louisiana. The objective is to implement an interface between Astrophysical datasets and DAVID. Discussed are design details and implementation specifics between DAVID and astrophysical datasets.

  19. Really big data: Processing and analysis of large datasets

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  20. Querying Patterns in High-Dimensional Heterogenous Datasets

    ERIC Educational Resources Information Center

    Singh, Vishwakarma

    2012-01-01

    The recent technological advancements have led to the availability of a plethora of heterogenous datasets, e.g., images tagged with geo-location and descriptive keywords. An object in these datasets is described by a set of high-dimensional feature vectors. For example, a keyword-tagged image is represented by a color-histogram and a…

  1. Primary Datasets for Case Studies of River-Water Quality

    ERIC Educational Resources Information Center

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  2. Finding Spatio-Temporal Patterns in Large Sensor Datasets

    ERIC Educational Resources Information Center

    McGuire, Michael Patrick

    2010-01-01

    Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…

  3. TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

    PubMed Central

    2010-01-01

    Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner. PMID:20573248

  4. New addition curing polyimides

    NASA Technical Reports Server (NTRS)

    Frimer, Aryeh A.; Cavano, Paul

    1991-01-01

    In an attempt to improve the thermal-oxidative stability (TOS) of PMR-type polymers, the use of 1,4-phenylenebis (phenylmaleic anhydride) PPMA, was evaluated. Two series of nadic end-capped addition curing polyimides were prepared by imidizing PPMA with either 4,4'-methylene dianiline or p-phenylenediamine. The first resulted in improved solubility and increased resin flow while the latter yielded a compression molded neat resin sample with a T(sub g) of 408 C, close to 70 C higher than PME-15. The performance of these materials in long term weight loss studies was below that of PMR-15, independent of post-cure conditions. These results can be rationalized in terms of the thermal lability of the pendant phenyl groups and the incomplete imidization of the sterically congested PPMA. The preparation of model compounds as well as future research directions are discussed.

  5. Length-independent structural similarities enrich the antibody CDR canonical class model.

    PubMed

    Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M

    2016-01-01

    Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing. PMID:26963563

  6. Length-independent structural similarities enrich the antibody CDR canonical class model

    PubMed Central

    Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M.

    2016-01-01

    ABSTRACT Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing. PMID:26963563

  7. Lacunarity analysis of raster datasets and 1D, 2D, and 3D point patterns

    NASA Astrophysics Data System (ADS)

    Dong, Pinliang

    2009-10-01

    Spatial scale plays an important role in many fields. As a scale-dependent measure for spatial heterogeneity, lacunarity describes the distribution of gaps within a set at multiple scales. In Earth science, environmental science, and ecology, lacunarity has been increasingly used for multiscale modeling of spatial patterns. This paper presents the development and implementation of a geographic information system (GIS) software extension for lacunarity analysis of raster datasets and 1D, 2D, and 3D point patterns. Depending on the application requirement, lacunarity analysis can be performed in two modes: global mode or local mode. The extension works for: (1) binary (1-bit) and grey-scale datasets in any raster format supported by ArcGIS and (2) 1D, 2D, and 3D point datasets as shapefiles or geodatabase feature classes. For more effective measurement of lacunarity for different patterns or processes in raster datasets, the extension allows users to define an area of interest (AOI) in four different ways, including using a polygon in an existing feature layer. Additionally, directionality can be taken into account when grey-scale datasets are used for local lacunarity analysis. The methodology and graphical user interface (GUI) are described. The application of the extension is demonstrated using both simulated and real datasets, including Brodatz texture images, a Spaceborne Imaging Radar (SIR-C) image, simulated 1D points on a drainage network, and 3D random and clustered point patterns. The options of lacunarity analysis and the effects of polyline arrangement on lacunarity of 1D points are also discussed. Results from sample data suggest that the lacunarity analysis extension can be used for efficient modeling of spatial patterns at multiple scales.

  8. Efficient segmentation of 3D fluoroscopic datasets from mobile C-arm

    NASA Astrophysics Data System (ADS)

    Styner, Martin A.; Talib, Haydar; Singh, Digvijay; Nolte, Lutz-Peter

    2004-05-01

    The emerging mobile fluoroscopic 3D technology linked with a navigation system combines the advantages of CT-based and C-arm-based navigation. The intra-operative, automatic segmentation of 3D fluoroscopy datasets enables the combined visualization of surgical instruments and anatomical structures for enhanced planning, surgical eye-navigation and landmark digitization. We performed a thorough evaluation of several segmentation algorithms using a large set of data from different anatomical regions and man-made phantom objects. The analyzed segmentation methods include automatic thresholding, morphological operations, an adapted region growing method and an implicit 3D geodesic snake method. In regard to computational efficiency, all methods performed within acceptable limits on a standard Desktop PC (30sec-5min). In general, the best results were obtained with datasets from long bones, followed by extremities. The segmentations of spine, pelvis and shoulder datasets were generally of poorer quality. As expected, the threshold-based methods produced the worst results. The combined thresholding and morphological operations methods were considered appropriate for a smaller set of clean images. The region growing method performed generally much better in regard to computational efficiency and segmentation correctness, especially for datasets of joints, and lumbar and cervical spine regions. The less efficient implicit snake method was able to additionally remove wrongly segmented skin tissue regions. This study presents a step towards efficient intra-operative segmentation of 3D fluoroscopy datasets, but there is room for improvement. Next, we plan to study model-based approaches for datasets from the knee and hip joint region, which would be thenceforth applied to all anatomical regions in our continuing development of an ideal segmentation procedure for 3D fluoroscopic images.

  9. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  10. The Centennial Trends Greater Horn of Africa precipitation dataset

    PubMed Central

    Funk, Chris; Nicholson, Sharon E.; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded ‘Centennial Trends’ precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data. PMID:26451250

  11. The Centennial Trends Greater Horn of Africa precipitation dataset.

    PubMed

    Funk, Chris; Nicholson, Sharon E; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded 'Centennial Trends' precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data. PMID:26451250

  12. Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets

    PubMed Central

    Gratzl, Samuel; Gehlenborg, Nils; Lex, Alexander; Pfister, Hanspeter; Streit, Marc

    2016-01-01

    Answering questions about complex issues often requires analysts to take into account information contained in multiple interconnected datasets. A common strategy in analyzing and visualizing large and heterogeneous data is dividing it into meaningful subsets. Interesting subsets can then be selected and the associated data and the relationships between the subsets visualized. However, neither the extraction and manipulation nor the comparison of subsets is well supported by state-of-the-art techniques. In this paper we present Domino, a novel multiform visualization technique for effectively representing subsets and the relationships between them. By providing comprehensive tools to arrange, combine, and extract subsets, Domino allows users to create both common visualization techniques and advanced visualizations tailored to specific use cases. In addition to the novel technique, we present an implementation that enables analysts to manage the wide range of options that our approach offers. Innovative interactive features such as placeholders and live previews support rapid creation of complex analysis setups. We introduce the technique and the implementation using a simple example and demonstrate scalability and effectiveness in a use case from the field of cancer genomics. PMID:26356916

  13. Data Machine Independence

    1994-12-30

    Data-machine independence achieved by using four technologies (ASN.1, XDR, SDS, and ZEBRA) has been evaluated by encoding two different applications in each of the above; and their results compared against the standard programming method using C.

  14. Media independent interface

    NASA Technical Reports Server (NTRS)

    1987-01-01

    The work done on the Media Independent Interface (MII) Interface Control Document (ICD) program is described and recommendations based on it were made. Explanations and rationale for the content of the ICD itself are presented.

  15. Identification of rogue datasets in serial crystallography1

    PubMed Central

    Assmann, Greta; Brehm, Wolfgang; Diederichs, Kay

    2016-01-01

    Advances in beamline optics, detectors and X-ray sources allow new techniques of crystallographic data collection. In serial crystallography, a large number of partial datasets from crystals of small volume are measured. Merging of datasets from different crystals in order to enhance data completeness and accuracy is only valid if the crystals are isomorphous, i.e. sufficiently similar in cell parameters, unit-cell contents and molecular structure. Identification and exclusion of non-isomorphous datasets is therefore indispensable and must be done by means of suitable indicators. To identify rogue datasets, the influence of each dataset on CC1/2 [Karplus & Diederichs (2012 ▸). Science, 336, 1030–1033], the correlation coefficient between pairs of intensities averaged in two randomly assigned subsets of observations, is evaluated. The presented method employs a precise calculation of CC1/2 that avoids the random assignment, and instead of using an overall CC1/2, an average over resolution shells is employed to obtain sensible results. The selection procedure was verified by measuring the correlation of observed (merged) intensities and intensities calculated from a model. It is found that inclusion and merging of non-isomorphous datasets may bias the refined model towards those datasets, and measures to reduce this effect are suggested. PMID:27275144

  16. Consensus gene regulatory networks: combining multiple microarray gene expression datasets

    NASA Astrophysics Data System (ADS)

    Peeling, Emma; Tucker, Allan

    2007-09-01

    In this paper we present a method for modelling gene regulatory networks by forming a consensus Bayesian network model from multiple microarray gene expression datasets. Our method is based on combining Bayesian network graph topologies and does not require any special pre-processing of the datasets, such as re-normalisation. We evaluate our method on a synthetic regulatory network and part of the yeast heat-shock response regulatory network using publicly available yeast microarray datasets. Results are promising; the consensus networks formed provide a broader view of the potential underlying network, obtaining an increased true positive rate over networks constructed from a single data source.

  17. Dataset of aggregate producers in New Mexico

    USGS Publications Warehouse

    Orris, Greta J.

    2000-01-01

    This report presents data, including latitude and longitude, for aggregate sites in New Mexico that were believed to be active in the period 1997-1999. The data are presented in paper form in Part A of this report and as Microsoft Excel 97 and Data Interchange Format (DIF) files in Part B. The work was undertaken as part of the effort to update information for the National Atlas. This compilation includes data from: the files of U.S. Geological Survey (USGS); company contacts; the New Mexico Bureau of Mines and Mineral Resources, New Mexico Bureau of Mine Inspection, and the Mining and Minerals Division of the New Mexico Energy, Minerals and Natural Resources Department (Hatton and others, 1998); the Bureau of Land Management Information; and direct communications with some of the aggregate operators. Additional information on most of the sites is available in Hatton and others (1998).

  18. Techniques for Exploring Cluster Compressed Geospatial-Temporal Satellite Datasets

    NASA Astrophysics Data System (ADS)

    Ashley, John M.

    detail. I examined the stability and performance of the method. I developed the Palettized Automated Coloring Algorithm (PACA) to allow automated production of hierarchical global cluster set maps and additional maps to highlight the extent and changes over time of clusters. I then develop a graphic that displays, using multiple maps and colors, the evolution of hierarchical clusters over time. I developed a custom graphic to allow visualization of large numbers of weighted geophysical data vectors. It used color, overplotting, and structural meta-data about the physical data vectors in a fashion that can be extended to other datasets. The visualization can be extended for exploratory and interactive use. I applied a combination of these new techniques and tools to study several scenarios related to previous research. The graphics provided a step towards the goal of understanding what the grid cell cluster represented in terms of the geophysical variables.

  19. The Transition of NASA EOS Datasets to WFO Operations: A Model for Future Technology Transfer

    NASA Technical Reports Server (NTRS)

    Darden, C.; Burks, J.; Jedlovec, G.; Haines, S.

    2007-01-01

    The collocation of a National Weather Service (NWS) Forecast Office with atmospheric scientists from NASA/Marshall Space Flight Center (MSFC) in Huntsville, Alabama has afforded a unique opportunity for science sharing and technology transfer. Specifically, the NWS office in Huntsville has interacted closely with research scientists within the SPORT (Short-term Prediction and Research and Transition) Center at MSFC. One significant technology transfer that has reaped dividends is the transition of unique NASA EOS polar orbiting datasets into NWS field operations. NWS forecasters primarily rely on the AWIPS (Advanced Weather Information and Processing System) decision support system for their day to day forecast and warning decision making. Unfortunately, the transition of data from operational polar orbiters or low inclination orbiting satellites into AWIPS has been relatively slow due to a variety of reasons. The ability to integrate these high resolution NASA datasets into operations has yielded several benefits. The MODIS (MODerate-resolution Imaging Spectrometer ) instrument flying on the Aqua and Terra satellites provides a broad spectrum of multispectral observations at resolutions as fine as 250m. Forecasters routinely utilize these datasets to locate fine lines, boundaries, smoke plumes, locations of fog or haze fields, and other mesoscale features. In addition, these important datasets have been transitioned to other WFOs for a variety of local uses. For instance, WFO Great Falls Montana utilizes the MODIS snow cover product for hydrologic planning purposes while several coastal offices utilize the output from the MODIS and AMSR-E instruments to supplement observations in the data sparse regions of the Gulf of Mexico and western Atlantic. In the short term, these datasets have benefited local WFOs in a variety of ways. In the longer term, the process by which these unique datasets were successfully transitioned to operations will benefit the planning and

  20. The COST-HOME monthly benchmark dataset with temperature and precipitation data for testing homogenisation algorithms

    NASA Astrophysics Data System (ADS)

    Venema, V. K. C.; Mestre, O.

    2009-04-01

    As part of the COST Action HOME (Advances in homogenisation methods of climate series: an integrated approach) a dataset is generated that will serve as a benchmark for homogenisation algorithms. Members of the Action and third parties are invited to homogenise this dataset. The results of this exercise will be analysed by the HOME Working Groups (WG) on detection (WG2) and correction (WG3) algorithms to obtain recommendations for a standard homogenisation procedure for climate data. This talk will introduce this benchmark dataset. Based upon a survey among homogenisation experts we chose to start our work with monthly values for temperature and precipitation. Temperature and precipitation are selected because most participants consider these elements the most relevant for their studies. Furthermore, they represent two important types of statistics (additive and multiplicative). The benchmark will have three difference types of datasets: real data, surrogate data and synthetic data. Real datasets will allow comparing the different homogenisation methods with the most realistic type of data and inhomogeneities. Thus this part of the benchmark is important for a faithful comparison of algorithms with each other. However, as in this case the truth is not known, it is not possible to quantify the improvements due to homogenisation. Therefore, the benchmark also has two datasets with artificial data to which we inserted known inhomogeneities: surrogate and synthetic data. The aim of surrogate data is to reproduce the structure of measured data accurately enough that it can be used as substitute for measurements. The surrogate climate networks have the spatial and temporal auto- and cross-correlation functions of real homogenised networks as well as the (non-Gaussian) exact distribution of each station. The idealised synthetic data is based on the surrogate networks. The change is that the difference between the stations has been modelled as uncorrelated Gaussian white

  1. Hadley cell dynamics in Japanese Reanalysis-55 dataset: evaluation using other reanalysis datasets and global radiosonde network observations

    NASA Astrophysics Data System (ADS)

    Mathew, Sneha Susan; Kumar, Karanam Kishore; Subrahmanyam, Kandula Venkata

    2016-02-01

    Hadley circulation (HC) is a planetary scale circulation spanning one-third of the globe from tropics to the sub-tropics. Recent changes in HC width and its temporal variability is a topic of paramount interest because of the climate implications it carry alongside. The present study attempts to bring out the subtropical climate change indications in the comparatively new Japanese Re-analysis (JRA55) dataset by means of the mean meridional stream function (MSF). The observed features of HC in JRA55 are found to be reproduced in NCEP, MERRA and ECMWF datasets, with notable differences in the magnitudes of MSF. The calculated annual cycle of HC edges, center and total width from this dataset closely resembles the annual cycle of the respective parameters derived from the rest of the datasets, with very less inter-annual variability. For the first time, MSF estimated using four reanalysis datasets (JRA55, NCEP, MERRA and ECMWF datasets) are verified with observations from integrated global radiosonde archive datasets, using the process of subsampling. The features so estimated show a high degree of similarity amongst each other as well as with observations. The monthly trend in the total width of the HC is quantified to show a maximum of expansion during the month of July, which is significant at the 95 % confidence interval for all datasets. The present paper also discusses the presence of a `minor circulation' feature in the northern hemisphere which is centered on 34°N during the June and July months, but not in all years. The significance of the present study lies in evaluating the relatively new JRA55 datasets with widely used reanalysis data sets and radiosonde observations and revelation of a minor circulation not discussed hitherto in the context of HC dynamics.

  2. The Intercalibration of the Night Lights Dataset

    NASA Astrophysics Data System (ADS)

    Ziskin, D. C.; Elvidge, C.; Baugh, K.; Tuttle, B.; Ghosh, T.

    2009-12-01

    The NOAA National Geophysical Data Center (NGDC) has archived approximately 17 years of data from the Operational Linescan System (OLS) aboard the Defense Meteorological Satellite Program (DMSP) spanning from 1992 to the present flown on 5 different satellites. However, this extraordinary record of night lights lacks an onboard calibration system so the radiometric value of the instruments’ data numbers vary due to changes in orbital parameters, sensor degradation, and internal gain settings in addition to changes in signal strength. Without having all the information needed to calibrate the data numbers, definitive measurements of change have been elusive. We have modeled reflected moonlight from high-albedo locations (e.g. White Sands NM) to estimate the calibrated radiance the sensor experienced. By comparing the sensors’ reported uncalibrated radiance to the modeled received radiance we obtain an estimate of the sensors’ efficiency. Then each satellite and year can be calibrated based on the practically invariant geophysical properties of moonlight and desert albedo. After applying this calibration, the time series varies in a more predictable fashion with more agreement between co-incident observations than we were previously able to achieve. See Figure 1 for an example of prior intercalibration (Elvidge et al, 2009). Note that the prior method failed to converge on complete agreement between the observations and there are features in the time series that were probably introduced by an imperfect intercalibration procedure. This paper will present the intercalibration based on an improved methodology.

  3. Creating Data Pipelines for PDS Datasets

    NASA Astrophysics Data System (ADS)

    Balfanz, Ryan; Armbrust, M.; Smith, A.; Gay, P. L.

    2010-01-01

    We present the details of an image processing pipeline and a new Python library providing a convenient interface to Planetary Data System (PDS) data products. The library aims to be a useful tool for general purpose PDS processing. Test images have been extracted from existing PDS data products using the library but will work with lunar images from LRO/LROC. To process high-volume data sets we employ Hadoop, an open-source framework implementing the Map/Reduce paradigm for writing data intensive distributed applications. By harnessing a cluster of processing nodes we are able to extract raw images from data products and convert them to web-friendly formats at the rate of gigabytes per minute. The resultant images have been converted using the Python Image Library. Additionally, the images have been cropped to postage stamp images supporting various zoom levels. The final images, along with some metadata are uploaded to Amazon's S3 data storage system where they are served. Preliminary tests of the pipeline are promising, having processed 10,000 sample files totaling 30 GB in 15 minutes. The resultant jpegs totaled only 3 GB after compression. The code base has not only proven successful in its own right, but also shows Python, an interpreted language, to be a viable alternative to more mainstream compiled languages such as C/C++ or Fortran, especially when combined with Hadoop. This work was funded through NASA ROSES NNX09AD34G.

  4. Precipitation comparison for the CFSR, MERRA, TRMM3B42 and Combined Scheme datasets in Bolivia

    NASA Astrophysics Data System (ADS)

    Blacutt, Luis A.; Herdies, Dirceu L.; de Gonçalves, Luis Gustavo G.; Vila, Daniel A.; Andrade, Marcos

    2015-09-01

    An overwhelming number of applications depend on reliable precipitation estimations. However, over complex terrain in regions such as the Andes or the southwestern Amazon, the spatial coverage of rain gauges is scarce. Two reanalysis datasets, a satellite algorithm and a scheme that combines surface observations with satellite estimations were selected for studying rainfall in the following areas of Bolivia: the central Andes, Altiplano, southwestern Amazonia, and Chaco. These Bolivian regions can be divided into three main basins: the Altiplano, La Plata, and Amazon. The selected reanalyses were the Modern-Era Retrospective Analysis for Research and Applications, which has a horizontal resolution (~ 50 km) conducive for studying rainfall in relatively small precipitation systems, and the Climate Forecast System Reanalysis and Reforecast, which features an improved horizontal resolution (~ 38 km). The third dataset was the seventh version of the Tropical Rainfall Measurement Mission 3B42 algorithm, which is conducive for studying rainfall at an ~ 25 km horizontal resolution. The fourth dataset utilizes a new technique known as the Combined Scheme, which successfully removes satellite bias. All four of these datasets were aggregated to a coarser resolution. Additionally, the daily totals were calculated to match the cumulative daily values of the ground observations. This research aimed to describe and compare precipitations in the two reanalysis datasets, the satellite-algorithm dataset, and the Combined Scheme with ground observations. Two seasons were selected for studying the precipitation estimates: the rainy season (December-February) and the dry season (June-August). The average, bias, standard deviation, correlation coefficient, and root mean square error were calculated. Moreover, a contingency table was generated to calculate the accuracy, bias frequency, probability of detection, false alarm ratio, and equitable threat score. All four datasets correctly

  5. Level 2 Ancillary Products and Datasets Algorithm Theoretical Basis

    NASA Technical Reports Server (NTRS)

    Diner, D.; Abdou, W.; Gordon, H.; Kahn, R.; Knyazikhin, Y.; Martonchik, J.; McDonald, D.; McMuldroch, S.; Myneni, R.; West, R.

    1999-01-01

    This Algorithm Theoretical Basis (ATB) document describes the algorithms used to generate the parameters of certain ancillary products and datasets used during Level 2 processing of Multi-angle Imaging SpectroRadiometer (MIST) data.

  6. Constructing Phylogenetic Networks Based on the Isomorphism of Datasets

    PubMed Central

    Zhang, Zhibin; Li, Yanjuan

    2016-01-01

    Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK, and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topology G and construct a network for every dataset. For any one dataset 𝒞, we can compute a network from the network representing the simplest dataset which is isomorphic to 𝒞. This process will save more time for the algorithms when constructing networks. PMID:27547759

  7. BMDExpress Data Viewer: A Visualization Tool to Analyze BMDExpress Datasets

    EPA Science Inventory

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure in human risk assessments. BMDExpress applies BMD modeling to transcriptomics datasets and groups genes to biological processes and pathways for rapid assessment of doses at whic...

  8. Constructing Phylogenetic Networks Based on the Isomorphism of Datasets.

    PubMed

    Wang, Juan; Zhang, Zhibin; Li, Yanjuan

    2016-01-01

    Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK, and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topology G and construct a network for every dataset. For any one dataset , we can compute a network from the network representing the simplest dataset which is isomorphic to . This process will save more time for the algorithms when constructing networks. PMID:27547759

  9. Identifying Martian Hydrothermal Sites: Geological Investigation Utilizing Multiple Datasets

    NASA Technical Reports Server (NTRS)

    Dohm, J. M.; Baker, V. R.; Anderson, R. C.; Scott, D. H.; Rice, J. W., Jr.; Hare, T. M.

    2000-01-01

    Comprehensive geological investigations of martian landscapes that may have been modified by magmatic-driven hydrothermal activity, utilizing multiple datasets, will yield prime target sites for future hydrological, mineralogical, and biological investigations.

  10. Comparison of Eight Different Precipitation Datasets for South America

    NASA Astrophysics Data System (ADS)

    Pinto, L. C.; Costa, M. H.; Diniz, L. F.

    2007-05-01

    Long and continuous meteorological data series for large areas are hard to obtain, so several groups have developed climate datasets generated through the combination of models and observed and remote sensing data, including reanalysis products. This study compares eight different precipitation datasets for South America (NCEP/NCAR-2, ERA-40, CMAP, GPCP, CRU, CPTEC, TRMM, Legates and Willmott, Leemans and Cramer). For each dataset, we analyze the four moments of the data distribution (mean, variance, skewness, kurtosis), for latitudinal variation, for the major river basins and for the major vegetation types in the continent, allowing to identify the geographical variations in each dataset. We verified that significant differences exist among the precipitation products.

  11. Knowledge Discovery Workflows in the Exploration of Complex Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    D'Abrusco, Raffaele; Fabbiano, Giuseppina; Laurino, Omar; Massaro, Francesco

    2015-03-01

    The massive amount of data produced by the recent multi-wavelength large-area surveys has spurred the growth of unprecedentedly massive and complex astronomical datasets that are proving the traditional data analysis techniques more and more inadequate. Knowledge discovery techniques, while relatively new to astronomy, have been successfully applied in several other quantitative disciplines for the determination of patterns in extremely complex datasets. The concerted use of different unsupervised and supervised machine learning techniques, in particular, can be a powerful approach to answer specific questions involving high-dimensional datasets and degenerate observables. In this paper I will present CLaSPS, a data-driven methodology for the discovery of patterns in high-dimensional astronomical datasets based on the combination of clustering techniques and pattern recognition algorithms. I shall also describe the result of the application of CLaSPS to a sample of a peculiar class of AGNs, the blazars.

  12. Evaluation of Global Observations-Based Evapotranspiration Datasets and IPCC AR4 Simulations

    NASA Technical Reports Server (NTRS)

    Mueller, B.; Seneviratne, S. I.; Jimenez, C.; Corti, T.; Hirschi, M.; Balsamo, G.; Ciais, P.; Dirmeyer, P.; Fisher, J. B.; Guo, Z.; Jung, M.; Maignan, F.; McCabe, M. F.; Reichle, R.; Reichstein, M.; Rodell, M.; Sheffield, J.; Teuling, A. J.; Wang, K.; Wood, E. F.; Zhang, Y.

    2011-01-01

    Quantification of global land evapotranspiration (ET) has long been associated with large uncertainties due to the lack of reference observations. Several recently developed products now provide the capacity to estimate ET at global scales. These products, partly based on observational data, include satellite ]based products, land surface model (LSM) simulations, atmospheric reanalysis output, estimates based on empirical upscaling of eddycovariance flux measurements, and atmospheric water balance datasets. The LandFlux-EVAL project aims to evaluate and compare these newly developed datasets. Additionally, an evaluation of IPCC AR4 global climate model (GCM) simulations is presented, providing an assessment of their capacity to reproduce flux behavior relative to the observations ]based products. Though differently constrained with observations, the analyzed reference datasets display similar large-scale ET patterns. ET from the IPCC AR4 simulations was significantly smaller than that from the other products for India (up to 1 mm/d) and parts of eastern South America, and larger in the western USA, Australia and China. The inter-product variance is lower across the IPCC AR4 simulations than across the reference datasets in several regions, which indicates that uncertainties may be underestimated in the IPCC AR4 models due to shared biases of these simulations.

  13. Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection

    PubMed Central

    Hu, Ming; Qin, Zhaohui S.

    2009-01-01

    In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors. PMID:19214232

  14. Development of a 10-year-old full body geometric dataset for computational modeling.

    PubMed

    Mao, Haojie; Holcombe, Sven; Shen, Ming; Jin, Xin; Wagner, Christina D; Wang, Stewart C; Yang, King H; King, Albert I

    2014-10-01

    The objective of this study was to create a computer-aided design (CAD) geometric dataset of a 10-year-old (10 YO) child. The study includes two phases of efforts. At Phase One, the 10 YO whole body CAD was developed from component computed tomography and magnetic resonance imaging scans of 12 pediatric subjects. Geometrical scaling methods were used to convert all component parts to the average size for a 10 YO child, based on available anthropometric data. Then the component surfaces were compiled and integrated into a complete body. The bony structures and flesh were adjusted as symmetrical to minimize the bias from a single subject while maintaining anthropometrical measurements. Internal organs including the liver, spleen, and kidney were further verified by literature data. At Phase Two, internal characteristics for the cervical spine disc, wrist, hand, pelvis, femur, and tibia were verified with data measured from additional 94 10 YO children. The CAD dataset developed through these processes was mostly within the corridor of one standard deviation (SD) of the mean. In conclusion, a geometric dataset for an average size 10 YO child was created. The dataset serves as a foundation to develop computational 10 YO whole body models for enhanced pediatric injury prevention. PMID:25118667

  15. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  16. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging

    PubMed Central

    Rosa, Maria J.; Mehta, Mitul A.; Pich, Emilio M.; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A. T. S.; Williams, Steve C. R.; Dazzan, Paola; Doyle, Orla M.; Marquand, Andre F.

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow. PMID:26528117

  17. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging.

    PubMed

    Rosa, Maria J; Mehta, Mitul A; Pich, Emilio M; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A T S; Williams, Steve C R; Dazzan, Paola; Doyle, Orla M; Marquand, Andre F

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow. PMID:26528117

  18. Making of a solar spectral irradiance dataset I: observations, uncertainties, and methods

    NASA Astrophysics Data System (ADS)

    Schöll, Micha; Dudok de Wit, Thierry; Kretzschmar, Matthieu; Haberreiter, Margit

    2016-03-01

    Context. Changes in the spectral solar irradiance (SSI) are a key driver of the variability of the Earth's environment, strongly affecting the upper atmosphere, but also impacting climate. However, its measurements have been sparse and of different quality. The "First European Comprehensive Solar Irradiance Data Exploitation project" (SOLID) aims at merging the complete set of European irradiance data, complemented by archive data that include data from non-European missions. Aims: As part of SOLID, we present all available space-based SSI measurements, reference spectra, and relevant proxies in a unified format with regular temporal re-gridding, interpolation, gap-filling as well as associated uncertainty estimations. Methods: We apply a coherent methodology to all available SSI datasets. Our pipeline approach consists of the pre-processing of the data, the interpolation of missing data by utilizing the spectral coherency of SSI, the temporal re-gridding of the data, an instrumental outlier detection routine, and a proxy-based interpolation for missing and flagged values. In particular, to detect instrumental outliers, we combine an autoregressive model with proxy data. We independently estimate the precision and stability of each individual dataset and flag all changes due to processing in an accompanying quality mask. Results: We present a unified database of solar activity records with accompanying meta-data and uncertainties. Conclusions: This dataset can be used for further investigations of the long-term trend of solar activity and the construction of a homogeneous SSI record.

  19. Nine martian years of dust optical depth observations: A reference dataset

    NASA Astrophysics Data System (ADS)

    Montabone, Luca; Forget, Francois; Kleinboehl, Armin; Kass, David; Wilson, R. John; Millour, Ehouarn; Smith, Michael; Lewis, Stephen; Cantor, Bruce; Lemmon, Mark; Wolff, Michael

    2016-07-01

    We present a multi-annual reference dataset of the horizontal distribution of airborne dust from martian year 24 to 32 using observations of the martian atmosphere from April 1999 to June 2015 made by the Thermal Emission Spectrometer (TES) aboard Mars Global Surveyor, the Thermal Emission Imaging System (THEMIS) aboard Mars Odyssey, and the Mars Climate Sounder (MCS) aboard Mars Reconnaissance Orbiter (MRO). Our methodology to build the dataset works by gridding the available retrievals of column dust optical depth (CDOD) from TES and THEMIS nadir observations, as well as the estimates of this quantity from MCS limb observations. The resulting (irregularly) gridded maps (one per sol) were validated with independent observations of CDOD by PanCam cameras and Mini-TES spectrometers aboard the Mars Exploration Rovers "Spirit" and "Opportunity", by the Surface Stereo Imager aboard the Phoenix lander, and by the Compact Reconnaissance Imaging Spectrometer for Mars aboard MRO. Finally, regular maps of CDOD are produced by spatially interpolating the irregularly gridded maps using a kriging method. These latter maps are used as dust scenarios in the Mars Climate Database (MCD) version 5, and are useful in many modelling applications. The two datasets (daily irregularly gridded maps and regularly kriged maps) for the nine available martian years are publicly available as NetCDF files and can be downloaded from the MCD website at the URL: http://www-mars.lmd.jussieu.fr/mars/dust_climatology/index.html

  20. Efficient genotype compression and analysis of large genetic variation datasets

    PubMed Central

    Layer, Ryan M.; Kindlon, Neil; Karczewski, Konrad J.; Quinlan, Aaron R.

    2015-01-01

    Genotype Query Tools (GQT) is a new indexing strategy that expedites analyses of genome variation datasets in VCF format based on sample genotypes, phenotypes and relationships. GQT’s compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size. We show substantial (up to 443 fold) performance gains over existing methods and demonstrate GQT’s utility for exploring massive datasets involving thousands to millions of genomes. PMID:26550772

  1. Sampling Within k-Means Algorithm to Cluster Large Datasets

    SciTech Connect

    Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George

    2011-08-01

    Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

  2. Toward Computational Cumulative Biology by Combining Models of Biological Datasets

    PubMed Central

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176

  3. Independent NOAA considered

    NASA Astrophysics Data System (ADS)

    Richman, Barbara T.

    A proposal to pull the National Oceanic and Atmospheric Administration (NOAA) out of the Department of Commerce and make it an independent agency was the subject of a recent congressional hearing. Supporters within the science community and in Congress said that an independent NOAA will benefit by being more visible and by not being tied to a cabinet-level department whose main concerns lie elsewhere. The proposal's critics, however, cautioned that making NOAA independent could make it even more vulnerable to the budget axe and would sever the agency's direct access to the President.The separation of NOAA from Commerce was contained in a June 1 proposal by President Ronald Reagan that also called for all federal trade functions under the Department of Commerce to be reorganized into a new Department of International Trade and Industry (DITI).

  4. Independent technical review, handbook

    SciTech Connect

    Not Available

    1994-02-01

    Purpose Provide an independent engineering review of the major projects being funded by the Department of Energy, Office of Environmental Restoration and Waste Management. The independent engineering review will address questions of whether the engineering practice is sufficiently developed to a point where a major project can be executed without significant technical problems. The independent review will focus on questions related to: (1) Adequacy of development of the technical base of understanding; (2) Status of development and availability of technology among the various alternatives; (3) Status and availability of the industrial infrastructure to support project design, equipment fabrication, facility construction, and process and program/project operation; (4) Adequacy of the design effort to provide a sound foundation to support execution of project; (5) Ability of the organization to fully integrate the system, and direct, manage, and control the execution of a complex major project.

  5. Assessing global land cover reference datasets for different user communities

    NASA Astrophysics Data System (ADS)

    Tsendbazar, N. E.; de Bruin, S.; Herold, M.

    2015-05-01

    Global land cover (GLC) maps and assessments of their accuracy provide important information for different user communities. To date, there are several GLC reference datasets which are used for assessing the accuracy of specific maps. Despite significant efforts put into generating them, their availability and role in applications outside their intended use have been very limited. This study analyses metadata information from 12 existing and forthcoming GLC reference datasets and assesses their characteristics and potential uses in the context of 4 GLC user groups, i.e., climate modellers requiring data on Essential Climate Variables (ECV), global forest change analysts, the GEO Community of Practice for Global Agricultural Monitoring and GLC map producers. We assessed user requirements with respect to the sampling scheme, thematic coverage, spatial and temporal detail and quality control of the GLC reference datasets. Suitability of the datasets is highly dependent upon specific applications by the user communities considered. The LC-CCI, GOFC-GOLD, FAO-FRA and Geo-Wiki datasets had the broadest applicability for multiple uses. The re-usability of the GLC reference datasets would be greatly enhanced by making them publicly available in an expert framework that guides users on how to use them for specific applications.

  6. Supporting independent inventors

    SciTech Connect

    Bernard, M.J. III; Whalley, P.; Loyola Univ., Chicago, IL . Dept. of Sociology)

    1989-01-01

    Independent inventors contribute products to the marketplace despite the well-financed brain trusts at corporate, university, and federal R and D laboratories. But given the environment in which the basement/garage inventor labors, transferring a worthwhile invention into a commercial product is quite difficult. There is a growing effort by many state and local agencies and organizations to improve the inventor's working environment and begin to routinize the process of developing ideas and inventional of independent inventors into commercial products. 4 refs.

  7. "Uncertainty in downscaling using high-resolution observational datasets"

    NASA Astrophysics Data System (ADS)

    Oswald, E.; Rood, R. B.

    2013-12-01

    In order to bridge the gap between the resolution of global climate modeling efforts and the scale that decision-makers work at statistical downscaling is often employed. The performance of any statistical downscaling is dependant on the quality of the observational data at the location(s) of downscaling (whether gridded or point-source). However, discussions of the assumptions made during statistical downscaling, such as the stationariness of the relationships between predictor(s) and predictand, normally do not acknowledge the uncertainty introduced by the observational dataset. Many observational datasets do not have the erroneous temporal discontinuities caused by non-climatic biases, such as instrument changes or station relocations, diminished by a homogenization process. Moreover stations included within the underlying networks of high-resolution gridded datasets are typically not required to meet high standards of quality. Hence we evaluated three popular observational climate datasets, of the high-resolution gridded type, for their depiction of temperature values over the span of the datasets and the continental U.S. This was done using the homogenized United States Historical Climatology Network (USHCN) dataset version 2.0. The summer average temperatures at selected stations within the USHCN were compared to those created by interpolating gridpoints to the locations of those stations. The relationships these datasets have with more traditional climate datasets (e.g. the GISS, CRU, USHCN) have not formally been evaluated. The comparisons were not to judge which dataset was closest aligned with the USHCN dataset, but rather to discuss the common features (across datasets) of the residuals (i.e. differences with the USHCN dataset). We found that the lack of homogenization was a primary cause of the residuals, but that proxies for the non-climatic biases were not as well related to the residuals as expected. This was due in part to the gridding process that

  8. Developing a regional retrospective ensemble precipitation dataset for watershed hydrology modeling, Idaho, USA

    NASA Astrophysics Data System (ADS)

    Flores, A. N.; Smith, K.; LaPorte, P.

    2011-12-01

    Applications like flood forecasting, military trafficability assessment, and slope stability analysis necessitate the use of models capable of resolving hydrologic states and fluxes at spatial scales of hillslopes (e.g., 10s to 100s m). These models typically require precipitation forcings at spatial scales of kilometers or better and time intervals of hours. Yet in especially rugged terrain that typifies much of the Western US and throughout much of the developing world, precipitation data at these spatiotemporal resolutions is difficult to come by. Ground-based weather radars have significant problems in high-relief settings and are sparsely located, leaving significant gaps in coverage and high uncertainties. Precipitation gages provide accurate data at points but are very sparsely located and their placement is often not representative, yielding significant coverage gaps in a spatial and physiographic sense. Numerical weather prediction efforts have made precipitation data, including critically important information on precipitation phase, available globally and in near real-time. However, these datasets present watershed modelers with two problems: (1) spatial scales of many of these datasets are tens of kilometers or coarser, (2) numerical weather models used to generate these datasets include a land surface parameterization that in some circumstances can significantly affect precipitation predictions. We report on the development of a regional precipitation dataset for Idaho that leverages: (1) a dataset derived from a numerical weather prediction model, (2) gages within Idaho that report hourly precipitation data, and (3) a long-term precipitation climatology dataset. Hourly precipitation estimates from the Modern Era Retrospective-analysis for Research and Applications (MERRA) are stochastically downscaled using a hybrid orographic and statistical model from their native resolution (1/2 x 2/3 degrees) to a resolution of approximately 1 km. Downscaled

  9. Using Multiple Metadata Standards to Describe Climate Datasets in a Semantic Framework

    NASA Astrophysics Data System (ADS)

    Blumenthal, M. B.; Del Corral, J.; Bell, M.

    2007-12-01

    The standards underlying the Semantic Web -- Resource Description Framework (RDF) and Web Ontology Language (OWL), among others -- show great promise in addressing some of the basic problems in earth science metadata. In particular they provide a single framework that allows us to describe datasets according to multiple standards, creating a more complete description than any single standard can support, and avoiding the difficult problem of creating a super-standard that can describe everything about everything. The Semantic Web standards provide a framework for explicitly describing the data models implicit in programs that display and manipulate data. They also provide a framework where multiple metadata standards can be described. Most importantly, these data models and metadata standards can be interrelated, a key step in creating interoperability, and an important step in creating a practical system. As a exercise in understanding how this framework might be used, we have created an RDF expression of the datasets and some of the metadata in the IRI/LDEO Climate Data Library. This includes concepts like datasets, units, dependent variables, and independent variables. These datasets have been provided under diverse frameworks that have varied levels of associated metadata, including netCDF, GRIB, GeoTIFF, and OpenDAP: these frameworks have some associated concepts that are common, some that are similar and some that are quite distinct. We have also created an RDF expression of a taxonomy that forms the basis of a earth data search interface. These concepts include location, time, quantity, realm, author, and institution. A series of inference engines using currently-evolving semantic web technologies are then used to infer the connections between the diverse data-oriented concepts of the data library as well as the distinctly different conceptual framework of the data search.

  10. Postcard from Independence, Mo.

    ERIC Educational Resources Information Center

    Archer, Jeff

    2004-01-01

    This article reports results showing that the Independence, Missori school district failed to meet almost every one of its improvement goals under the No Child Left Behind Act. The state accreditation system stresses improvement over past scores, while the federal law demands specified amounts of annual progress toward the ultimate goal of 100…

  11. Touchstones of Independence.

    ERIC Educational Resources Information Center

    Roha, Thomas Arden

    1999-01-01

    Foundations affiliated with public higher education institutions can avoid having to open records for public scrutiny, by having independent boards of directors, occupying leased office space or paying market value for university space, using only foundation personnel, retaining legal counsel, being forthcoming with information and use of public…

  12. Independent Human Studies.

    ERIC Educational Resources Information Center

    Kaplan, Suzanne; Wilson, Gordon

    1978-01-01

    The Independent Human Studies program at Schoolcraft College offers an alternative method of earning academic credits. Students delineate an area of study, pose research questions, gather resources, synthesize the information, state the thesis, choose the method of presentation, set schedules, and take responsibility for meeting deadlines. (MB)

  13. Independence and Survival.

    ERIC Educational Resources Information Center

    James, H. Thomas

    Independent schools that are of viable size, well managed, and strategically located to meet competition will survive and prosper past the current financial crisis. We live in a complex technological society with insatiable demands for knowledgeable people to keep it running. The future will be marked by the orderly selection of qualified people,…

  14. Caring about Independent Lives

    ERIC Educational Resources Information Center

    Christensen, Karen

    2010-01-01

    With the rhetoric of independence, new cash for care systems were introduced in many developed welfare states at the end of the 20th century. These systems allow local authorities to pay people who are eligible for community care services directly, to enable them to employ their own careworkers. Despite the obvious importance of the careworker's…

  15. Independence, Disengagement, and Discipline

    ERIC Educational Resources Information Center

    Rubin, Ron

    2012-01-01

    School disengagement is linked to a lack of opportunities for students to fulfill their needs for independence and self-determination. Young people have little say about what, when, where, and how they will learn, the criteria used to assess their success, and the content of school and classroom rules. Traditional behavior management discourages…

  16. Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment

    PubMed Central

    2013-01-01

    Background Biopsies taken from individual tumours exhibit extensive differences in their cellular composition due to the inherent heterogeneity of cancers and vagaries of sample collection. As a result genes expressed in specific cell types, or associated with certain biological processes are detected at widely variable levels across samples in transcriptomic analyses. This heterogeneity also means that the level of expression of genes expressed specifically in a given cell type or process, will vary in line with the number of those cells within samples or activity of the pathway, and will therefore be correlated in their expression. Results Using a novel 3D network-based approach we have analysed six large human cancer microarray datasets derived from more than 1,000 individuals. Based upon this analysis, and without needing to isolate the individual cells, we have defined a broad spectrum of cell-type and pathway-specific gene signatures present in cancer expression data which were also found to be largely conserved in a number of independent datasets. Conclusions The conserved signature of the tumour-associated macrophage is shown to be largely-independent of tumour cell type. All stromal cell signatures have some degree of correlation with each other, since they must all be inversely correlated with the tumour component. However, viewed in the context of established tumours, the interactions between stromal components appear to be multifactorial given the level of one component e.g. vasculature, does not correlate tightly with another, such as the macrophage. PMID:23845084

  17. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis.

    PubMed

    Alameda-Pineda, Xavier; Staiano, Jacopo; Subramanian, Ramanathan; Batrinca, Ligia; Ricci, Elisa; Lepri, Bruno; Lanz, Oswald; Sebe, Nicu

    2016-08-01

    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa. PMID:26540677

  18. Removing Satellite Equatorial Crossing Time Biases from the OLR and HRC Datasets.

    NASA Astrophysics Data System (ADS)

    Waliser, Duane E.; Zhou, Wufeng

    1997-09-01

    artificial variability related to the satellite changes is captured and the natural variability is excluded. These modified time series are used in conjunction with their associated spatial patterns to compute the satellite-related artificial variability, which is then removed from the two datasets. These datasets provide an improved resource to study intraseasonal and longer timescale regional climate variations, large-scale interannual variability, and global-scale climate trends. Analyses of the long-term trends in both datasets show that the satellite biases induce artificial trends in the data and that these artificial trends are reduced in the corrected datasets. Further, each of the corrected datasets exhibits a trend in the tropical western-central Pacific that appears spatially independent of the satellite biases and agrees with results of previous studies that indicate an increase in precipitation has occurred in this region over the period encompassed by these datasets.

  19. Why Additional Presentations Help Identify a Stimulus

    ERIC Educational Resources Information Center

    Guest, Duncan; Kent, Christopher; Adelman, James S.

    2010-01-01

    Nosofsky (1983) reported that additional stimulus presentations within a trial increase discriminability in absolute identification, suggesting that each presentation creates an independent stimulus representation, but it remains unclear whether exposure duration or the formation of independent representations improves discrimination in such…

  20. A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity

    PubMed Central

    Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

    2016-01-01

    We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103–189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven’s Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association. PMID:26978040

  1. A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity.

    PubMed

    Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

    2016-01-01

    We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103-189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven's Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association. PMID:26978040

  2. Discovery and Analysis of Intersecting Datasets: JMARS as a Comparative Science Platform

    NASA Astrophysics Data System (ADS)

    Carter, S.; Christensen, P. R.; Dickenshied, S.; Anwar, S.; Noss, D.

    2014-12-01

    sources under the given area. JMARS has the ability to geographically locate and display a vast array of remote sensing data for a user. In addition to its powerful searching ability, it also enables users to compare datasets using the Data Spike and Data Profile techniques. Plots and tables from this data can be exported and used in presentations, papers, or external software for further study.

  3. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    NASA Astrophysics Data System (ADS)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be

  4. Dataset for a case report of a homozygous PEX16 F332del mutation.

    PubMed

    Bacino, Carlos; Chao, Yu-Hsin; Seto, Elaine; Lotze, Tim; Xia, Fan; Jones, Richard O; Moser, Ann; Wangler, Michael F

    2016-03-01

    This dataset provides a clinical description along with extensive biochemical and molecular characterization of a patient with a homozygous mutation in PEX16 with an atypical phenotype. This patient described in Molecular Genetics and Metabolism Reports was ultimately diagnosed with an atypical peroxisomal disorder on exome sequencing. A clinical timeline and diagnostic summary, results of an extensive plasma and fibroblast analysis of this patient׳s peroxisomal profile is provided. In addition, a table of additional variants from the exome analysis is provided. PMID:26870756

  5. Dataset for a case report of a homozygous PEX16 F332del mutation

    PubMed Central

    Bacino, Carlos; Chao, Yu-Hsin; Seto, Elaine; Lotze, Tim; Xia, Fan; Jones, Richard O.; Moser, Ann; Wangler, Michael F.

    2015-01-01

    This dataset provides a clinical description along with extensive biochemical and molecular characterization of a patient with a homozygous mutation in PEX16 with an atypical phenotype. This patient described in Molecular Genetics and Metabolism Reports was ultimately diagnosed with an atypical peroxisomal disorder on exome sequencing. A clinical timeline and diagnostic summary, results of an extensive plasma and fibroblast analysis of this patient׳s peroxisomal profile is provided. In addition, a table of additional variants from the exome analysis is provided. PMID:26870756

  6. The LANDFIRE Refresh strategy: updating the national dataset

    USGS Publications Warehouse

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  7. Dataset from chemical gas sensor array in turbulent wind tunnel.

    PubMed

    Fonollosa, Jordi; Rodríguez-Luján, Irene; Trincavelli, Marco; Huerta, Ramón

    2015-06-01

    The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to "On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines", by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings. PMID:26217739

  8. Realistic computer network simulation for network intrusion detection dataset generation

    NASA Astrophysics Data System (ADS)

    Payer, Garrett

    2015-05-01

    The KDD-99 Cup dataset is dead. While it can continue to be used as a toy example, the age of this dataset makes it all but useless for intrusion detection research and data mining. Many of the attacks used within the dataset are obsolete and do not reflect the features important for intrusion detection in today's networks. Creating a new dataset encompassing a large cross section of the attacks found on the Internet today could be useful, but would eventually fall to the same problem as the KDD-99 Cup; its usefulness would diminish after a period of time. To continue research into intrusion detection, the generation of new datasets needs to be as dynamic and as quick as the attacker. Simply examining existing network traffic and using domain experts such as intrusion analysts to label traffic is inefficient, expensive, and not scalable. The only viable methodology is simulation using technologies including virtualization, attack-toolsets such as Metasploit and Armitage, and sophisticated emulation of threat and user behavior. Simulating actual user behavior and network intrusion events dynamically not only allows researchers to vary scenarios quickly, but enables online testing of intrusion detection mechanisms by interacting with data as it is generated. As new threat behaviors are identified, they can be added to the simulation to make quicker determinations as to the effectiveness of existing and ongoing network intrusion technology, methodology and models.

  9. Securely measuring the overlap between private datasets with cryptosets.

    PubMed

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure. PMID:25714898

  10. Dataset from chemical gas sensor array in turbulent wind tunnel

    PubMed Central

    Fonollosa, Jordi; Rodríguez-Luján, Irene; Trincavelli, Marco; Huerta, Ramón

    2015-01-01

    The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to “On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines”, by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings PMID:26217739

  11. The independent medical examination.

    PubMed

    Ameis, Arthur; Zasler, Nathan D

    2002-05-01

    The physiatrist, owing to expertise in impairment and disability analysis, is able to offer the medicolegal process considerable assistance. This chapter describes the scope and process of the independent medical examination (IME) and provides an overview of its component parts. Practical guidelines are provided for performing a physiatric IME of professional standard, and for serving as an impartial, expert witness. Caveats are described regarding testifying and medicolegal ethical issues along with practice management advice. PMID:12122847

  12. Use of country of birth as an indicator of refugee background in health datasets

    PubMed Central

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  13. Agent independent task planning

    NASA Technical Reports Server (NTRS)

    Davis, William S.

    1990-01-01

    Agent-Independent Planning is a technique that allows the construction of activity plans without regard to the agent that will perform them. Once generated, a plan is then validated and translated into instructions for a particular agent, whether a robot, crewmember, or software-based control system. Because Space Station Freedom (SSF) is planned for orbital operations for approximately thirty years, it will almost certainly experience numerous enhancements and upgrades, including upgrades in robotic manipulators. Agent-Independent Planning provides the capability to construct plans for SSF operations, independent of specific robotic systems, by combining techniques of object oriented modeling, nonlinear planning and temporal logic. Since a plan is validated using the physical and functional models of a particular agent, new robotic systems can be developed and integrated with existing operations in a robust manner. This technique also provides the capability to generate plans for crewmembers with varying skill levels, and later apply these same plans to more sophisticated robotic manipulators made available by evolutions in technology.

  14. The Schema.org Datasets Schema: Experiences at the National Snow and Ice Data Center

    NASA Astrophysics Data System (ADS)

    Duerr, R.; Billingsley, B. W.; Harper, D.; Kovarik, J.

    2014-12-01

    Data discovery, is still a major challenge for many users. Relevant data may be located anywhere. There are currently no existing universal data registries. Often users start with a simple query through their web browser. But how do you get your data to actually show up near the top of the results? One relatively new way to accomplish this is to use schema.org dataset markup in your data pages. Theoretically this provides web crawlers the additional information needed so that a query for data will preferentially return those pages that were marked up accordingly. The National Snow and Ice Data Center recently implemented an initial set of markup in the data set pages returned by its catalog. The Datasets data model, our process, challenges encountered and results will be described.

  15. Revisiting Spitzer Transit Observations with Independent Component Analysis: New Results for Exoplanetary Systems

    NASA Astrophysics Data System (ADS)

    Morello, G.; Waldmann, I. P.; Tinetti, G.; Howarth, I. D.; Micela, G.

    2015-10-01

    Blind source separation techniques are used to reanalyse several exoplanetary transit lightcurves of a few exoplanets recorded with the infrared camera IRAC on board the Spitzer Space Telescope during the "cold" era. These observations, together with observations at other IR wavelengths, are crucial to characterise the atmospheres of the planets. Previous analyses of the same datasets reported discrepant results, hence the necessity of the reanalyses. The method we used here is based on the Independent Component Analysis (ICA) statistical technique, which ensures a high degree of objectivity. The use of ICA to detrend single photometric observations in a self-consistent way is novel in the literature. The advantage of our reanalyses over previous work is that we do not have to make any assumptions on the structure of the unknown instrumental systematics. We obtained for the first time coherent and repeatable results over different epochs for the exoplanets HD189733b and GJ436b[Morello et al.(2014), Morello et al.(2015)]. The technique has been also tested on simulated datasets with different instrument properties, proving its validity in a more general context [Morello et al.(2015b)]. We will present here the technique, and the results of its application to different observations, in addition to the already published ones. A uniform re-analysis of other archive data with this technique will provide improved parameters for a list of exoplanets, and in particular some other results debated in the literature.

  16. Bayesian Test of Significance for Conditional Independence: The Multinomial Model

    NASA Astrophysics Data System (ADS)

    de Morais Andrade, Pablo; Stern, Julio; de Bragança Pereira, Carlos

    2014-03-01

    Conditional independence tests (CI tests) have received special attention lately in Machine Learning and Computational Intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of Probabilistic Graphical Models (PGM)--which includes Bayesian Networks (BN) models--CI tests are especially important for the task of learning the PGM structure from data. In this paper, we propose the Full Bayesian Significance Test (FBST) for tests of conditional independence for discrete datasets. FBST is a powerful Bayesian test for precise hypothesis, as an alternative to frequentist's significance tests (characterized by the calculation of the \\emph{p-value}).

  17. Publishing datasets with eSciDoc and panMetaDocs

    NASA Astrophysics Data System (ADS)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    publishing scientific datasets as electronic data supplements to research papers. Publication of research manuscripts has an already well established workflow that shares junctures with other processes and involves several parties in the process of dataset publication. Activities of the author, the reviewer, the print publisher and the data publisher have to be coordinated into a common data publication workflow. The case of data publication at GFZ Potsdam displays some specifics, e.g. the DOIDB webservice. The DOIDB is a proxy service at GFZ for the DataCite [4] DOI registration and its metadata store. DOIDB provides a local summary of the dataset DOIs registered through GFZ as a publication agent. An additional use case for the DOIDB is its function to enrich the datacite metadata with additional custom attributes, like a geographic reference in a DIF record. These attributes are at the moment not available in the datacite metadata schema but would be valuable elements for the compilation of data catalogues in the earth sciences and for dissemination of catalogue data via OAI-PMH. [1] http://www.escidoc.org , eSciDoc, FIZ Karlruhe, Germany [2] http://panmetadocs.sf.net , panMetaDocs, GFZ Potsdam, Germany [3] http://metaworks.pangaea.de , panMetaWorks, Dr. R. Huber, MARUM, Univ. Bremen, Germany [4] http://www.datacite.org

  18. ESTATE: Strategy for Exploring Labeled Spatial Datasets Using Association Analysis

    NASA Astrophysics Data System (ADS)

    Stepinski, Tomasz F.; Salazar, Josue; Ding, Wei; White, Denis

    We propose an association analysis-based strategy for exploration of multi-attribute spatial datasets possessing naturally arising classification. Proposed strategy, ESTATE (Exploring Spatial daTa Association patTErns), inverts such classification by interpreting different classes found in the dataset in terms of sets of discriminative patterns of its attributes. It consists of several core steps including discriminative data mining, similarity between transactional patterns, and visualization. An algorithm for calculating similarity measure between patterns is the major original contribution that facilitates summarization of discovered information and makes the entire framework practical for real life applications. Detailed description of the ESTATE framework is followed by its application to the domain of ecology using a dataset that fuses the information on geographical distribution of biodiversity of bird species across the contiguous United States with distributions of 32 environmental variables across the same area.

  19. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    NASA Technical Reports Server (NTRS)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  20. obs4MIPS: Satellite Datasets for Model Evaluation

    NASA Astrophysics Data System (ADS)

    Ferraro, R.; Waliser, D. E.; Gleckler, P. J.

    2013-12-01

    This poster will review the current status of the obs4MIPs project, whose purpose is to provide a limited collection of well-established and documented datasets for comparison with Earth system models. These datasets have been reformatted to correspond with the CMIP5 model output requirements, and include technical documentation specifically targeted for their use in model output evaluation. There are currently over 50 datasets containing observations that directly correspond to CMIP5 model output variables. We will review the rational and requirements for obs4MIPs contributions, and provide summary information of the current obs4MIPs holdings on the Earth System Grid Federation. We will also provide some usage statistics, an update on governance for the obs4MIPs project, and plans for supporting CMIP6.

  1. Boosting association rule mining in large datasets via Gibbs sampling.

    PubMed

    Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

    2016-05-01

    Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm. PMID:27091963

  2. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  3. Benchmark three-dimensional eye-tracking dataset for visual saliency prediction on stereoscopic three-dimensional video

    NASA Astrophysics Data System (ADS)

    Banitalebi-Dehkordi, Amin; Nasiopoulos, Eleni; Pourazad, Mahsa T.; Nasiopoulos, Panos

    2016-01-01

    Visual attention models (VAMs) predict the location of image or video regions that are most likely to attract human attention. Although saliency detection is well explored for two-dimensional (2-D) image and video content, there have been only a few attempts made to design three-dimensional (3-D) saliency prediction models. Newly proposed 3-D VAMs have to be validated over large-scale video saliency prediction datasets, which also contain results of eye-tracking information. There are several publicly available eye-tracking datasets for 2-D image and video content. In the case of 3-D, however, there is still a need for large-scale video saliency datasets for the research community for validating different 3-D VAMs. We introduce a large-scale dataset containing eye-tracking data collected from 61 stereoscopic 3-D videos (and also 2-D versions of those), and 24 subjects participated in a free-viewing test. We evaluate the performance of the existing saliency detection methods over the proposed dataset. In addition, we created an online benchmark for validating the performance of the existing 2-D and 3-D VAMs and facilitating the addition of new VAMs to the benchmark. Our benchmark currently contains 50 different VAMs.

  4. Independence among People with Disabilities: II. Personal Independence Profile.

    ERIC Educational Resources Information Center

    Nosek, Margaret A.; And Others

    1992-01-01

    Developed Personal Independence Profile (PIP) as an instrument to measure aspects of independence beyond physical and cognitive functioning in people with diverse disabilities. PIP was tested for reliability and validity with 185 subjects from 10 independent living centers. Findings suggest that the Personal Independence Profile measures the…

  5. Integrated Northern Hemisphere Terrestrial Snow Extent Climate Datasets

    NASA Astrophysics Data System (ADS)

    Robinson, D. A.; Estilow, T. W.; Henderson, G.; Mote, T. L.; Hall, D. K.

    2011-12-01

    Multiple satellite-derived sources of snow cover extent (SCE), snow-cover fraction (SCF) and melting snow over Northern Hemisphere (NH) lands have been used to produce two integrated datasets. One dataset is long enough (over 30 years) to be considered a climate data record (CDR), a second has data of a quality suitable for a CDR but lacks the duration to be recognized as such. These datasets provide state-of-the-art NH snow information in multiple formats and on multiple time steps for the research community, decision-makers and stakeholders. To generate the CDR, a low-resolution, long-term SCE record available on a weekly timescale derived from a thorough reanalysis of NOAA satellite-derived maps of NH continental SCE dating back to the late 1966 was integrated with a newly generated 1979-present microwave SCE and melting snow product. A 100 km Equal-Area Scalable Earth version 2 grid (EASE2) developed at the National Snow and Ice Data Center is used for this CDR. The shorter-term integrated dataset has a higher spatial resolution (25km EASE2) and is daily. It was developed to assess the level of agreement between the NOAA Interactive Multisensor Snow and Ice Mapping System SCE, the previously mentioned microwave product and the MODIS cloud gap filled SCF product. The period of overlap for these three datasets spans from 2000-present. We will discuss methodologies of identifying SCE, melting snow and SCF amongst the contributing data sources, techniques of integrating these products into the CDR and 25km products and confidence assessments. We will also provide an overall evaluation of how these datasets improve regional to NH scale monitoring of snow cover.

  6. An evaluation of the global 1-km AVHRR land dataset

    USGS Publications Warehouse

    Teillet, P.M.; El Saleous, N.; Hansen, M.C.; Eidenshink, Jeffery C.; Justice, C.O.; Townshend, J.R.G.

    2000-01-01

    This paper summarizes the steps taken in the generation of the global 1-km AVHRR land dataset, and it documents an evaluation of the data product with respect to the original specifications and its usefulness in research and applications to date. The evaluation addresses data characterization, processing, compositing and handling issues. Examples of the main scientific outputs are presented and options for improved processing are outlined and prioritized. The dataset has made a significant contribution, and a strong recommendation is made for its reprocessing and continuation to produce a long-term record for global change research.

  7. Climate Model Datasets on Earth System Grid II (ESG II)

    DOE Data Explorer

    Earth System Grid (ESG) is a project that combines the power and capacity of supercomputers, sophisticated analysis servers, and datasets on the scale of petabytes. The goal is to provide a seamless distributed environment that allows scientists in many locations to work with large-scale data, perform climate change modeling and simulation,and share results in innovative ways. Though ESG is more about the computing environment than the data, still there are several catalogs of data available at the web site that can be browsed or search. Most of the datasets are restricted to registered users, but several are open to any access.

  8. Applying Urban Compactness Metrics on Pan-European Datasets

    NASA Astrophysics Data System (ADS)

    Stathakis, D.; Tsilimigkas, G.

    2013-05-01

    Urban compactness is measured for a number of medium sized European cities based on metrics available in the literature. The information used is a combination of Urban Atlas and Urban Audit data sets. The former is a source of spatial data whereas the latter of population data. These datasets that have been made recently available providing for the first time the opportunity to perform comparative analysis of urban compactness across European countries. The results provide an interesting insight of variation amongst cities in different countries. The analysis is limited however due to the quality and generalization of the datasets.

  9. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    SciTech Connect

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  10. Dataset of mitochondrial genome variants associated with asymptomatic atherosclerosis

    PubMed Central

    Sazonova, Margarita A.; Zhelankin, Andrey V.; Barinova, Valeria A.; Sinyov, Vasily V.; Khasanova, Zukhra B.; Postnov, Anton Y.; Sobenin, Igor A.; Bobryshev, Yuri V.; Orekhov, Alexander N.

    2016-01-01

    This dataset report is dedicated to mitochondrial genome variants associated with asymptomatic atherosclerosis. These data were obtained using the method of next generation pyrosequencing (NGPS). The whole mitochondrial genome of the sample of patients from the Moscow region was analyzed. In this article the dataset including anthropometric, biochemical and clinical parameters along with detected mtDNA variants in patients with carotid atherosclerosis and healthy individuals was presented. Among 58 of the most common homoplasmic mtDNA variants found in the observed sample, 7 variants occurred more often in patients with atherosclerosis and 16 variants occurred more often in healthy individuals. PMID:27222855

  11. Dataset of mitochondrial genome variants associated with asymptomatic atherosclerosis.

    PubMed

    Sazonova, Margarita A; Zhelankin, Andrey V; Barinova, Valeria A; Sinyov, Vasily V; Khasanova, Zukhra B; Postnov, Anton Y; Sobenin, Igor A; Bobryshev, Yuri V; Orekhov, Alexander N

    2016-06-01

    This dataset report is dedicated to mitochondrial genome variants associated with asymptomatic atherosclerosis. These data were obtained using the method of next generation pyrosequencing (NGPS). The whole mitochondrial genome of the sample of patients from the Moscow region was analyzed. In this article the dataset including anthropometric, biochemical and clinical parameters along with detected mtDNA variants in patients with carotid atherosclerosis and healthy individuals was presented. Among 58 of the most common homoplasmic mtDNA variants found in the observed sample, 7 variants occurred more often in patients with atherosclerosis and 16 variants occurred more often in healthy individuals. PMID:27222855

  12. Independent studies using deep sequencing resolve the same set of core bacterial species dominating gut communities of honey bees.

    PubMed

    Sabree, Zakee L; Hansen, Allison K; Moran, Nancy A

    2012-01-01

    Starting in 2003, numerous studies using culture-independent methodologies to characterize the gut microbiota of honey bees have retrieved a consistent and distinctive set of eight bacterial species, based on near identity of the 16S rRNA gene sequences. A recent study [Mattila HR, Rios D, Walker-Sperling VE, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7(3): e32962], using pyrosequencing of the V1-V2 hypervariable region of the 16S rRNA gene, reported finding entirely novel bacterial species in honey bee guts, and used taxonomic assignments from these reads to predict metabolic activities based on known metabolisms of cultivable species. To better understand this discrepancy, we analyzed the Mattila et al. pyrotag dataset. In contrast to the conclusions of Mattila et al., we found that the large majority of pyrotag sequences belonged to clusters for which representative sequences were identical to sequences from previously identified core species of the bee microbiota. On average, they represent 95% of the bacteria in each worker bee in the Mattila et al. dataset, a slightly lower value than that found in other studies. Some colonies contain small proportions of other bacteria, mostly species of Enterobacteriaceae. Reanalysis of the Mattila et al. dataset also did not support a relationship between abundances of Bifidobacterium and of putative pathogens or a significant difference in gut communities between colonies from queens that were singly or multiply mated. Additionally, consistent with previous studies, the dataset supports the occurrence of considerable strain variation within core species, even within single colonies. The roles of these bacteria within bees, or the implications of the strain variation, are not yet clear. PMID:22829932

  13. Independent Studies Using Deep Sequencing Resolve the Same Set of Core Bacterial Species Dominating Gut Communities of Honey Bees

    PubMed Central

    Sabree, Zakee L.; Hansen, Allison K.; Moran, Nancy A.

    2012-01-01

    Starting in 2003, numerous studies using culture-independent methodologies to characterize the gut microbiota of honey bees have retrieved a consistent and distinctive set of eight bacterial species, based on near identity of the 16S rRNA gene sequences. A recent study [Mattila HR, Rios D, Walker-Sperling VE, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7(3): e32962], using pyrosequencing of the V1–V2 hypervariable region of the 16S rRNA gene, reported finding entirely novel bacterial species in honey bee guts, and used taxonomic assignments from these reads to predict metabolic activities based on known metabolisms of cultivable species. To better understand this discrepancy, we analyzed the Mattila et al. pyrotag dataset. In contrast to the conclusions of Mattila et al., we found that the large majority of pyrotag sequences belonged to clusters for which representative sequences were identical to sequences from previously identified core species of the bee microbiota. On average, they represent 95% of the bacteria in each worker bee in the Mattila et al. dataset, a slightly lower value than that found in other studies. Some colonies contain small proportions of other bacteria, mostly species of Enterobacteriaceae. Reanalysis of the Mattila et al. dataset also did not support a relationship between abundances of Bifidobacterium and of putative pathogens or a significant difference in gut communities between colonies from queens that were singly or multiply mated. Additionally, consistent with previous studies, the dataset supports the occurrence of considerable strain variation within core species, even within single colonies. The roles of these bacteria within bees, or the implications of the strain variation, are not yet clear. PMID:22829932

  14. Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera

    PubMed Central

    Dupin, Maxime; Reynaud, Philippe; Jarošík, Vojtěch; Baker, Richard; Brunel, Sarah; Eyre, Dominic; Pergl, Jan; Makowski, David

    2011-01-01

    Many distribution models developed to predict the presence/absence of invasive alien species need to be fitted to a training dataset before practical use. The training dataset is characterized by the number of recorded presences/absences and by their geographical locations. The aim of this paper is to study the effect of the training dataset characteristics on model performance and to compare the relative importance of three factors influencing model predictive capability; size of training dataset, stage of the biological invasion, and choice of input variables. Nine models were assessed for their ability to predict the distribution of the western corn rootworm, Diabrotica virgifera virgifera, a major pest of corn in North America that has recently invaded Europe. Twenty-six training datasets of various sizes (from 10 to 428 presence records) corresponding to two different stages of invasion (1955 and 1980) and three sets of input bioclimatic variables (19 variables, six variables selected using information on insect biology, and three linear combinations of 19 variables derived from Principal Component Analysis) were considered. The models were fitted to each training dataset in turn and their performance was assessed using independent data from North America and Europe. The models were ranked according to the area under the Receiver Operating Characteristic curve and the likelihood ratio. Model performance was highly sensitive to the geographical area used for calibration; most of the models performed poorly when fitted to a restricted area corresponding to an early stage of the invasion. Our results also showed that Principal Component Analysis was useful in reducing the number of model input variables for the models that performed poorly with 19 input variables. DOMAIN, Environmental Distance, MAXENT, and Envelope Score were the most accurate models but all the models tested in this study led to a substantial rate of mis-classification. PMID:21701579

  15. An independent hydrogen source

    SciTech Connect

    Kobzenko, G.F.; Chubenko, M.V.; Kobzenko, N.S.; Senkevich, A.I.; Shkola, A.A.

    1985-10-01

    Descriptions are given of the design and operation of an independent hydrogen source used in purifying and storing hydrogen. If LaNi/sub 5/ or TiFe is used as the sorbent, one can store about 500 liter of chemically bound hydrogen in a vessel of 0.9 liter. Molecular purification of the desorbed hydrogen is used. The IHS is a safe hydrogen source, since the hydrogen is trapped in the sorbent in the chemically bound state and in equilibrium with LaNi/sub 5/Hx at room temperature. If necessary, the IHS can serve as a compressor and provide higher hydrogen pressures. The device is compact and transportable.

  16. Employee vs independent contractor.

    PubMed

    Kolender, Ellen

    2012-01-01

    Finding qualified personnel for the cancer registry department has become increasingly difficult, as experienced abstractors retire and cancer diagnoses increase. Faced with hiring challenges, managers turn to teleworkers to fill positions and accomplish work in a timely manner. Suddenly, the hospital hires new legal staff and all telework agreements are disrupted. The question arises: Are teleworkers employees or independent contractors? Creating telework positions requires approval from the legal department and human resources. Caught off-guard in the last quarter of the year, I found myself again faced with hiring challenges. PMID:23599033

  17. Cary Potter on Independent Education

    ERIC Educational Resources Information Center

    Potter, Cary

    1978-01-01

    Cary Potter was President of the National Association of Independent Schools from 1964-1978. As he leaves NAIS he gives his views on education, on independence, on the independent school, on public responsibility, on choice in a free society, on educational change, and on the need for collective action by independent schools. (Author/RK)

  18. Myth or Truth: Independence Day.

    ERIC Educational Resources Information Center

    Gardner, Traci

    Most Americans think of the Fourth of July as Independence Day, but is it really the day the U.S. declared and celebrated independence? By exploring myths and truths surrounding Independence Day, this lesson asks students to think critically about commonly believed stories regarding the beginning of the Revolutionary War and the Independence Day…

  19. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    NASA Astrophysics Data System (ADS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed.

  20. Orientation-independent measures of ground motion

    USGS Publications Warehouse

    Boore, D.M.; Watson-Lamprey, Jennie; Abrahamson, N.A.

    2006-01-01

    The geometric mean of the response spectra for two orthogonal horizontal components of motion, commonly used as the response variable in predictions of strong ground motion, depends on the orientation of the sensors as installed in the field. This means that the measure of ground-motion intensity could differ for the same actual ground motion. This dependence on sensor orientation is most pronounced for strongly correlated motion (the extreme example being linearly polarized motion), such as often occurs at periods of 1 sec or longer. We propose two new measures of the geometric mean, GMRotDpp, and GMRotIpp, that are independent of the sensor orientations. Both are based on a set of geometric means computed from the as-recorded orthogonal horizontal motions rotated through all possible non-redundant rotation angles. GMRotDpp is determined as the ppth percentile of the set of geometric means for a given oscillator period. For example, GMRotDOO, GMRotD50, and GMRotD100 correspond to the minimum, median, and maximum values, respectively. The rotations that lead to GMRotDpp depend on period, whereas a single-period-independent rotation is used for GMRotIpp, the angle being chosen to minimize the spread of the rotation-dependent geometric mean (normalized by GMRotDpp) over the usable range of oscillator periods. GMRotI50 is the ground-motion intensity measure being used in the development of new ground-motion prediction equations by the Pacific Earthquake Engineering Center Next Generation Attenuation project. Comparisons with as-recorded geometric means for a large dataset show that the new measures are systematically larger than the geometric-mean response spectra using the as-recorded values of ground acceleration, but only by a small amount (less than 3%). The theoretical advantage of the new measures is that they remove sensor orientation as a contributor to aleatory uncertainty. Whether the reduction is of practical significance awaits detailed studies of large

  1. Seismic Electric Signals: An additional fact showing their physical interconnection with seismicity

    NASA Astrophysics Data System (ADS)

    Varotsos, P. A.; Sarlis, N. V.; Skordas, E. S.; Lazaridou, M. S.

    2013-03-01

    Natural time analysis reveals novel dynamical features hidden behind time series in complex systems. By applying it to the time series of earthquakes, we find that the order parameter of seismicity exhibits a unique change approximately at the date(s) at which Seismic Electric Signals (SES) activities have been reported to initiate. In particular, we show that the fluctuations of the order parameter of seismicity in Japan exhibits a clearly detectable minimum approximately at the time of the initiation of the SES activity observed by Uyeda and coworkers almost two months before the onset of the volcanic-seismic swarm activity in 2000 in the Izu Island region, Japan. To the best of our knowledge, this is the first time that, well before the occurrence of major earthquakes, anomalous changes are found to appear almost simultaneously in two independent datasets of different geophysical observables (geoelectrical measurements, seismicity). In addition, we show that these two phenomena are also linked closely in space.

  2. Independent task Fourier filters

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    2001-11-01

    Since the early 1960s, a major part of optical computing systems has been Fourier pattern recognition, which takes advantage of high speed filter changes to enable powerful nonlinear discrimination in `real time.' Because filter has a task quite independent of the tasks of the other filters, they can be applied and evaluated in parallel or, in a simple approach I describe, in sequence very rapidly. Thus I use the name ITFF (independent task Fourier filter). These filters can also break very complex discrimination tasks into easily handled parts, so the wonderful space invariance properties of Fourier filtering need not be sacrificed to achieve high discrimination and good generalizability even for ultracomplex discrimination problems. The training procedure proceeds sequentially, as the task for a given filter is defined a posteriori by declaring it to be the discrimination of particular members of set A from all members of set B with sufficient margin. That is, we set the threshold to achieve the desired margin and note the A members discriminated by that threshold. Discriminating those A members from all members of B becomes the task of that filter. Those A members are then removed from the set A, so no other filter will be asked to perform that already accomplished task.

  3. Evaluation of a Moderate Resolution, Satellite-Based Impervious Surface Map Using an Independent, High-Resolution Validation Dataset

    EPA Science Inventory

    Given the relatively high cost of mapping impervious surfaces at regional scales, substantial effort is being expended in the development of moderate-resolution, satellite-based methods for estimating impervious surface area (ISA). To rigorously assess the accuracy of these data ...

  4. Spatial Disaggregation of the 0.25-degree GLDAS Air Temperature Dataset to 30-arcsec Resolution

    NASA Astrophysics Data System (ADS)

    Ji, L.; Senay, G. B.; Verdin, J. P.; Velpuri, N. M.

    2015-12-01

    Air temperature is a key input variable in ecological and hydrological models for simulating the hydrological cycle and water budget. Several global reanalysis products have been developed at different organizations, which provide gridded air temperature datasets at resolutions ranging from 0.25º to 2.5º (or 27.8 - 278.3 km at the equator). However, gridded air temperature products at a high-resolution (≤1 km) are available only for limited areas of the world. To meet the needs for global eco-hydrological modeling, we aim to produce a continuous daily air temperature datasets at 1-km resolution for the global coverage. In this study, we developed a technique that spatially disaggregates the 0.25º Global Land Data Assimilation System (GLDAS) daily air temperature data to 30-arcsec (0.928 km at the equator) resolution by integrating the GLDAS data with the 30-arcsec WorldClim 1950 - 2000 monthly normal air temperature data. The method was tested using the GLDAS and Worldclim maximum and minimum air temperature datasets from 2002 and 2010 for the conterminous Unites States and Africa. The 30-arcsec disaggregated GLDAS (GLDASd) air temperature dataset retains the mean values of the original GLDAS data, while adding spatial variabilities inherited from the Worldclim data. A great improvement in GLDAS disaggregation is shown in mountain areas where complex terrain features have strong impact on temperature. We validated the disaggregation method by comparing the GLDASd product with daily meteorological observations archived by the Global Historical Climatology Network (GHCN) and the Global Surface Summary of the Day (GSOD) datasets. Additionally, the 30-arcsec TopoWX daily air temperature product was used to compare with the GLDASd data for the conterminous United States. The proposed data disaggregation method provides a convenient and efficient tool for generating a global high-resolution air temperature dataset, which will be beneficial to global eco

  5. Estimated Perennial Streams of Idaho and Related Geospatial Datasets

    USGS Publications Warehouse

    Rea, Alan; Skinner, Kenneth D.

    2009-01-01

    The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of

  6. Correcting OCR text by association with historical datasets

    NASA Astrophysics Data System (ADS)

    Hauser, Susan E.; Schlaifer, Jonathan; Sabir, Tehseen F.; Demner-Fushman, Dina; Straughan, Scott; Thoma, George R.

    2003-01-01

    The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting algorithms to generate electronic bibliographic citation data from paper biomedical journal articles. The OCR server incorporated in MARS performs well in general, but fares less well with text printed in small or italic fonts. Affiliations are often printed in small italic fonts in the journals processed by MARS. Consequently, although the automatic processes generate much of the citation data correctly, the affiliation field frequently contains incorrect data, which must be manually corrected by verification operators. In contrast, author names are usually printed in large, normal fonts that are correctly converted to text by the OCR server. The National Library of Medicine"s MEDLINE database contains 11 million indexed citations for biomedical journal articles. This paper documents our effort to use the historical author, affiliation relationships from this large dataset to find potential correct affiliations for MARS articles based on the author and the affiliation in the OCR output. Preliminary tests using a table of about 400,000 author/affiliation pairs extracted from the corrected data from MARS indicated that about 44% of the author/affiliation pairs were repeats and that about 47% of newly converted author names would be found in this set. A text-matching algorithm was developed to determine the likelihood that an affiliation found in the table corresponding to the OCR text of the first author was the current, correct affiliation. This matching algorithm compares an affiliation found in the author/affiliation table (found with the OCR text of the first author) to the OCR output affiliation, and calculates a score indicating the similarity of the affiliation found in the table to the OCR affiliation. Using a ground truth set of 519 OCR author/OCR affiliation/correct affiliation

  7. A Dataset for Visual Navigation with Neuromorphic Methods.

    PubMed

    Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi

    2016-01-01

    Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets. PMID:26941595

  8. The Pitfalls of Ignoring Multilevel Design in National Datasets.

    ERIC Educational Resources Information Center

    Roberts, J. Kyle

    This paper examines the differences between multilevel modeling and weighted ordinary least squares (OLS) regression for analyzing data from the National Educational Longitudinal Study 1988 (NELS:88). The final sample consisted of 718 students in 298 schools. Eighteen variables from the NELS:88 dataset were used, with the dependent variable being…

  9. Oregon Cascades Play Fairway Analysis: Raster Datasets and Models

    DOE Data Explorer

    Adam Brandt

    2015-11-15

    This submission includes maps of the spatial distribution of basaltic, and felsic rocks in the Oregon Cascades. It also includes a final Play Fairway Analysis (PFA) model, with the heat and permeability composite risk segments (CRS) supplied separately. Metadata for each raster dataset can be found within the zip files, in the TIF images

  10. A Dataset for Breast Cancer Histopathological Image Classification.

    PubMed

    Spanhol, Fabio A; Oliveira, Luiz S; Petitjean, Caroline; Heutte, Laurent

    2016-07-01

    Today, medical image analysis papers require solid experiments to prove the usefulness of proposed methods. However, experiments are often performed on data selected by the researchers, which may come from different institutions, scanners, and populations. Different evaluation measures may be used, making it difficult to compare the methods. In this paper, we introduce a dataset of 7909 breast cancer histopathology images acquired on 82 patients, which is now publicly available from http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign and malignant images. The task associated with this dataset is the automated classification of these images in two classes, which would be a valuable computer-aided diagnosis tool for the clinician. In order to assess the difficulty of this task, we show some preliminary results obtained with state-of-the-art image classification systems. The accuracy ranges from 80% to 85%, showing room for improvement is left. By providing this dataset and a standardized evaluation protocol to the scientific community, we hope to gather researchers in both the medical and the machine learning field to advance toward this clinical application. PMID:26540668

  11. The Nashua agronomic, water quality, and economic dataset

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This paper describes a dataset on 36 0.4 hectare tile-drained plots relating management to nitrogen (N) loading and crop yields from 1990-2003 on the Northeast Research and Demonstration Farm near Nashua, Iowa. The measured data were analyzed with the Root Zone Water Quality Model (RZWQM) and summa...

  12. A Dataset for Visual Navigation with Neuromorphic Methods

    PubMed Central

    Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi

    2016-01-01

    Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets. PMID:26941595

  13. Solar Cycle Variability in New Merge Satellite Ozone Datasets

    NASA Astrophysics Data System (ADS)

    Kuchar, A.; Pisoft, P.

    2014-12-01

    Studies using coupled chemistry climate model simulations of the solar cycle in the ozone field reveal agreement with the observed "double-peaked" ozone anomaly in the original satellite observations represented by SBUV(/2), HALOE and SAGE datasets. The motivation of our analysis is to examine whether the solar signal in the last generation of reanalyzed datasets (i.e. MERRA and ERA-INTERIM) is consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. Since an analysis of the solar cycle response requires long-term and temporal homogeneous time series of the ozone profile and no single satellite instrument has covered the entire period since 1984, satellite measurements in our study are represented by new merged satellite ozone datasets, i.e. GOZCARDS, SBUV MOD and SWOOSH datasets. The results of the presented study are based on the attribution analysis using multiple nonlinear techniques besides traditional linear approach based on the multiple linear models. The study results are supplemented by a frequency analysis using the pseudo-2D wavelet transform algorithms.

  14. NATIONAL HYDROGRAPHY DATASET - ALBERMARLE-PAMLICO ESTUARY STUDY

    EPA Science Inventory

    The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the nations surface water drainage system. It is based initially on the content of the US Geological Survey 1:100,000-scale D...

  15. Accounting For Uncertainty in The Application Of High Throughput Datasets

    EPA Science Inventory

    The use of high throughput screening (HTS) datasets will need to adequately account for uncertainties in the data generation process and propagate these uncertainties through to ultimate use. Uncertainty arises at multiple levels in the construction of predictors using in vitro ...

  16. Eastern Renewable Generation Integration Study Solar Dataset (Presentation)

    SciTech Connect

    Hummon, M.

    2014-04-01

    The National Renewable Energy Laboratory produced solar power production data for the Eastern Renewable Generation Integration Study (ERGIS) including "real time" 5-minute interval data, "four hour ahead forecast" 60-minute interval data, and "day-ahead forecast" 60-minute interval data for the year 2006. This presentation provides a brief overview of the three solar power datasets.

  17. Using Real Datasets for Interdisciplinary Business/Economics Projects

    ERIC Educational Resources Information Center

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  18. Finding the Maine Story in Hugh Cumbersome National Monitoring Datasets

    EPA Science Inventory

    What’s a manager, analyst, or concerned citizen to do with the complex datasets generated by State and Federal monitoring efforts? Is it possible to use such information to address Maine’s environmental issues without having a degree in informatics and statistics? This presentati...

  19. A daily global mesoscale ocean eddy dataset from satellite altimetry

    PubMed Central

    Faghmous, James H.; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993–2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744

  20. Alternative user interface devices for improved navigation of CT datasets.

    PubMed

    Lidén, M; Andersson, T; Geijer, H

    2011-02-01

    The workflow in radiology departments has changed dramatically with the transition to digital PACS, especially with the shift from tile mode to stack mode display of volumetric images. With the increasing number of images in routinely captured datasets, the standard user interface devices (UIDs) become inadequate. One basic approach to improve the navigation of the stack mode datasets is to take advantage of alternative UIDs developed for other domains, such as the computer game industry. We evaluated three UIDs both in clinical practice and in a task-based experiment. After using the devices in the daily image interpretation work, the readers reported that both of the tested alternative UIDs were better in terms of ergonomics compared to the standard mouse and that both alternatives were more efficient when reviewing large CT datasets. In the task-based experiment, one of the tested devices was faster than the standard mouse, while the other alternative was not significantly faster. One of the tested alternative devices showed a larger number of traversed images during the task. The results indicate that alternative user interface devices can improve the navigation of stack mode datasets and that radiologists should consider the potential benefits of alternatives to the standard mouse. PMID:19949832

  1. Automated single particle detection and tracking for large microscopy datasets

    PubMed Central

    Wilson, Rhodri S.; Yang, Lei; Dun, Alison; Smyth, Annya M.; Duncan, Rory R.; Rickman, Colin

    2016-01-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates. PMID:27293801

  2. Using Progressive Resolution to Visualize large Satellite Image dataset

    NASA Astrophysics Data System (ADS)

    ho, yuan; ramanmurthy, mohan

    2014-05-01

    Unidata's Integrated Data Viewer (IDV) is a Java-based software application that provides new and innovative ways of displaying satellite imagery, gridded data, and surface, upper air, and radar data within a unified interface. Progressive Resolution (PR) is a advanced feature newly developed in the IDV. When loading a large satellite dataset with PR turned on, the IDV calculates the resolution of the view window, sets the magnification factors dynamically, and loads a sufficient amount of the data to generate an image at the correct resolution. A rubber band box (RBB) interface allows the user to zoom in/out or change the projection, forcing the IDV to recalculate the magnification factors and get higher/lower resolution data. This new feature improves the IDV memory usage significantly. In the preliminary test, loading 100 time steps of GOES-East 1 km 0.65 visible image data (100 X 10904 X 6928) with PR, both memory and CPU usage are comparable to generating a single time-step display at full resolution (10904 X 6928), and the quality of the resulting image is not compromised. The PR feature is currently available for both satellite imagery and gridded datasets, and will be expanded to other datasets. In this presentation we will present examples of PR usage with large satellite datasets for academic investigations and scientific discovery.

  3. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    NASA Technical Reports Server (NTRS)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  4. A daily global mesoscale ocean eddy dataset from satellite altimetry.

    PubMed

    Faghmous, James H; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993-2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744

  5. Comparison and validation of gridded precipitation datasets for Spain

    NASA Astrophysics Data System (ADS)

    Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

    2016-04-01

    In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely

  6. A New Dataset of Spermatogenic vs. Oogenic Transcriptomes in the Nematode Caenorhabditis elegans

    PubMed Central

    Ortiz, Marco A.; Noble, Daniel; Sorokin, Elena P.; Kimble, Judith

    2014-01-01

    The nematode Caenorhabditis elegans is an important model for studies of germ cell biology, including the meiotic cell cycle, gamete specification as sperm or oocyte, and gamete development. Fundamental to those studies is a genome-level knowledge of the germline transcriptome. Here, we use RNA-Seq to identify genes expressed in isolated XX gonads, which are approximately 95% germline and 5% somatic gonadal tissue. We generate data from mutants making either sperm [fem-3(q96)] or oocytes [fog-2(q71)], both grown at 22°. Our dataset identifies a total of 10,754 mRNAs in the polyadenylated transcriptome of XX gonads, with 2748 enriched in spermatogenic gonads, 1732 enriched in oogenic gonads, and the remaining 6274 not enriched in either. These spermatogenic, oogenic, and gender-neutral gene datasets compare well with those of previous studies, but double the number of genes identified. A comparison of the additional genes found in our study with in situ hybridization patterns in the Kohara database suggests that most are expressed in the germline. We also query our RNA-Seq data for differential exon usage and find 351 mRNAs with sex-enriched isoforms. We suggest that this new dataset will prove useful for studies focusing on C. elegans germ cell biology. PMID:25060624

  7. Reconstruction and exploration of virtual middle-ear models derived from micro-CT datasets

    PubMed Central

    Lee, Dong H.; Chan, Sonny; Salisbury, Curt; Kim, Namkeun; Salisbury, Kenneth; Puria, Sunil; Blevins, Nikolas H.

    2014-01-01

    Background Middle-ear anatomy is integrally linked to both its normal function and its response to disease processes. Micro-CT imaging provides an opportunity to capture high-resolution anatomical data in a relatively quick and non-destructive manner. However, to optimally extract functionally relevant details, an intuitive means of reconstructing and interacting with these data is needed. Materials and methods A micro-CT scanner was used to obtain high-resolution scans of freshly explanted human temporal bones. An advanced volume renderer was adapted to enable real-time reconstruction, display, and manipulation of these volumetric datasets. A custom-designed user interface provided for semi-automated threshold segmentation. A 6-degrees-of-freedom navigation device was designed and fabricated to enable exploration of the 3D space in a manner intuitive to those comfortable with the use of a surgical microscope. Standard haptic devices were also incorporated to assist in navigation and exploration. Results Our visualization workstation could be adapted to allow for the effective exploration of middle-ear micro-CT datasets. Functionally significant anatomical details could be recognized and objective data could be extracted. Conclusions We have developed an intuitive, rapid, and effective means of exploring otological micro-CT datasets. This system may provide a foundation for additional work based on middle-ear anatomical data. PMID:20100558

  8. Proteome dataset of pre-ovulatory follicular fluids from less fertile dairy cows

    PubMed Central

    Zachut, Maya; Sood, Pankaj; Livshitz, Lilya; Kra, Gitit; Levin, Yishai; Moallem, Uzi

    2016-01-01

    This article contains raw and processed data related to research published in Zachut et al. (2016) [1]. Proteomics data from preovulatory follicles in cows was obtained by liquid chromatography-mass spectrometry following protein extraction. Differential expression between controls and less fertile cows (LFC) was quantified using MS1 intensity based label-free. The only previous proteomic analysis of bovine FF detected merely 40 proteins in follicular cysts obtained from the slaughterhouse (Maniwa et al., 2005) [2], and the abundance of proteins in the bovine preovulatory FF remains unknown. Therefore, the objectives were to establish the first dataset of FF proteome in preovulatory follicles of cows, and to examine differentially expressed proteins in FF obtained in-vivo from preovulatory follicles of less fertile cows (also termed “repeat breeder”) and control (CTL) cows. The proteome of FF from 10 preovulatory follicles that were aspirated in vivo (estradiol/progesterone>1) was analyzed. This novel dataset contains 219 identified and quantified proteins in FF, consisting mainly of binding proteins, proteases, receptor ligands, enzymes and transporters. In addition, differential abundance of 8 proteins relevant to follicular function was found in LFC compared to CTL; these findings are discussed in our recent research article Zachut et al. (2016) [1]. The present dataset of bovine FF proteome can be used as a reference for any study involving disorders of follicular development in dairy cows or in comparative studies between species. PMID:27182550

  9. A multi-dataset data-collection strategy produces better diffraction data

    SciTech Connect

    Liu, Zhi-Jie; Chen, Lirong; Wu, Dong; Ding, Wei; Zhang, Hua; Zhou, Weihong; Fu, Zheng-Qing; Wang, Bi-Cheng

    2011-11-01

    Theoretical analysis and experimental validation prove that a multi-dataset data-collection strategy produces better diffraction data. The readiness test is a simple and sensitive method for X-ray data-collection system evaluation and a benchmark. A multi-dataset (MDS) data-collection strategy is proposed and analyzed for macromolecular crystal diffraction data acquisition. The theoretical analysis indicated that the MDS strategy can reduce the standard deviation (background noise) of diffraction data compared with the commonly used single-dataset strategy for a fixed X-ray dose. In order to validate the hypothesis experimentally, a data-quality evaluation process, termed a readiness test of the X-ray data-collection system, was developed. The anomalous signals of sulfur atoms in zinc-free insulin crystals were used as the probe to differentiate the quality of data collected using different data-collection strategies. The data-collection results using home-laboratory-based rotating-anode X-ray and synchrotron X-ray systems indicate that the diffraction data collected with the MDS strategy contain more accurate anomalous signals from sulfur atoms than the data collected with a regular data-collection strategy. In addition, the MDS strategy offered more advantages with respect to radiation-damage-sensitive crystals and better usage of rotating-anode as well as synchrotron X-rays.

  10. Proteome dataset of pre-ovulatory follicular fluids from less fertile dairy cows.

    PubMed

    Zachut, Maya; Sood, Pankaj; Livshitz, Lilya; Kra, Gitit; Levin, Yishai; Moallem, Uzi

    2016-06-01

    This article contains raw and processed data related to research published in Zachut et al. (2016) [1]. Proteomics data from preovulatory follicles in cows was obtained by liquid chromatography-mass spectrometry following protein extraction. Differential expression between controls and less fertile cows (LFC) was quantified using MS1 intensity based label-free. The only previous proteomic analysis of bovine FF detected merely 40 proteins in follicular cysts obtained from the slaughterhouse (Maniwa et al., 2005) [2], and the abundance of proteins in the bovine preovulatory FF remains unknown. Therefore, the objectives were to establish the first dataset of FF proteome in preovulatory follicles of cows, and to examine differentially expressed proteins in FF obtained in-vivo from preovulatory follicles of less fertile cows (also termed "repeat breeder") and control (CTL) cows. The proteome of FF from 10 preovulatory follicles that were aspirated in vivo (estradiol/progesterone>1) was analyzed. This novel dataset contains 219 identified and quantified proteins in FF, consisting mainly of binding proteins, proteases, receptor ligands, enzymes and transporters. In addition, differential abundance of 8 proteins relevant to follicular function was found in LFC compared to CTL; these findings are discussed in our recent research article Zachut et al. (2016) [1]. The present dataset of bovine FF proteome can be used as a reference for any study involving disorders of follicular development in dairy cows or in comparative studies between species. PMID:27182550

  11. Visually Integrating Datasets of the Lau Back-arc Basin

    NASA Astrophysics Data System (ADS)

    Jacobs, A. M.; Kent, G. M.; Harding, A. J.

    2004-12-01

    The Ridge 2000 (R2K) program aims to better understand the complex linkages existing between the biological and physical processes occurring at oceanic spreading centers. R2K scientists are approaching this challenge by studying several key spreading centers, the East Pacific Rise, the Juan de Fuca Ridge, and the Lau Back-arc Basin, from an in-depth and multidisciplinary perspective. However, to characterize and begin interdisciplinary analyses of these deep-sea locations, numerous disparate datasets must be collected. With data types ranging from microbiological fauna to petrogenic core samples to geophysical mantle dynamics, this variation exposes the need for an accessible, integrated, visual database. In conjunction with the principal investigators of applicable experiments, we are taking the first steps towards constructing such a database for the Lau Basin. Building off an earlier visual integration of seafloor bathymetry and multi-channel seismic sections, we intend to incorporate a range of biological, chemical, geological, and geophysical datasets into an interactive, 3-D visual database, or scene. The scene will be constructed by combining individual visual objects derived from various datasets. By assembling the initial scene in this modular fashion, we hope other R2K scientists will ultimately use the database to choose various sub-datasets of interest and create user-specific scenes. Contending with the range in scales associated with these datasets, from meters to hundreds of kilometers, may present a slight challenge, however, once overcome, the integrated data scene should be beneficial to numerous members of the R2K community.

  12. Atlas-guided cluster analysis of large tractography datasets.

    PubMed

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  13. Atlas-Guided Cluster Analysis of Large Tractography Datasets

    PubMed Central

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  14. Synthetic neuronal datasets for benchmarking directed functional connectivity metrics

    PubMed Central

    Andrade, Alexandre

    2015-01-01

    Background. Datasets consisting of synthetic neural data generated with quantifiable and controlled parameters are a valuable asset in the process of testing and validating directed functional connectivity metrics. Considering the recent debate in the neuroimaging community concerning the use of these metrics for fMRI data, synthetic datasets that emulate the BOLD signal dynamics have played a central role by supporting claims that argue in favor or against certain choices. Generative models often used in studies that simulate neuronal activity, with the aim of gaining insight into specific brain regions and functions, have different requirements from the generative models for benchmarking datasets. Even though the latter must be realistic, there is a tradeoff between realism and computational demand that needs to be contemplated and simulations that efficiently mimic the real behavior of single neurons or neuronal populations are preferred, instead of more cumbersome and marginally precise ones. Methods. This work explores how simple generative models are able to produce neuronal datasets, for benchmarking purposes, that reflect the simulated effective connectivity and, how these can be used to obtain synthetic recordings of EEG and fMRI BOLD signals. The generative models covered here are AR processes, neural mass models consisting of linear and nonlinear stochastic differential equations and populations with thousands of spiking units. Forward models for EEG consist in the simple three-shell head model while the fMRI BOLD signal is modeled with the Balloon-Windkessel model or by convolution with a hemodynamic response function. Results. The simulated datasets are tested for causality with the original spectral formulation for Granger causality. Modeled effective connectivity can be detected in the generated data for varying connection strengths and interaction delays. Discussion. All generative models produce synthetic neuronal data with detectable causal

  15. A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system

    NASA Astrophysics Data System (ADS)

    Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley

    2016-04-01

    An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between

  16. Frame independent cosmological perturbations

    SciTech Connect

    Prokopec, Tomislav; Weenink, Jan E-mail: j.g.weenink@uu.nl

    2013-09-01

    We compute the third order gauge invariant action for scalar-graviton interactions in the Jordan frame. We demonstrate that the gauge invariant action for scalar and tensor perturbations on one physical hypersurface only differs from that on another physical hypersurface via terms proportional to the equation of motion and boundary terms, such that the evolution of non-Gaussianity may be called unique. Moreover, we demonstrate that the gauge invariant curvature perturbation and graviton on uniform field hypersurfaces in the Jordan frame are equal to their counterparts in the Einstein frame. These frame independent perturbations are therefore particularly useful in relating results in different frames at the perturbative level. On the other hand, the field perturbation and graviton on uniform curvature hypersurfaces in the Jordan and Einstein frame are non-linearly related, as are their corresponding actions and n-point functions.

  17. A consistent aerosol optical depth (AOD) dataset over mainland China by integration of several AOD products

    NASA Astrophysics Data System (ADS)

    Xu, H.; Guang, J.; Xue, Y.; de Leeuw, Gerrit; Che, Y. H.; Guo, Jianping; He, X. W.; Wang, T. K.

    2015-08-01

    The Moderate Resolution Imaging Spectroradiometer (MODIS), the Multiangle Imaging Spectroradiometer (MISR) and the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) provide validated aerosol optical depth (AOD) products over both land and ocean. However, the values of the AOD provided by each of these satellites may show spatial and temporal differences due to the instrument characteristics and aerosol retrieval algorithms used for each instrument. In this article we present a method to produce an AOD data set over Asia for the year 2007 based on fusion of the data provided by different instruments and/or algorithms. First, the bias of each satellite-derived AOD product was calculated by comparison with ground-based AOD data derived from the AErosol RObotic NETwork (AERONET) and the China Aerosol Remote Sensing NETwork (CARSNET) for different values of the surface albedo and the AOD. Then, these multiple AOD products were combined using the maximum likelihood estimate (MLE) method using weights derived from the root mean square error (RMSE) associated with the accuracies of the original AOD products. The original and merged AOD dataset has been validated by comparison with AOD data from the CARSNET. Results show that the mean bias error (MBE) and mean absolute error (MAE) of the merged AOD dataset are not larger than that of any of the original AOD products. In addition, for the merged AOD dataset the fraction of pixels with no data is significantly smaller than that of any of the original products, thus increasing the spatial coverage. The fraction of retrievable area is about 50% for the merged AOD dataset and between 5% and 20% for the MISR, SeaWiFS, MODIS-DT and MODIS-DB algorithms.

  18. Boolean methods of optimization over independence systems

    SciTech Connect

    Hulme, B.L.

    1983-01-01

    This paper presents both a direct and an iterative method of solving the combinatorial optimization problem associated with any independence system. The methods use Boolean algebraic computations to produce solutions. In addition, the iterative method employs a version of the greedy algorithm both to compute upper bounds on the optimum value and to produce the additional circuits needed at every stage. The methods are extensions of those used to solve a problem of fire protection at nuclear reactor power plants.

  19. Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

    PubMed Central

    2010-01-01

    This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. Background Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. Results Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. Conclusions We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of

  20. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919–2014

    PubMed Central

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-01-01

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919–2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks. PMID:27116565

  1. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919-2014.

    PubMed

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-01-01

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919-2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks. PMID:27116565

  2. Toward quality-controlled soil moisture products for climate studies using data assimilation of multiple satellite datasets

    NASA Astrophysics Data System (ADS)

    Lahoz, William; Svendby, Tove; Griesfeller, Alexandra; de Jeu, Richard; Dorigo, Wouter

    2015-04-01

    Satellite observations provide information on soil moisture spatio-temporal variability, which is key to understanding processes linking the land surface and the atmosphere, and their impact on, e.g., climate change. This is a key motivation behind the setting up by the European Space Agency (ESA) of the climate change initiative (CCI) project for soil moisture. The ESA CCI for soil moisture will produce a multi-year soil moisture dataset from various satellite datasets: ASCAT, AMSR-E, SSMR, SSM/I, TMI and the ERS Scatterometer. Other satellite datasets of interest to the ESA CCI for soil moisture include Windsat, AMSR-E-2, Feng Yun and SMOS. To add value to the ESA CCI soil moisture dataset, we perform data assimilation experiments using multiple satellite datasets with a variant of the Ensemble Kalman Filter (EnKF), in the first instance over the European domain. Initially, the satellite datasets are from the ASCAT, AMSR-E and SMOS platforms; later, these will include ESA CCI soil moisture datasets. Tests of the data assimilation set-up involve runs that are 1-month long; we then present results for a longer period (c. 1 year). We evaluate the data assimilation results by comparison against independent in situ soil moisture data from the International Soil Moisture Network. We show that the data assimilation method provides the following. (i) Information on the observational error. (ii) Information on the quality of the ESA CCI product and the land surface model used in the assimilation. (iii) Extension of the ESA CCI product in the horizontal (by gridding the data) and the vertical (by providing root zone soil moisture). And (iv) Information on the relative impact of the satellite observations, as well as the relative impact of the elements comprising the ESA CCI soil moisture product (active, passive and merged radiometer datasets). In this presentation, we first show and evaluate preliminary results from these assimilation experiments. We then discuss the way

  3. Improving the Fundamental Understanding of Regional Seismic Signal Processing with a Unique Western U.S. Dataset

    SciTech Connect

    Walter, W R; Smith, K; O'Boyle, J; Hauk, T F; Ryall, F; Ruppert, S D; Myers, S C; Anderson, M; Dodge, D A

    2003-07-18

    recovered and reformatted old event segmented data from the LLNL and SNL managed stations for past nuclear tests and earthquakes. We then used the preferred origin catalog to extract waveforms from continuous data and associate event segmented waveforms within the database. The result is a well-organized regional western US dataset with hundreds of nuclear tests, thousands of mining explosions and hundreds of thousands of earthquakes. In the second stage of the project we have chosen a subset of approximately 125 events that are well located and cover a range of magnitudes, source types, and locations. Ms. Flori Ryall, an experienced seismic analyst is reviewing this dataset. She is picking all arrival onsets with quantitative uncertainties and making note of data problems (timing errors, glitches, dropouts) and issues. The resulting arrivals and comments will then be loaded into the database for future researcher use. During the summer of 2003 we will be carrying out some analysis and quality control on this subset. It is anticipated that this set of consistently picked, independently located data will provide an effective test set for regional sparse station location algorithms. In addition, because the set will include nuclear tests, earthquakes, and mine-related events, each with related source parameters, it will provide a valuable test set for regional discrimination and magnitude estimation as well. A final relational database of these approximately 125 events in the high quality subset will be put onto a CD-ROM and distributed for other researchers to use in benchmarking regional algorithms after the conclusion of the project.

  4. Fast and Sensitive Alignment of Microbial Whole Genome Sequencing Reads to Large Sequence Datasets on a Desktop PC: Application to Metagenomic Datasets and Pathogen Identification

    PubMed Central

    2014-01-01

    Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner. PMID:25077800

  5. Boosting medical diagnostics by pooling independent judgments

    PubMed Central

    Kurvers, Ralf H. J. M.; Herzog, Stefan M.; Hertwig, Ralph; Krause, Jens; Carney, Patricia A.; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max

    2016-01-01

    Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches. PMID:27432950

  6. Boosting medical diagnostics by pooling independent judgments.

    PubMed

    Kurvers, Ralf H J M; Herzog, Stefan M; Hertwig, Ralph; Krause, Jens; Carney, Patricia A; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max

    2016-08-01

    Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors' diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches. PMID:27432950

  7. Troposphere-Stratosphere Connections in Recent Northern Winters in NASA GEOS Assimilated Datasets

    NASA Technical Reports Server (NTRS)

    Pawson, Steven

    2000-01-01

    The northern winter stratosphere displays a wide range of interannual variability, much of which is believed to result from the response to the damping of upward-propagating waves. However, there is considerable (growing) evidence that the stratospheric state can also impact the tropospheric circulation. This issue will be examined using datasets generated in the Data Assimilation Office (DAO) at NASA's Goddard Space Flight Center. Just as the tropospheric circulation in each of these years was dominated by differing synoptic-scale structures, the stratospheric polar vortex also displayed different evolutions. The two extremes are the winter 1998/1999, when the stratosphere underwent a series of warming events (including two major warmings), and the winter 1999/2000, which was dominated by a persistent, cold polar vortex, often distorted by a dominant blocking pattern in the troposphere. This study will examine several operational and research-level versions of the DAO's systems. The 70-level-TRMM-system with a resolution of 2-by-2.5 degrees and the 48-level, 1-by-l-degree resolution ''Terra'' system were operational in 1998/1999 and 1999/2000, respectively. Research versions of the system used a 48-level, 2-by-2.5-degree configuration, which facilitates studies of the impact of vertical resolution. The study includes checks against independent datasets and error analyses, as well as the main issue of troposphere-stratosphere interactions.

  8. Climatic Analysis of Oceanic Water Vapor Transports Based on Satellite E-P Datasets

    NASA Technical Reports Server (NTRS)

    Smith, Eric A.; Sohn, Byung-Ju; Mehta, Vikram

    2004-01-01

    Understanding the climatically varying properties of water vapor transports from a robust observational perspective is an essential step in calibrating climate models. This is tantamount to measuring year-to-year changes of monthly- or seasonally-averaged, divergent water vapor transport distributions. This cannot be done effectively with conventional radiosonde data over ocean regions where sounding data are generally sparse. This talk describes how a methodology designed to derive atmospheric water vapor transports over the world oceans from satellite-retrieved precipitation (P) and evaporation (E) datasets circumvents the problem of inadequate sampling. Ultimately, the method is intended to take advantage of the relatively complete and consistent coverage, as well as continuity in sampling, associated with E and P datasets obtained from satellite measurements. Independent P and E retrievals from Special Sensor Microwave Imager (SSM/I) measurements, along with P retrievals from Tropical Rainfall Measuring Mission (TRMM) measurements, are used to obtain transports by solving a potential function for the divergence of water vapor transport as balanced by large scale E - P conditions.

  9. Tests of diffusion-free scaling behaviors in numerical dynamo datasets

    NASA Astrophysics Data System (ADS)

    Cheng, J. S.; Aurnou, J. M.

    2016-02-01

    Many dynamo studies extrapolate numerical model results to planetary conditions by empirically constructing scaling laws. The seminal work of Christensen and Aubert (2006) proposed a set of scaling laws that have been used throughout the geoscience community. These scalings make use of specially-constructed parameters that are independent of fluid diffusivities, anticipating that large-scale turbulent processes will dominate the physics in planetary dynamo settings. With these 'diffusion-free' parameterizations, the results of current numerical dynamo models extrapolate directly to fully-turbulent planetary core systems; the effects of realistic fluid properties merit no further investigation. In this study, we test the validity of diffusion-free heat transfer scaling arguments and their applicability to planetary conditions. We do so by constructing synthetic heat transfer datasets and examining their scaling properties alongside those proposed by Christensen and Aubert (2006). We find that the diffusion-free parameters compress and stretch the heat transfer data, eliminating information and creating an artificial alignment of the data. Most significantly, diffusion-free heat transfer scalings are found to be unrelated to bulk turbulence and are instead controlled by the onset of non-magnetic rotating convection, itself determined by the viscous diffusivity of the working fluid. Ultimately, our results, in conjunction with those of Stelzer and Jackson (2013) and King and Buffett (2013), show that diffusion-free scalings are not validated by current-day numerical dynamo datasets and cannot yet be extrapolated to planetary conditions.

  10. A Global, High-Resolution Gridded Dataset of Ecosystem Carbon and Water Fluxes (EC-MOD) from 2000 to Present: A Benchmark Dataset for Model Evaluation

    NASA Astrophysics Data System (ADS)

    Xiao, J.

    2011-12-01

    It is a challenge to evaluate the simulations of process-based ecosystem models and atmospheric inversions over regions, continents, or the globe due to the lack of spatially explicit reference information on carbon and water fluxes. Eddy covariance flux towers provide continuous measurements of ecosystem carbon and water fluxes for a large range of climate and ecosystem types. These observations, however, represent fluxes at the scale of the tower footprint. To examine carbon and water cycling over regions, continents, or the globe, we need to upscale Fluxnet observations from towers to these broad regions. At the global scale, the existing gridded flux fields derived from Fluxnet observations exhibit coarse spatial resolution (0.5 degree by 0.5 degree). Recent studies have demonstrated that land cover heterogeneity and spatial resolution have significant effects on carbon fluxes. The development of a global, high-resolution product of ecosystem carbon and water fluxes is essential for capturing the effects of spatial heterogeneity on carbon dynamics. Here we use a data-driven approach to upscale carbon and water fluxes from towers to the global scale and develop a global, high-resolution (0.05 degree by 0.05 degree) gridded flux product over the period from 2000 to present. We combine site-specific flux measurements, MODIS data streams, and climate data to develop predictive models for ecosystem carbon and water fluxes: gross primary productivity (GPP), ecosystem respiration (ER), net ecosystem exchange (NEE), and evapotranspiration (ET). The independent or explanatory variables include vegetation type, enhanced vegetation index (EVI), land surface temperature (LST), the normalized difference water index (NDWI) derived from surface reflectance, and photosynthetically active radiation (PAR). The performances of our predictive models are evaluated using cross-validation. The predictive models are applied to the global scale to produce continuous flux estimates from

  11. Improving the terrestial gravity dataset in South-Estonia

    NASA Astrophysics Data System (ADS)

    Oja, T.; Gruno, A.; Bloom, A.; Mäekivi, E.; Ellmann, A.; All, T.; Jürgenson, H.; Michelson, M.

    2009-04-01

    The only available gravity dataset covering the whole of Estonia has been observed from 1949 to 1958. This historic dataset has been used as a main input source for many applications including the geoid determination, the realization of the height system, the geological mapping. However, some recent studies have been indicated remarkable systematic biases in the dataset. For instance, a comparison of modern gravity control points with the historic data revealed unreasonable discrepancies in a large region in South-Estonia. However, the distribution of the gravity control was scarce, which did not allow to fully assess the quality of the historic data in the study area. In 2008 a pilot project was called out as a cooperation between Estonian Land Board, Geological Survey of Estonia, Tallinn University of Technology and Estonian University of Life Sciences to densify the detected problematic area (about 2000 km2) with new and reliable gravity data. Field work was carried out in October and November 2008, whereas GPS RTK and relative Scintrex gravimeter CG5 were used for precise positioning and gravity determinations, respectively. Altogether more than 140 new points were determined along the roads. Despite bad weather conditions and unstable observation base of the gravimeter (mostly on the bank of the road), uncertainty better than ±0.1 mGal (1 mGal = 10-5 m/s2) was estimated from the adjustment of gravimeter's readings. The separate gravity dataset of the Estonian Geological Survey were also incorporated into the gravity database of the project for further analysis. Those data were collected within several geological mapping projects in 1981-2007 and contain the data with uncertainty better than ±0.25 mGal. After the collection of new gravity data, a Kriging with proper variogram modeling was applied to form the Bouguer anomaly grids of the historic and the new datasets. The comparison of the resulting grids revealed biases up to -4 mGal at certain regions

  12. Global heating distributions for January 1979 calculated from GLA assimilated and simulated model-based datasets

    NASA Technical Reports Server (NTRS)

    Schaack, Todd K.; Lenzen, Allen J.; Johnson, Donald R.

    1991-01-01

    This study surveys the large-scale distribution of heating for January 1979 obtained from five sources of information. Through intercomparison of these distributions, with emphasis on satellite-derived information, an investigation is conducted into the global distribution of atmospheric heating and the impact of observations on the diagnostic estimates of heating derived from assimilated datasets. The results indicate a substantial impact of satellite information on diagnostic estimates of heating in regions where there is a scarcity of conventional observations. The addition of satellite data provides information on the atmosphere's temperature and wind structure that is important for estimation of the global distribution of heating and energy exchange.

  13. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    ERIC Educational Resources Information Center

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  14. Development and Validation of a Novel Platform-Independent Metastasis Signature in Human Breast Cancer

    PubMed Central

    Speers, Corey; Liu, Meilan; Wilder-Romans, Kari; Lawrence, Theodore S.; Pierce, Lori J.; Feng, Felix Y.

    2015-01-01

    Purpose The molecular drivers of metastasis in breast cancer are not well understood. Therefore, we sought to identify the biological processes underlying distant progression and define a prognostic signature for metastatic potential in breast cancer. Experimental design In vivo screening for metastases was performed using Chick Chorioallantoic Membrane assays in 21 preclinical breast cancer models. Expressed genes associated with metastatic potential were identified using high-throughput analysis. Correlations with biological function were determined using the Database for Annotation, Visualization and Integrated Discovery. Results We identified a broad range of metastatic potential that was independent of intrinsic breast cancer subtypes. 146 genes were significantly associated with metastasis progression and were linked to cancer-related biological functions, including cell migration/adhesion, Jak-STAT, TGF-beta, and Wnt signaling. These genes were used to develop a platform-independent gene expression signature (M-Sig), which was trained and subsequently validated on 5 independent cohorts totaling nearly 1800 breast cancer patients with all p-values < 0.005 and hazard ratios ranging from approximately 2.5 to 3. On multivariate analysis accounting for standard clinicopathologic prognostic variables, M-Sig remained the strongest prognostic factor for metastatic progression, with p-values < 0.001 and hazard ratios > 2 in three different cohorts. Conclusion M-Sig is strongly prognostic for metastatic progression, and may provide clinical utility in combination with treatment prediction tools to better guide patient care. In addition, the platform-independent nature of the signature makes it an excellent research tool as it can be directly applied onto existing, and future, datasets. PMID:25974184

  15. A comparison between general circulation model simulations using two sea surface temperature datasets for January 1979

    NASA Technical Reports Server (NTRS)

    Ose, Tomoaki; Mechoso, Carlos; Halpern, David

    1994-01-01

    Simulations with the UCLA atmospheric general circulation model (AGCM) using two different global sea surface temperature (SST) datasets for January 1979 are compared. One of these datasets is based on Comprehensive Ocean-Atmosphere Data Set (COADS) (SSTs) at locations where there are ship reports, and climatology elsewhere; the other is derived from measurements by instruments onboard NOAA satellites. In the former dataset (COADS SST), data are concentrated along shipping routes in the Northern Hemisphere; in the latter dataset High Resolution Infrared Sounder (HIRS SST), data cover the global domain. Ensembles of five 30-day mean fields are obtained from integrations performed in the perpetual-January mode. The results are presented as anomalies, that is, departures of each ensemble mean from that produced in a control simulation with climatological SSTs. Large differences are found between the anomalies obtained using COADS and HIRS SSTs, even in the Northern Hemisphere where the datasets are most similar to each other. The internal variability of the circulation in the control simulation and the simulated atmospheric response to anomalous forcings appear to be linked in that the pattern of geopotential height anomalies obtained using COADS SSTs resembles the first empirical orthogonal function (EOF 1) in the control simulation. The corresponding pattern obtained using HIRS SSTs is substantially different and somewhat resembles EOF 2 in the sector from central North America to central Asia. To gain insight into the reasons for these results, three additional simulations are carried out with SST anomalies confined to regions where COADS SSTs are substantially warmer than HIRS SSTs. The regions correspond to warm pools in the northwest and northeast Pacific, and the northwest Atlantic. These warm pools tend to produce positive geopotential height anomalies in the northeastern part of the corresponding oceans. Both warm pools in the Pacific produce large

  16. Production of a national 1:1,000,000-scale hydrography dataset for the United States: feature selection, simplification, and refinement

    USGS Publications Warehouse

    Gary, Robin H.; Wilson, Zachary D.; Archuleta, Christy-Ann M.; Thompson, Florence E.; Vrabel, Joseph

    2009-01-01

    During 2006-09, the U.S. Geological Survey, in cooperation with the National Atlas of the United States, produced a 1:1,000,000-scale (1:1M) hydrography dataset comprising streams and waterbodies for the entire United States, including Puerto Rico and the U.S. Virgin Islands, for inclusion in the recompiled National Atlas. This report documents the methods used to select, simplify, and refine features in the 1:100,000-scale (1:100K) (1:63,360-scale in Alaska) National Hydrography Dataset to create the national 1:1M hydrography dataset. Custom tools and semi-automated processes were created to facilitate generalization of the 1:100K National Hydrography Dataset (1:63,360-scale in Alaska) to 1:1M on the basis of existing small-scale hydrography datasets. The first step in creating the new 1:1M dataset was to address feature selection and optimal data density in the streams network. Several existing methods were evaluated. The production method that was established for selecting features for inclusion in the 1:1M dataset uses a combination of the existing attributes and network in the National Hydrography Dataset and several of the concepts from the methods evaluated. The process for creating the 1:1M waterbodies dataset required a similar approach to that used for the streams dataset. Geometric simplification of features was the next step. Stream reaches and waterbodies indicated in the feature selection process were exported as new feature classes and then simplified using a geographic information system tool. The final step was refinement of the 1:1M streams and waterbodies. Refinement was done through the use of additional geographic information system tools.

  17. Wear Independent Similarity.

    PubMed

    Steele, Adam; Davis, Alexander; Kim, Joohyung; Loth, Eric; Bayer, Ilker S

    2015-06-17

    This study presents a new factor that can be used to design materials where desired surface properties must be retained under in-system wear and abrasion. To demonstrate this factor, a synthetic nonwetting coating is presented that retains chemical and geometric performance as material is removed under multiple wear conditions: a coarse vitrified abradant (similar to sanding), a smooth abradant (similar to rubbing), and a mild abradant (a blend of sanding and rubbing). With this approach, such a nonwetting material displays unprecedented mechanical durability while maintaining desired performance under a range of demanding conditions. This performance, herein termed wear independent similarity performance (WISP), is critical because multiple mechanisms and/or modes of wear can be expected to occur in many typical applications, e.g., combinations of abrasion, rubbing, contact fatigue, weathering, particle impact, etc. Furthermore, these multiple wear mechanisms tend to quickly degrade a novel surface's unique performance, and thus many promising surfaces and materials never scale out of research laboratories. Dynamic goniometry and scanning electron microscopy results presented herein provide insight into these underlying mechanisms, which may also be applied to other coatings and materials. PMID:26018058

  18. CODE: A Data Complexity Framework for Imbalanced Datasets

    NASA Astrophysics Data System (ADS)

    Weng, Cheng G.; Poon, Josiah

    Imbalanced datasets occur in many domains, such as fraud detection, cancer detection and web; and in such domains, the class of interest often concerns the rare occurring events. Thus it is important to have a good performance on these classes while maintaining a reasonable overall accuracy. Although imbalanced datasets can be difficult to learn, but in the previous researches, the skewed class distribution has been suggested to not necessarily being the one that poses problems for learning. Therefore, when the learning of the rare class becomes problematic, it does not imply that the skewed class distribution is the cause to blame, but rather that the imbalanced distribution may just be a byproduct of some other hidden intrinsic difficulties.

  19. Comprehensive comparison of large-scale tissue expression datasets.

    PubMed

    Santos, Alberto; Tsafou, Kalliopi; Stolte, Christian; Pletscher-Frankild, Sune; O'Donoghue, Seán I; Jensen, Lars Juhl

    2015-01-01

    For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface. PMID:26157623

  20. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset

    PubMed Central

    Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R.

    2016-01-01

    Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo “paired-recordings” such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. PMID:27306671

  1. A Computational Approach to Qualitative Analysis in Large Textual Datasets

    PubMed Central

    Evans, Michael S.

    2014-01-01

    In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern. PMID:24498398

  2. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset.

    PubMed

    Neto, Joana P; Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R

    2016-08-01

    Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo "paired-recordings" such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. PMID:27306671

  3. Serial femtosecond crystallography datasets from G protein-coupled receptors.

    PubMed

    White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-01-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data. PMID:27479354

  4. Serial femtosecond crystallography datasets from G protein-coupled receptors

    PubMed Central

    White, Thomas A.; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A.; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R.; Yoon, Chun Hong; Yefanov, Oleksandr M.; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E.; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-01-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data. PMID:27479354

  5. A multimodal MRI dataset of professional chess players

    PubMed Central

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions. PMID:26346238

  6. geoknife: Reproducible web-processing of large gridded datasets

    USGS Publications Warehouse

    Read, Jordan S.; Walker, Jordan I.; Appling, Alison P.; Blodgett, David L.; Read, Emily Kara; Winslow, Luke A.

    2016-01-01

    Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large-scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially-averaged time series values filtered by user-specified areas of interest, and categorical coverage fractions for various land-use types.

  7. Comprehensive comparison of large-scale tissue expression datasets

    PubMed Central

    Stolte, Christian; Pletscher-Frankild, Sune; O’Donoghue, Seán I.

    2015-01-01

    For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface. PMID:26157623

  8. Wind Integration Datasets from the National Renewable Energy Laboratory (NREL)

    DOE Data Explorer

    The Wind Integration Datasets provide time-series wind data for 2004, 2005, and 2006. They are intended to be used by energy professionals such as transmission planners, utility planners, project developers, and university researchers, helping them to perform comparisons of sites and estimate power production from hypothetical wind plants. NREL cautions that the information from modeled data may not match wind resource information shown on NREL;s state wind maps as they were created for different purposes and using different methodologies.

  9. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    USGS Publications Warehouse

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  10. BMDExpress Data Viewer - a visualization tool to analyze BMDExpress datasets.

    PubMed

    Kuo, Byron; Francina Webster, A; Thomas, Russell S; Yauk, Carole L

    2016-08-01

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure for risk assessment. BMDExpress applies BMD modeling to transcriptomic datasets to identify transcriptional BMDs. However, graphing and analytical capabilities within BMDExpress are limited, and the analysis of output files is challenging. We developed a web-based application, BMDExpress Data Viewer (http://apps.sciome.com:8082/BMDX_Viewer/), for visualizing and graphing BMDExpress output files. The application consists of "Summary Visualization" and "Dataset Exploratory" tools. Through analysis of transcriptomic datasets of the toxicants furan and 4,4'-methylenebis(N,N-dimethyl)benzenamine, we demonstrate that the "Summary Visualization Tools" can be used to examine distributions of gene and pathway BMD values, and to derive a potential point of departure value based on summary statistics. By applying filters on enrichment P-values and minimum number of significant genes, the "Functional Enrichment Analysis" tool enables the user to select biological processes or pathways that are selectively perturbed by chemical exposure and identify the related BMD. The "Multiple Dataset Comparison" tool enables comparison of gene and pathway BMD values across multiple experiments (e.g., across timepoints or tissues). The "BMDL-BMD Range Plotter" tool facilitates the observation of BMD trends across biological processes or pathways. Through our case studies, we demonstrate that BMDExpress Data Viewer is a useful tool to visualize, explore and analyze BMDExpress output files. Visualizing the data in this manner enables rapid assessment of data quality, model fit, doses of peak activity, most sensitive pathway perturbations and other metrics that will be useful in applying toxicogenomics in risk assessment. © 2015 Her Majesty the Queen in Right of Canada. Journal of Applied Toxicology published by John Wiley & Sons, Ltd. PMID:26671443

  11. GLEAM version 3: Global Land Evaporation Datasets and Model

    NASA Astrophysics Data System (ADS)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  12. Fast 4D segmentation of large datasets using graph cuts

    NASA Astrophysics Data System (ADS)

    Lombaert, Herve; Sun, Yiyong; Cheriet, Farida

    2011-03-01

    In this paper, we propose to use 4D graph cuts for the segmentation of large spatio-temporal (4D) datasets. Indeed, as 4D datasets grow in popularity in many clinical areas, so will the demand for efficient general segmentation algorithms. The graph cuts method1 has become a leading method for complex 2D and 3D image segmentation in many applications. Despite a few attempts2-5 in 4D, the use of graph cuts on typical medical volume quickly exceeds today's computer capacities. Among all existing graph cuts based methods6-10 the multilevel banded graph cuts9 is the fastest and uses the least amount of memory. Nevertheless, this method has its limitation. Memory becomes an issue when using large 4D volume sequences, and small structures become hardly recoverable when using narrow bands. We thus improve the boundary refinement efficiency by using a 4D competitive region growing. First, we construct a coarse graph at a low resolution with strong temporal links to prevent the shrink bias inherent to the graph cuts method. Second, we use a competitive region growing using a priority queue to capture all fine details. Leaks are prevented by constraining the competitive region growing within a banded region and by adding a viscosity term. This strategy yields results comparable to the multilevel banded graph cuts but is faster and allows its application to large 4D datasets. We applied our method on both cardiac 4D MRI and 4D CT datasets with promising results.

  13. Development of a video tampering dataset for forensic investigation.

    PubMed

    Ismael Al-Sanjary, Omar; Ahmed, Ahmed Abdullah; Sulong, Ghazali

    2016-09-01

    Forgery is an act of modifying a document, product, image or video, among other media. Video tampering detection research requires an inclusive database of video modification. This paper aims to discuss a comprehensive proposal to create a dataset composed of modified videos for forensic investigation, in order to standardize existing techniques for detecting video tampering. The primary purpose of developing and designing this new video library is for usage in video forensics, which can be consciously associated with reliable verification using dynamic and static camera recognition. To the best of the author's knowledge, there exists no similar library among the research community. Videos were sourced from YouTube and by exploring social networking sites extensively by observing posted videos and rating their feedback. The video tampering dataset (VTD) comprises a total of 33 videos, divided among three categories in video tampering: (1) copy-move, (2) splicing, and (3) swapping-frames. Compared to existing datasets, this is a higher number of tampered videos, and with longer durations. The duration of every video is 16s, with a 1280×720 resolution, and a frame rate of 30 frames per second. Moreover, all videos possess the same formatting quality (720p(HD).avi). Both temporal and spatial video features were considered carefully during selection of the videos, and there exists complete information related to the doctored regions in every modified video in the VTD dataset. This database has been made publically available for research on splicing, Swapping frames, and copy-move tampering, and, as such, various video tampering detection issues with ground truth. The database has been utilised by many international researchers and groups of researchers. PMID:27574113

  14. Mining spatiotemporal co-occurrence patterns in solar datasets

    NASA Astrophysics Data System (ADS)

    Aydin, B.; Kempton, D.; Akkineni, V.; Angryk, R.; Pillai, K. G.

    2015-11-01

    We address the problem of mining spatiotemporal co-occurrence patterns (STCOPs) in solar datasets with extended polygon-based geometric representations. Specifically designed spatiotemporal indexing techniques are used in the mining of STCOPs. These include versions of two well-known spatiotemporal trajectory indexing techniques: the scalable and efficient trajectory index and Chebyshev polynomial indexing. We present a framework, STCOP-MINER, implementing a filter-and-refine STCOP mining algorithm, with the indexing techniques mentioned for efficiently performing data analysis.

  15. Improving the performance of predictive process modeling for large datasets

    PubMed Central

    Finley, Andrew O.; Sang, Huiyan; Banerjee, Sudipto; Gelfand, Alan E.

    2009-01-01

    Advances in Geographical Information Systems (GIS) and Global Positioning Systems (GPS) enable accurate geocoding of locations where scientific data are collected. This has encouraged collection of large spatial datasets in many fields and has generated considerable interest in statistical modeling for location-referenced spatial data. The setting where the number of locations yielding observations is too large to fit the desired hierarchical spatial random effects models using Markov chain Monte Carlo methods is considered. This problem is exacerbated in spatial-temporal and multivariate settings where many observations occur at each location. The recently proposed predictive process, motivated by kriging ideas, aims to maintain the richness of desired hierarchical spatial modeling specifications in the presence of large datasets. A shortcoming of the original formulation of the predictive process is that it induces a positive bias in the non-spatial error term of the models. A modified predictive process is proposed to address this problem. The predictive process approach is knot-based leading to questions regarding knot design. An algorithm is designed to achieve approximately optimal spatial placement of knots. Detailed illustrations of the modified predictive process using multivariate spatial regression with both a simulated and a real dataset are offered. PMID:20016667

  16. Multiresolution persistent homology for excessively large biomolecular datasets.

    PubMed

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. PMID:26450288

  17. Securely Measuring the Overlap between Private Datasets with Cryptosets

    PubMed Central

    Swamidass, S. Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data—collected by different groups or across large collaborative networks—into a combined analysis. Unfortunately, some of the most interesting and powerful datasets—like health records, genetic data, and drug discovery data—cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset’s contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach “information-theoretic” security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure. PMID:25714898

  18. Multiresolution persistent homology for excessively large biomolecular datasets

    NASA Astrophysics Data System (ADS)

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  19. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    PubMed Central

    Levin, Barnaby D.A.; Padgett, Elliot; Chen, Chien-Chun; Scott, M.C.; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D.; Robinson, Richard D.; Ercius, Peter; Kourkoutis, Lena F.; Miao, Jianwei; Muller, David A.; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data. PMID:27272459

  20. Multiresolution persistent homology for excessively large biomolecular datasets

    SciTech Connect

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  1. Analysis of Amazon Extreme Drought in Two Reanalysis Datasets

    NASA Astrophysics Data System (ADS)

    de Souza, D. O.; Herdies, D. L.; Berbery, E. H.

    2013-05-01

    Over the past 10 years the Amazon region experienced two major drought events, which had large impacts on vegetation and rivers, affecting directly the transport, fishing, agriculture and health. These events were characterized by large changes in precipitation patterns over the region in the dry wet season. Aiming to analyze and identify anomalies related to the 2005 and 2010 drought, as well as the effects of larger-scale disturbances such as El Niño, we used reanalysis datasets from CFSR/NCEP and MERRA/NASA. Overall the results show that the droughts don't show a very clear signal in datasets reanalyses. Only drought events related to El Niño are well identified in the data. For the 2005 event, the reanalysis move precipitation anomalies for the regions where they are not observed in observational data. The case os 2010 presents the same charateristic, with displacement of the anomalies, and underestimating the observed effects. In the general context of the results, it was observed that there is a direct relationship between the vertical motion in the Atlantic and the precipitation anomalies over the Amazon. The drop in rates of precipitation over the Amazon and the possible increase of the flow in low levels in cases of drought affected directly the precipitation over the La Plata Basin. Thus, even the drought events not being well represented in datasets of reanalysis, it was possible to observe their effects on local and remote almost all South America.

  2. Web-based 2-d Visualization with Large Datasets

    NASA Astrophysics Data System (ADS)

    Goldina, T.; Roby, W.; Wu, X.; Ly, L.

    2015-09-01

    Modern astronomical surveys produce large catalogs. Modern archives are web-based. As the science becomes more and more data driven, the pressure on visualization tools to support large datasets increases. While tables can render one page at a time, image overlays showing the returned catalog entries or XY plots showing the relationship between table columns must cover all of the rows to be meaningful. The large data set could easily overwhelm the browsers capabilities. Therefore the amount of data to be transported or rendered must be reduced. IRSA's catalog visualization is based on Firefly package, developed in IPAC (Roby 2013). Firefly is used by multiple web-based tools and archives, maintained by IRSA: Catalog Search, Spitzer, WISE, Plank, etc. Its distinctive feature is the tri-view: table, image overlay, and XY plot. All three highly interactive components are integrated together. The tri-view presentation allows an astronomer to dissect a dataset in various ways and to detect underlying structure and anomalies in the data, which makes it a handy tool for data exploration. Many challenges are encountered when only a subset of data is used in place of the full data set. Preserving coherence and maintaining the ability to select and filter data become issues. This talk addresses how we have solved problems in large dataset visualization.

  3. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy.

    PubMed

    Levin, Barnaby D A; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data. PMID:27272459

  4. Automatic run-time provenance capture for scientific dataset generation

    NASA Astrophysics Data System (ADS)

    Frew, J.; Slaughter, P.

    2008-12-01

    Provenance---the directed graph of a dataset's processing history---is difficult to capture effectively. Human- generated provenance, as narrative metadata, is labor-intensive and thus often incorrect, incomplete, or simply not recorded. Workflow systems capture some provenance implicitly in workflow specifications, but these systems are not ubiquitous or standardized, and a workflow specification may not capture all of the factors involved in a dataset's production. System audit trails capture potentially all processing activities, but not the relationships between them. We describe a system that transparently (i.e., without any modification to science codes) and automatically (i.e. without any human intervention) captures the low-level interactions (files read/written, parameters accessed, etc.) between scientific processes, and then synthesizes these relationships into a provenance graph. This system---the Earth System Science Server (ES3)---is sufficiently general that it can accommodate any combination of stand-alone programs, interpreted codes (e.g. IDL), and command- language scripts. Provenance in ES3 can be published in well-defined XML formats (including formats suitable for graphical visualization), and queried to determine the ancestors or descendants of any specific data file or process invocation. We demonstrate how ES3 can be used to capture the provenance of a large operational ocean color dataset.

  5. ERA-Interim/Land: A global land surface reanalysis dataset

    NASA Astrophysics Data System (ADS)

    Balsamo, Gianpaolo; Albergel, Clement; Beljaars, Anton; Boussetta, Souhail; Brun, Eric; Cloke, Hannah; Dee, Dick; Dutra, Emanuel; Muñoz-Sabater, Joaquín; Pappenberger, Florian; De Rosnay, Patricia; Stockdale, Tim; Vitart, Frederic

    2015-04-01

    ERA-Interim/Land is a global land-surface reanalysis dataset covering the period 1979-2010 recently made publicly available from ECMWF. It describes the evolution of soil moisture, soil temperature and snowpack. ERA-Interim/Land is the result of a single 32-year simulation with the latest ECMWF land surface model driven by meteorological forcing from the ERA-Interim atmospheric reanalysis and precipitation adjustments based on monthly GPCP v2.1 (Global Precipitation Climatology Project). The horizontal resolution is about 80km and the time frequency is 3-hourly. ERA-Interim/Land includes a number of parameterization improvements in the land surface scheme with respect to the original ERA-Interim dataset, which makes it more suitable for climate studies involving land water resources. The quality of ERA-Interim/Land is assessed by comparing with ground-based and remote sensing observations. In particular, estimates of soil moisture, snow depth, surface albedo, turbulent latent and sensible fluxes, and river discharges are verified against a large number of site measurements. ERA-Interim/Land provides a global integrated and coherent estimate of soil moisture and snow water equivalent, which can also be used for the initialization of numerical weather prediction and climate models. Current plans for the extension and improvements of ERA-Interim/Land in the framework of future reanalyses will be briefly presented. References and dataset download information at: http://www.ecmwf.int/en/research/climate-reanalysis/era-interim/land

  6. Statistically significant deviations from additivity: What do they mean in assessing toxicity of mixtures?

    PubMed

    Liu, Yang; Vijver, Martina G; Qiu, Hao; Baas, Jan; Peijnenburg, Willie J G M

    2015-12-01

    There is increasing attention from scientists and policy makers to the joint effects of multiple metals on organisms when present in a mixture. Using root elongation of lettuce (Lactuca sativa L.) as a toxicity endpoint, the combined effects of binary mixtures of Cu, Cd, and Ni were studied. The statistical MixTox model was used to search deviations from the reference models i.e. concentration addition (CA) and independent action (IA). The deviations were subsequently interpreted as 'interactions'. A comprehensive experiment was designed to test the reproducibility of the 'interactions'. The results showed that the toxicity of binary metal mixtures was equally well predicted by both reference models. We found statistically significant 'interactions' in four of the five total datasets. However, the patterns of 'interactions' were found to be inconsistent or even contradictory across the different independent experiments. It is recommended that a statistically significant 'interaction', must be treated with care and is not necessarily biologically relevant. Searching a statistically significant interaction can be the starting point for further measurements and modeling to advance the understanding of underlying mechanisms and non-additive interactions occurring inside the organisms. PMID:26188643

  7. Evaluation of catchment delineation methods for the medium-resolution National Hydrography Dataset

    USGS Publications Warehouse

    Johnston, Craig M.; Dewald, Thomas G.; Bondelid, Timothy R.; Worstell, Bruce B.; McKay, Lucinda D.; Rea, Alan; Moore, Richard B.; Goodall, Jonathan L.

    2009-01-01

    Different methods for determining catchments (incremental drainage areas) for stream segments of the medium-resolution (1:100,000-scale) National Hydrography Dataset (NHD) were evaluated by the U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency (USEPA). The NHD is a comprehensive set of digital spatial data that contains information about surface-water features (such as lakes, ponds, streams, and rivers) of the United States. The need for NHD catchments was driven primarily by the goal to estimate NHD streamflow and velocity to support water-quality modeling. The application of catchments for this purpose also demonstrates the broader value of NHD catchments for supporting landscape characterization and analysis. Five catchment delineation methods were evaluated. Four of the methods use topographic information for the delineation of the NHD catchments. These methods include the Raster Seeding Method; two variants of a method first used in a USGS New England study-one used the Watershed Boundary Dataset (WBD) and the other did not-termed the 'New England Methods'; and the Outlet Matching Method. For these topographically based methods, the elevation data source was the 30-meter (m) resolution National Elevation Dataset (NED), as this was the highest resolution available for the conterminous United States and Hawaii. The fifth method evaluated, the Thiessen Polygon Method, uses distance to the nearest NHD stream segments to determine catchment boundaries. Catchments were generated using each method for NHD stream segments within six hydrologically and geographically distinct Subbasins to evaluate the applicability of the method across the United States. The five methods were evaluated by comparing the resulting catchments with the boundaries and the computed area measurements available from several verification datasets that were developed independently using manual methods. The results of the evaluation indicated that the two

  8. Non Local Spatial and Angular Matching: Enabling higher spatial resolution diffusion MRI datasets through adaptive denoising.

    PubMed

    St-Jean, Samuel; Coupé, Pierrick; Descoteaux, Maxime

    2016-08-01

    Diffusion magnetic resonance imaging (MRI) datasets suffer from low Signal-to-Noise Ratio (SNR), especially at high b-values. Acquiring data at high b-values contains relevant information and is now of great interest for microstructural and connectomics studies. High noise levels bias the measurements due to the non-Gaussian nature of the noise, which in turn can lead to a false and biased estimation of the diffusion parameters. Additionally, the usage of in-plane acceleration techniques during the acquisition leads to a spatially varying noise distribution, which depends on the parallel acceleration method implemented on the scanner. This paper proposes a novel diffusion MRI denoising technique that can be used on all existing data, without adding to the scanning time. We first apply a statistical framework to convert both stationary and non stationary Rician and non central Chi distributed noise to Gaussian distributed noise, effectively removing the bias. We then introduce a spatially and angular adaptive denoising technique, the Non Local Spatial and Angular Matching (NLSAM) algorithm. Each volume is first decomposed in small 4D overlapping patches, thus capturing the spatial and angular structure of the diffusion data, and a dictionary of atoms is learned on those patches. A local sparse decomposition is then found by bounding the reconstruction error with the local noise variance. We compare against three other state-of-the-art denoising methods and show quantitative local and connectivity results on a synthetic phantom and on an in-vivo high resolution dataset. Overall, our method restores perceptual information, removes the noise bias in common diffusion metrics, restores the extracted peaks coherence and improves reproducibility of tractography on the synthetic dataset. On the 1.2 mm high resolution in-vivo dataset, our denoising improves the visual quality of the data and reduces the number of spurious tracts when compared to the noisy acquisition. Our

  9. Online Visualization and Analysis of Merged Global Geostationary Satellite Infrared Dataset

    NASA Technical Reports Server (NTRS)

    Liu, Zhong; Ostrenga, D.; Leptoukh, G.; Mehta, A.

    2008-01-01

    The NASA Goddard Earth Sciences Data Information Services Center (GES DISC) is home of Tropical Rainfall Measuring Mission (TRMM) data archive. The global merged IR product also known as the NCEP/CPC 4-km Global (60 degrees N - 60 degrees S) IR Dataset, is one of TRMM ancillary datasets. They are globally merged (60 degrees N - 60 degrees S) pixel-resolution (4 km) IR brightness temperature data (equivalent blackbody temperatures), merged from all available geostationary satellites (GOES-8/10, METEOSAT-7/5 and GMS). The availability of data from METEOSAT-5, which is located at 63E at the present time, yields a unique opportunity for total global (60 degrees N- 60 degrees S) coverage. The GES DISC has collected over 8 years of the data beginning from February of 2000. This high temporal resolution dataset can not only provide additional background information to TRMM and other satellite missions, but also allow observing a wide range of meteorological phenomena from space, such as, mesoscale convection systems, tropical cyclones, hurricanes, etc. The dataset can also be used to verify model simulations. Despite that the data can be downloaded via ftp, however, its large volume poses a challenge for many users. A single file occupies about 70 MB disk space and there is a total of approximately 73,000 files (approximately 4.5 TB) for the past 8 years. In order to facilitate data access, we have developed a web prototype to allow users to conduct online visualization and analysis of this dataset. With a web browser and few mouse clicks, users can have a full access to over 8 year and over 4.5 TB data and generate black and white IR imagery and animation without downloading any software and data. In short, you can make your own images! Basic functions include selection of area of interest, single imagery or animation, a time skip capability for different temporal resolution and image size. Users can save an animation as a file (animated gif) and import it in other

  10. Land surface model evaluation using a new soil moisture dataset from Kamennaya Steppe, Russia

    NASA Astrophysics Data System (ADS)

    Atkins, T.; Robock, A.; Speranskaya, N.

    2004-12-01

    The land surface affects the atmosphere through the transfer of energy and moisture and serves as the lower boundary in numerical weather prediction and climate models. To obtain good forecasts, these models must therefore accurately portray the land surface. Actual in situ measurements are vital for testing and developing these models. It is with this in mind that we have obtained a dataset of soil moisture, soil temperature and meteorological measurements from Kamennaya Steppe, Russia. The meteorological dataset spans the time period 1965-1991, while the soil moisture dataset runs from 1956-1991. The soil moisture dataset contains gravimetric volumetric total soil moisture measurements for 10 layers taken from forest, agricultural and grassland soils. The meteorological dataset contains 3-hourly measurements of precipitation, temperature, wind speed, pressure and relative humidity. We obtained longwave and shortwave radiation data from standard formulae. The data will be made available to the public via the Rutgers University Center for Environmental Prediction Global Soil Moisture Data Bank. Soil temperature is important in determining the timing, duration and intensity of runoff and snowmelt, particularly at the beginning and end of the winter when the ground is only partially frozen. Soil temperature can in turn be affected by the vertical distribution of roots. The soil temperature data are for 1969-1991. The data are daily averaged for every 20 cm to 1.2 meters in depth. These data are used to investigate the natural sensitivity of soil temperature to vegetation type and root distribution. We also use the temperature data, as well as water balance and snowfall data to test the sensitivity of the Noah land surface model (LSM) soil temperature to vertical root distribution, and what effect that has on the hydrology of the site. In addition to soil temperature data, we also have soil moisture data for several vegetation types. We compare the soil moisture time

  11. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    NASA Astrophysics Data System (ADS)

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  12. Extracting a low-dimensional description of multiple gene expression datasets reveals a potential driver for tumor-associated stroma in ovarian cancer.

    PubMed

    Celik, Safiye; Logsdon, Benjamin A; Battle, Stephanie; Drescher, Charles W; Rendi, Mara; Hawkins, R David; Lee, Su-In

    2016-01-01

    Patterns in expression data conserved across multiple independent disease studies are likely to represent important molecular events underlying the disease. We present the INSPIRE method to infer modules of co-expressed genes and the dependencies among the modules from multiple expression datasets that may contain different sets of genes. We show that INSPIRE infers more accurate models than existing methods to extract low-dimensional representation of expression data. We demonstrate that applying INSPIRE to nine ovarian cancer datasets leads to a new marker and potential driver of tumor-associated stroma, HOPX, followed by experimental validation. The implementation of INSPIRE is available at http://inspire.cs.washington.edu . PMID:27287041

  13. The NASA/GEWEX Surface Radiation Budget Dataset

    NASA Astrophysics Data System (ADS)

    Gupta, Shashi; Stackhouse, Paul; Cox, Stephen; Mikovitz, Colleen; Zhang, Taiping

    The surface radiation budget (SRB), consisting of downward and upward components of shortwave (SW) and longwave (LW) radiation, is a major component of the energy exchanges between the atmosphere and land/ocean surfaces and thus affects surface temperature fields, fluxes of sensible and latent heat, and every aspect of energy and hydrological cycles. The NASA Global Energy and Water-cycle Experiment (GEWEX) SRB project has recently updated and improved a global dataset of surface radiative fluxes on a 1-degree grid for a 23-year period (July 1983 to June 2006). Both SW and LW fluxes have been produced with two sets of algorithms: one designated as primary and the other as quality-check. The primary algorithms use a more explicit treatment of surface and atmospheric processes while quality-check algorithms use a more parameterized approach. Cloud and surface properties for input to the algorithms have been derived from ISCCP pixel level (DX) data, temperature and humidity profiles from GEOS- 4 reanalysis products, and column ozone from a composite of TOMS, TOVS, and assimilated SBUV-2 datasets. Several top-of-atmosphere (TOA) radiation budget parameters have also been derived with the primary algorithms. Surface fluxes from all algorithms are extensively validated with ground-based measurements obtained from the Baseline Surface Radiation Network (BSRN), the Global Energy Balance Archive (GEBA), and the World Radiation Data Center (WRDC) archives. The SRB dataset is a major contributor to the GEWEX Radiative Flux Assessment activity. An overview of the latest version (Release-3.0) of the dataset with global and zonal statistics of fluxes, inferred cloud radiative forcing, and results of the validation activities will be presented. Time series of SRB parameters at the TOA and surface for global, land, ocean, and tropical area means will be presented along with analysis of flux anomalies related to El Nino/La Nina episodes, phases of North Atlantic Oscillation (NAO

  14. Satellite Merged Microwave Radiometer Datasets for Climate Studies

    NASA Astrophysics Data System (ADS)

    Smith, D. K.; Mears, C. A.; Hilburn, K. A.; Ricciardulli, L.

    2013-12-01

    With more than two decades of continuous accurate monitoring over the global oceans, microwave satellite observations provide a very valuable data record for climate research and climate models validation. Observations of columnar water vapor, rain rates, cloud liquid water, and surface winds from microwave radiometers are retrieved twice daily since 1987 from a number of sensors: SSM/I F08 through F15, SSMIS F16 and F17, AMSR-E, TMI and WindSat. Sea surface temperature measurements through clouds are available since 1998. These datasets have been carefully intercalibrated at the brightness temperature level. The recently released V7 ocean products have been produced with a consistent methodology common to all sensors. Given the enormous amount of data, using these observations for climate research and model evaluation is time-consuming and proper quality control is non-trivial. At Remote Sensing Systems we are focusing on creating merged monthly gridded data records suitable for climate research for all these ocean products. The methodology to construct these merged datasets has been carefully developed after exploring different methods of combining the data from different sensors. Important aspects of the methodology include: selection of input data, requirement of minimal data values per grid cell, use of extended area rain flagging, use of extended area ice flagging, averaging method applied, and application of derived merging parameters. The resulting merged datasets are monthly timeseries of global gridded data over the ocean, with a 1-deg resolution. The timeseries starts in January 1988, and they are stored in a single NetCDF file which is updated every month. Included in the file are: monthly average timeseries, monthly climatology, monthly anomaly timeseries, trend map, and time-latitude array. Water vapor is the first of these merged datasets, and has been released in early 2013. Merged ocean surface winds and precipitation rates are currently under

  15. Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset

    PubMed Central

    Guo, Jie; Wu, Xiaomei; Zhang, Da-Yong; Lin, Kui

    2008-01-01

    High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein–protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 3–8 residues) that are essential for understanding protein functions and helpful for the rational design of protein engineering and folding experiments. A gold standard positive (interacting) dataset and a gold standard negative (non-interacting) dataset were mined to infer the interacting motif pairs that are significantly overrepresented in the positive dataset compared to the negative dataset. Four negative datasets assembled by different strategies were evaluated and the one with the best performance was used as the gold standard negatives for further analysis. Meanwhile, to assess the efficiency of our method in detecting potential interacting motif pairs, other approaches developed previously were compared, and we found that our method achieved the highest prediction accuracy. In addition, many uncharacterized motif pairs of interest were found to be functional with experimental evidence in other species. This investigation demonstrates the important effects of a high-quality negative dataset on the performance of such statistical inference. PMID:18281313

  16. Robust language-independent OCR system

    NASA Astrophysics Data System (ADS)

    Lu, Zhidong A.; Bazzi, Issam; Kornai, Andras; Makhoul, John; Natarajan, Premkumar S.; Schwartz, Richard

    1999-01-01

    We present a language-independent optical character recognition system that is capable, in principle, of recognizing printed text from most of the world's languages. For each new language or script the system requires sample training data along with ground truth at the text-line level; there is no need to specify the location of either the lines or the words and characters. The system uses hidden Markov modeling technology to model each character. In addition to language independence, the technology enhances performance for degraded data, such as fax, by using unsupervised adaptation techniques. Thus far, we have demonstrated the language-independence of this approach for Arabic, English, and Chinese. Recognition results are presented in this paper, including results on faxed data.

  17. [Food additives and healthiness].

    PubMed

    Heinonen, Marina

    2014-01-01

    Additives are used for improving food structure or preventing its spoilage, for example. Many substances used as additives are also naturally present in food. The safety of additives is evaluated according to commonly agreed principles. If high concentrations of an additive cause adverse health effects for humans, a limit of acceptable daily intake (ADI) is set for it. An additive is a risk only when ADI is exceeded. The healthiness of food is measured on the basis of nutrient density and scientifically proven effects. PMID:24772784

  18. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Pratt, J. R.; St. Clair, T. L.; Burks, H. D.; Stoakley, D. M.

    1987-01-01

    A method has been found for enhancing the melt flow of thermoplastic polyimides during processing. A high molecular weight 422 copoly(amic acid) or copolyimide was fused with approximately 0.05 to 5 pct by weight of a low molecular weight amic acid or imide additive, and this melt was studied by capillary rheometry. Excellent flow and improved composite properties on graphite resulted from the addition of a PMDA-aniline additive to LARC-TPI. Solution viscosity studies imply that amic acid additives temporarily lower molecular weight and, hence, enlarge the processing window. Thus, compositions containing the additive have a lower melt viscosity for a longer time than those unmodified.

  19. Determining similarity of scientific entities in annotation datasets.

    PubMed

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  20. Determining similarity of scientific entities in annotation datasets

    PubMed Central

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  1. A synthetic dataset for evaluating soft and hard fusion algorithms

    NASA Astrophysics Data System (ADS)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  2. Rapid Global Fitting of Large Fluorescence Lifetime Imaging Microscopy Datasets

    PubMed Central

    Warren, Sean C.; Margineanu, Anca; Alibhai, Dominic; Kelly, Douglas J.; Talbot, Clifford; Alexandrov, Yuriy; Munro, Ian; Katan, Matilda

    2013-01-01

    Fluorescence lifetime imaging (FLIM) is widely applied to obtain quantitative information from fluorescence signals, particularly using Förster Resonant Energy Transfer (FRET) measurements to map, for example, protein-protein interactions. Extracting FRET efficiencies or population fractions typically entails fitting data to complex fluorescence decay models but such experiments are frequently photon constrained, particularly for live cell or in vivo imaging, and this leads to unacceptable errors when analysing data on a pixel-wise basis. Lifetimes and population fractions may, however, be more robustly extracted using global analysis to simultaneously fit the fluorescence decay data of all pixels in an image or dataset to a multi-exponential model under the assumption that the lifetime components are invariant across the image (dataset). This approach is often considered to be prohibitively slow and/or computationally expensive but we present here a computationally efficient global analysis algorithm for the analysis of time-correlated single photon counting (TCSPC) or time-gated FLIM data based on variable projection. It makes efficient use of both computer processor and memory resources, requiring less than a minute to analyse time series and multiwell plate datasets with hundreds of FLIM images on standard personal computers. This lifetime analysis takes account of repetitive excitation, including fluorescence photons excited by earlier pulses contributing to the fit, and is able to accommodate time-varying backgrounds and instrument response functions. We demonstrate that this global approach allows us to readily fit time-resolved fluorescence data to complex models including a four-exponential model of a FRET system, for which the FRET efficiencies of the two species of a bi-exponential donor are linked, and polarisation-resolved lifetime data, where a fluorescence intensity and bi-exponential anisotropy decay model is applied to the analysis of live cell

  3. Integrated Dataset of Screening Hits against Multiple Neglected Disease Pathogens

    PubMed Central

    Nwaka, Solomon; Besson, Dominique; Ramirez, Bernadette; Maes, Louis; Matheeussen, An; Bickle, Quentin; Mansour, Nuha R.; Yousif, Fouad; Townson, Simon; Gokool, Suzanne; Cho-Ngwa, Fidelis; Samje, Moses; Misra-Bhattacharya, Shailja; Murthy, P. K.; Fakorede, Foluke; Paris, Jean-Marc; Yeates, Clive; Ridley, Robert; Van Voorhis, Wesley C.; Geary, Timothy

    2011-01-01

    New chemical entities are desperately needed that overcome the limitations of existing drugs for neglected diseases. Screening a diverse library of 10,000 drug-like compounds against 7 neglected disease pathogens resulted in an integrated dataset of 744 hits. We discuss the prioritization of these hits for each pathogen and the strong correlation observed between compounds active against more than two pathogens and mammalian cell toxicity. Our work suggests that the efficiency of early drug discovery for neglected diseases can be enhanced through a collaborative, multi-pathogen approach. PMID:22247786

  4. Fast methods for training Gaussian processes on large datasets

    PubMed Central

    Moore, C. J.; Berry, C. P. L.; Gair, J. R.

    2016-01-01

    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large datasets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences. PMID:27293793

  5. Conformations of macromolecules and their complexes from heterogeneous datasets

    PubMed Central

    Schwander, P.; Fung, R.; Ourmazd, A.

    2014-01-01

    We describe a new generation of algorithms capable of mapping the structure and conformations of macromolecules and their complexes from large ensembles of heterogeneous snapshots, and demonstrate the feasibility of determining both discrete and continuous macromolecular conformational spectra. These algorithms naturally incorporate conformational heterogeneity without resort to sorting and classification, or prior knowledge of the type of heterogeneity present. They are applicable to single-particle diffraction and image datasets produced by X-ray lasers and cryo-electron microscopy, respectively, and particularly suitable for systems not easily amenable to purification or crystallization. PMID:24914167

  6. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets

    PubMed Central

    Bose, Tungadri; Haque, Mohammed Monzoorul; Reddy, CVSK; Mande, Sharmila S.

    2015-01-01

    Background Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Results Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. Conclusion The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide

  7. Fast methods for training Gaussian processes on large datasets.

    PubMed

    Moore, C J; Chua, A J K; Berry, C P L; Gair, J R

    2016-05-01

    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large datasets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences. PMID:27293793

  8. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    NASA Astrophysics Data System (ADS)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  9. Agile data management for curation of genomes to watershed datasets

    NASA Astrophysics Data System (ADS)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data

  10. The Maunder minimum: A reassessment from multiple dataset

    NASA Astrophysics Data System (ADS)

    Usoskin, Ilya; Arlt, Rainer; Asvestari, Eleanna; Kovaltsov, Gennady; Krivova, Natalie; Lockwood, Michael; Käpylä, Maarit; Owens, Matthew; Sokoloff, Dmitry D.; Solanki, Sami; Soon, Willie; Vaquero, Jose; Scott, Chris

    2015-08-01

    The Maunder minimum (MM) in 1645-1715 was a period of the lowest ever known solar activity recorded via sunspot numbers since 1610. Since it is the only Grand minimum of solar activity directly observed, it forms a benchmark for the solar variability studies. Therefore, it is crucially important to assess the level and other features of temporal and spatial solar magnetic variability during that time. However, because of uncertainties related mostly to ambiguity of some historical sunspot observation records, the exact level of solar activity during the MM is somewhat unclear, leaving room for continuous discussions and speculations. Many of these issues have been addressed by Jack Eddy in his cornerstone papers of 1976 and 1983, but since then numerous new pieces of evidence and datasets have appeared, making it possible to verify the paradigm of the Maunder minimum with far greater certainty than before.Here we provide a full reassessment of the Maunder minimum using all the available datasets: augmented sunspot counts and drawings; revisited historical archives; both well-known and newly revealed records of auroral observations; cosmic ray variability via cosmogenic isotope records of 14C in tree trunks, 10Be in ice cores and 44Ti in fallen meteorites. We show that, while the exact level of the activity is not easy to determine, the Sun indeed exhibited exceptionally low magnetic activity during the MM, in comparison to other periods of moderate or decreased activity, such as the Dalton minimum (ca. 1800), the Gleissberg minimum (ca. 1900) and the present weak solar cycle # 24. We show that a scenario of moderate or strong activity during the MM contradicts all the available datasets.Thus, we confirm, using all the presently available datasets of different nature, that the period of the Maunder minimum in 1645-1715 was indeed a Grand minimum, with very low solar surface magnetic activity, low intensity of the interplanetary magnetic field, as well as lower

  11. Additive usage levels.

    PubMed

    Langlais, R

    1996-01-01

    With the adoption of the European Parliament and Council Directives on sweeteners, colours and miscellaneous additives the Commission is now embarking on the project of coordinating the activities of the European Union Member States in the collection of the data that are to make up the report on food additive intake requested by the European Parliament. This presentation looks at the inventory of available sources on additive use levels and concludes that for the time being national legislation is still the best source of information considering that the directives have yet to be transposed into national legislation. Furthermore, this presentation covers the correlation of the food categories as found in the additives directives with those used by national consumption surveys and finds that in a number of instances this correlation still leaves a lot to be desired. The intake of additives via food ingestion and the intake of substances which are chemically identical to additives but which occur naturally in fruits and vegetables is found in a number of cases to be higher than the intake of additives added during the manufacture of foodstuffs. While the difficulties are recognized in contributing to the compilation of food additive intake data, industry as a whole, i.e. the food manufacturing and food additive manufacturing industries, are confident that in a concerted effort, use data on food additives by industry can be made available. Lastly, the paper points out that with the transportation of the additives directives into national legislation and the time by which the food industry will be able to make use of the new food legislative environment several years will still go by; food additives use data by the food industry will thus have to be reviewed at the beginning of the next century. PMID:8792135

  12. 11 Years of Cloud Characteristics from SEVIRI: 2nd Edition of the CLAAS Dataset by CMSAF

    NASA Astrophysics Data System (ADS)

    Finkensieper, Stephan; Stengel, Martin; Fokke Meirink, Jan; van Zadelhoff, Gerd-Jan; Kniffka, Anke

    2016-04-01

    Spatiotemporal variability of clouds is an important aspect of the climate system. Therefore climate data records of cloud properties are valuable to many researchers in the climate community. The passive SEVIRI imager onboard the geostationary Meteosat Second Generation satellites is well suited for the needs of cloud retrievals as it provides measurements in 12 spectral channels every 15 minutes and thus allows for capturing both the spatial and the temporal variability of clouds. However, requirements on climate data records are high in terms of record length and homogeneity, so that intercalibration and homogenization among the available SEVIRI instruments becomes a crucial factor. We present the 2nd edition of the CLoud Property DAtAset using SEVIRI (CLAAS-2) generated within the EUMETSAT Satellite Application Facility on Climate Monitoring (CMSAF), that is temporally extended and qualitatively improved compared to the 1st edition. CLAAS-2 covers the time period 2004-2014 and features cloud mask, cloud top properties, cloud phase, cloud type, and microphysical cloud properties on the complete SEVIRI disc in 15-minute temporal resolution. Temporally and spatially averaged quantities, mean diurnal cycles and monthly histograms are included as well. CLAAS-2 was derived from a homogenized data basis, obtained by intercalibrating visible and infrared SEVIRI radiances (of Meteosat 8, 9 and 10) with MODIS, using state-of-the-art retrieval schemes. In addition to the dataset characteristics, we will present validation results using CALIPSO as reference observations. The CLAAS-2 dataset will allow for a large variety of applications of which some will be indicated in our presentation, with focus on determining diurnal to seasonal cycles, spatially resolved frequencies of cloud properties as well as showing the potential for using CLAAS-2 data for model process studies.

  13. Developing a Resource for Implementing ArcSWAT Using Global Datasets

    NASA Astrophysics Data System (ADS)

    Taggart, M.; Caraballo Álvarez, I. O.; Mueller, C.; Palacios, S. L.; Schmidt, C.; Milesi, C.; Palmer-Moloney, L. J.

    2015-12-01

    This project developed a comprehensive user manual outlining methods for adapting and implementing global datasets for use within ArcSWAT for international and worldwide applications. The Soil and Water Assessment Tool (SWAT) is a hydrologic model that looks at a number of hydrologic variables including runoff and the chemical makeup of water at a given location on the Earth's surface using Digital Elevation Models (DEM), land cover, soil, and weather data. However, the application of ArcSWAT for projects outside of the United States is challenging as there is no standard framework for inputting global datasets into ArcSWAT. This project aims to remove this obstacle by outlining methods for adapting and implementing these global datasets via the user manual. The manual takes the user through the processes of data conditioning while providing solutions and suggestions for common errors. The efficacy of the manual was explored using examples from watersheds located in Puerto Rico, Mexico and Western Africa. Each run explored the various options for setting up a ArcSWAT project as well as a range of satellite data products and soil databases. Future work will incorporate in-situ data for validation and calibration of the model and outline additional resources to assist future users in efficiently implementing the model for worldwide applications. The capacity to manage and monitor freshwater availability is of critical importance in both developed and developing countries. As populations grow and climate changes, both the quality and quantity of freshwater are affected resulting in negative impacts on the health of the surrounding population. The use of hydrologic models such as ArcSWAT can help stakeholders and decision makers understand the future impacts of these changes enabling informed and substantiated decisions.

  14. Remote web-based 3D visualization of hydrological forecasting datasets.

    NASA Astrophysics Data System (ADS)

    van Meersbergen, Maarten; Drost, Niels; Blower, Jon; Griffiths, Guy; Hut, Rolf; van de Giesen, Nick

    2015-04-01

    As the possibilities for larger and more detailed simulations of geoscientific data expand, the need for smart solutions in data visualization grow as well. Large volumes of data should be quickly accessible from anywhere in the world without the need for transferring the simulation results. We aim to provide tools for both processing and the handling of these large datasets. As an example, the eWaterCycle project (www.ewatercycle.org) aims to provide a running 14-day ensemble forecast to predict water related stress around the globe. The large volumes of simulation results with uncertainty data that are generated through ensemble hydrological predictions provide a challenge for existing visualization solutions. One possible solution for this challenge lies in the use of web-enabled technology for visualization and analysis of these datasets. Web-based visualization provides an additional benefit in that it eliminates the need for any software installation and configuration and allows for the easy communication of research results between collaborating research parties. Providing interactive tools for the exploration of these datasets will not only help in the analysis of the data by researchers, it can also aid in the dissemination of the research results to the general public. In Vienna, we will present a working open source solution for remote visualization of large volumes of global geospatial data based on the proven open-source 3D web visualization software package Cesium (cesiumjs.org), the ncWMS software package provided by the Reading e-Science Centre and the WebGL and NetCDF standards.

  15. enRoute: dynamic path extraction from biological pathway maps for exploring heterogeneous experimental datasets

    PubMed Central

    2013-01-01

    Jointly analyzing biological pathway maps and experimental data is critical for understanding how biological processes work in different conditions and why different samples exhibit certain characteristics. This joint analysis, however, poses a significant challenge for visualization. Current techniques are either well suited to visualize large amounts of pathway node attributes, or to represent the topology of the pathway well, but do not accomplish both at the same time. To address this we introduce enRoute, a technique that enables analysts to specify a path of interest in a pathway, extract this path into a separate, linked view, and show detailed experimental data associated with the nodes of this extracted path right next to it. This juxtaposition of the extracted path and the experimental data allows analysts to simultaneously investigate large amounts of potentially heterogeneous data, thereby solving the problem of joint analysis of topology and node attributes. As this approach does not modify the layout of pathway maps, it is compatible with arbitrary graph layouts, including those of hand-crafted, image-based pathway maps. We demonstrate the technique in context of pathways from the KEGG and the Wikipathways databases. We apply experimental data from two public databases, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) that both contain a wide variety of genomic datasets for a large number of samples. In addition, we make use of a smaller dataset of hepatocellular carcinoma and common xenograft models. To verify the utility of enRoute, domain experts conducted two case studies where they explore data from the CCLE and the hepatocellular carcinoma datasets in the context of relevant pathways. PMID:24564375

  16. A High-Resolution Merged Wind Dataset for DYNAMO: Progress and Future Plans

    NASA Technical Reports Server (NTRS)

    Lang, Timothy J.; Mecikalski, John; Li, Xuanli; Chronis, Themis; Castillo, Tyler; Hoover, Kacie; Brewer, Alan; Churnside, James; McCarty, Brandi; Hein, Paul; Rutledge, Steve; Dolan, Brenda; Matthews, Alyssa; Thompson, Elizabeth

    2015-01-01

    In order to support research on optimal data assimilation methods for the Cyclone Global Navigation Satellite System (CYGNSS), launching in 2016, work has been ongoing to produce a high-resolution merged wind dataset for the Dynamics of the Madden Julian Oscillation (DYNAMO) field campaign, which took place during late 2011/early 2012. The winds are produced by assimilating DYNAMO observations into the Weather Research and Forecasting (WRF) three-dimensional variational (3DVAR) system. Data sources from the DYNAMO campaign include the upper-air sounding network, radial velocities from the radar network, vector winds from the Advanced Scatterometer (ASCAT) and Oceansat-2 Scatterometer (OSCAT) satellite instruments, the NOAA High Resolution Doppler Lidar (HRDL), and several others. In order the prep them for 3DVAR, significant additional quality control work is being done for the currently available TOGA and SMART-R radar datasets, including automatically dealiasing radial velocities and correcting for intermittent TOGA antenna azimuth angle errors. The assimilated winds are being made available as model output fields from WRF on two separate grids with different horizontal resolutions - a 3-km grid focusing on the main DYNAMO quadrilateral (i.e., Gan Island, the R/V Revelle, the R/V Mirai, and Diego Garcia), and a 1-km grid focusing on the Revelle. The wind dataset is focused on three separate approximately 2-week periods during the Madden Julian Oscillation (MJO) onsets that occurred in October, November, and December 2011. Work is ongoing to convert the 10-m surface winds from these model fields to simulated CYGNSS observations using the CYGNSS End-To-End Simulator (E2ES), and these simulated satellite observations are being compared to radar observations of DYNAMO precipitation systems to document the anticipated ability of CYGNSS to provide information on the relationships between surface winds and oceanic precipitation at the mesoscale level. This research will

  17. Conditional analyses on the T1DGC MHC dataset: novel associations with type 1 diabetes around HLA-G and confirmation of HLA-B.

    PubMed

    Eike, M C; Becker, T; Humphreys, K; Olsson, M; Lie, B A

    2009-01-01

    The major histocompatibility complex (MHC) is known to harbour genetic risk factors for type 1 diabetes (T1D) additional to the class II determinants HLA-DRB1, -DQA1 and -DQB1, but strong linkage disequilibrium (LD) has made efforts to establish their location difficult. This study utilizes a dataset generated by the T1D genetics consortium (T1DGC), with genotypes for 2965 markers across the MHC in 2321 T1D families of multiple (mostly Caucasian) ethnicities. Using a comprehensive approach consisting of complementary conditional methods and LD analyses, we identified three regions with T1D association, independent both of the known class II determinants and of each other. A subset of polymorphisms that could explain most of the association in each region included single nucleotide polymorphisms (SNPs) in the vicinity of HLA-G, particular HLA-B and HLA-DPB1 alleles, and SNPs close to the COL11A2 and RING1 genes. Apart from HLA-B and HLA-DPB1, all of these represent novel associations, and subpopulation analyses did not indicate large population-specific differences among Caucasians for our findings. On account of the unusual genetic complexity of the MHC, further fine mapping is demanded, with the possible exception of HLA-B. However, our results mean that these efforts can be focused on narrow, defined regions of the MHC. PMID:18830248

  18. An additional middle cuneiform?

    PubMed Central

    Brookes-Fazakerley, S.D.; Jackson, G.E.; Platt, S.R.

    2015-01-01

    Additional cuneiform bones of the foot have been described in reference to the medial bipartite cuneiform or as small accessory ossicles. An additional middle cuneiform has not been previously documented. We present the case of a patient with an additional ossicle that has the appearance and location of an additional middle cuneiform. Recognizing such an anatomical anomaly is essential for ruling out second metatarsal base or middle cuneiform fractures and for the preoperative planning of arthrodesis or open reduction and internal fixation procedures in this anatomical location. PMID:26224890

  19. The Path from Large Earth Science Datasets to Information

    NASA Astrophysics Data System (ADS)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  20. Introducing A Global Dataset Of Open Permanent Water Bodies

    NASA Astrophysics Data System (ADS)

    Santoro, Maurizio; Lamarche, Celine; Bontemps, Sophie; Wegmuller, Urs; Kalogirou, Vasileios; Arino, Oliver; Defourny, Pierre

    2013-12-01

    This paper introduces a 300-m global map of open permanent water bodies derived from multi-temporal synthetic aperture radar (SAR) data. The SAR dataset consisted of images of the radar backscatter acquired by Envisat Advanced SAR(ASAR) in Wide Swath Mode (WSM, 150 m spatial resolution) between 2005 and 2010. Extended time series of WSM to 2012, Image Mode Medium resolution (IMM) and Global Monitoring Mode (GMM) data have been used to fill gaps. Using as input the temporal variability (TV) of the backscatter and the minimum backscatter (MB), a SAR- based indicator of water bodies (SAR-WBI) has been generated for all continents with a previously validated thresholding algorithm and local refinements. The accuracy of the SAR-WBI is 80%; a threshold of 50% has been used for the land/water fraction in the case of mixed pixels. Correction of inconsistencies with respect to auxiliary datasets, completion of gaps and aggregation to 300 m were applied to obtain the final global water body map referred to as Climate Change Initiative Land Cover Water Body (CCI-LC WB) Product.

  1. Robust Computational Analysis of rRNA Hypervariable Tag Datasets

    PubMed Central

    Sipos, Maksim; Jeraldo, Patricio; Chia, Nicholas; Qu, Ani; Dhillon, A. Singh; Konkel, Michael E.; Nelson, Karen E.; White, Bryan A.; Goldenfeld, Nigel

    2010-01-01

    Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large () 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods. PMID:21217830

  2. Sodankylä ionospheric tomography dataset 2003-2014

    NASA Astrophysics Data System (ADS)

    Norberg, J.; Roininen, L.; Kero, A.; Raita, T.; Ulich, T.; Markkanen, M.; Juusola, L.; Kauristie, K.

    2015-12-01

    Sodankylä Geophysical Observatory has been operating a tomographic receiver network and collecting the produced data since 2003. The collected dataset consists of phase difference curves measured from Russian COSMOS dual-frequency (150/400 MHz) low-Earth-orbit satellite signals, and tomographic electron density reconstructions obtained from these measurements. In this study vertical total electron content (VTEC) values are integrated from the reconstructed electron densities to make a qualitative and quantitative analysis to validate the long-term performance of the tomographic system. During the observation period, 2003-2014, there were three-to-five operational stations at the Fenno-Scandinavian sector. Altogether the analysis consists of around 66 000 overflights, but to ensure the quality of the reconstructions, the examination is limited to cases with descending (north to south) overflights and maximum elevation over 60°. These constraints limit the number of overflights to around 10 000. Based on this dataset, one solar cycle of ionospheric vertical total electron content estimates is constructed. The measurements are compared against International Reference Ionosphere IRI-2012 model, F10.7 solar flux index and sunspot number data. Qualitatively the tomographic VTEC estimate corresponds to reference data very well, but the IRI-2012 model are on average 40 % higher of that of the tomographic results.

  3. High-throughput concentration-response analysis for omics datasets.

    PubMed

    Smetanová, Soňa; Riedl, Janet; Zitzkat, Dimitar; Altenburger, Rolf; Busch, Wibke

    2015-09-01

    Omics-based methods are increasingly used in current ecotoxicology. Therefore, a large number of observations for various toxic substances and organisms are available and may be used for identifying modes of action, adverse outcome pathways, or novel biomarkers. For these purposes, good statistical analysis of toxicogenomic data is vital. In contrast to established ecotoxicological techniques, concentration-response modeling is rarely used for large datasets. Instead, statistical hypothesis testing is prevalent, which provides only a limited scope for inference. The present study therefore applied automated concentration-response modeling for 3 different ecotoxicotranscriptomic and ecotoxicometabolomic datasets. The modeling process was performed by simultaneously applying 9 different regression models, representing distinct mechanistic, toxicological, and statistical ideas that result in different curve shapes. The best-fitting models were selected by using Akaike's information criterion. The linear and exponential models represented the best data description for more than 50% of responses. Models generating U-shaped curves were frequently selected for transcriptomic signals (30%), and sigmoid models were identified as best fit for many metabolomic signals (21%). Thus, selecting the models from an array of different types seems appropriate, because concentration-response functions may vary because of the observed response type, and they also depend on the compound, the organism, and the investigated concentration and exposure duration range. The application of concentration-response models can help to further tap the potential of omics data and is a necessary step for quantitative mixture effect assessment at the molecular response level. PMID:25900799

  4. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    PubMed

    Li, Lianwei; Ma, Zhanshan Sam

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime. PMID:27527985

  5. Biofuel Enduse Datasets from the Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about]

    Holdings include datasets, models, and maps. This is a very new resource, but the collections will grow due to both DOE contributions and individualsÆ data uploads. Currently the Biofuel Enduse collection includes 133 items. Most of these are categorized as literature, but 36 are listed as datasets and ten as models.

  6. Using Benford's law to investigate Natural Hazard dataset homogeneity.

    PubMed

    Joannes-Boyau, Renaud; Bodin, Thomas; Scheffers, Anja; Sambridge, Malcolm; May, Simon Matthias

    2015-01-01

    Working with a large temporal dataset spanning several decades often represents a challenging task, especially when the record is heterogeneous and incomplete. The use of statistical laws could potentially overcome these problems. Here we apply Benford's Law (also called the "First-Digit Law") to the traveled distances of tropical cyclones since 1842. The record of tropical cyclones has been extensively impacted by improvements in detection capabilities over the past decades. We have found that, while the first-digit distribution for the entire record follows Benford's Law prediction, specific changes such as satellite detection have had serious impacts on the dataset. The least-square misfit measure is used as a proxy to observe temporal variations, allowing us to assess data quality and homogeneity over the entire record, and at the same time over specific periods. Such information is crucial when running climatic models and Benford's Law could potentially be used to overcome and correct for data heterogeneity and/or to select the most appropriate part of the record for detailed studies. PMID:26156060

  7. Development of a Watershed Boundary Dataset for Mississippi

    USGS Publications Warehouse

    Van Wilson, K., Jr.; Clair, Michael G., II; Turnipseed, D. Phil; Rebich, Richard A.

    2009-01-01

    The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System, developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and drainage areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (hydrologic unit codes) were further subdivided into 10-digit watersheds and 12-digit subwatersheds - the exceptions are the Lower Mississippi River Alluvial Plain (known locally as the Delta) and the Mississippi River inside levees, which were only subdivided into 10-digit watersheds. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, hydrologic unit codes and names, and drainage-area data - are stored in a Geographic Information System database.

  8. Condensing Massive Satellite Datasets For Rapid Interactive Analysis

    NASA Astrophysics Data System (ADS)

    Grant, G.; Gallaher, D. W.; Lv, Q.; Campbell, G. G.; Fowler, C.; LIU, Q.; Chen, C.; Klucik, R.; McAllister, R. A.

    2015-12-01

    Our goal is to enable users to interactively analyze massive satellite datasets, identifying anomalous data or values that fall outside of thresholds. To achieve this, the project seeks to create a derived database containing only the most relevant information, accelerating the analysis process. The database is designed to be an ancillary tool for the researcher, not an archival database to replace the original data. This approach is aimed at improving performance by reducing the overall size by way of condensing the data. The primary challenges of the project include: - The nature of the research question(s) may not be known ahead of time. - The thresholds for determining anomalies may be uncertain. - Problems associated with processing cloudy, missing, or noisy satellite imagery. - The contents and method of creation of the condensed dataset must be easily explainable to users. The architecture of the database will reorganize spatially-oriented satellite imagery into temporally-oriented columns of data (a.k.a., "data rods") to facilitate time-series analysis. The database itself is an open-source parallel database, designed to make full use of clustered server technologies. A demonstration of the system capabilities will be shown. Applications for this technology include quick-look views of the data, as well as the potential for on-board satellite processing of essential information, with the goal of reducing data latency.

  9. A new compression format for fiber tracking datasets.

    PubMed

    Presseau, Caroline; Jodoin, Pierre-Marc; Houde, Jean-Christophe; Descoteaux, Maxime

    2015-04-01

    A single diffusion MRI streamline fiber tracking dataset may contain hundreds of thousands, and often millions of streamlines and can take up to several gigabytes of memory. This amount of data is not only heavy to compute, but also difficult to visualize and hard to store on disk (especially when dealing with a collection of brains). These problems call for a fiber-specific compression format that simplifies its manipulation. As of today, no fiber compression format has yet been adopted and the need for it is now becoming an issue for future connectomics research. In this work, we propose a new compression format, .zfib, for streamline tractography datasets reconstructed from diffusion magnetic resonance imaging (dMRI). Tracts contain a large amount of redundant information and are relatively smooth. Hence, they are highly compressible. The proposed method is a processing pipeline containing a linearization, a quantization and an encoding step. Our pipeline is tested and validated under a wide range of DTI and HARDI tractography configurations (step size, streamline number, deterministic and probabilistic tracking) and compression options. Similar to JPEG, the user has one parameter to select: a worst-case maximum tolerance error in millimeter (mm). Overall, we find a compression factor of more than 96% for a maximum error of 0.1mm without any perceptual change or change of diffusion statistics (mean fractional anisotropy and mean diffusivity) along bundles. This opens new opportunities for connectomics and tractometry applications. PMID:25592997

  10. New datasets and services for studying magnetospheric plasma processes

    NASA Astrophysics Data System (ADS)

    Laakso, H.; Perry, C.; Taylor, M.; Escoubet, C. P.

    2009-04-01

    The four-satellite Cluster mission investigates the small-scale structures and physical processes related to interaction between the solar wind and the magnetospheric plasma. The mission has collected observations since 2001 and has been approved to operate until 2012. The Cluster Active Archive (CAA) (URL: http://caa.estec.esa.int) will contain the entire set of Cluster high-resolution data and other allied products in a standard format and with a complete set of metadata in machine readable format. Currently there are more than 200 datasets from each spacecraft. The total amount of data files in compressed format is expected to exceed 50 TB. Later this year, the system will also provide access to the observations of the two Double Star spacecraft. The data archive is publicly accessible and suitable for science use and publication by the world-wide scientific community. The CAA became operational in February 2006 and now there are more than 700 registered users. The CAA provides user-friendly services for searching and accessing these data and ancillary products as well as for visualizing some of the scientific parameters. The CAA is continuously being upgraded in terms of datasets and services. This presentation makes first a quick overview of the CAA and concentrates then on the recent updates of the overall system and its services.

  11. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    NASA Astrophysics Data System (ADS)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  12. Collaboration tools and techniques for large model datasets

    USGS Publications Warehouse

    Signell, R.P.; Carniel, S.; Chiggiato, J.; Janekovic, I.; Pullen, J.; Sherwood, C.R.

    2008-01-01

    In MREA and many other marine applications, it is common to have multiple models running with different grids, run by different institutions. Techniques and tools are described for low-bandwidth delivery of data from large multidimensional datasets, such as those from meteorological and oceanographic models, directly into generic analysis and visualization tools. Output is stored using the NetCDF CF Metadata Conventions, and then delivered to collaborators over the web via OPeNDAP. OPeNDAP datasets served by different institutions are then organized via THREDDS catalogs. Tools and procedures are then used which enable scientists to explore data on the original model grids using tools they are familiar with. It is also low-bandwidth, enabling users to extract just the data they require, an important feature for access from ship or remote areas. The entire implementation is simple enough to be handled by modelers working with their webmasters - no advanced programming support is necessary. ?? 2007 Elsevier B.V. All rights reserved.

  13. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  14. Interactive exploration of implicit and explicit relations in faceted datasets.

    PubMed

    Zhao, Jian; Collins, Christopher; Chevalier, Fanny; Balakrishnan, Ravin

    2013-12-01

    Many datasets, such as scientific literature collections, contain multiple heterogeneous facets which derive implicit relations, as well as explicit relational references between data items. The exploration of this data is challenging not only because of large data scales but also the complexity of resource structures and semantics. In this paper, we present PivotSlice, an interactive visualization technique which provides efficient faceted browsing as well as flexible capabilities to discover data relationships. With the metaphor of direct manipulation, PivotSlice allows the user to visually and logically construct a series of dynamic queries over the data, based on a multi-focus and multi-scale tabular view that subdivides the entire dataset into several meaningful parts with customized semantics. PivotSlice further facilitates the visual exploration and sensemaking process through features including live search and integration of online data, graphical interaction histories and smoothly animated visual state transitions. We evaluated PivotSlice through a qualitative lab study with university researchers and report the findings from our observations and interviews. We also demonstrate the effectiveness of PivotSlice using a scenario of exploring a repository of information visualization literature. PMID:24051774

  15. Periodicity detection method for small-sample time series datasets.

    PubMed

    Tominaga, Daisuke

    2010-01-01

    Time series of gene expression often exhibit periodic behavior under the influence of multiple signal pathways, and are represented by a model that incorporates multiple harmonics and noise. Most of these data, which are observed using DNA microarrays, consist of few sampling points in time, but most periodicity detection methods require a relatively large number of sampling points. We have previously developed a detection algorithm based on the discrete Fourier transform and Akaike's information criterion. Here we demonstrate the performance of the algorithm for small-sample time series data through a comparison with conventional and newly proposed periodicity detection methods based on a statistical analysis of the power of harmonics.We show that this method has higher sensitivity for data consisting of multiple harmonics, and is more robust against noise than other methods. Although "combinatorial explosion" occurs for large datasets, the computational time is not a problem for small-sample datasets. The MATLAB/GNU Octave script of the algorithm is available on the author's web site: http://www.cbrc.jp/%7Etominaga/piccolo/. PMID:21151841

  16. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    PubMed Central

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health—the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography “Everything is everywhere, but the environment selects” first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime. PMID:27527985

  17. Exploitation of a large COSMO-SkyMed interferometric dataset

    NASA Astrophysics Data System (ADS)

    Nutricato, Raffaele; Nitti, Davide O.; Bovenga, Fabio; Refice, Alberto; Chiaradia, Maria T.

    2014-10-01

    In this work we explored a dataset made by more than 100 images acquired by COSMO-SkyMed (CSK) constellation over the Port-au-Prince (Haiti) metropolitan and surrounding areas that were severely hit by the January 12th, 2010 earthquake. The images were acquired along ascending pass by all the four sensors of the constellation with a mean rate of 1 acquisition/week. This consistent CSK dataset was fully exploited by using the Persistent Scatterer Interferometry algorithm SPINUA with the aim of: i) providing a displacement map of the area; ii) assessing the use of CSK and PSI for ground elevation measurements; iii) exploring the CSK satellite orbital tube in terms of both precision and size. In particular, significant subsidence phenomena were detected affecting river deltas and coastal areas of the Port-au-Prince and Carrefour region, as well as very slow slope movements and local ground instabilities. Ground elevation was also measured on PS targets with resolution of 3m. The density of these measurable targets depends on the ground coverage, and reaches values higher than 4000 PS/km2 over urban areas, while it drops over vegetated areas or along slopes affected by layover and shadow. Heights values were compared with LIDAR data at 1m of resolution collected soon after the 2010 earthquake. Furthermore, by using geocoding procedures and the precise LIDAR data as reference, the orbital errors affecting CSK records were investigated. The results are in line with other recent studies.

  18. Applicability of AgMERRA Forcing Dataset to Fill Gaps in Historical in-situ Meteorological Data

    NASA Astrophysics Data System (ADS)

    Bannayan, M.; Lashkari, A.; Zare, H.; Asadi, S.; Salehnia, N.

    2015-12-01

    Integrated assessment studies of food production systems use crop models to simulate the effects of climate and socio-economic changes on food security. Climate forcing data is one of those key inputs of crop models. This study evaluated the performance of AgMERRA climate forcing dataset to fill gaps in historical in-situ meteorological data for different climatic regions of Iran. AgMERRA dataset intercompared with in- situ observational dataset for daily maximum and minimum temperature and precipitation during 1980-2010 periods via Root Mean Square error (RMSE), Mean Absolute Error (MAE) and Mean Bias Error (MBE) for 17 stations in four climatic regions included humid and moderate, cold, dry and arid, hot and humid. Moreover, probability distribution function and cumulative distribution function compared between model and observed data. The results of measures of agreement between AgMERRA data and observed data demonstrated that there are small errors in model data for all stations. Except for stations which are located in cold regions, model data in the other stations illustrated under-prediction for daily maximum temperature and precipitation. However, it was not significant. In addition, probability distribution function and cumulative distribution function showed the same trend for all stations between model and observed data. Therefore, the reliability of AgMERRA dataset is high to fill gaps in historical observations in different climatic regions of Iran as well as it could be applied as a basis for future climate scenarios.

  19. Device-independent quantum key distribution

    NASA Astrophysics Data System (ADS)

    Hänggi, Esther

    2010-12-01

    In this thesis, we study two approaches to achieve device-independent quantum key distribution: in the first approach, the adversary can distribute any system to the honest parties that cannot be used to communicate between the three of them, i.e., it must be non-signalling. In the second approach, we limit the adversary to strategies which can be implemented using quantum physics. For both approaches, we show how device-independent quantum key distribution can be achieved when imposing an additional condition. In the non-signalling case this additional requirement is that communication is impossible between all pairwise subsystems of the honest parties, while, in the quantum case, we demand that measurements on different subsystems must commute. We give a generic security proof for device-independent quantum key distribution in these cases and apply it to an existing quantum key distribution protocol, thus proving its security even in this setting. We also show that, without any additional such restriction there always exists a successful joint attack by a non-signalling adversary.

  20. Carbamate deposit control additives

    SciTech Connect

    Honnen, L.R.; Lewis, R.A.

    1980-11-25

    Deposit control additives for internal combustion engines are provided which maintain cleanliness of intake systems without contributing to combustion chamber deposits. The additives are poly(oxyalkylene) carbamates comprising a hydrocarbyloxyterminated poly(Oxyalkylene) chain of 2-5 carbon oxyalkylene units bonded through an oxycarbonyl group to a nitrogen atom of ethylenediamine.

  1. Independently Controlled Wing Stroke Patterns in the Fruit Fly Drosophila melanogaster

    PubMed Central

    Chakraborty, Soma; Bartussek, Jan; Fry, Steven N.; Zapotocky, Martin

    2015-01-01

    Flies achieve supreme flight maneuverability through a small set of miniscule steering muscles attached to the wing base. The fast flight maneuvers arise from precisely timed activation of the steering muscles and the resulting subtle modulation of the wing stroke. In addition, slower modulation of wing kinematics arises from changes in the activity of indirect flight muscles in the thorax. We investigated if these modulations can be described as a superposition of a limited number of elementary deformations of the wing stroke that are under independent physiological control. Using a high-speed computer vision system, we recorded the wing motion of tethered flying fruit flies for up to 12 000 consecutive wing strokes at a sampling rate of 6250 Hz. We then decomposed the joint motion pattern of both wings into components that had the minimal mutual information (a measure of statistical dependence). In 100 flight segments measured from 10 individual flies, we identified 7 distinct types of frequently occurring least-dependent components, each defining a kinematic pattern (a specific deformation of the wing stroke and the sequence of its activation from cycle to cycle). Two of these stroke deformations can be associated with the control of yaw torque and total flight force, respectively. A third deformation involves a change in the downstroke-to-upstroke duration ratio, which is expected to alter the pitch torque. A fourth kinematic pattern consists in the alteration of stroke amplitude with a period of 2 wingbeat cycles, extending for dozens of cycles. Our analysis indicates that these four elementary kinematic patterns can be activated mutually independently, and occur both in isolation and in linear superposition. The results strengthen the available evidence for independent control of yaw torque, pitch torque, and total flight force. Our computational method facilitates systematic identification of novel patterns in large kinematic datasets. PMID:25710715

  2. Marketing Handbook for Independent Schools.

    ERIC Educational Resources Information Center

    Boarding Schools, Boston, MA.

    This publication is a resource to help independent schools attract more familites to their institutions and to increase the voluntary support by the larger community surrounding the school. The first chapter attempts to dispel misconceptions, define pertinent marketing terms, and relate their importance to independent schools. The rest of the book…

  3. Independent Learning Models: A Comparison.

    ERIC Educational Resources Information Center

    Wickett, R. E. Y.

    Five models of independent learning are suitable for use in adult education programs. The common factor is a facilitator who works in some way with the student in the learning process. They display different characteristics, including the extent of independence in relation to content and/or process. Nondirective tutorial instruction and learning…

  4. Synthesizing Global and Local Datasets to Estimate Jurisdictional Forest Carbon Fluxes in Berau, Indonesia

    PubMed Central

    Griscom, Bronson W.; Ellis, Peter W.; Baccini, Alessandro; Marthinus, Delon; Evans, Jeffrey S.; Ruslandi

    2016-01-01

    Background Forest conservation efforts are increasingly being implemented at the scale of sub-national jurisdictions in order to mitigate global climate change and provide other ecosystem services. We see an urgent need for robust estimates of historic forest carbon emissions at this scale, as the basis for credible measures of climate and other benefits achieved. Despite the arrival of a new generation of global datasets on forest area change and biomass, confusion remains about how to produce credible jurisdictional estimates of forest emissions. We demonstrate a method for estimating the relevant historic forest carbon fluxes within the Regency of Berau in eastern Borneo, Indonesia. Our method integrates best available global and local datasets, and includes a comprehensive analysis of uncertainty at the regency scale. Principal Findings and Significance We find that Berau generated 8.91 ± 1.99 million tonnes of net CO2 emissions per year during 2000–2010. Berau is an early frontier landscape where gross emissions are 12 times higher than gross sequestration. Yet most (85%) of Berau’s original forests are still standing. The majority of net emissions were due to conversion of native forests to unspecified agriculture (43% of total), oil palm (28%), and fiber plantations (9%). Most of the remainder was due to legal commercial selective logging (17%). Our overall uncertainty estimate offers an independent basis for assessing three other estimates for Berau. Two other estimates were above the upper end of our uncertainty range. We emphasize the importance of including an uncertainty range for all parameters of the emissions equation to generate a comprehensive uncertainty estimate–which has not been done before. We believe comprehensive estimates of carbon flux uncertainty are increasingly important as national and international institutions are challenged with comparing alternative estimates and identifying a credible range of historic emissions values

  5. Global lightning NOx production estimated by an assimilation of multiple satellite datasets

    NASA Astrophysics Data System (ADS)

    Miyazaki, K.; Eskes, H. J.; Sudo, K.; Zhang, C.

    2013-11-01

    The global source of lightning-produced NOx (LNOx) is estimated by assimilating observations of NO2, O3, HNO3, and CO measured by multiple satellite measurements. Included are observations from the Ozone Monitoring Instrument (OMI), Microwave Limb Sounder (MLS), Tropospheric Emission Spectrometer (TES), and Measurements of Pollution in the Troposphere (MOPITT) instruments. The assimilation of multiple chemical datasets with different vertical sensitivity profiles provides comprehensive constraints on the global LNOx source while improving the representations of the entire chemical system affecting atmospheric NOx, including surface emissions and inflows from the stratosphere. The annual global LNOx source amount and NO production efficiency are estimated at 6.3 Tg N yr-1 and 350 mol NO flash-1, respectively. Sensitivity studies with perturbed satellite datasets, model and data assimilation settings leads to an error estimate of about 1.4 Tg N yr-1 on this global LNOx source. These estimates are significantly different from those derived from NO2 observations alone, which may lead to an overestimate of the source adjustment. The total LNOx source is predominantly corrected by the assimilation of OMI NO2 observations, while TES and MLS observations add important constraints on the vertical source profile. The results indicate that the widely used lightning parameterization based on the C-shape assumption underestimates the source in the upper troposphere and overestimates the peak source height by up to about 1 km over land and the tropical western Pacific. Adjustments are larger over ocean than over land, suggesting that the cloud height dependence is too weak over the ocean in the Price and Rind (1992) approach. The significantly improved agreement between the analysed ozone fields and independent observations gives confidence in the performance of the LNOx source estimation.

  6. Convergent Genetic and Expression Datasets Highlight TREM2 in Parkinson's Disease Susceptibility.

    PubMed

    Liu, Guiyou; Liu, Yongquan; Jiang, Qinghua; Jiang, Yongshuai; Feng, Rennan; Zhang, Liangcai; Chen, Zugen; Li, Keshen; Liu, Jiafeng

    2016-09-01

    A rare TREM2 missense mutation (rs75932628-T) was reported to confer a significant Alzheimer's disease (AD) risk. A recent study indicated no evidence of the involvement of this variant in Parkinson's disease (PD). Here, we used the genetic and expression data to reinvestigate the potential association between TREM2 and PD susceptibility. In stage 1, using 10 independent studies (N = 89,157; 8787 cases and 80,370 controls), we conducted a subgroup meta-analysis. We identified a significant association between rs75932628 and PD (P = 3.10E-03, odds ratio (OR) = 3.88, 95 % confidence interval (CI) 1.58-9.54) in No-Northern Europe subgroup, and significantly increased PD risks (P = 0.01 for Mann-Whitney test) in No-Northern Europe subgroup than in Northern Europe subgroup. In stage 2, we used the summary results from a large-scale PD genome-wide association study (GWAS; N = 108,990; 13,708 cases and 95,282 controls) to search for other TREM2 variants contributing to PD susceptibility. We identified 14 single-nucleotide polymorphisms (SNPs) associated with PD within 50-kb upstream and downstream range of TREM2. In stage 3, using two brain expression GWAS datasets (N = 773), we identified 6 of the 14 SNPs regulating increased expression of TREM2. In stage 4, using the whole human genome microarray data (N = 50), we further identified significantly increased expression of TREM2 in PD cases compared with controls in human prefrontal cortex. In summary, convergent genetic and expression datasets demonstrate that TREM2 is a potent risk factor for PD and may be a therapeutic target in PD and other neurodegenerative diseases. PMID:26365049

  7. Monotonic Weighted Power Transformations to Additivity

    ERIC Educational Resources Information Center

    Ramsay, J. O.

    1977-01-01

    A class of monotonic transformations which generalize the power transformation is fit to the independent and dependent variables in multiple regression so that the resulting additive relationship is optimized. Examples of analysis of real and artificial data are presented. (Author/JKS)

  8. Comparison of Two U.S. Power-Plant Carbon Dioxide Emissions Datasets

    NASA Astrophysics Data System (ADS)

    Ackerman, K. V.; Sundquist, E. T.

    2006-12-01

    U.S. electric generating facilities account for 8-9 percent of global fossil-fuel CO2 emissions. Because estimates of fossil-fuel consumption and CO2 emissions are recorded at each power-plant point source, U.S. power-plant CO2 emissions may be the most thoroughly monitored globally significant source of fossil-fuel CO2 emissions. We examined two datasets for the years 1998-2000: (1) the Department of Energy/Energy Information Administration (EIA) dataset of emissions calculated from fuel data contained in the EIA electricity database files, and (2) eGRID (Emissions and Generation Resource Integrated Database), a publicly available database generated by the Environmental Protection Agency. We compared the eGRID and EIA estimates of CO2 emissions for electricity generation at power plants within the conterminous U.S. at two levels: (1) estimates for individual power-plant emissions, which allowed analysis of differences due to plant listings, calculation methods, and measurement methods; and (2) estimated conterminous U.S. totals for power-plant emissions, which allowed analysis of the aggregated effects of these individual plant differences, and assessment of the aggregated differences in the context of previously published uncertainty estimates. Comparison of data for individual plants, after removing outliers, shows the average difference (absolute value) between eGRID and EIA estimates for individual plants to be approximately 12 percent, relative to the means of the paired estimates. Systematic differences are apparent in the eGRID and EIA reporting of emissions from combined heat and power plants. Additional differences between the eGRID and EIA datasets can be attributed to the fact that most of the emissions from the largest plants are derived from a Continuous Emissions Monitoring (CEM) system in eGRID and are calculated using fuel consumption data in the EIA dataset. This results in a conterminous U.S. total calculated by eGRID that is 3.4 to 5.8 percent

  9. Visual integration of multi-displicinary datasets for the geophysical analysis of tectonic processes

    NASA Astrophysics Data System (ADS)

    Jacobs, A. M.; Dingler, J. A.; Brothers, D.; Kent, G. M.

    2006-12-01

    Within the scientific community, there is a growing emphasis on interdisciplinary analyses to gain a more complete understanding of how entire earth systems function. Challenges of this approach include integrating the numerous, and often disparate, datasets, while also presenting the integrated data in a manner comprehensible to a wide range of scientists. Three- and four-dimensional visualization is quickly becoming the primary tool for facilitating these challenges. We frequently utilize the modular methodology of the IVS Fledermaus visualization software package to enhance our ability to better understand various geophysical datasets and the tectonic processes occurring within their respective systems. A main benefit of this software is that it allows us to generate individual visual objects from geo-referenced datasets and then combine them to form interactive, multi-dimension visual scenes. Additionally, this visualization process is advantageous to interdisciplinary analyses because: 1) the visual objects are portable across scenes, 2) they can be easily exchanged between scientists to build new user-specific scenes, and 3) both the objects or scenes can be viewed using the full software package or the free viewer, iView3D, on any modern computer operating system (i.e., Mac OSX, Windows, Linux). Here we present examples of Fledermaus and how we have used visualization to better "see" oceanic, coastal, and continental tectonic environments. In one visualization, bathymetric, petrologic and hydrothermal vent information from a spreading system in the Lau back-arc basin is integrated with multichannel seismic (MCS) data to ascertain where the subduction zone influences begin strongly shaping the character of the spreading ridge. In visualizations of coastal environments, we combine high-resolution seismic CHIRP data with bathymetry, side-scan and MCS data, Landsat images, geological maps, and earthquake locations to look at slope stability in the Santa Barbara

  10. New Atmospheric and Oceanic Angular Momentum Datasets for Predictions of Earth Rotation/Polar Motion

    NASA Astrophysics Data System (ADS)

    Salstein, D. A.; Stamatakos, N.

    2014-12-01

    We are reviewing the state of the art in available datasets for both atmospheric angular momentum (AAM) and oceanic angular momentum (OAM) for the purposes of analysis and prediction of both polar motion and length of day series. Both analyses and forecasts of these quantities have been used separately and in combination to aid in short and medium range predictions of Earth rotation parameters. The AAM and OAM combination, with the possible addition of hydrospheric angular momentum can form a proxy index for the Earth rotation parameters themselves due to the conservation of angular momentum in the Earth system. Such a combination of angular momentum of the geophysical fluids has helped in forecasts within periods up to about 10 days, due to the dynamic models, and together with extended statistical predictions of Earth rotation parameters out even as far as 90 days, according to Dill et al. (2013). We assess other dataset combinations that can be used in such analysis and prediction efforts for the Earth rotation parameters, and demonstrate the corresponding skill levels in doing so.

  11. Toward a Comprehensive Carbon Budget for North America: Potential Applications of Adjoint Methods with Diverse Datasets

    NASA Technical Reports Server (NTRS)

    Andrews, A.

    2002-01-01

    A detailed mechanistic understanding of the sources and sinks of CO2 will be required to reliably predict future COS levels and climate. A commonly used technique for deriving information about CO2 exchange with surface reservoirs is to solve an "inverse problem," where CO2 observations are used with an atmospheric transport model to find the optimal distribution of sources and sinks. Synthesis inversion methods are powerful tools for addressing this question, but the results are disturbingly sensitive to the details of the calculation. Studies done using different atmospheric transport models and combinations of surface station data have produced substantially different distributions of surface fluxes. Adjoint methods are now being developed that will more effectively incorporate diverse datasets in estimates of surface fluxes of CO2. In an adjoint framework, it will be possible to combine CO2 concentration data from long-term surface monitoring stations with data from intensive field campaigns and with proposed future satellite observations. A major advantage of the adjoint approach is that meteorological and surface data, as well as data for other atmospheric constituents and pollutants can be efficiently included in addition to observations of CO2 mixing ratios. This presentation will provide an overview of potentially useful datasets for carbon cycle research in general with an emphasis on planning for the North American Carbon Project. Areas of overlap with ongoing and proposed work on air quality/air pollution issues will be highlighted.

  12. Wide-Area Mapping of Forest with National Airborne Laser Scanning and Field Inventory Datasets

    NASA Astrophysics Data System (ADS)

    Monnet, J.-M.; Ginzler, C.; Clivaz, J.-C.

    2016-06-01

    Airborne laser scanning (ALS) remote sensing data are now available for entire countries such as Switzerland. Methods for the estimation of forest parameters from ALS have been intensively investigated in the past years. However, the implementation of a forest mapping workflow based on available data at a regional level still remains challenging. A case study was implemented in the Canton of Valais (Switzerland). The national ALS dataset and field data of the Swiss National Forest Inventory were used to calibrate estimation models for mean and maximum height, basal area, stem density, mean diameter and stem volume. When stratification was performed based on ALS acquisition settings and geographical criteria, satisfactory prediction models were obtained for volume (R2 = 0.61 with a root mean square error of 47 %) and basal area (respectively 0.51 and 45 %) while height variables had an error lower than 19%. This case study shows that the use of nationwide ALS and field datasets for forest resources mapping is cost efficient, but additional investigations are required to handle the limitations of the input data and optimize the accuracy.

  13. Intercomparison and suitability of five Greenland topographic datasets for the purpose of hydrologic runoff modeling

    NASA Astrophysics Data System (ADS)

    Pitcher, L. H.; Smith, L. C.; Rennermalm, A. K.; Chu, V. W.; Gleason, C. J.; Yang, K.; Finnegan, D. C.; LeWinter, A. L.; Moller, D.; Moustafa, S.

    2012-12-01

    Rapid melting of the Greenland Ice Sheet (GrIS) and subsequent sea level rise has underscored the need for accurate modeling of hydrologic processes. Researchers rely on the accuracy of topography datasets for this purpose, especially in remote areas like Greenland where in situ validation data are difficult to acquire. A number of new remotely-sensed Digital Elevation Models (DEMs) have recently become available for Greenland, but a comparative study of their respective quality and suitability for hydrologic modeling has not been undertaken. We examine five such remotely-sensed DEMs acquired for proglacial and supraglacial ablation zones of Greenland, namely (1) WorldView stereo DEMs, (2) NASA GLISTIN-A experimental radar, (3) NASA/IceBridge Airborne Topographic Mapper (ATM), (4) Greenland Ice Mapping Project (GIMP) DEM, and (5) ASTER DEM. The quality, strengths and weaknesses of these DEMs for GrIS hydrologic modeling is assessed through intercomparison and in situ terrestrial lidar scanning data with precise RTK GPS control. Additionally, gridded bedrock (i.e. NASA/IceBridge Multichannel Coherent Radar Depth Sounder (MCoRDS); Bamber DEMs) and surface topography datasets are combined to create a hydraulic potentiometric surface for hydrologic modeling. Finally, the suitability of these combined topographic products for hydrologic modeling, characterization of GrIS meltwater runoff, and estimating sub- and/or englacial pathways is explored.

  14. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide

    PubMed Central

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-01-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species’ evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals (“MammalDIET”). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  15. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide.

    PubMed

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-07-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species' evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals ("MammalDIET"). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  16. Revisiting Frazier's subdeltas: enhancing datasets with dimensionality, better to understand geologic systems

    USGS Publications Warehouse

    Flocks, James

    2006-01-01

    Scientific knowledge from the past century is commonly represented by two-dimensional figures and graphs, as presented in manuscripts and maps. Using today's computer technology, this information can be extracted and projected into three- and four-dimensional perspectives. Computer models can be applied to datasets to provide additional insight into complex spatial and temporal systems. This process can be demonstrated by applying digitizing and modeling techniques to valuable information within widely used publications. The seminal paper by D. Frazier, published in 1967, identified 16 separate delta lobes formed by the Mississippi River during the past 6,000 yrs. The paper includes stratigraphic descriptions through geologic cross-sections, and provides distribution and chronologies of the delta lobes. The data from Frazier's publication are extensively referenced in the literature. Additional information can be extracted from the data through computer modeling. Digitizing and geo-rectifying Frazier's geologic cross-sections produce a three-dimensional perspective of the delta lobes. Adding the chronological data included in the report provides the fourth-dimension of the delta cycles, which can be visualized through computer-generated animation. Supplemental information can be added to the model, such as post-abandonment subsidence of the delta-lobe surface. Analyzing the regional, net surface-elevation balance between delta progradations and land subsidence is computationally intensive. By visualizing this process during the past 4,500 yrs through multi-dimensional animation, the importance of sediment compaction in influencing both the shape and direction of subsequent delta progradations becomes apparent. Visualization enhances a classic dataset, and can be further refined using additional data, as well as provide a guide for identifying future areas of study.

  17. Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges

    NASA Astrophysics Data System (ADS)

    Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.

    2013-12-01

    Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM

  18. Web-based Data Information and Sharing System Using Mars Remotely Sensed Datasets

    NASA Astrophysics Data System (ADS)

    Necsoiu, M.; Dinwiddie, C. L.; Colton, S.; Coleman, N. M.

    2004-05-01

    to 2S) outflow channels. For Walla Walla Vallis (name provisionally approved by the International Astronomical Union), a small outflow channel, the integrated datasets helped resolve the locations of reaches that were indistinct in visible light images. For Ravi Vallis, the composite data system enhanced our understanding of how some chaotic terrain forms. As presented by Coleman, N.M. (2004 Lunar and Planetary Science Conference, Abstract #1299), thinning of the cryosphere by deep fluvial incision spawned secondary breakouts of groundwater, forming new chaos zones. The systems flexible design allows for incorporation of additional remote sensing datasets, such as those provided by MOC, TES, and MARSIS instruments. In summary, our integrated data-access system will make the wealth of new Martian data more readily available to planetary researchers enabling scientists to focus more time on analyses or algorithm development rather than on finding data and format conversions. Disclaimer: An employee of the U.S. Nuclear Regulatory Commission (NRC) made contributions to this work on his own time apart from regular duties. NRC has neither approved nor disapproved the technical context of this abstract.

  19. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Fletcher, James C. (Inventor); Pratt, J. Richard (Inventor); St.clair, Terry L. (Inventor); Stoakley, Diane M. (Inventor); Burks, Harold D. (Inventor)

    1992-01-01

    A process for preparing polyimides having enhanced melt flow properties is described. The process consists of heating a mixture of a high molecular weight poly-(amic acid) or polyimide with a low molecular weight amic acid or imide additive in the range of 0.05 to 15 percent by weight of additive. The polyimide powders so obtained show improved processability, as evidenced by lower melt viscosity by capillary rheometry. Likewise, films prepared from mixtures of polymers with additives show improved processability with earlier onset of stretching by TMA.

  20. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Pratt, J. Richard (Inventor); St.clair, Terry L. (Inventor); Stoakley, Diane M. (Inventor); Burks, Harold D. (Inventor)

    1993-01-01

    A process for preparing polyimides having enhanced melt flow properties is described. The process consists of heating a mixture of a high molecular weight poly-(amic acid) or polyimide with a low molecular weight amic acid or imide additive in the range of 0.05 to 15 percent by weight of the additive. The polyimide powders so obtained show improved processability, as evidenced by lower melt viscosity by capillary rheometry. Likewise, films prepared from mixtures of polymers with additives show improved processability with earlier onset of stretching by TMA.

  1. Restoration and Recalibration of the Viking MAWD Datasets

    NASA Astrophysics Data System (ADS)

    Nuno, R. G.; Paige, D. A.; Sullivan, M.

    2014-12-01

    High-resolution HIRISE images of transient albedo dark features, called Recurring Slope Lineae (RSL), have been interpreted to be evidence for current hydrological activity [1]. If there are surface sources of water, then localized plumes of atmospheric water may be observable from orbit. The Viking MAWD column water vapor data are uniquely valuable for this purpose because they cover the full range of Martian local times, and include data sampled at high spatial resolution [2]. They also are accompanied by simultaneously acquired surface and atmospheric temperatures acquired by the Viking Infrared Thermal Mapper (IRTM) instruments. We searched the raster-averaged Viking Orbiter 1 and 2 MAWD column water vapor dataset for regions of localized elevated column water vapor abundances and found mid-latitude regions with transient water observations [3]. The raster averaged Viking Orbiter 1 and 2 MAWD column water vapor data available in the Planetary Data System (PDS), were calculated from radiance measurements using seasonally and topographically varying surface pressures which, at the time, had high uncertainties [4]. Due to recent interest in transient hydrological activity on Mars [2], we decoded the non-raster averaged Viking MAWD dataset, which are sampled at 15 times higher spatial resolution than the data that are currently available from PDS. This new dataset is being used to recalculate column water vapor abundances using current topographical data, as well as dust and pressure measurements from the Mars Global Circulation Model.References: [1] McEwen, A. S., et al. (2011). Seasonal flows on warm Martian slopes. Science (New York, N.Y.), 333(6043), 740-3. [2] Farmer, C. B., & Laporte, D. D. (1972). The Detection and Mapping of Water Vapor in the Martian Atmosphere. Icarus. [3] Nuno, R. G., et al. (2013). Searching for Localized Water Vapor Sources on Mars Utilizing Viking MAWD Data. 44th Lunar and Planetary Science Conference. [4] Farmer, C. B., et al. (1977

  2. Smed454 dataset: unravelling the transcriptome of Schmidtea mediterranea

    PubMed Central

    2010-01-01

    Background Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea

  3. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    NASA Astrophysics Data System (ADS)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  4. Dataset used to improve liquid water absorption models in the microwave

    SciTech Connect

    Turner, David

    2015-12-14

    Two datasets, one a compilation of laboratory data and one a compilation from three field sites, are provided here. These datasets provide measurements of the real and imaginary refractive indices and absorption as a function of cloud temperature. These datasets were used in the development of the new liquid water absorption model that was published in Turner et al. 2015.

  5. Smog control fuel additives

    SciTech Connect

    Lundby, W.

    1993-06-29

    A method is described of controlling, reducing or eliminating, ozone and related smog resulting from photochemical reactions between ozone and automotive or industrial gases comprising the addition of iodine or compounds of iodine to hydrocarbon-base fuels prior to or during combustion in an amount of about 1 part iodine per 240 to 10,000,000 parts fuel, by weight, to be accomplished by: (a) the addition of these inhibitors during or after the refining or manufacturing process of liquid fuels; (b) the production of these inhibitors for addition into fuel tanks, such as automotive or industrial tanks; or (c) the addition of these inhibitors into combustion chambers of equipment utilizing solid fuels for the purpose of reducing ozone.

  6. Food Additives and Hyperkinesis

    ERIC Educational Resources Information Center

    Wender, Ester H.

    1977-01-01

    The hypothesis that food additives are causally associated with hyperkinesis and learning disabilities in children is reviewed, and available data are summarized. Available from: American Medical Association 535 North Dearborn Street Chicago, Illinois 60610. (JG)

  7. Additional Types of Neuropathy

    MedlinePlus

    ... A A Listen En Español Additional Types of Neuropathy Charcot's Joint Charcot's Joint, also called neuropathic arthropathy, ... can stop bone destruction and aid healing. Cranial Neuropathy Cranial neuropathy affects the 12 pairs of nerves ...

  8. Knowledge representation for platform-independent structured reporting.

    PubMed Central

    Kahn, C. E.; Huynh, P. N.

    1996-01-01

    Structured reporting systems allow health care providers to record observations using predetermined data elements and formats. We present a generalized language, based on the Standard Generalized Markup Language (SGML), for platform-independent structured reporting. DRML (Data-entry and Report Markup Language) specifies hierarchically organized concepts to be included in data-entry forms and reports. DRML documents serve as the knowledge base for SPIDER, a reporting system that uses the World Wide Web as its data-entry medium. SPIDER generates platform-independent documents that incorporate familiar data-entry objects such as text windows, checkboxes, and radio buttons. From the data entered on these forms, SPIDER uses its knowledge base to generate outline-format textual reports, and creates datasets for analysis of aggregate results. DRML allows knowledge engineers to design a wide variety of clinical reports and survey instruments. PMID:8947712

  9. A comparison of absolute performance of different correlative and mechanistic species distribution models in an independent area.

    PubMed

    Shabani, Farzin; Kumar, Lalit; Ahmadi, Mohsen

    2016-08-01

    To investigate the comparative abilities of six different bioclimatic models in an independent area, utilizing the distribution of eight different species available at a global scale and in Australia. Global scale and Australia. We tested a variety of bioclimatic models for eight different plant species employing five discriminatory correlative species distribution models (SDMs) including Generalized Linear Model (GLM), MaxEnt, Random Forest (RF), Boosted Regression Tree (BRT), Bioclim, together with CLIMEX (CL) as a mechanistic niche model. These models were fitted using a training dataset of available global data, but with the exclusion of Australian locations. The capabilities of these techniques in projecting suitable climate, based on independent records for these species in Australia, were compared. Thus, Australia is not used to calibrate the models and therefore it is as an independent area regarding geographic locations. To assess and compare performance, we utilized the area under the receiver operating characteristic (ROC) curves (AUC), true skill statistic (TSS), and fractional predicted areas for all SDMs. In addition, we assessed satisfactory agreements between the outputs of the six different bioclimatic models, for all eight species in Australia. The modeling method impacted on potential distribution predictions under current climate. However, the utilization of sensitivity and the fractional predicted areas showed that GLM, MaxEnt, Bioclim, and CL had the highest sensitivity for Australian climate conditions. Bioclim calculated the highest fractional predicted area of an independent area, while RF and BRT were poor. For many applications, it is difficult to decide which bioclimatic model to use. This research shows that variable results are obtained using different SDMs in an independent area. This research also shows that the SDMs produce different results for different species; for example, Bioclim may not be good for one species but works better

  10. Hypersonic Turbulent Boundary-Layer and Free Sheer Database Datasets

    NASA Technical Reports Server (NTRS)

    Settles, Gary S.; Dodson, Lori J.

    1993-01-01

    A critical assessment and compilation of data are presented on attached hypersonic turbulent boundary layers in pressure gradients and compressible turbulent mixing layers. Extensive searches were conducted to identify candidate experiments, which were subjected to a rigorous set of acceptance criteria. Accepted datasets are both tabulated and provided in machine-readable form. The purpose of this database effort is to make existing high quality data available in detailed form for the turbulence-modeling and computational fluid dynamics communities. While significant recent data were found on the subject of compressible turbulent mixing, the available boundary-layer/pressure-gradient experiments are all older ones of which no acceptable data were found at hypersonic Mach numbers.

  11. A Discretized Method for Deriving Vortex Impulse from Volumetric Datasets

    NASA Astrophysics Data System (ADS)

    Buckman, Noam; Mendelson, Leah; Techet, Alexandra

    2015-11-01

    Many biological and mechanical systems transfer momentum through a fluid by creating vortical structures. To study this mechanism, we derive a method for extracting impulse and its time derivative from flow fields observed in experiments and simulations. We begin by discretizing a thin-cored vortex filament, and extend the model to account for finite vortex core thickness and asymmetric distributions of vorticity. By solely using velocity fields to extract vortex cores and calculate circulation, this method is applicable to 3D PIV datasets, even with low spatial resolution flow fields and measurement noise. To assess the performance of this analysis method, we simulate vortex rings and arbitrary vortex structures using OpenFOAM computational fluid dynamics software and analyze the wake momentum using this model in order to validate this method. We further examine a piston-vortex experiment, using 3D synthetic particle image velocimetry (SAPIV) to capture velocity fields. Strengths, limitations, and improvements to the framework are discussed.

  12. Feedstock Logistics Datasets from DOE's Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. Holdings include datasets, models, and maps. [from https://www.bioenergykdf.net/content/about

  13. Biofuel Production Datasets from DOE's Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about]

    Holdings include datasets, models, and maps and the collections arel growing due to both DOE contributions and data uploads from individuals.

  14. Biofuel Distribution Datasets from the Bioenergy Knowledge Discovery Framework

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about] Holdings include datasets, models, and maps and the collections are growing due to both DOE contributions and individuals' data uploads.

  15. Feedstock Production Datasets from the Bioenergy Knowledge Discovery Framework

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about] Holdings include datasets, models, and maps and the collections are growing due to both DOE contributions and data uploads from individuals.

  16. MATCH: Metadata Access Tool for Climate and Health Datasets

    DOE Data Explorer

    MATCH is a searchable clearinghouse of publicly available Federal metadata (i.e. data about data) and links to datasets. Most metadata on MATCH pertain to geospatial data sets ranging from local to global scales. The goals of MATCH are to: 1) Provide an easily accessible clearinghouse of relevant Federal metadata on climate and health that will increase efficiency in solving research problems; 2) Promote application of research and information to understand, mitigate, and adapt to the health effects of climate change; 3) Facilitate multidirectional communication among interested stakeholders to inform and shape Federal research directions; 4) Encourage collaboration among traditional and non-traditional partners in development of new initiatives to address emerging climate and health issues. [copied from http://match.globalchange.gov/geoportal/catalog/content/about.page

  17. Orthology Detection Combining Clustering and Synteny for Very Large Datasets

    PubMed Central

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets. PMID:25137074

  18. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGESBeta

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  19. ConStrains identifies microbial strains in metagenomic datasets.

    PubMed

    Luo, Chengwei; Knight, Rob; Siljander, Heli; Knip, Mikael; Xavier, Ramnik J; Gevers, Dirk

    2015-10-01

    An important fraction of microbial diversity is harbored in strain individuality, so identification of conspecific bacterial strains is imperative for improved understanding of microbial community functions. Limitations in bioinformatics and sequencing technologies have to date precluded strain identification owing to difficulties in phasing short reads to faithfully recover the original strain-level genotypes, which have highly similar sequences. We present ConStrains, an open-source algorithm that identifies conspecific strains from metagenomic sequence data and reconstructs the phylogeny of these strains in microbial communities. The algorithm uses single-nucleotide polymorphism (SNP) patterns in a set of universal genes to infer within-species structures that represent strains. Applying ConStrains to simulated and host-derived datasets provides insights into microbial community dynamics. PMID:26344404

  20. Incorporating the TRMM Dataset into the GPM Mission Data Suite

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz; Ji, Yimin; Chou, Joyce; Kelley, Owen; Kwiatkowski, John; Stout, John

    2016-01-01

    In June 2015 the TRMM satellite came to its end. The 17 plus year of mission data that it provided has proven a valuable asset to a variety of science communities. This 17plus year data set does not, however, stagnate with the end of the mission itself. NASA/JAXA intend to integrate the TRMM data set into the data suite of the GPM mission. This will ensure the creation of a consistent, intercalibrated, accurate dataset within GPM that extends back to November of 1998. This paper describes the plans for incorporating the TRMM 17plus year data into the GPM data suite. These plans call for using GPM algorithms for both radiometer and radar to reprocess TRMM data as well as intercalibrating partner radiometers using GPM intercalibration techniques. This reprocessing will mean changes in content, logical format and physical format as well as improved geolocation, sensor corrections and retrieval techniques.